Previous Next

Intel Processors Machine Check Architectures in Microsoft Windows

Microsoft Windows generic hardware abstraction layers (HAL) for Intel architectures (halx86.dll, halapic.dll, halmps.dll, halia64.dll) support the Machine Check Architectures (MCA) for the Intel Pentium® Pro and Itanium™ processors. The HAL enables Machine Check Exception (MCE) reporting for all implementation-defined errors.

Intel Pentium Pro Processor Machine Check

The Machine Check Exception (MCE) is processor exception 18. The handler for MCE is implemented as a task gate for maximum reliability of the exception handler. The HAL provides a generic exception handler for all errors that cause an exception. This handler reports the machine check exception code on the screen and causes the operating system to halt gracefully, reducing the possibility of persistent data corruption.

In addition, the HAL also provides MCA specific interface that can be used by drivers to:

Machine Check Exception Handling

If the MCA exception handler detects only Intel Pentium® technology (style) MCE support on the platform, it does the following:

If MCA support (Pentium Pro processor) on the platform is detected, the exception handler determines if the error is restartable. If not, it does the following:

If the error is restartable, the exception handler queues a DPC which, when called, reports the MCA bank error to the MCA driver through the DpcCallback routine.

Intel Itanium Processor Machine Check

Machine checks, including Machine Check Aborts, cause Itanium™ processor execution to vector to the Processor Abstraction Layer (PAL) PALE_CHECK code in the Itanium ISA. When PALE_CHECK has finished processing, it passes control to the System Abstraction Layer (SAL) SAL_ENTRY code in the Itanium ISA, which in turn branches to the SAL MCA handler: SAL_CHECK.

Uncorrected machine checks refer to errors that cannot be corrected at PAL or SAL layers. These may still be fully or partially recoverable at the operating system layer. At that time, the control flow differs between corrected and uncorrected machine checks.

For corrected machine checks, the operating system corrected error interrupt handlers will be invoked some time after returning to the interrupted process.

For uncorrected machine checks, SAL exposes an interface to register an OS_MCA callback. After validating this entry point, SAL_CHECK branches to it and provides an error record that will allow the operating system to recover whenever possible. The error record passed by SAL must comply, at a minimum, with the V3.0 SAL specification, Error Record Structures, Appendix B, January 2001. The HAL exposes interfaces for the OEMs to register a driver, and provides the error record to the driver. This enables the OEMs to assist the generic HAL MCA handler by attempting recovery of platform-specific errors and maintaining the integrity of the platform.

For Itanium PAL, SAL, and operating system MCA handler’s details, please refer to http://www.intel.com/design/ia-64/manuals.

The Itanium reference HAL provides an MCA-specific interface that can be used by drivers to:

Machine Check Exception Handling

After collecting the MCA log, the standard HAL MCA handler calls the MCA driver ExceptionCallback function providing the MCA record. This allows the MCA driver to process the log and makes appropriate consideration with regards to the stability of the system. This callback function returns an error severity value to let the HAL know if it should consider the event as fatal, recoverable, or corrected by the MCA driver. In case of a corrected event and if registered, the MCA driver's DpcCallback is then called for asynchronous log collection by the driver.

In case of an OS_MCA uncorrected event, the HAL calls KeBugCheckEx with the bugcheck code MACHINE_CHECK_EXCEPTION and the following four parameters to halt the system:

  1. HAL Itanium™ MCA type, which values could be:

    HAL_BUGCHECK_MCA_ASSERT = 1

    HAL_BUGCHECK_MCA_GET_STATEINFO = 2

    HAL_BUGCHECK_MCA_CLEAR_STATEINFO = 3

    HAL_BUGCHECK_MCA_FATAL = 4

    The last value should be the expected only for the MCA driver, the other values are HAL internal error values.

  2. MCA log address.
  3. MCA maximum log size.
  4. SAL status of the last SAL interface.

MCA Interface for Drivers

The Intel generic HALs provide the following Intel Pentium® Pro and Itanium™ technology MCA-specific interface for drivers:

HalSetSystemInformation

HalSetSystemInformation can be used to register MCA driver with the HAL.

NTSTATUS 
HalSetSystemInformation(
    IN HAL_QUERY_INFORMATION_CLASS      InformationClass,
    IN ULONG  BufferSize,
    OUT PVOID  Buffer,
    );

Parameters

InformationClass
Specify HalMcaRegisterDriver to register MCA driver’s callback routines with the HAL. There are two callback routines — ExceptionCallback and DpcCallback. The ExceptionCallback routine is called during the Machine Check Exception handler nonrestartable error processing, before it issues a bugcheck for the system. The DpcCallback routine is called when the MCA error is restartable. For Itanium™ systems, specify HalCmcRegisterDriver to register a driver’s Corrected CPU Error DpcCallback routine, and HalCpeRegisterDriver to register a driver’s Corrected Platform Error DpcCallback.
BufferSize
Specifies the size, in bytes, of the buffer supplied by the caller.
Buffer
Pointer to a caller-supplied buffer of type MCA_DRIVER_INFO.
//
// Structure to record the callbacks from driver
//
typedef struct _MCA_DRIVER_INFO {
    PDRIVER_EXCPTN_CALLBACK ExceptionCallback;  -  NULL for Itanium corrected error registration
    PKDEFERRED_ROUTINE      DpcCallback;
    PVOID                   DeviceContext;
} MCA_DRIVER_INFO, *PMCA_DRIVER_INFO;
ExceptionCallback
The driver-supplied routine to be called when Machine Check Exception occurs for uncorrected errors. A driver explicitly can not utilize any kernel services or spinlock routines. The handler is restricted to the same constraints as a driver operating at highest IRQL.
DpcCallback
A driver-supplied routine that is called for corrected errors that caused Machine Check Exception. This routine must be called at DISPATCH_LEVEL.
DeviceContext
The device-specific context for this MCA Driver.

Headers

Declared in ntddk.h. Include ntddk.h.

Return Value

HalSetSystemInformation returns STATUS_SUCCESS if the registration is successful.

Comments

HalSetSystemInformation must be called before an MCA driver can use any of the other interface routines. Only one MCA driver can be registered with the HAL at any given time.

HalQuerySystemInformation

HalQuerySystemInformation can be used to read MCA banks' status registers.

NTSTATUS 
HalQuerySystemInformation(
    IN HAL_QUERY_INFORMATION_CLASS  InformationClass,
    IN ULONG  BufferSize,
    OUT PVOID  Buffer,
    OUT PULONG  ReturnedLength
    );

Parameters

InformationClass
Specify HalMcaLogInformation to read the current MCA error log. If any of the uncorrected Machine Check errors is found, it is returned in the buffer. For Itanium™ systems, specify HalCmcLogInformation to read the current corrected CPU error log and HalCpeLogInformation to read the current corrected platform error log.
BufferSize
Specifies the size, in bytes, of the buffer supplied by the caller.
Buffer
Points to a caller-supplied buffer of type MCA_EXCEPTION that will contain the information returned by this routine. For Itanium, the returned information must be compliant, at a minimum, with the V3.0 SAL specification, Error Record Structures, January 2001, Appendix B. For Pentium Pro, the information is as described in the following code sample.
typedef union _MCI_STATS {
struct {
    USHORT  McaCod;
    USHORT  MsCod;
    ULONG  OtherInfo   : 25;
    ULONG  Damage    : 1;
    ULONG  AddressValid  : 1;
    ULONG  MiscValid  : 1;
    ULONG  Enabled    : 1;
    ULONG  UnCorrected  : 1;
    ULONG  OverFlow  : 1;
    ULONG  Valid    : 1;
} MciStats;

ULONGLONG  QuadPart;
} MCI_STATS, *PMCI_STATS;
 
typedef union _MCI_ADDR{
  struct {
    ULONG Address;
    ULONG Reserved;
  } MciAddr;
  
  ULONGLONG  QuadPart;
} MCI_ADDR, *PMCI_ADDR;
 
typedef struct _MCA_EXCEPTION {
  
  ULONG        VersionNumber;  // Version number of this record type
  MCA_EXCEPTION_TYPE    ExceptionType;  // MCA or MCE
  LARGE_INTEGER     TimeStamp;  // exception recording timestamp
  ULONG        ProcessorNumber;// processor number
     
  union {
    struct {
      UCHAR      BankNumber;    // bank number
      MCI_STATS    Status;     
      MCI_ADDR    Address;
      ULONGLONG    Misc;
    } Mca;
    
    struct {
      ULONGLONG  McAddress;  // physical address for the cycle causing the error
      ULONGLONG  McType;   // cycle specification causing the error
    } Mce;
  } u;

} MCA_EXCEPTION, *PMCA_EXCEPTION;
ReturnedLength
Specifies the number of bytes returned in Buffer.

Headers

Declared in ntddk.h. Include ntddk.h.

Return Value

HalQuerySystemInformation returns STATUS_SUCCESS if an error log exists.

Comments

This function returns the first error. It is the MCA driver's responsibility to call this routine again to see if there are any more errors available.