Microsoft Windows generic hardware abstraction layers (HAL) for Intel architectures (halx86.dll, halapic.dll, halmps.dll, halia64.dll) support the Machine Check Architectures (MCA) for the Intel Pentium® Pro and Itanium™ processors. The HAL enables Machine Check Exception (MCE) reporting for all implementation-defined errors.
The Machine Check Exception (MCE) is processor exception 18. The handler for MCE is implemented as a task gate for maximum reliability of the exception handler. The HAL provides a generic exception handler for all errors that cause an exception. This handler reports the machine check exception code on the screen and causes the operating system to halt gracefully, reducing the possibility of persistent data corruption.
In addition, the HAL also provides MCA specific interface that can be used by drivers to:
One case where an error does not generate an exception is if the bit controlling reporting of the machine check error for a specific bank (MCi_CTL.Eej bit) is turned cleared. There are also some corrected errors that don’t generate MCE and are logged in the MCA banks.
If the MCA exception handler detects only Intel Pentium® technology (style) MCE support on the platform, it does the following:
If MCA support (Pentium Pro processor) on the platform is detected, the exception handler determines if the error is restartable. If not, it does the following:
If the error is restartable, the exception handler queues a DPC which, when called, reports the MCA bank error to the MCA driver through the DpcCallback routine.
Machine checks, including Machine Check Aborts, cause Itanium™ processor execution to vector to the Processor Abstraction Layer (PAL) PALE_CHECK code in the Itanium ISA. When PALE_CHECK has finished processing, it passes control to the System Abstraction Layer (SAL) SAL_ENTRY code in the Itanium ISA, which in turn branches to the SAL MCA handler: SAL_CHECK.
Uncorrected machine checks refer to errors that cannot be corrected at PAL or SAL layers. These may still be fully or partially recoverable at the operating system layer. At that time, the control flow differs between corrected and uncorrected machine checks.
For corrected machine checks, the operating system corrected error interrupt handlers will be invoked some time after returning to the interrupted process.
For uncorrected machine checks, SAL exposes an interface to register an OS_MCA callback. After validating this entry point, SAL_CHECK branches to it and provides an error record that will allow the operating system to recover whenever possible. The error record passed by SAL must comply, at a minimum, with the V3.0 SAL specification, Error Record Structures, Appendix B, January 2001. The HAL exposes interfaces for the OEMs to register a driver, and provides the error record to the driver. This enables the OEMs to assist the generic HAL MCA handler by attempting recovery of platform-specific errors and maintaining the integrity of the platform.
For Itanium PAL, SAL, and operating system MCA handler’s details, please refer to http://www.intel.com/design/ia-64/manuals.
The Itanium reference HAL provides an MCA-specific interface that can be used by drivers to:
After collecting the MCA log, the standard HAL MCA handler calls the MCA driver ExceptionCallback function providing the MCA record. This allows the MCA driver to process the log and makes appropriate consideration with regards to the stability of the system. This callback function returns an error severity value to let the HAL know if it should consider the event as fatal, recoverable, or corrected by the MCA driver. In case of a corrected event and if registered, the MCA driver's DpcCallback is then called for asynchronous log collection by the driver.
In case of an OS_MCA uncorrected event, the HAL calls KeBugCheckEx with the bugcheck code MACHINE_CHECK_EXCEPTION and the following four parameters to halt the system:
HAL_BUGCHECK_MCA_ASSERT = 1
HAL_BUGCHECK_MCA_GET_STATEINFO = 2
HAL_BUGCHECK_MCA_CLEAR_STATEINFO = 3
HAL_BUGCHECK_MCA_FATAL = 4
The last value should be the expected only for the MCA driver, the other values are HAL internal error values.
The Intel generic HALs provide the following Intel Pentium® Pro and Itanium™ technology MCA-specific interface for drivers:
HalSetSystemInformation can be used to register MCA driver with the HAL.
NTSTATUS
HalSetSystemInformation(
IN HAL_QUERY_INFORMATION_CLASS InformationClass,
IN ULONG BufferSize,
OUT PVOID Buffer,
);
//
// Structure to record the callbacks from driver
//
typedef struct _MCA_DRIVER_INFO {
PDRIVER_EXCPTN_CALLBACK ExceptionCallback; - NULL for Itanium corrected error registration
PKDEFERRED_ROUTINE DpcCallback;
PVOID DeviceContext;
} MCA_DRIVER_INFO, *PMCA_DRIVER_INFO;
Declared in ntddk.h. Include ntddk.h.
HalSetSystemInformation returns STATUS_SUCCESS if the registration is successful.
HalSetSystemInformation must be called before an MCA driver can use any of the other interface routines. Only one MCA driver can be registered with the HAL at any given time.
HalQuerySystemInformation can be used to read MCA banks' status registers.
NTSTATUS
HalQuerySystemInformation(
IN HAL_QUERY_INFORMATION_CLASS InformationClass,
IN ULONG BufferSize,
OUT PVOID Buffer,
OUT PULONG ReturnedLength
);
typedef union _MCI_STATS {
struct {
USHORT McaCod;
USHORT MsCod;
ULONG OtherInfo : 25;
ULONG Damage : 1;
ULONG AddressValid : 1;
ULONG MiscValid : 1;
ULONG Enabled : 1;
ULONG UnCorrected : 1;
ULONG OverFlow : 1;
ULONG Valid : 1;
} MciStats;
ULONGLONG QuadPart;
} MCI_STATS, *PMCI_STATS;
typedef union _MCI_ADDR{
struct {
ULONG Address;
ULONG Reserved;
} MciAddr;
ULONGLONG QuadPart;
} MCI_ADDR, *PMCI_ADDR;
typedef struct _MCA_EXCEPTION {
ULONG VersionNumber; // Version number of this record type
MCA_EXCEPTION_TYPE ExceptionType; // MCA or MCE
LARGE_INTEGER TimeStamp; // exception recording timestamp
ULONG ProcessorNumber;// processor number
union {
struct {
UCHAR BankNumber; // bank number
MCI_STATS Status;
MCI_ADDR Address;
ULONGLONG Misc;
} Mca;
struct {
ULONGLONG McAddress; // physical address for the cycle causing the error
ULONGLONG McType; // cycle specification causing the error
} Mce;
} u;
} MCA_EXCEPTION, *PMCA_EXCEPTION;
Declared in ntddk.h. Include ntddk.h.
HalQuerySystemInformation returns STATUS_SUCCESS if an error log exists.
This function returns the first error. It is the MCA driver's responsibility to call this routine again to see if there are any more errors available.