Jump-start your project by learning from devs who
write Windows drivers and file systems every day.
Take an OSR seminar!

Upcoming OSR Seminars:
WDM Lab, Seattle, WA 16 August 2010
WDF Lab, Santa Clara, CA 27 September 2010
Debug Lab, Portland, OR 18 October 2010
Windows Internals & Software Drivers Lab, Santa Clara, CA 15 November 2010


Go Back   OSR Online Lists > ntdev
Welcome, Guest
You must login to post to this list
  Message 1 of 3  
24 Jan 07 12:39
ntdev member 33000
xxxxxx@level5networks.com
Join Date:
Posts To This List: 10
Bugcheck: WHEA_UNCORRECTABLE_ERROR

Hi folks, Got a nice little bugcheck that I'm having trouble debugging. I can decode the basic record header and section descriptor structures, but I don't know how to decode the actual data, and hence, don't know how to determine what actually happened. (e.g. something like PCI device X asserted #PERR or #SERR). The info I have is: 0: kd> !analyze -v ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error conditon. Arguments: Arg1: 00000000, MCA_ASSERT Arg2: 85045028, Address of WHEA_ERROR_RECORD structure Arg3: b2000000, High 32 bits of MCi_STATUS MSR for the MCA bank that had the error Arg4: 00070f0f, Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error Debugging Details: ------------------ WHEA_ERROR_RECORD: !errrec ffffffff85045028 <non pertinent stuff snipped> 0: kd> ??(_WHEA_ERROR_RECORD *)(0x85045028);.echo done struct _WHEA_ERROR_RECORD * 0x85045028 +0x000 Header : _WHEA_ERROR_RECORD_HEADER +0x088 SectionDescriptor : [1] _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR done 0: kd> ??((_WHEA_ERROR_RECORD *)(0x85045028))->Header;.echo done struct _WHEA_ERROR_RECORD_HEADER +0x000 Signature : 0x52455043 +0x004 Revision : 0x100 +0x006 Reserved1 : 0xffff +0x008 Reserved2 : 0xffff +0x00a SectionCount : 1 +0x00c Severity : 1 ( WheaErrSevFatal ) +0x010 ValidationBits : 2 +0x014 Length : 0x2e8 +0x018 Timestamp : _LARGE_INTEGER 0x1c73fd6`1d68ff38 +0x020 PlatformId : _GUID {00000000-0000-0000-0000-000000000000} +0x030 PartitionId : _GUID {00000000-0000-0000-0000-000000000000} +0x040 CreatorId : _GUID {f9de0c24-0e4d-4c87-b410-f5701cab65c3} +0x050 NotifyType : _GUID {e8f56ffe-919c-4cc5-ba88-65abe14913bb} +0x060 RecordId : 1 +0x068 Flags : 0 +0x070 PersistenceInfo : _WHEA_PERSISTENCE_INFO +0x078 Reserved3 : [12] "" done 0: kd> ??((_WHEA_ERROR_RECORD *)(0x85045028))->SectionDescriptor;.echo done struct _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR [1] 0x850450b0 +0x000 SectionOffset : 0xd0 +0x004 SectionLength : 0x218 +0x008 Revision : 0x100 +0x00a ValidationBits : 0 '' +0x00b Reserved : 0 '' +0x00c Flags : 1 +0x010 SectionType : _GUID {e71254e9-c1b9-4940-ab76-909703a4320f} +0x020 FRUId : _GUID {00000000-0000-0000-0000-000000000000} +0x030 SectionSeverity : 1 ( WheaErrSevFatal ) +0x034 FRUText : [20] "" done 0: kd> dd (0x85045028 + 0xd0) L 86 850450f8 74507245 00000000 00000218 00000000 85045108 00000100 00000000 00000000 00000000 85045118 00000000 00000001 00000000 00000000 85045128 00000000 00000002 00000002 00000000 85045138 0000017f 00000000 00030000 00000300 85045148 00020f12 00000000 00000000 00000000 85045158 00000000 00000000 00000000 00000000 85045168 00000000 00000000 00000000 00000000 85045178 00000000 00000000 00000000 00000000 85045188 00000000 00000000 00000000 00000000 85045198 00000000 00000000 00000000 00000000 850451a8 00000000 00000000 00000000 00000000 850451b8 00000000 00000000 00000000 00000000 850451c8 00000000 00000000 00000002 00000000 850451d8 00000000 00000000 00000000 00000000 850451e8 00000000 00000000 00000000 00000000 850451f8 00000000 00000000 00000000 00000000 85045208 00000001 00000000 00000001 00000001 85045218 1d68ff38 01c73fd6 00000002 00000000 85045228 00000004 00000000 00070f0f b2000000 85045238 00000000 00000000 00000000 00000000 85045248 00000000 00000000 00000000 00000000 85045258 00000000 00000000 00000000 00000000 85045268 00000000 00000000 00000000 00000000 85045278 00000000 00000000 00000000 00000000 85045288 00000000 00000000 00000000 00000000 85045298 00000000 00000000 00000000 00000000 850452a8 00000000 00000000 00000000 00000000 850452b8 00000000 00000000 00000000 00000000 850452c8 00000000 00000000 00000000 00000000 850452d8 00000000 00000000 00000000 00000000 850452e8 00000000 00000000 00000000 00000000 850452f8 00000000 00000000 00000000 00000000 85045308 00000000 00000000
  Message 2 of 3  
24 Jan 07 14:54
Ian Service
xxxxxx@microsoft.com
Join Date: 30 Sep 2005
Posts To This List: 7
RE: Bugcheck: WHEA_UNCORRECTABLE_ERROR

Hi Martin, I asked one of the developers who works on this if he could help. Here is what he said. I get the daily summary, but that's all. I can give you what sounds like a good description of the error. However, this particular error is somewhat generic and very difficult to root cause. Concatenating bugcheck arguments 2 and 3, you can get the full machine check status code for the error - 0x b200000000070f0f. This is an AMD-specific error code reported on the processor's Northbridge machine check status bank. The error code means a HyperTransport Watchdog Timeout (WDTO) occurred. Basically, this means a PCI transaction failed to complete for some reason, HT timed out, and the processor raised the fatal machine check. There is some additional snooping that is possible if this error occurs under a debugger, but, due to the likelihood of a hard lockup, the OS does not attempt to probe the buses to find which device(s) have error bits set. Under a debugger, you can try to dump PCI config space to determine which device(s) report errors. This doesn't necessarily offer a great deal of help, but it may indicate which device(s) are involved. The root cause could be any one or more of the following: a device driver is misprogramming hardware, thus hanging the bus; a device is not in a state such that it can properly respond to accesses (frequently the device is found to be in a low power state); or possibly this is case where the HT timeout threshold is too sensitive. This could potentially happen on DMA or PIO requests to storage device and maybe network devices. This could again be hardware, firmware, or a device driver issue. The Windows Hardware Error Architecture (WHEA) does allow the platform to cooperate with the OS to provide additional details about errors such as this. This particular error, for instance could potentially be much better described if the BIOS were to identify the device(s) involved in the error, but until WHEA-aware platforms are available from vendors details like this are not included in the error record. -----Original Message----- From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Martin Harvey Sent: Wednesday, January 24, 2007 9:37 AM To: Windows System Software Devs Interest List Subject: [ntdev] Bugcheck: WHEA_UNCORRECTABLE_ERROR Hi folks, Got a nice little bugcheck that I'm having trouble debugging. I can decode the basic record header and section descriptor structures, but I don't know how to decode the actual data, and hence, don't know how to determine what actually happened. (e.g. something like PCI device X asserted #PERR or #SERR). The info I have is: 0: kd> !analyze -v ************************************************************************ ******* * * * Bugcheck Analysis * * * ************************************************************************ ******* WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error conditon. Arguments: Arg1: 00000000, MCA_ASSERT Arg2: 85045028, Address of WHEA_ERROR_RECORD structure Arg3: b2000000, High 32 bits of MCi_STATUS MSR for the MCA bank that had the error Arg4: 00070f0f, Low 32 bits of MCi_STATUS MSR for the MCA bank that had the error Debugging Details: ------------------ WHEA_ERROR_RECORD: !errrec ffffffff85045028 <non pertinent stuff snipped> 0: kd> ??(_WHEA_ERROR_RECORD *)(0x85045028);.echo done struct _WHEA_ERROR_RECORD * 0x85045028 +0x000 Header : _WHEA_ERROR_RECORD_HEADER +0x088 SectionDescriptor : [1] _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR done 0: kd> ??((_WHEA_ERROR_RECORD *)(0x85045028))->Header;.echo done struct _WHEA_ERROR_RECORD_HEADER +0x000 Signature : 0x52455043 +0x004 Revision : 0x100 +0x006 Reserved1 : 0xffff +0x008 Reserved2 : 0xffff +0x00a SectionCount : 1 +0x00c Severity : 1 ( WheaErrSevFatal ) +0x010 ValidationBits : 2 +0x014 Length : 0x2e8 +0x018 Timestamp : _LARGE_INTEGER 0x1c73fd6`1d68ff38 +0x020 PlatformId : _GUID {00000000-0000-0000-0000-000000000000} +0x030 PartitionId : _GUID {00000000-0000-0000-0000-000000000000} +0x040 CreatorId : _GUID {f9de0c24-0e4d-4c87-b410-f5701cab65c3} +0x050 NotifyType : _GUID {e8f56ffe-919c-4cc5-ba88-65abe14913bb} +0x060 RecordId : 1 +0x068 Flags : 0 +0x070 PersistenceInfo : _WHEA_PERSISTENCE_INFO +0x078 Reserved3 : [12] "" done 0: kd> ??((_WHEA_ERROR_RECORD *)(0x85045028))->SectionDescriptor;.echo done struct _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR [1] 0x850450b0 +0x000 SectionOffset : 0xd0 +0x004 SectionLength : 0x218 +0x008 Revision : 0x100 +0x00a ValidationBits : 0 '' +0x00b Reserved : 0 '' +0x00c Flags : 1 +0x010 SectionType : _GUID {e71254e9-c1b9-4940-ab76-909703a4320f} +0x020 FRUId : _GUID {00000000-0000-0000-0000-000000000000} +0x030 SectionSeverity : 1 ( WheaErrSevFatal ) +0x034 FRUText : [20] "" done 0: kd> dd (0x85045028 + 0xd0) L 86 850450f8 74507245 00000000 00000218 00000000 85045108 00000100 00000000 00000000 00000000 85045118 00000000 00000001 00000000 00000000 85045128 00000000 00000002 00000002 00000000 85045138 0000017f 00000000 00030000 00000300 85045148 00020f12 00000000 00000000 00000000 85045158 00000000 00000000 00000000 00000000 85045168 00000000 00000000 00000000 00000000 85045178 00000000 00000000 00000000 00000000 85045188 00000000 00000000 00000000 00000000 85045198 00000000 00000000 00000000 00000000 850451a8 00000000 00000000 00000000 00000000 850451b8 00000000 00000000 00000000 00000000 850451c8 00000000 00000000 00000002 00000000 850451d8 00000000 00000000 00000000 00000000 850451e8 00000000 00000000 00000000 00000000 850451f8 00000000 00000000 00000000 00000000 85045208 00000001 00000000 00000001 00000001 85045218 1d68ff38 01c73fd6 00000002 00000000 85045228 00000004 00000000 00070f0f b2000000 85045238 00000000 00000000 00000000 00000000 85045248 00000000 00000000 00000000 00000000 85045258 00000000 00000000 00000000 00000000 85045268 00000000 00000000 00000000 00000000 85045278 00000000 00000000 00000000 00000000 85045288 00000000 00000000 00000000 00000000 85045298 00000000 00000000 00000000 00000000 850452a8 00000000 00000000 00000000 00000000 850452b8 00000000 00000000 00000000 00000000 850452c8 00000000 00000000 00000000 00000000 850452d8 00000000 00000000 00000000 00000000 850452e8 00000000 00000000 00000000 00000000 850452f8 00000000 00000000 00000000 00000000 85045308 00000000 00000000 --- Questions? First check the Kernel Driver FAQ at http://www.osronline.com/article.cfm?id=256 To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer
  Message 3 of 3  
25 Jan 07 06:10
ntdev member 33000
xxxxxx@level5networks.com
Join Date:
Posts To This List: 10
Re: Bugcheck: WHEA_UNCORRECTABLE_ERROR

Ian Service wrote: >Hi Martin, I asked one of the developers who works on this if he could >help. Here is what he said. > > > Ian, Many thanks for your colleagues time and that information - that's enough information to get me started down a fruitful line of investigation. It's probably a bus analyzer and a fair amount of head scratching from here on in! MH.
Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You must login to OSR Online AND be a member of the ntdev list to be able to post.

All times are GMT -5. The time now is 14:00.


Copyright ©2005, OSR Open Systems Resourcs, Inc.
Based on vBulletin Copyright ©2000 - 2005, Jelsoft Enterprises Ltd.
Modified under license