Jump-start your project by learning from devs who
write Windows drivers and file systems every day.
Take an OSR seminar!

OSR is Hiring! Click here to find out more.

Windows Internals & Software Drivers Lab, Santa Clara, CA 5-9 August, 2013
Kernel Debugging & Crash Analysis for Windows Lab, Santa Clara, CA 9-13 September, 2013
Upcoming OSR Seminars:
Writing WDF Drivers for Windows Lab, Boston, MA 7-11 October, 2013
Developing File Systems for Windows, Seattle, WA 5-8 November, 2013


Go Back   OSR Online Lists > ntdev
Welcome, Guest
You must login to post to this list
  Message 1 of 7  
02 Jul 12 16:36
QuasiCodo
xxxxxx@Yahoo.com
Join Date: 23 Dec 2002
Posts To This List: 52
BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

I need a little help on interpreting a BUGCHECK WHEA_UNCORRECTABLE_ERROR (124). If anyone could provide tips on how to interpret the WHEA_ERROR_RECORD, I would appreciate it. It appears to be some sort of PCIe protocol error. ((&-> Here is a copy of the bugcheck and dump of the WHEA_ERROR_RECORD: 2: kd> !analyze -v ERROR: FindPlugIns 8007007b ******************************************************************************* * * * Bugcheck Analysis * * * ******************************************************************************* WHEA_UNCORRECTABLE_ERROR (124) A fatal hardware error has occurred. Parameter 1 identifies the type of error source that reported the error. Parameter 2 holds the address of the WHEA_ERROR_RECORD structure that describes the error conditon. Arguments: Arg1: 0000000000000005, Generic Error Arg2: fffffa81391a5028, Address of the WHEA_ERROR_RECORD structure. Arg3: 0000000000000000 Arg4: 0000000000000000 Debugging Details: ------------------ BUGCHECK_STR: 0x124_5 DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT PROCESS_NAME: System CURRENT_IRQL: f STACK_TEXT: fffff880`0247c2b8 fffff800`01a21a3b : 00000000`00000124 00000000`00000005 fffffa81`391a5028 00000000`00000000 : nt!KeBugCheckEx fffff880`0247c2c0 fffff800`01be4b43 : 00000000`00000001 fffffa82`31e207b0 00000000`00000000 00000000`00000000 : hal!HalBugCheckSystem+0x1e3 fffff880`0247c300 fffff800`01a1b6be : fffffa81`00002ba0 fffffa81`3856cbf0 fffff880`0247c3f0 fffff800`01a39470 : nt!WheaReportHwError+0x263 fffff880`0247c360 fffff800`01b86c61 : fffff880`0247c530 00000000`00000001 00000000`00000001 00000000`00000001 : hal!HalHandleNMI+0x66 fffff880`0247c390 fffff800`01ad4502 : 00000000`00000001 00000000`00000000 00000000`00000000 00000000`00000002 : nt!KiProcessNMI+0x131 fffff880`0247c3f0 fffff800`01ad4363 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!KxNmiInterrupt+0x82 fffff880`0247c530 fffff880`05932c61 : fffff800`01adfcf9 00000000`001dba6f fffffa82`31e13450 00000000`00000000 : nt!KiNmiInterrupt+0x163 fffff880`02499c98 fffff800`01adfcf9 : 00000000`001dba6f fffffa82`31e13450 00000000`00000000 00000000`00000000 : intelppm!MWaitIdle+0x19 fffff880`02499ca0 fffff800`01acee9c : fffff880`02471180 fffff880`00000001 00000000`00000000 fffff880`00000000 : nt!PoIdle+0x52a fffff880`02499d80 00000000`00000000 : fffff880`0249a000 fffff880`02494000 fffff880`02499d40 00000000`00000000 : nt!KiIdleLoop+0x2c STACK_COMMAND: kb FOLLOWUP_NAME: MachineOwner MODULE_NAME: hardware IMAGE_NAME: hardware DEBUG_FLR_IMAGE_TIMESTAMP: 0 FAILURE_BUCKET_ID: X64_0x124_5_PCIEXPRESS BUCKET_ID: X64_0x124_5_PCIEXPRESS Followup: MachineOwner --------- 2: kd> dt _WHEA_ERROR_RECORD fffffa81391a5028 -b nt!_WHEA_ERROR_RECORD +0x000 Header : _WHEA_ERROR_RECORD_HEADER +0x000 Signature : 0x52455043 +0x004 Revision : _WHEA_REVISION +0x000 MinorRevision : 0x10 '' +0x001 MajorRevision : 0x2 '' +0x000 AsUSHORT : 0x210 +0x006 SignatureEnd : 0xffffffff +0x00a SectionCount : 1 +0x00c Severity : 1 ( WheaErrSevFatal ) +0x010 ValidBits : _WHEA_ERROR_RECORD_HEADER_VALIDBITS +0x000 PlatformId : 0y0 +0x000 Timestamp : 0y1 +0x000 PartitionId : 0y0 +0x000 Reserved : 0y00000000000000000000000000000 (0) +0x000 AsULONG : 2 +0x014 Length : 0x198 +0x018 Timestamp : _WHEA_TIMESTAMP +0x000 Seconds : 0y00000100 (0x4) +0x000 Minutes : 0y00100001 (0x21) +0x000 Hours : 0y00001101 (0xd) +0x000 Precise : 0y0 +0x000 Reserved : 0y0000000 (0) +0x000 Day : 0y00011110 (0x1e) +0x000 Month : 0y00000110 (0x6) +0x000 Year : 0y00001100 (0xc) +0x000 Century : 0y00010100 (0x14) +0x000 AsLARGE_INTEGER : _LARGE_INTEGER 0x140c061e`000d2104 +0x000 LowPart : 0xd2104 +0x004 HighPart : 0n336332318 +0x000 u : <unnamed-tag> +0x000 LowPart : 0xd2104 +0x004 HighPart : 0n336332318 +0x000 QuadPart : 0n1444536306398732548 +0x020 PlatformId : _GUID {00000000-0000-0000-0000-000000000000} +0x000 Data1 : 0 +0x004 Data2 : 0 +0x006 Data3 : 0 +0x008 Data4 : "" [00] 0 '' [01] 0 '' [02] 0 '' [03] 0 '' [04] 0 '' [05] 0 '' [06] 0 '' [07] 0 '' +0x030 PartitionId : _GUID {00000000-0000-0000-0000-000000000000} +0x000 Data1 : 0 +0x004 Data2 : 0 +0x006 Data3 : 0 +0x008 Data4 : "" [00] 0 '' [01] 0 '' [02] 0 '' [03] 0 '' [04] 0 '' [05] 0 '' [06] 0 '' [07] 0 '' +0x040 CreatorId : _GUID {cf07c4bd-b789-4e18-b3c4-1f732cb57131} +0x000 Data1 : 0xcf07c4bd +0x004 Data2 : 0xb789 +0x006 Data3 : 0x4e18 +0x008 Data4 : "???" [00] 0xb3 '' [01] 0xc4 '' [02] 0x1f '' [03] 0x73 's' [04] 0x2c ',' [05] 0xb5 '' [06] 0x71 'q' [07] 0x31 '1' +0x050 NotifyType : _GUID {3e62a467-ab40-409a-a698-f362d464b38f} +0x000 Data1 : 0x3e62a467 +0x004 Data2 : 0xab40 +0x006 Data3 : 0x409a +0x008 Data4 : "???" [00] 0xa6 '' [01] 0x98 '' [02] 0xf3 '' [03] 0x62 'b' [04] 0xd4 '' [05] 0x64 'd' [06] 0xb3 '' [07] 0x8f '' +0x060 RecordId : 0x1cd56b9`9e48c4c6 +0x068 Flags : _WHEA_ERROR_RECORD_HEADER_FLAGS +0x000 Recovered : 0y0 +0x000 PreviousError : 0y0 +0x000 Simulated : 0y0 +0x000 Reserved : 0y00000000000000000000000000000 (0) +0x000 AsULONG : 0 +0x06c PersistenceInfo : _WHEA_PERSISTENCE_INFO +0x000 Signature : 0y0101001001000101 (0x5245) +0x000 Length : 0y000000000000000000000000 (0) +0x000 Identifier : 0y0000000000000000 (0) +0x000 Attributes : 0y00 +0x000 DoNotLog : 0y0 +0x000 Reserved : 0y00000 (0) +0x000 AsULONGLONG : 0x5245 +0x074 Reserved : "" [00] 0 '' [01] 0 '' [02] 0 '' [03] 0 '' [04] 0 '' [05] 0 '' [06] 0 '' [07] 0 '' [08] 0 '' [09] 0 '' [10] 0 '' [11] 0 '' +0x080 SectionDescriptor : [00] _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR +0x000 SectionOffset : 0xc8 +0x004 SectionLength : 0xd0 +0x008 Revision : _WHEA_REVISION +0x000 MinorRevision : 0x1 '' +0x001 MajorRevision : 0x2 '' +0x000 AsUSHORT : 0x201 +0x00a ValidBits : _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR_VALIDBITS +0x000 FRUId : 0y0 +0x000 FRUText : 0y0 +0x000 Reserved : 0y000000 (0) +0x000 AsUCHAR : 0 '' +0x00b Reserved : 0 '' +0x00c Flags : _WHEA_ERROR_RECORD_SECTION_DESCRIPTOR_FLAGS +0x000 Primary : 0y1 +0x000 ContainmentWarning : 0y0 +0x000 Reset : 0y0 +0x000 ThresholdExceeded : 0y0 +0x000 ResourceNotAvailable : 0y0 +0x000 LatentError : 0y0 +0x000 Reserved : 0y00000000000000000000000000 (0) +0x000 AsULONG : 1 +0x010 SectionType : _GUID {d995e954-bbc1-430f-ad91-b44dcb3c6f35} +0x000 Data1 : 0xd995e954 +0x004 Data2 : 0xbbc1 +0x006 Data3 : 0x430f +0x008 Data4 : "???" [00] 0xad '' [01] 0x91 '' [02] 0xb4 '' [03] 0x4d 'M' [04] 0xcb '' [05] 0x3c '<' [06] 0x6f 'o' [07] 0x35 '5' +0x020 FRUId : _GUID {00000000-0000-0000-0000-000000000000} +0x000 Data1 : 0 +0x004 Data2 : 0 +0x006 Data3 : 0 +0x008 Data4 : "" [00] 0 '' [01] 0 '' [02] 0 '' [03] 0 '' [04] 0 '' [05] 0 '' [06] 0 '' [07] 0 '' +0x030 SectionSeverity : 1 ( WheaErrSevFatal ) +0x034 FRUText : "" [00] 0 '' [01] 0 '' [02] 0 '' [03] 0 '' [04] 0 '' [05] 0 '' [06] 0 '' [07] 0 '' [08] 0 '' [09] 0 '' [10] 0 '' [11] 0 '' [12] 0 '' [13] 0 '' [14] 0 '' [15] 0 '' [16] 0 '' [17] 0 '' [18] 0 '' [19] 0 ''
  Message 2 of 7  
02 Jul 12 16:57
Alex Grig
xxxxxx@broadcom.com
Join Date: 14 Apr 2008
Posts To This List: 1835
RE: BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

Try to use a PCIe analyzer.
  Message 3 of 7  
02 Jul 12 17:15
Calvin Guan
xxxxxx@gradovec.com
Join Date: 11 Oct 2009
Posts To This List: 380
Re: BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

WHEA (weird) is one of the most useless bugcheck. It says hey there is a little rat with wings somewhere in your 13 acre farm house, go find it... You start with a PCI analyzer to prove or disprove if the PCI dev being monitored had generated a NR error while the system is generate WHEA. Calvin On Mon, Jul 2, 2012 at 1:35 PM, QuasiCodo <xxxxx@yahoo.com> wrote: > I need a little help on interpreting a BUGCHECK WHEA_UNCORRECTABLE_ERROR > (124). If anyone could provide tips on how to interpret the > WHEA_ERROR_RECORD, I would appreciate it. It appears to be some sort of > PCIe protocol error. > > ((&-> > --
  Message 4 of 7  
02 Jul 12 19:18
QuasiCodo
xxxxxx@Yahoo.com
Join Date: 23 Dec 2002
Posts To This List: 52
Re: BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

Thanks, guys. I found that "!errrec <addr>" does a fair job at interpreting the WHEA error record. I was able to see which VEN_ID and DEV_ID caused the problem with the Command register, Status register and the Uncorrectable Error Status. Basically, the PLX bridge fell off the bus for some reason. The hardware guys are now investigating. I love the rat-with-wings-on-a-13-acre-farm analogy. That is too good :D The problem with this error is that it only happens once a year, so putting a leCroy PCIe analyzer on it is not really an option. Thx ((&-> On 7/2/2012 3:14 PM, Calvin Guan (news) wrote: > WHEA (weird) is one of the most useless bugcheck. It says hey there is a > little rat with wings somewhere in your 13 acre farm house, go find it... > > You start with a PCI analyzer to prove or disprove if the PCI dev being > monitored had generated a NR error while the system is generate WHEA. > > Calvin > > > On Mon, Jul 2, 2012 at 1:35 PM, QuasiCodo <xxxxx@yahoo.com <...excess quoted lines suppressed...>
  Message 5 of 7  
02 Jul 12 22:23
Calvin Guan
xxxxxx@gradovec.com
Join Date: 11 Oct 2009
Posts To This List: 380
Re: BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

Nice trick! I usually bounced through all PCI devices in the system starting from the suspected path if I was lucky enough that the system were still cooperating. Well, for problem happened once a year, it's hard to declare victory. In general, at least one negative and one positive are required to claim a valid fix. On Mon, Jul 2, 2012 at 4:17 PM, QuasiCodo <xxxxx@yahoo.com> wrote: > Thanks, guys. > > I found that "!errrec <addr>" does a fair job at interpreting the WHEA > error record. I was able to see which VEN_ID and DEV_ID caused the problem > with the Command register, Status register and the Uncorrectable Error > Status. Basically, the PLX bridge fell off the bus for some reason. The > hardware guys are now investigating. > > I love the rat-with-wings-on-a-13-acre-**farm analogy. That is too good > :D <...excess quoted lines suppressed...> --
  Message 6 of 7  
03 Jul 12 19:27
QuasiCodo
xxxxxx@Yahoo.com
Join Date: 23 Dec 2002
Posts To This List: 52
Re: BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

The victory is that it is not my problem any more -- let someone else sweat bullets for a change :D However, we have already figured it out. It turns out that the system's BIOS lied to us. The slot that is having problems is actually limited to 15W of power, while the BIOS reports that it will supply 25W of power. As a result, we consistently draw 25W of power on that slot. Most of the time, the cards in the system are not busy. However, every so often, all of the cards draw 25W on all of the PCIe slots and over tax the power supply. This causes a brown out on the slot which causes our device to issue a Surprise Down PCIe Uncorrectable Error. This of course causes Windows to bug check. ((&-> On 7/2/2012 8:22 PM, Calvin Guan (news) wrote: > Nice trick! I usually bounced through all PCI devices in the system > starting from the suspected path if I was lucky enough that the system > were still cooperating. > Well, for problem happened once a year, it's hard to declare victory. In > general, at least one negative and one positive are required to claim a > valid fix. > > On Mon, Jul 2, 2012 at 4:17 PM, QuasiCodo <xxxxx@yahoo.com > <mailto:xxxxx@yahoo.com>> wrote: > <...excess quoted lines suppressed...>
  Message 7 of 7  
04 Jul 12 01:05
Joseph M. Newcomer
xxxxxx@flounder.com
Join Date: 20 Nov 2008
Posts To This List: 1892
Re:BUGCHECK WHEA_UNCORRECTABLE_ERROR (124)

It's always nice when we can blame the hardware guys. I've had to do this several times (we once had a computer that would occasionally push the return address onto the stack, but then not call the subroutine. This same architecture would also scramble the contents of a register if a DMA grant happened during a rotate-bit operation). Stories like this help me set students' expectations. Although I am no longer actively teaching, I'm working on mentoring opportunities in local universities, but I have other distractions in my life right now that are keeping me very busy. Thank you for sharing this gem. joe > The victory is that it is not my problem any more -- let someone else > sweat bullets for a change :D > > However, we have already figured it out. It turns out that the system's > BIOS lied to us. The slot that is having problems is actually limited > to 15W of power, while the BIOS reports that it will supply 25W of > power. As a result, we consistently draw 25W of power on that slot. > Most of the time, the cards in the system are not busy. However, every > so often, all of the cards draw 25W on all of the PCIe slots and over > tax the power supply. This causes a brown out on the slot which causes <...excess quoted lines suppressed...>
Posting Rules  
You may not post new threads
You may not post replies
You may not post attachments
You must login to OSR Online AND be a member of the ntdev list to be able to post.

All times are GMT -5. The time now is 00:47.


Copyright ©2012, OSR Open Systems Resources, Inc.
Based on vBulletin Copyright ©2000 - 2005, Jelsoft Enterprises Ltd.
Modified under license