OSRLogo
OSRLogoOSRLogoOSRLogo x Seminar Ad
OSRLogo
x

Everything Windows Driver Development

x
x
x
GoToHomePage xLoginx
 
 

    Thu, 14 Mar 2019     118020 members

   Login
   Join


 
 
Contents
  Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum
  Express Links
  · The NT Insider Digital Edition - May-June 2016 Now Available!
  · Windows 8.1 Update: VS Express Now Supported
  · HCK Client install on Windows N versions
  · There's a WDFSTRING?
  · When CAN You Call WdfIoQueueP...ously

Analyze This - Analyzing a Crash Dump

 

One of our students taking our Kernel Debugging class recently brought in an excellent crash dump that demonstrates what we suspect is a multi-processor race condition in Windows NT 4.0.  In this article we will demonstrate our analysis of the crash dump with an eye towards assisting our readers in analyzing their own crash dumps.

 

The key to any analysis is, of course, ensuring that you are using the right tools for the job.  In analyzing this crash dump we used both WinDBG (Build 2127.1 – the version provided with the Windows 2000 RC2 DDK) and i386kd (again, the version from the Windows 2000 RC2 DDK).  While we normally use WinDBG, because of what appear to be some temporary development issues we had to also use i386kd.  As of the publication of this article, the current version of WinDBG (from the final release) is considerably more stable and we are expecting to see a new version to be demonstrated at WinHEC 2000.

 

Background

The crash dump was obtained from a quad processor system running Windows NT 4.0 SP5 (yes, you can use the Windows 2000 tools to debug on Windows NT 4.0) while running the latest version of the Microsoft Hardware Compatibility Tests (HCTs).  The hardware platform worked with previous versions of the HCTs, but the latest versions exhibited a mysterious PAGE_FAULT_IN_NONPAGED_AREA error while running the tests.

 

Of course, the initial concern was that this was uncovering some hardware bug (which is always a possibility when running the HCT tests).  Then, the HCT tests came under fire. When we started looking at it, we noted that the system crashed inside of the Windows NT operating system – no drivers were involved at all.  Thus, we argued that it could not be the fault of the HCTs, because an application should never be able to crash the operating system.

 

Analyzing the Stop Code

A stop code of 0x50 (PAGE_FAULT_IN_NONPAGED _AREA) is actually one of the most common stop codes that we observe with Windows NT systems, and thus we’re quite familiar with analyzing it.  This stop code occurs as a result of a page fault against an address within the system address space (normally between 0x80000000 and 0xFFFFFFFF) when that range of addresses does not, in fact, support paging (Most system addresses do not.  Paged pool, of course does support paging and a page fault for a paged pool address would cause the system to retrieve the page from the paging file – assuming it is in the paging file).

 

Thus, this stop code occurs because some part of the system has accessed an invalid memory location, whether because it used an uninitialized data pointer, or a pointer to memory that has since been freed.  Regardless, the Memory Manager considers this to be a critical error and it halts the system.

 

In this case, the four parameters tell us quite a lot about why the system crashed.  The first parameter indicates the virtual address that was being accessed and the second parameter indicates whether or not the address was being read (a value of zero) or written (a value of one).  The meaning of the other two parameters isn’t generally useful in Windows NT 4.0 (we note that these values have changed meanings in Windows 2000, providing better information to simplify debugging).

 

In this case, the stop code was:

 

STOP: PAGE_FAULT_IN_NONPAGED_AREA (af3defc4, 0, 0, 0)

 

Thus, an attempt to access address af3defc4 failed.  The address is “allowed” but isn’t one we normally see in use, but that’s probably because this system had 1GB of physical memory, which is also unusual (it yielded a large crash dump file, though!).  Thus, this is most likely some sort of programming bug – whether because something is using an uninitialized memory location or it is using a block of memory that has recently been freed, or possibly some other programming problem.

 

Analyzing the Stack

Once we’ve determined that this is probably some programming bug, we start by looking at the stack that declared the halt.  On a multi-processor system this isn’t simple, since the halt might not have occurred on CPU 0 – but of course the debuggers will start using the CPU 0 information as the default.

 

Thus, we normally start by looking at the stack for each processor in an attempt to identify which processor called KeBugCheckEx.  In this case, we obtained the information shown in Figure 1.

 

 

0: kd> kv

cannot get version packet on a crash dumpcannot get version packet on a crash dumpChildEBP RetAddr  Args to Child

f766ce14                 80003e47                80153f7c                 00000000                00000000                ntkrnlmp!KeWaitForSingleObject+0x9a(FPO: [Non-Fpo]

f766ce34                 8019ace2                7ffde000                 77fa5560                 00000000                halmps!ExAcquireFastMutex+0x2b (FPO: [0,2,0])

f766ce4c                 8019aba1                00000001                b980ae08                b980ae58                ntkrnlmp!PspExitProcess+0x8c(FPO: [Non-Fpo]

f766ced0                8019a53c                00000000                f766cf04                 0006fea4                 ntkrnlmp!PspExitThread+0x447(FPO: [Non-Fpo]

f766cef4                 80140da9                ffffffff                     00000000                00000000                ntkrnlmp!NtTerminateProcess+0x13c(FPO: [Non-Fpo]

f766cef4                 77f681ff                  ffffffff                     00000000                00000000                ntkrnlmp!KiSystemService+0xc9 (FPO: [0,0] TrapFrame @ f766cf04)

f766cdf4                 80153f70                 b980aea4                00000000                00000000                0x77f681ff [Stdcall: 257]

0006ff5c                 00000000                00000000                00000000                00000000               ntkrnlmp!PspActiveProcessMutex(FPO: [Non-Fpo]

0: kd> ~1

1: kd> kv

dumpChildEBP RetAddr  Args to Child

f7b9ab24                80143e8f                 00000000                af3defc4                 00000000                ntkrnlmp!MmAccessFault+0x29a(FPO: [Non-Fpo]

f7b9ab24                8015c925                00000000                af3defc4                 00000000                ntkrnlmp!KiTrap0E+0xc7 (FPO: [0,0] TrapFrame @ f7b9ab3c)

f7b9abb8                8015481a                f7abeca0                b980ae08                00010000                ntkrnlmp!ExpCopyProcessInfo+0x11 (FPO: [2,0,3])

f7b9ac38                8015b811                00b40000                00010000                f7b9aec8                ntkrnlmp!ExpGetProcessInformation+0x156(FPO: [Non-Fpo]

f7b9aeec                80140da9                00000005                00b40000                00010000                ntkrnlmp!NtQuerySystemInformation+0x725(FPO: [Non-Fpo]

f7b9aeec                77f67e27                 00000005                00b40000                00010000                ntkrnlmp!KiSystemService+0xc9 (FPO: [0,0] TrapFrame @ f7b9af04)

f7b9abac                b2ec4ff0                 f7abeca0                b2ec4e58                8015481a                0x77f67e27 [Stdcall: 257]

00b3fabc                00000000                00000000                00000000                00000000                0xffffffff`b2ec4ff0 [Stdcall: 257]

 

 

Figure 1 — Check Out Each Processor for Call to KeBugCheckEx

 

 

From this, then, we couldn’t actually tell which CPU had caused the halt (although we suspected it was CPU 1 – where the page fault occurred).  Thus, we turned to the OEM Support Tools KD extension to give us a bit more stack information.  We found that the stack for CPU 1 had called KeBugCheckEx (shown in Figure 2).

 

 

> !b.stack

T. Address  RetAddr  Called Procedure

*1 F7B9AAD0      8012E67A _KeBugCheckEx@20(00000050, AF3DEFC4, 00000000,...);

*0 F7B9AAFC      80118AE8 @KiFlushSingleTb@8(F7B9AB38, 801450C1, 80118AE8,...);

*0 F7B9AB04        801450C1 @FxsrSwapContextNotify@8(80118AE8, 80118AE8, 8011BB44,...);

*0 F7B9AB08        80118AE8 @KiFlushSingleTb@8(80118AE8, 8011BB44, 00000000,...);

*0 F7B9AB0C       80118AE8 @KiFlushSingleTb@8(8011BB44, 00000000, BC442E08,...);

*0 F7B9AB10        8011BB44 dword ptr EAX(00000000, BC442E08, FFFFF000,...);

*1 F7B9AB28        80143E8F _MmAccessFault@16(00000000, AF3DEFC4, 00000000,...);

*1 F7B9AB40        800031DA _KiIpiServiceRoutine@8(F7B9AB54, 800031E0, 0001001C,...);

*0 F7B9AB48        800031E0 _HalEndSystemInterrupt@8(0001001C, 000000E1, 00000010,...);

*0 F7B9AB64        80120DEB _MmMapLockedPagesSpecifyCache@24(00006C8E, 00000000, AC900023,...);

*1 F7B9ABBC      8015481A _ExpCopyProcessInfo@8(F7ABECA0, B980AE08, 00010000,...);

*1 F7B9AC3C       8015B811 _ExpGetProcessInformation@12(00B40000, 00010000, F7B9AEC8,...);

*0 F7B9AC58        F7C363A8 _NbtDereferenceDevice@4(B70D2E78, 80E6964C, 80E69528,...);

*1 F7B9AC74        801128AF dword ptr [ECX+EAX*4+38](B70D2E78, 80E69528, 0000004A,...);

*1 F7B9AC88        F7B49BBB @IofCallDriver@8(F7B9000E, 80E01279, 80E69400,...);

*1 F7B9ACAC      8012DF3E @KfReleaseSpinLock@8(F7B9ACDC, ABC8C008, C02AF230,...);

*1 F7B9ACC0       8012D140 @MiChargeCommitmentCantExpand@8(BCA7EFBC, 80150F30, 00000100,...);

*1 F7B9ACE0       8010A8BC _MmAllocateSpecialPool@12(00000100, 7366704E, 00000000,...);

*1 F7B9AD10       801134E1 @KfReleaseSpinLock@8(EBC40937, 00000000, EBC40938,...);

*1 F7B9AD14       EBC40937 _IoReleaseCancelSpinLock@4(00000000, EBC40938, A5332F00,...);

*1 F7B9AD5C       EBC4624B _NpAddDataQueueEntry@24(801096D9, F7B9ADC0, A5332F00,...);

*0 F7B9AD60       801096D9 @KfReleaseSpinLock@8(F7B9ADC0, A5332F00, F7B9ADE8,...);

*0 F7B9ADA8      8012DDE8 @KfReleaseSpinLock@8(00000000, A8E9EFFC, C4000010,...);

*0 F7B9ADD0      8012DC15 @KfReleaseSpinLock@8(F7B9AE34, F7B9AE34, 00000000,...);

*1 F7B9ADE8       80131164 @MiInsertNode@8(00B4FFFF, 00B40000, C4000010,...);

*1 F7B9AE38        80181E56 _MiInsertVad@4(80181E9B, F7B9AF04, 00B3FA3C,...);

*0 F7B9AE3C       80181E9B @ExReleaseFastMutex@4(F7B9AF04, 00B3FA3C, 801813DE,...);

*0 F7B9AE84        80139804 @KfReleaseSpinLock@8(00000004, BC8EEFD4, 00010000,...);

*1 F7B9AEF0        80140DA9 dword ptr EBX(00000005, 00B40000, 00010000,...);

 

 

Figure 2 — Stack for CPU-1 Using KD Extension

 

 

Note that we can observe the KeBugCheckEx call  and for this function, if it is present on the stack, even in a “ghost” stack frame, it must have been called.  After all, this is not a function that returns to the caller!

 

Just a side note: if you aren’t using the OEM Support Tools package in your debugging, you’ve left a very powerful tool out of your toolbox.  We’ve been using it for a few years now (version 3.0 was released in March) and around OSR we swear by it. There is a version (Version 2.0) included in the Windows 2000 final release, but the newer version (V3) is available from Microsoft’s website (V3 with symbols currently at http://download.microsoft.com/download/win2000srv/Utility/3.0/NT45/EN-US/oem3sr0s.zip...of course this will change).

 

Thus, we have the page fault that actually triggered the operation that caused the termination of the system.  Note the KiTrap0E on the stack – that is the page fault handler function within the kernel, because Trap 14 (0x0E) is the page fault on the IA32 CPU occurred. It occurred in the ExpCopyProcessInfo function.

 

This function, in turn, was invoked from function ExpGetProcessInformation.  Unfortunately, we don’t have any source of information about the function ExpCopyProcessInfo (or ExpGetProcessInformation for that matter) although there is some (non-Microsoft and hence of suspect quality) information about the function NtQuerySystemInformation.  However, based upon the name of ExpCopyProcessInfo we can guess that it is attempting to copy process data from an EPROCESS structure into a buffer.  Thus, we probed the arguments to determine if one was in fact an EPROCESS structure.  It turns out that the second parameter was, in fact an EPROCESS (Figure 3).

 

 

1: kd> !process b980ae08

!process b980ae08

PROCESS b980ae08  Cid: 0120    Peb: 7ffdf000  ParentCid: 007c

    DirBase: 08c6f000  ObjectTable: 00000000  TableSize:   0.

    Image: cgiapp.exe

    VadRoot a856cfc8 Clone 0 Private 30. Modified 0. Locked 0.

    B980AFC4 MutantState Signalled OwningThread 0

    Process Lock Owned by Thread   bf6b6dc0

    Token                                                 b0834eb0

    ElapsedTime                                     0:00:00.0500

    UserTime                                           0:00:00.0015

    KernelTime                                        0:00:00.0015

    QuotaPoolUsage[PagedPool]         3713

    QuotaPoolUsage[NonPagedPool] 832

    Working Set Sizes (now,min,max)  (145, 50, 345) (580KB, 200KB, 1380KB)

    PeakWorkingSetSize                       167

    VirtualSize                                         4 Mb

    PeakVirtualSize                                 9 Mb

    PageFaultCount                               164

    MemoryPriority                                BACKGROUND

    BasePriority                                      8

    CommitCharge                                  36

 

        THREAD bf6b6dc0  Cid 120.78  Teb: 00000000  Win32Thread: 00000000 RUNNING

        Not impersonating

        Owning Process b980ae08

        WaitTime (seconds)      338550

        Context Switch Count    53

        UserTime                                       0:00:00.0000

        KernelTime                                    0:00:00.0015

        Start Address 0x77f0528c

        Win32 Start Address 0x01001150

        Stack Init f766d000 Current f766cc80 Base f766d000 Limit f766a000 Call 0

        Priority 16 BasePriority 8 PriorityDecrement 0 DecrementCount 0

        ChildEBP       RetAddr                 Args to Child

        f766ce14         80003e47                80153f7c                 00000000                00000000                ntkrnlmp!KeWaitForSingleObject+0x9a

        f766ce34         8019ace2                7ffde000                 77fa5560                 00000000                halmps!ExAcquireFastMutex+0x2b

        f766ce4c         8019aba1                00000001                b980ae08                b980ae58                ntkrnlmp!PspExitProcess+0x8c

        f766ced0        8019a53c                00000000                f766cf04                 0006fea4                 ntkrnlmp!PspExitThread+0x447

        f766cef4         80140da9                ffffffff                     00000000                00000000                ntkrnlmp!NtTerminateProcess+0x13c

        f766cef4         77f681ff                  ffffffff                     00000000                00000000                ntkrnlmp!KiSystemService+0xc9

        f766cdf4         80153f70                 b980aea4                00000000                00000000                +0x77f681ff

        0006ff5c         00000000                00000000                00000000                00000000                ntkrnlmp!PspActiveProcessMutex

 

 

 

Figure 3 — Stack of Process Exiting...Suspicions Raised

 

 

The stack trace in Figure 3 was most interesting because the process is exiting.  From this we began to suspect that we might be observing an interesting bug – one process is gathering information about a second process, and the second process is terminating.  The two threads are running simultaneously – one on CPU 0, the other on CPU1.

 

The thread on CPU0 (the thread in the terminating process) is entering a wait condition.  It has not yet dispatched (so it is still running) but it has encountered an owned mutex and is going to wait for that mutex (we can determine this because of the call to ExAcquireFastMutex).

 

Alas, this does not conclusively demonstrate a bug, but it certainly raised our suspicions.  We decided it was time to turn our attention to the faulting thread – running on CPU 1.

 

Interpreting the Trap Frame

We decided to track back through the code for the faulting thread.  To accomplish this we used the trap frame information on the stack.  In this case, it was simple to find because the debugger detected and reported the location of the trap frame to us.  Had the debugger not told us where the trap frame was located, we would have looked for it (manually) on the stack.  On the IA32 platform running Windows NT the values of the DS and ES segment registers contain the value 0x23 and thus we can identify the location of the trap frame by looking for these values (the DS segment register is stored 0x34 bytes from the beginning of the trap frame).  This technique is actually described by Microsoft in Knowledge Base Article # Q159672.

 

The trap frame tells us what the values of the registers were at the time of the page fault.  From this information, we can then work backwards to try figuring out what the code was actually doing at the time the system crashed.  In this case the function we need to analyze had just been called – and this makes it easy for us to figure out what was going on.  Thus, using the debugger, we generated a listing of the assembly code for this function (shown in Figure 4).

 

 

> !trap f7b9ab3c

eax=af3defb0         ebx=b2ec4e58       ecx=00005d28        edx=00000481 esi=f7abeca0 edi=b980ae08

eip=8015c925        esp=f7b9abb0       ebp=f7b9ac38       iopl=0         nv up ei ng nz na pe nc

cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010282

ErrCode = 00000000

8015C925  8B4014           mov         eax,dword ptr [eax+14h]        

 

               

 

Figure 4 — Let’s Check Out the Assembly Code!

 

 

The trap occurred at address 0xf7b9ab3c while attempting to access 0xaf3defb0 (this is the value in the EAX register in this instance)  Thus, the instruction:

 

8015C925  8B4014           mov         eax,dword ptr [eax+14h]                          

 

is attempting to retrieve some value in memory.  Working backwards from this, we try to determine where this code segment came up with this particular value.  Figure 5 shows the disassembly from the beginning of the current function (ExpCopyProcessInfo).

 

 

> u 8015c914

NT!_ExpCopyProcessInfo@8+0x0:

8015C914  53                         push        ebx                          

8015C915  56                         push        esi                          

8015C916  57                         push        edi                          

8015C917  8B7C2414            mov         edi,dword ptr [esp+14h]                          

8015C91B  8B8704010000    mov         eax,dword ptr [edi+104h]                          

8015C921  85C0                     test        eax,eax                          

8015C923  740C                     je             _ExpCopyProcessInfo@8+1Dh                          

8015C925  8B4014                 mov        eax,dword ptr [eax+14h]

 

 

 

Figure 5 — One Step Backwards...Disassembly from ExpCopyProcessInfo

 

 

Normally, when presented with a crash such as this one, we will attempt to work backwards from the current register values, following the trail of information back to see if we can determine what the problem was.

 

In this case, the contents of the EAX register came by using the address 0x104 bytes from the address contained in the EDI register.  This would most likely be a dereference of some field within a data structure.  That “data structure address” in turn was extracted from the stack (the stack pointer is ESP) – notably 0x14 bytes from the current stack pointer.  Since the previous three instructions pushed three values onto the stack and the function return address is also stored there, we note that this looks to be referencing parameter two (with parameter 1 at 0x10 from the current stack pointer).  Oh, don’t forget that stacks grow down so arguments above the current stack address (at a positive offset) are values on the stack.

 

Since we noted earlier that parameter two is the EPROCESS, we believe this is consistent – that we are attempting to load some information from the EPROCESS structure.  Thus, our next question becomes: what is located in the EPROCESS at offset 0x104.  We use the “!strct” command (from kdex2x86) to display the format of the EPROCESS structure (See Figure 6).

 

> !strct eprocess

Structure EPROCESS (Size:0x1f8) member offsets:

+0000    Pcb(KPROCESS struct)

+0000      Header(DISPATCHER_HEADER struct)

+0010      ProfileListHead(LIST_ENTRY struct)

+0018      DirectoryTableBase

+0020      LdtDescriptor(KGDTENTRY struct)

+0028      Int21Descriptor(KIDTENTRY struct)

+0030      IopmOffset

+0032      Iopl

+0033      VdmFlag

+0034      ActiveProcessors

+0038      KernelTime

+003c      UserTime

+0040      ReadyListHead(LIST_ENTRY struct)

+0048      SwapListEntry(LIST_ENTRY struct)

+0050      ThreadListHead(LIST_ENTRY struct)

+0058      ProcessLock

+005c      Affinity

+0060      StackCount

+0062      BasePriority

+0063      ThreadQuantum

+0064      AutoAlignment

+0065      State

+0066      ThreadSeed

+0067      DisableBoost

+0068    ExitStatus

+006c    LockEvent(KEVENT struct)

+006c      Header(DISPATCHER_HEADER struct)

+007c    LockCount

+0080    CreateTime

+0088    ExitTime

+0090    LockOwner

+0094    UniqueProcessId

+0098    ActiveProcessLinks(LIST_ENTRY struct)

+0098      Flink

+009c      Blink

+00a0    QuotaPeakPoolUsage

+00a8    QuotaPoolUsage

+00b0    PagefileUsage

+00b4    CommitCharge

+00b8    PeakPagefileUsage

+00bc    PeakVirtualSize

+00c0    VirtualSize

+00c8    Vm(MMSUPPORT struct)

+00c8      LastTrimTime

+00d0      LastTrimFaultCount

+00d4      PageFaultCount

+00d8      PeakWorkingSetSize

+00dc      WorkingSetSize

+00e0      MinimumWorkingSetSize

+00e4      MaximumWorkingSetSize

+00e8      VmWorkingSetList

+00ec      WorkingSetExpansionLinks(LIST_ENTRY struct)

+00f4      AllowWorkingSetAdjustment

+00f5      AddressSpaceBeingDeleted

+00f6      ForegroundSwitchCount

+00f7      MemoryPriority

+00f8    LastProtoPteFault

+00fc    DebugPort

+0100    ExceptionPort

+0104    ObjectTable

+0108    Token

+010c    WorkingSetLock(FAST_MUTEX struct)

+010c      Count

+0110      Owner

+0114      Contention

+0118      Event(KEVENT struct)

+0128      OldIrql

+012c    WorkingSetPage

+0130    ProcessOutswapEnabled

+0131    ProcessOutswapped

+0132    AddressSpaceInitialized

+0133    AddressSpaceDeleted

+0134    AddressCreationLock(FAST_MUTEX struct)

+0134      Count

+0138      Owner

+013c      Contention

+0140      Event(KEVENT struct)

+0150      OldIrql

+0154    HyperSpaceLock

+0158    ForkInProgress

+015c    VmOperation

+015e    ForkWasSuccessful

+015f    MmAgressiveWsTrimMask

+0160    VmOperationEvent

+0164    PageDirectoryPte(HARDWARE_PTE struct)

+0164      Valid

+0164      Write

+0164      Owner

+0164      WriteThrough

+0164      CacheDisable

+0164      Accessed

+0164      Dirty

+0164      LargePage

+0164      Global

+0164      CopyOnWrite

+0164      Prototype

+0164      reserved

+0164      PageFrameNumber

+0168    LastFaultCount

+016c    ModifiedPageCount

+0170    VadRoot

+0174    VadHint

+0178    CloneRoot

+017c    NumberOfPrivatePages

+0180    NumberOfLockedPages

+0184    NextPageColor

+0186    ExitProcessCalled

+0187    CreateProcessReported

+0188    SectionHandle

+018c    Peb

+0190    SectionBaseAddress

+0194    QuotaBlock

+0198    LastThreadExitStatus

+019c    WorkingSetWatch

+01a0    Win32WindowStation

+01a4    InheritedFromUniqueProcessId

+01a8    GrantedAccess

+01ac    DefaultHardErrorProcessing

+01b0    LdtInformation

+01b4    VadFreeHint

+01b8    VdmObjects

+01bc    ProcessMutant(KMUTANT struct)

+01bc      Header(DISPATCHER_HEADER struct)

+01cc      MutantListEntry(LIST_ENTRY struct)

+01d4      OwnerThread

+01d8      Abandoned

+01d9      ApcDisable

+01dc    ImageFileName

+01ec    VmTrimFaultValue

+01f0    SetTimerResolution

+01f1    PriorityClass

+01f2    SubSystemMinorVersion

+01f3    SubSystemMajorVersion

+01f2    SubSystemVersion

+01f4    Win32Process

 

 

> * esp+14 looks like Param2

> * eax is (esp+14)->(104)

> * Test for null

> * eax = *(eax+14)

 

Figure 6 — Using !strct to Reveal the Format of the EPROCESS Structure

 

 

Note offset 0x104 – the ObjectTable. Looking back at the code disassembly, we note that after loading this value into memory it is tested to ensure that it is not a NULL pointer:

 

8015C921  85C0             test        eax,eax                          

8015C923  740C             je          _ExpCopyProcessInfo@8+1Dh                          

 

Since we are executing the instruction following the “je” we know that the test succeeded and we have a non-NULL value.  Let us compare this result with the current contents of the data in memory.  We accomplish this by dumping the contents of the EPROCESS structure using kdex2x86 (See Figure 7).

 

0: kd> !strct eprocess B980Ae08

Structure EPROCESS (Size:0x1f8) at 0xb980ae08:

+0000    Pcb(KPROCESS struct)

+0000      Header(DISPATCHER_HEADER struct)

+0010      ProfileListHead(LIST_ENTRY struct)

+0018      DirectoryTableBase =   08c6f000 21570000

+0020      LdtDescriptor(KGDTENTRY struct)

+0028      Int21Descriptor(KIDTENTRY struct)

+0030      IopmOffset =           20ad

+0032      Iopl =                 00

+0033      VdmFlag =              00

+0034      ActiveProcessors =     00000001

+0038      KernelTime =           00000001

+003c      UserTime =             00000001

+0040      ReadyListHead(LIST_ENTRY struct)

+0048      SwapListEntry(LIST_ENTRY struct)

+0050      ThreadListHead(LIST_ENTRY struct)

+0058      ProcessLock =          00000000

+005c      Affinity =             0000000f

+0060      StackCount =           0001

+0062      BasePriority =         08

+0063      ThreadQuantum =        24

+0064      AutoAlignment =        00

+0065      State =                00

+0066      ThreadSeed =           54

+0067      DisableBoost =         00

+0068    ExitStatus(NTSTATUS) = 0(STATUS_SUCCESS)

+006c    LockEvent(KEVENT struct)

+006c      Header(DISPATCHER_HEADER struct)

+007c    LockCount =            00000000

+0080    CreateTime(LARGE_INTEGER/ULARGE_INTEGER union) = following

+0080      None(Anonymous struct) = following

+0088    ExitTime(LARGE_INTEGER/ULARGE_INTEGER union) = following

+0088      None(Anonymous struct) = following

+0090    LockOwner =            BF6B6DC0 (-> PKTHREAD)

+0094    UniqueProcessId =      00000120 (-> HANDLE)

+0098    ActiveProcessLinks(LIST_ENTRY struct)

+0098      Flink =                BF2E4EA0 (-> PLIST_ENTRY)

+009c      Blink =                B2EC4EA0 (-> PLIST_ENTRY)

+00a0    QuotaPeakPoolUsage =   00000460 00002938

+00a8    QuotaPoolUsage =       00000340 00000e81

+00b0    PagefileUsage =        00000024

+00b4    CommitCharge =         00000024

+00b8    PeakPagefileUsage =    0000003a

+00bc    PeakVirtualSize =      00905000

+00c0    VirtualSize =          004e5000

+00c8    Vm(MMSUPPORT struct)

+00c8      LastTrimTime(LARGE_INTEGER/ULARGE_INTEGER union) = following

+00d0      LastTrimFaultCount =   000000a2

+00d4      PageFaultCount =       000000a4

+00d8      PeakWorkingSetSize =   000000a7

+00dc      WorkingSetSize =       00000091

+00e0      MinimumWorkingSetSize = 00000032

+00e4      MaximumWorkingSetSize = 00000159

+00e8      VmWorkingSetList =     C0502000 (-> PMMWSL)

+00ec      WorkingSetExpansionLinks(LIST_ENTRY struct)

+00f4      AllowWorkingSetAdjustment = 01

+00f5      AddressSpaceBeingDeleted = 00

+00f6      ForegroundSwitchCount = 00

+00f7      MemoryPriority =       00

+00f8    LastProtoPteFault =    00000000

+00fc    DebugPort =            00000000

+0100    ExceptionPort =        b3030f68

+0104    ObjectTable =          00000000 (-> PHANDLE_TABLE)

+0108    Token =                B0834EB0 (-> PACCESS_TOKEN)

+010c    WorkingSetLock(FAST_MUTEX struct)

+010c      Count =                00000001

+0110      Owner =                00000000 (-> PKTHREAD)

+0114      Contention =           00000000

+0118      Event(KEVENT struct)

+0128      OldIrql =              0000003d

+012c    WorkingSetPage =       0002ec71

+0130    ProcessOutswapEnabled = 00

+0131    ProcessOutswapped =    00

+0132    AddressSpaceInitialized = 01

+0133    AddressSpaceDeleted =  00

+0134    AddressCreationLock(FAST_MUTEX struct)

+0134      Count =                00000001

+0138      Owner =                00000000 (-> PKTHREAD)

+013c      Contention =           00000000

+0140      Event(KEVENT struct)

+0150      OldIrql =              00000000

+0154    HyperSpaceLock =       00000000

+0158    ForkInProgress =       00000000 (-> PETHREAD)

+015c    VmOperation =          0000

+015e    ForkWasSuccessful =    00

+015f    MmAgressiveWsTrimMask = 00

+0160    VmOperationEvent =     00000000 (-> PKEVENT)

+0164    PageDirectoryPte(HARDWARE_PTE struct)

+0168    LastFaultCount =       00000000

+016c    ModifiedPageCount =    00000000

+0170    VadRoot =              a856cfc8

+0174    VadHint =              a856cfc8

+0178    CloneRoot =            00000000

+017c    NumberOfPrivatePages = 0000001e

+0180    NumberOfLockedPages =  00000000

+0184    NextPageColor =        5d24

+0186    ExitProcessCalled =    01

+0187    CreateProcessReported = 00

+0188    SectionHandle =        00000004 (-> HANDLE)

+018c    Peb =                  7FFDF000 (-> PPEB)

+0190    SectionBaseAddress =   01000000

+0194    QuotaBlock =           BDCEEFD0 (-> PEPROCESS_QUOTA_BLOCK)

+0198    LastThreadExitStatus(NTSTATUS) = 0(STATUS_SUCCESS)

+019c    WorkingSetWatch =      00000000 (-> PPAGEFAULT_HISTORY)

+01a0    Win32WindowStation =   00000000 (-> HANDLE)

+01a4    InheritedFromUniqueProcessId = 0000007C (-> HANDLE)

+01a8    GrantedAccess(ACCESS_MASK) = 1f0fff( STANDARD_RIGHTS_ALL )

+01ac    DefaultHardErrorProcessing = 00008000

+01b0    LdtInformation =       00000000

+01b4    VadFreeHint =          ba2cafc8

+01b8    VdmObjects =           00000000

+01bc    ProcessMutant(KMUTANT struct)

+01bc      Header(DISPATCHER_HEADER struct)

+01cc      MutantListEntry(LIST_ENTRY struct)

+01d4      OwnerThread =          00000000 (-> PKTHREAD)

+01d8      Abandoned =            00

+01d9      ApcDisable =           00

+01dc    ImageFileName =        cgiapp.exe......    63 67 69 61 70 70 2e 65 78 6

5 00 00 00 00 00 00

+01ec    VmTrimFaultValue =     00000000

+01f0    SetTimerResolution =   00

+01f1    PriorityClass =        02

+01f2    SubSystemMinorVersion = 00

+01f3    SubSystemMajorVersion = 04

+01f2    SubSystemVersion =     0400

+01f4    Win32Process =         00000000

 

Figure 7 — Compare With Data in Memory

 

 

*Note the value at offset 0x104 – it is NULL!

 

We terminated our analysis at this stage, believing that it was likely we had found a multiprocessor race condition within Windows NT.  Specifically, the object handle table had been deleted and deallocated at the same time a separate thread was attempting to dereference it.  We concluded this was sufficient analysis to report to Microsoft and that further study on our part would be inconclusive.

 

After class, one of the students who had access to the relevant NT source code advised us that the field being accessed was a count field in the object handle table.  He was unable to ascertain why the access to the field became invalid during use, but he confirmed our analysis.

 

This type of system level damage is common to MP race condition problems – where the problem occurs only under specific loads (such as running a new set of HCT tests with what may have been slightly different behavior characteristics) and ultimately lead not to direct analysis that demonstrates the problem but a system state that is inconsistent (such as this).

 

We do not know if Microsoft has accepted this problem as a legitimate bug or if it is resolved in subsequent versions of Windows NT or Windows 2000.  Perhaps one of our loyal readers has more information?

 

 

 

Related Articles
Enabling Debugging on the Local Machine for Windows XP®
You're Testing Me - Testing WDM/Win2K Drivers
More on Kernel Debugging - KMODE_EXCEPTION_NOT_HANDLED
Making WinDbg Your Friend - Creating Debugger Extensions
Life Support for WinDbg - New Windows NT Support Tools
Life After Death? - Understanding Blue Screens
All About Lint - PC Lint and Windows Drivers
Bagging Bugs — Avoidance and Detection Tips to Consider
Choose Your Weapon: Kernel Mode Debuggers - a Choice at Last
Wild Speculation -- Debugging Another Crash Dump

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

Post Your Comments.
Print this article.
Email this article.
bottom nav links