OSRLogo
OSRLogoOSRLogoOSRLogo x Seminar Ad
OSRLogo
x

Everything Windows Driver Development

x
x
x
GoToHomePage xLoginx
 
 

    Wed, 01 Oct 2014     104741 members

   Login
   Join


 
 
Contents
  About This Site
What's New?
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum
Driver Jobs
Store
  Express Links
  · The NT Insider Digital Edition - Sept-Oct 2014 Now Available!
  · Sept-Oct Issue of The NT Insider Released!
  · Writing WDF Drivers: Advanced Implementation Techniques
  · OSR Seminar Schedule
  · Windows 8.1 Update: VS Express Now Supported

You've Typed !analyze -v, Now What? -- Next Steps in Debugging


We often see people asking what to do when their driver doesn’t work – this question arises on an almost daily basis. While the actual setup of the debugger now seems to be reasonably well understood, the steps beyond that appear to be shrouded in mystery. This article will warn about common issues and delve into some of the basic steps that you should follow when debugging.

For the purposes of this article, we will assume that you either have a live system debugging set-up (and have setup the appropriate symbols - see Side Bar Accessing the Microsoft Symbol Server at conclusion of article) or you are looking at a post-mortem dump in the debugger. For this article we are using the current (as of this article) publicly available version of the Windows debugger (WinDBG) from Microsoft (6.3.17.0, May 27, 2004). If you are using a different version of the debugger, the details may vary from what we describe in this article.

The first place to start with any crash is the information provided using the command "!analyze –v". This command invokes the "analyze" routine embedded within the ext.dll library shipped with the current debugger. This library encapsulates a tremendous body of knowledge for the causes of most common failures – and each new version of the debugger reflects the continuous learning process at Microsoft on how to perform automated fault isolation.

For example, we looked through our archives and found a recent crash dump to describe the use of the analyze utility as the starting point of our investigation, see Figure 1.

kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
        use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
        use .trap on that value
Else
        .trap on the appropriate frame will show where the trap was taken
        (on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 00000008, EXCEPTION_DOUBLE_FAULT
Arg2: 80042000
Arg3: 00000000
Arg4: 00000000

Debugging Details:
------------------


BUGCHECK_STR:  0x7f_8

TSS:  00000028 -- (.tss 28)
eax=00030e01 ebx=812ca368 ecx=8117f300 edx=00000001 esi=8117f3ac edi=00010000
eip=baf3f9d7 esp=f5031fc0 ebp=f503205c iopl=0         nv up ei ng nz na po nc
cs=0008  ss=0010  ds=0023  es=0023  fs=0030  gs=0000             efl=00010286
atapi!IdeSendCommand+0x3:
baf3f9d7 53               push    ebx
Resetting default scope

DEFAULT_BUCKET_ID:  DRIVER_FAULT

LAST_CONTROL_TRANSFER:  from baf40776 to baf3f9d7

STACK_TEXT: 
f503201c baf40776 812ca368 8117f3ac 00010000 atapi!IdeSendCommand+0x3
f503205c baf4291c 812ca368 8117f3ac 812ca1c8 atapi!AtapiStartIo+0x22b
f5032088 80530f57 012ca030 00000002 0dd36002 atapi!IdeStartIoSynchronized+0x162
812ca030 00000000 812fdc18 00000000 00000000 nt!KeSynchronizeExecution+0x17


FOLLOWUP_IP:
atapi!IdeSendCommand+3
baf3f9d7 53               push    ebx

SYMBOL_STACK_INDEX:  0

FOLLOWUP_NAME:  MachineOwner

SYMBOL_NAME:  atapi!IdeSendCommand+3

MODULE_NAME:  atapi

IMAGE_NAME:  atapi.sys

DEBUG_FLR_IMAGE_TIMESTAMP:  3d6ddb04

STACK_COMMAND:  .tss 28 ; kb

BUCKET_ID:  0x7f_8_atapi!IdeSendCommand+3

Followup: MachineOwner
---------

Figure 1

Note that the output from !analyze -v does not tell us the specific cause of the crash, but it does give us a place to start looking. First, make sure to read the text that describes the bug check code. It is also worthwhile to read the online documentation with WinDBG in order to better understand the bug check code.

In this example, the bug check code (0x7F) indicates that this is a "kernel mode trap". Note that the example text indicates 8, which represents a double fault – exactly the case in our crash example (this is also a hint to us that this is probably a common case, since it was used as the example). The documentation here also refers us out to the Intel reference manual – these manuals are useful when debugging (on the x86) and can be found on the Intel web site (as of the writing of this article the Pentium 4 documentation can be found at http://developer.intel.com/design/Pentium4/documentation.htm). This includes information about the various types of traps supported by the x86 processor family. It confirms that trap 8 is the "double fault" trap.

The WinDBG documentation also provides further insight here:

0x00000008, or Double Fault, is when an exception occurs while trying to call the handler for a prior exception. Normally, the two exceptions can be handled serially. However, there are several exceptions that cannot be handled serially, and in this situation the processor signals a double fault. There are two common causes of a double fault:

  • A kernel stack overflow. This occurs when a guard page is hit, and then the kernel tries to push a trap frame. Since there is no stack left, a stack overflow results, causing the double fault. If you suspect this has occurred, use !thread to determine the stack limits, and then use kb (Display Stack Backtrace) with a large parameter (for example, kb 100) to display the full stack.
  • A hardware problem. This also points out an important attribute of analyzing any crash: understand the context in which the crash occurred. If this were a machine running your experimental driver, it would seem likely that a software bug, and not a hardware problem was the cause. That allows us to ignore the less likely cases.

Continuing on, let’s follow the advice of the documentation. So, in our sample crash we do as we’re told, see Figure 2.

kd> !thread
THREAD ffb6fda8  Cid 0314.0330  Teb: 7ffdb000 Win32Thread: e1785120 RUNNING on processor 0
IRP List:
    81662e28: (0006,01d8) Flags: 40000404  Mdl: 00000000
    81714f68: (0006,0094) Flags: 40000000  Mdl: 00000000
    81d1af68: (0006,0094) Flags: 40000000  Mdl: ffab9d78
    81d5af68: (0006,0094) Flags: 40000000  Mdl: ffabbde0
    81784f68: (0006,0094) Flags: 40000000  Mdl: ffabbe60
    81990f68: (0006,0094) Flags: 40000000  Mdl: ffabc6f8
    81760f68: (0006,0094) Flags: 40000900  Mdl: 811b3bd0
Not impersonating
DeviceMap                 e1001098
Owning Process            ffb750b0
Wait Start TickCount      36960          Elapsed Ticks: 0
Context Switch Count      251                 LargeStack
UserTime                  00:00:00.0100
KernelTime                00:00:00.0130
Start Address 0x77e7d342
Win32 Start Address 0x77f95b06
Stack Init f5035000 Current f50326f0 Base f5035000 Limit f5032000 Call 0
Priority 8 BasePriority 8 PriorityDecrement 0 DecrementCount 16
ChildEBP RetAddr  Args to Child             
805367e0 8052f283 0000007f 00000008 80042000 nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])
805367e0 baf3f9d7 0000007f 00000008 80042000 nt!KiTrap08+0x52 (FPO: TaskGate 28:0)
f503201c baf40776 812ca368 8117f3ac 00010000 atapi!IdeSendCommand+0x3 (FPO: [EBP 0xf503205c] [2,24,4])
f503205c baf4291c 812ca368 8117f3ac 812ca1c8 atapi!AtapiStartIo+0x22b (FPO: [Non-Fpo])
f5032088 80530f57 012ca030 00000002 0dd36002 atapi!IdeStartIoSynchronized+0x162 (FPO: [Non-Fpo])
812ca030 00000000 812fdc18 00000000 00000000 nt!KeSynchronizeExecution+0x17

Figure 2

We have emphasized the line that contains the actual stack limits. It is important to remember that successive addresses on the stack are lower than the previous limit, so that we should think of stacks as "growing down". Thus, the address of the stack base is greater than the limit of the stack.

If we look at the stack information provided by the !thread command we can also see that the stack addresses (along the far left edge the "ChildEBP" values) end with 0x805367e0 but a few lines below are 0xf503201c – the first address is not within the stack range for this thread (0xf5032000 to 0xf5035000) but that later address is – and just above the stack limit. This does seem to suggest some sort of stack overflow.

The debugger documentation suggests using kb with a large numeric value. We tried this, see Figure 3.

kd> kb 100

ChildEBP RetAddr  Args to Child             

805367e0 8052f283 0000007f 00000008 80042000 nt!KeBugCheckEx+0x19

805367e0 baf3f9d7 0000007f 00000008 80042000 nt!KiTrap08+0x52

f503201c baf40776 812ca368 8117f3ac 00010000 atapi!IdeSendCommand+0x3

f503205c baf4291c 812ca368 8117f3ac 812ca1c8 atapi!AtapiStartIo+0x22b

f5032088 80530f57 012ca030 00000002 0dd36002 atapi!IdeStartIoSynchronized+0x162

812ca030 00000000 812fdc18 00000000 00000000 nt!KeSynchronizeExecution+0x17

Figure 3

However, this doesn’t provide us with any more information than we had with the !thread command. At this point it would be easy to give up and say, "I don’t know. This documentation is wrong, the debugger is broken, and this is too hard!" DO NOT GIVE UP YET!

Note that we can still exploit the steps suggested by the debugger. We wandered off looking at the documentation, but the debugger suggested that we look at the results from the kv command, see Figure 4.

 

kd> kv

ChildEBP RetAddr  Args to Child             

805367e0 8052f283 0000007f 00000008 80042000 nt!KeBugCheckEx+0x19 (FPO: [Non-Fpo])

805367e0 baf3f9d7 0000007f 00000008 80042000 nt!KiTrap08+0x52 (FPO: TaskGate 28:0)

f503201c baf40776 812ca368 8117f3ac 00010000 atapi!IdeSendCommand+0x3 (FPO: [EBP 0xf503205c] [2,24,4])

f503205c baf4291c 812ca368 8117f3ac 812ca1c8 atapi!AtapiStartIo+0x22b (FPO: [Non-Fpo])

f5032088 80530f57 012ca030 00000002 0dd36002 atapi!IdeStartIoSynchronized+0x162 (FPO: [Non-Fpo])

812ca030 00000000 812fdc18 00000000 00000000 nt!KeSynchronizeExecution+0x17

Figure 4

Note that this is similar in output to the kb command, but includes additional information about how the particular call frame was constructed. We can see that the call to KiTrap08 includes a TaskGate – and the debugger advised that if we found a task gate we should use .tss and then obtain a second stack list. This gives us what we see in Figure 5.

kd> kv

  *** Stack trace for last set context - .thread/.cxr resets it

ChildEBP RetAddr  Args to Child             

f503201c baf40776 812ca368 8117f3ac 00010000 atapi!IdeSendCommand+0x3 (FPO: [EBP 0xf503205c] [2,24,4])

f503205c baf4291c 812ca368 8117f3ac 812ca1c8 atapi!AtapiStartIo+0x22b (FPO: [Non-Fpo])

f5032088 80530f57 012ca030 00000002 0dd36002 atapi!IdeStartIoSynchronized+0x162 (FPO: [Non-Fpo])

812ca030 00000000 812fdc18 00000000 00000000 nt!KeSynchronizeExecution+0x17

Figure 5

Alas, this doesn’t seem to tell us much more than we knew before. Again, it would be easy for us to quit at this point, but we can still look farther and see if we can find the underlying cause. Unfortunately, once the automated part of the analysis is done, we have to actually start poking around – and most of the time, what we write about in The NT Insider is that process of "poking around".

So, what do we do next? We already suspect that this is a stack overflow, so it would be nice to see if we can find potential culprits for this stack consumption. Such large consumers would show up as big gaps between the ChildEBP values we see on the stack – and would suggest where to look in the guilty driver to resolve the underlying problem.

Unfortunately, sometimes the debugger cannot unwind the stack. In such cases, we must manually walk the stack looking for another call frame and see if we can continue the stack unwind. We normally analyze the stack from the last valid frame (in this case the third line, not that final line). The third line lists a different stack value for the ChildEBP and thus is suspect. We sometimes see this in the debugger output and have learned over the years to treat that final entry with suspicion.

We’re specifically looking for a pair of addresses: the first will be a ChildEBP (pushed on the stack upon entry to a function) and the second will be a return address for the function that called. We can use the information from these two to reconstruct the call sequence. In the listing below we highlighted the prospective addresses – keep in mind that a ChildEBP would normally be an address above the current location, but still within the stack – so it will be numerically close, see Figure 6.

kd> dd f5032088

f5032088  812ca030 80530f57 012ca030 00000002

f5032098  0dd36002 baf42b57 812fd970 baf427ba

f50320a8  812ca030 ffa50e94 813034c0 f5032104

f50320b8  0dd36000 baf43827 00000000 00000000

f50320c8  00000000 812ca030 f9ad3c75 812ca030

f50320d8  00000000 806adee0 81303408 00000000

f50320e8  812ddbd8 813034c0 ffa25000 806ab698

f50320f8  00004000 81322220 00000000 f5032134

Figure 6

Thus, our first candidate is f5032104. The corresponding candidate return address is actually 0dd36000, but we know that can’t be a valid kernel address (since it is below the kernel address space). Interestingly, when we first did this we accidentally skipped this value and used the next stack value - baf43827. As it turns out, this was the correct value, but we arrived at it accidentally.

To determine if a value is a valid return address, we must remember that the return address will be the instruction after the call. So, in order to see the call instruction we must back up in the instruction stream. In general, we try to back up five or six bytes from the return address in order to hit the call – presumably, using that Intel manual, one can figure out the various possible sizes for a call operation, but nobody around here has done that analysis yet! This yields the results seen in Figure 7.

kd> u baf43827-5

atapi!IdePortAllocateAccessToken+0x12:

baf43822 e803f3ffff       call    atapi!CallIdeStartIoSynchronized (baf42b2a)

baf43827 eb0d            jmp atapi!IdePortAllocateAccessToken+0x26 (baf43836)

baf43829 682a2bf4ba       push    0xbaf42b2a

baf4382e ffb084000000     push    dword ptr [eax+0x84]

baf43834 ffd1             call    ecx

baf43836 c20400           ret     0x4

baf43839 cc               int     3

atapi!IdeProcessCompletedRequest:

baf4383a 55               push    ebp

Figure 7

So since we can see a call, we really do have a return address. We can feed this into WinDBG using the kb or kv command plus the optional arguments (the ESP EBP and EIP values at that point in the call sequence). We know the EBP – that was on the stack (f5032104), and we know the EIP – that was the call instruction (baf43822). The stack pointer must have been the location on the stack of the return address, since the processor pushed the return address onto the stack at the current value of ESP. Thus, ESP was f50320bc (we took the address of the first location listed – f50320b8 and added 4 to it – and this is when we noticed we grabbed the wrong address earlier!).

Just to touch on our erroneous guess before (and confirm that we "got lucky") we looked at the code of the called function seen in Figure 8.

kd> u baf42b2a

atapi!CallIdeStartIoSynchronized:

baf42b2a 53               push    ebx

baf42b2b 55               push    ebp

baf42b2c 8b6c2418         mov     ebp,[esp+0x18]

baf42b30 56               push    esi

baf42b31 8b7528           mov     esi,[ebp+0x28]

baf42b34 57               push    edi

baf42b35 8dbee0000000     lea     edi,[esi+0xe0]

baf42b3b 8bcf             mov     ecx,edi

Figure 8

Note that the first value pushed was the value in EBX and the second was the value in EBP. While we are not sure why the debugger could not unwind this, we know that we can do so – and we’ll capitalize on our earlier mistake! This gives us the results seen in Figure 9.

kd> kv = f5032104 f50320bc baf43822

ChildEBP RetAddr  Args to Child             

f50320b8 baf43827 00000000 00000000 00000000 atapi!IdePortAllocateAccessToken+0x12 (FPO: [1,0,0])

f50320cc f9ad3c75 812ca030 00000000 806adee0 atapi!IdePortAllocateAccessToken+0x17 (FPO: [1,0,0])

f50320d8 806adee0 81303408 00000000 812ddbd8 PCIIDEX!BmReceiveScatterGatherList+0x21 (FPO: [4,0,1])

f5032104 806ae0b4 00000000 81303408 ffa50e68 hal!HalBuildScatterGatherList+0x19c (FPO: [Non-Fpo])

f5032134 f9ad3d17 81322220 81303408 ffa50e68 hal!HalGetScatterGatherList+0x24 (FPO: [Non-Fpo])

f5032168 baf44114 813034c0 ffa25000 00004000 PCIIDEX!BmSetup+0x5d (FPO: [Non-Fpo])

f50321a0 804eb04f 812ca030 81ab6f20 812fd578 atapi!IdePortStartIo+0xe6 (FPO: [Non-Fpo])

f50321c0 baf434fc 812ca030 81ab6f20 00000000 nt!IoStartPacket+0x7b (FPO: [Non-Fpo])

f50321ec 804ea221 812ca030 00ab6f20 806ac214 atapi!IdePortDispatch+0x4c0 (FPO: [Non-Fpo])

f50321fc 8062c190 81ab6f20 bafb910c 00000000 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032220 bafb9133 bafb910c bafb9898 812fd2a8 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5032228 bafb9898 812fd2a8 81ab6f20 812fd2a8 ACPI!ACPIDispatchForwardIrp+0x27 (FPO: [2,0,1])

f5032258 804ea221 812fd2a8 bafcd68c 806ac214 ACPI!ACPIDispatchIrp+0x158 (FPO: [Non-Fpo])

f5032268 8062c190 81ab6fd8 8117f300 812fd2a8 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f503228c f98a4028 ffa25000 81a18e28 00000000 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f503229c f98a3c4b 8117f300 81328998 81a18f28 CLASSPNP!SubmitTransferPacket+0x7e (FPO: [1,0,3])

f50322c8 f98a3b5a 00004000 00004000 813288e0 CLASSPNP!ServiceTransferRequest+0xe0 (FPO: [Non-Fpo])

f50322ec 804ea221 813288e0 00000000 806ac214 CLASSPNP!ClassReadWrite+0xfd (FPO: [Non-Fpo])

f50322fc 8062c190 813286b8 812fd040 81328770 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032320 f9adb36c 812f7b00 81a18f4c f5032364 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

Figure 9

Of course, this still isn’t enough stack information. The previous trick (of providing the stack frame count) won’t work this time around because we overrode the stack parameters. Thus, we must use a different debugger trick – the .kframes directive. That changes the default so that the debugger shows us more frames by default. We set it to 0x100 and repeated the stack backtrace and we now have what you see in Figure 10.

kd> .kframes 100

Default stack trace depth is 0n256 frames

kd> kv = f5032104 f50320bc baf43822

ChildEBP RetAddr  Args to Child             

f50320b8 baf43827 00000000 00000000 00000000 atapi!IdePortAllocateAccessToken+0x12 (FPO: [1,0,0])

f50320cc f9ad3c75 812ca030 00000000 806adee0 atapi!IdePortAllocateAccessToken+0x17 (FPO: [1,0,0])

f50320d8 806adee0 81303408 00000000 812ddbd8 PCIIDEX!BmReceiveScatterGatherList+0x21 (FPO: [4,0,1])

f5032104 806ae0b4 00000000 81303408 ffa50e68 hal!HalBuildScatterGatherList+0x19c (FPO: [Non-Fpo])

f5032134 f9ad3d17 81322220 81303408 ffa50e68 hal!HalGetScatterGatherList+0x24 (FPO: [Non-Fpo])

f5032168 baf44114 813034c0 ffa25000 00004000 PCIIDEX!BmSetup+0x5d (FPO: [Non-Fpo])

f50321a0 804eb04f 812ca030 81ab6f20 812fd578 atapi!IdePortStartIo+0xe6 (FPO: [Non-Fpo])

f50321c0 baf434fc 812ca030 81ab6f20 00000000 nt!IoStartPacket+0x7b (FPO: [Non-Fpo])

f50321ec 804ea221 812ca030 00ab6f20 806ac214 atapi!IdePortDispatch+0x4c0 (FPO: [Non-Fpo])

f50321fc 8062c190 81ab6f20 bafb910c 00000000 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032220 bafb9133 bafb910c bafb9898 812fd2a8 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5032228 bafb9898 812fd2a8 81ab6f20 812fd2a8 ACPI!ACPIDispatchForwardIrp+0x27 (FPO: [2,0,1])

f5032258 804ea221 812fd2a8 bafcd68c 806ac214 ACPI!ACPIDispatchIrp+0x158 (FPO: [Non-Fpo])

f5032268 8062c190 81ab6fd8 8117f300 812fd2a8 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f503228c f98a4028 ffa25000 81a18e28 00000000 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f503229c f98a3c4b 8117f300 81328998 81a18f28 CLASSPNP!SubmitTransferPacket+0x7e (FPO: [1,0,3])

f50322c8 f98a3b5a 00004000 00004000 813288e0 CLASSPNP!ServiceTransferRequest+0xe0 (FPO: [Non-Fpo])

f50322ec 804ea221 813288e0 00000000 806ac214 CLASSPNP!ClassReadWrite+0xfd (FPO: [Non-Fpo])

f50322fc 8062c190 813286b8 812fd040 81328770 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032320 f9adb36c 812f7b00 81a18f4c f5032364 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5032330 804ea221 813286b8 81a18e28 806ac214 PartMgr!PmReadWrite+0x93 (FPO: [Non-Fpo])

f5032340 8062c190 81a18e28 81a18f70 812f7bb0 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032364 baf781c6 812f7af8 812fbc88 81a18e00 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5032380 804ea221 812f7af8 81a18e28 806ac214 ftdisk!FtDiskReadWrite+0x194 (FPO: [Non-Fpo])

f5032390 8062c190 81a18f8c 81a18fb0 81a18e28 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f50323b4 f9883a4b 81a18e00 812f9020 812fdf38 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f50323c8 804ea221 812f90d8 81a18e28 806ac214 VolSnap!VolSnapWrite+0xb9 (FPO: [Non-Fpo])

f50323d8 8062c190 812c7738 81a18e28 81a18e0c nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f50323fc bae8fcd5 f50327ec 81a18e28 f50325e4 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f503240c bae8f77d f50327ec 812f9020 3f95d000 Ntfs!NtfsSingleAsync+0x6b (FPO: [Non-Fpo])

f50325e4 bae90b3a f50327ec 81a18e28 812c5a98 Ntfs!NtfsNonCachedIo+0x363 (FPO: [Non-Fpo])

f50327dc bae904f1 f50327ec 81a18e28 0110070a Ntfs!NtfsCommonWrite+0x1847 (FPO: [Non-Fpo])

f5032950 804ea221 812c7658 81a18e28 806ac214 Ntfs!NtfsFsdWrite+0xf3 (FPO: [Non-Fpo])

f5032960 8062c190 81a18e28 812f9350 812c70d8 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5032984 baf2c3b8 812c7020 812c5c00 f50329c8 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5032994 804ea221 812c7020 81a18e28 806ac214 sr!SrWrite+0xa8 (FPO: [Non-Fpo])

f50329a4 8062c190 81a18fdc 81a19000 812c5c90 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f50329c8 f468222f ffb5bc98 81280b60 ffb5bc00 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f50335f4 f4682377 ffb5bd50 81a18e28 804ea221 IOTRAP!IOTrapHookProc+0x27f (FPO: [Non-Fpo]) (CONV: stdcall) [c:\dev\iotrap\iotrap.c @ 2728]

f5033600 804ea221 ffb5bc98 81a18e28 806ac214 IOTRAP!IOTrapDispatch+0x1d (FPO: [2,0,0]) (CONV: stdcall) [c:\dev\iotrap\iotrap.c @ 2844]

f5033610 8062c190 812c5c90 007e3000 ffb5bc98 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5033634 804eb2d3 f5033828 f5033670 00000000 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5033648 80504af6 812c5c0a f5033670 f5033704 nt!IoSynchronousPageWrite+0xad (FPO: [Non-Fpo])

f5033720 80505466 e14e6f8c e14e6f9c 00000000 nt!MiFlushSectionInternal+0x37a (FPO: [7,44,3])

f503375c 804e048d 00000000 e14e6f8c 000007e3 nt!MmFlushSection+0x1e0 (FPO: [Non-Fpo])

f50337e4 baeb1f83 00004000 f503385c 00004000 nt!CcFlushCache+0x363 (FPO: [Non-Fpo])

f50338ac baeb2044 e144d8c0 e15f38a0 e144d8c0 Ntfs!LfsFlushLfcb+0x227 (FPO: [Non-Fpo])

f50338d0 baeb6d35 e144d8c0 e15f38a0 e14b8fd8 Ntfs!LfsFlushLbcb+0x7f (FPO: [Non-Fpo])

f50338f8 baeb23d1 e144d8c0 0c0fceb4 00000000 Ntfs!LfsFlushToLsnPriv+0xf1 (FPO: [Non-Fpo])

f5033938 804df58e e14b8fd8 0c0fceb4 00000000 Ntfs!LfsFlushToLsn+0x8e (FPO: [Non-Fpo])

f50339e0 804e03ba ffa655e8 00000000 00000001 nt!CcAcquireByteRangeForWrite+0x7dc (FPO: [Non-Fpo])

f5033a6c baec6055 00001000 00000000 00000000 nt!CcFlushCache+0x290 (FPO: [Non-Fpo])

f5033acc baec5fe2 f5033e3c e1274008 f5033b7c Ntfs!NtfsDeleteAllocationFromRecord+0x125 (FPO: [Non-Fpo])

f5033bfc baecc97d f5033e3c e1274008 e1676d20 Ntfs!NtfsDeleteFile+0x25a (FPO: [Non-Fpo])

f5033e18 baeafac0 f5033e3c 81662e28 812c7658 Ntfs!NtfsCommonCleanup+0xaaf (FPO: [Non-Fpo])

f5033f90 804ea221 812c7658 81662e28 806ac214 Ntfs!NtfsFsdCleanup+0xcf (FPO: [Non-Fpo])

f5033fa0 8062c190 812c70d8 812f9350 81662e28 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5033fc4 baf3066f 812c7020 812d1400 f5034008 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5033fd4 804ea221 812c7020 e175cdb0 806ac214 sr!SrCleanup+0xb1 (FPO: [Non-Fpo])

f5033fe4 8062c190 81662fdc 81663000 812d14c8 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5034008 f468222f ffb5bc98 81280b60 81662e00 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

Page f4000 too large to be in the dump file.

f5034c34 f4682377 ffb5bd50 81662e28 804ea221 IOTRAP!IOTrapHookProc+0x27f (FPO: [Non-Fpo]) (CONV: stdcall) [c:\dev\iotrap\iotrap.c @ 2728]

f5034c40 804ea221 ffb5bc98 81662e28 806ac214 IOTRAP!IOTrapDispatch+0x1d (FPO: [2,0,0]) (CONV: stdcall) [c:\dev\iotrap\iotrap.c @ 2844]

f5034c50 8062c190 812d14c8 81662e28 81662e38 nt!IopfCallDriver+0x31 (FPO: [0,0,1])

f5034c74 80560d4d 812d14b0 8130b560 00000001 nt!IovCallDriver+0x9e (FPO: [Non-Fpo])

f5034ca8 80597c03 ffb750b0 ffb5bc98 00110080 nt!IopCloseFile+0x261 (FPO: [Non-Fpo])

f5034cd8 805975bb ffb750b0 812d14c8 8130b560 nt!ObpDecrementHandleCount+0x119 (FPO: [Non-Fpo])

f5034d00 80597651 e157e280 812d14c8 00001a0c nt!ObpCloseHandleTableEntry+0x14b (FPO: [Non-Fpo])

f5034d48 80597777 00001a0c 00000001 00000000 nt!ObpCloseHandle+0x85 (FPO: [Non-Fpo])

f5034d58 8052d571 00001a0c 00530020 00740065 nt!NtClose+0x19 (FPO: [1,0,0])

f5034d58 7ffe0304 00001a0c 00530020 00740065 nt!KiSystemService+0xc4 (FPO: [0,0] TrapFrame @ f5034d64)

005df220 00000000 00000000 00000000 00000000 SharedUserData!SystemCallStub+0x4 (FPO: [0,0,0])

Figure 10

This does draw our attention to this call frame and we note that it does appear to consume an unhealthy amount of stack space – 3116 bytes to be exact. We note that this driver is on the stack twice, and the second invocation also uses 3116 bytes. Thus, a single driver in this one stack has used 6232 bytes of stack space. While itself not fatal, it certainly suggests where we should be turning our attention!

The key to debugging is perseverance. Follow all of the clues and all of the possible avenues. It is difficult when you first get started, but each time you do this, your understanding of the system improves as does your repertoire of scenarios. Eventually you will be debugging like a pro!

Accessing the Microsoft Symbol Server
One of the most common problems that new users have when using the debugger is that the symbols are not properly configured.  Current version of WinDBG have a new command that will reset your debugger configuration to retrieve symbols from the Microsoft symbol server.  This command is the .symfix command and it will adjust the symbol search path to default to srv**http://msdl.microsoft.com/download/symbols -- this is the default public location for symbols.  If you do not have Internet acess, this will not work (duh), but it is a terrific fix for the vast majority of configurations.

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

"Internal function interfaces"
Where to find internal function interfaces/prototypes e.g. ntfs!ntfscommancleanup, hal!HalpIpiHandler, nt!KiIpiServiceRoutine, nt!KiFreezeTargetExecution, etc ?

Rating:
29-Oct-09, Natarajan Sundararajan


"Article is Really Good"
Article is Really Good, Commments too. I am a beginner in Windows Driver Development & Debugging, It's Helped me lot to Analysis the Real time crash dump.

Rating:
08-Jun-07, Mohamed Jaffar


"DDS command"
The part of the analysis talked about in relation to Fig 6 is made easier by using the DDS esp command. This command lists symbols for those dwords on the stack that can be translated to a symbol. This makes it easy to pick out potential return addresses.

Additionally I have usually found it sufficient to use KV = and leave off the EIP and ESP values. Plus, KV = can take a frame count, so you don't have to use .kframe command.

04-Feb-05, Chip Webb


"Great article"
I think it would be nice to finish the article by proving that the problem indeed was a stack overflow problem. This way we have at least proven that we need to look into the unhealthy stack memory consumption of iotrap:

TSS: 00000028 -- (.tss 28) eax=00030e01 ebx=812ca368 ecx=8117f300 edx=00000001 esi=8117f3ac edi=00010000 eip=baf3f9d7 esp=f5031fc0 ebp=f503205c iopl=0 nv up ei ng nz na po nc cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000 efl=00010286 atapi!IdeSendCommand+0x3: baf3f9d7 53 push ebx <- We're pushing data onto the stack beyond it's limit.

Current ESP = f5031fc0

Thread stack limits:

Stack Init f5035000 Current f50326f0 Base f5035000 Limit f5032000 Call 0

Note that we are already over the limit. Before the push instruction is executed. Which means that the routine probably corrupted another component's memory by writing more data to the stack.

I am still looking for an explanation why it failed at ESP = f5031fc0 and not at ESP = f5032000. Could it be that the page the stack was written to was paged out? A 'dd f5031f00' would show us whether or not the memory memory was valid. But perhaps this is not the reason of the double fault.

Loved the KV tip to unwind the stack with manual parameters. Never used the K commands like this before. A tip may be to use the !stack command which is, I think, included with the kdex2x86.dll extension. I think it would show a similar result.

Keep up the good work!

Rating:
28-Aug-04, Erwin Zoer


"Great article!"
Good work guys! Personally, I prefer the following steps: 1) use kv to capture the trap frame or TSS 2) .trap or .tts to where it died. 3) dds esp l200 4) then feed kv with what I saw in the output from 3)

Love to see more crash analyze.

Rating:
27-Aug-04, Calvin Guan


Post Your Comments.
Print this article.
Email this article.

Windows Internals and SW Drivers
LAB

Dulles/Sterling, VA
20-24 Oct 2014

Developing File Systems for Windows
Seattle, WA
4-7 Nov 2014

Kernel Debugging and Crash Analysis
LAB

Boston/Waltham, MA
10-14 Nov 2014

 
 

Windows Debugger

Checked Build Downloads
29-Apr-10

Debugging Symbols

WDK Documentation

Windows WDK

 
 
x
LetUsHelp
 

Need to develop a Windows file system solution?

We've got a kit for that.

Need Windows internals or kernel driver expertise?

Bring us your most challenging project - we can help!

System hangs/crashes?

We've got a special diagnostic team that's standing by.

Visit the OSR Corporate Web site for more information about how OSR can help!

 
bottom nav links