DRIVER_IRQL_NOT_LESS_OR_EQUAL after KiDeliverApc

I just got a crash I can’t seem to puzzle out. Here is the bugcheck info:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pagable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: 0000001c, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 00000000, address which referenced memory

Ok, so referencing 0 is a pretty common cause of a crash, but it’s the
stack that puzzles me. The bugcheck analyze command shows me the stack
from the exception trap:

Trap Frame address: f57859b0
Trap Frame:
ErrCode = 00000000
eax=853c6c1c ebx=00000000 ecx=853c6cc4 edx=00000001 esi=853be328
edi=853c6be0
eip=00000000 esp=f5785a24 ebp=f5785a4c iopl=0 nv up ei pl zr na po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010246
00000000 ?? ???
*** Stack trace for last set context - .thread resets it
ChildEBP RetAddr
f5785a4c f67f7378 nt!KiTrap0E+0x284
f5785ab8 ffffffff RxJnlWrt!RxInstanceTableLock
ffb3b4c0 00000000 0xffffffff*** ERROR: Module load completed but symbols
could not be loaded for dcesm.sys

But “RxInstanceTableLock” is not a code address in my driver, it’s the
address of an ERESOURCE lock that this thread is trying to acquire. The
top of the “regular” stack looks like this:

ChildEBP RetAddr Args to Child
f57859b0 00000000 00000001 00000800 00000001 nt!KiTrap0E+0x284
f5785a20 8042f45d 00000100 853c6be0 ffdff000
f5785a4c 80469e00 00000000 00000000 00000000 nt!KiDeliverApc+0x1a1
f5785a6c 8042bd4d 00000000 f67f7378 00000000 nt!KiSwapThread+0x1e8
f5785a94 8041516b 85a9e868 00000000 00000000
nt!KeWaitForSingleObject+0x1a3
f5785ad0 804146df f67f7378 f67f7378 f5785af4 nt!ExpWaitForResource+0x2d
f5785ae0 80414733 f67f7378 853b7908 856b7d01
nt!ExpAcquireResourceExclusiveLite+0x7b
f5785af4 f67f5f2f f67f73ac 00000001 856b7d08
nt!ExAcquireResourceExclusiveLite+0x45

Ok, pretty normal, and the code was executing at PASSIVE_LEVEL here. I
was careful to call KeEnterCriticalRegion() before trying to acquire the
ERESOURCE lock exclusive.

Looking at the ERESOURCE lock I was trying to acquire, I see:
Resource @ RxJnlWrt!RxInstanceTableLock (0xf67f7378) Exclusively owned
Contention Count = 18740
NumberOfExclusiveWaiters = 6
Threads: 855f6820-01
1 total locks, 1 locks currently held

Notice that this lock is also the one that appears in the trap frame as a
possible “return address”. Why?

Also, although the !locks command says there are 6 exclusive waiters,
there are only 5 threads on the linked list for the associated lock
KEVENT. And the thread marked as the “owning” thread also seems to be
waiting for this lock (its stack goes on from
ExAcquireResourceExclusiveLite). Plus, the thread that actually crashed
doesn’t seem to be in the lock’s list of waiters at all. Perhaps the
crash caught the system just as it was about to grant the lock?

I’d like to hear any explanation for the stack I see after KiDeliverApc.

Thanks,
Carl Appellof

Carl,

First, I suspect that the reason you see this thread as OWNING the lock
while it also appears to be waiting on it is because it just got the lock
(it hasn’t returned back from the acquire call).

Of course, APC delivery can also trigger the thread running as well (ah, but
then it wouldn’t appear to OWN the lock, would it?)

Delivering pending APCs after being rescheduled is a normal OS function.
The context switch is done, the OS looks and notes there is a pending APC,
so it goes to examine the APC queue. This could very easily indicate that
there is a special kernel APC on the queue.

Since you didn’t give us the disassembly here, I took the liberty of
grabbing a random crash dump and disassembling the code in question (this
function doesn’t change much from OS version to OS version). I think I hit
paydirt here since the LAST instruction (prior to the KiDeliverApc+0x1a1)
is:

8042f45b ffd3 call ebx

Well, a call through EBX = 0 would give you an EIP of zero, just like you
have in your stack trace. 0 isn’t a valid address in this process address
space, so it generates a page fault. You are in the dispatcher so you are
running at IRQL SYNCH_LEVEL (0x1c) which is why KiTrap0E bailed out…

Of course, this makes me wonder where ebx came from. Looking back through
the disassembly (of course, realizing the entire time that I’m looking at
W2K and you might be doing this off on .NET RC1 and none of this would
apply…) I found that ebx was only SET in one place!

8042f2e3 8b1d1c064080 mov ebx,[nt!imp (8040061c)]

1: kd> dd 8040061c l1
8040061c 800654e0
1: kd> u @$p
hal!KfReleaseSpinLock:
800654e0 33c0 xor eax,eax

So EBX is supposed to be the spin lock release. This makes sense to me
because earlier in the disassembly (just before this call) I saw:

8042f458 8b4dec mov ecx,[ebp-0x14]

Having a pretty good idea of how APCs work (gee, call this function pointer
inside the KAPC object) I’m guessing that EBP-0x14 is a function loaded from
the KAPC. The code leading up to this you have to dig for:

8042f3da 747c jz nt!KiDeliverApc+0x19c (8042f458)
8042f3dc 807d0801 cmp byte ptr [ebp+0x8],0x1
8042f3e0 7576 jnz nt!KiDeliverApc+0x19c (8042f458)
8042f3e2 807f4a01 cmp byte ptr [edi+0x4a],0x1
8042f3e6 7570 jnz nt!KiDeliverApc+0x19c (8042f458)

So this is some sort of “if” statement (and we’re in the “else” clause).
Being curious of where ebp-0x14 came from, I searched backwards and found it
was one of the first things done after the function preamble:

8042f2c5 64a124010000 mov eax,fs:[00000124]
8042f2cb 8bf8 mov edi,eax
8042f2cd 8d8fe4000000 lea ecx,[edi+0xe4]
8042f2d3 894dec mov [ebp-0x14],ecx

That first instruction is one I’ve seen so many times I could identify it in
my sleep (MP kernel, uses the fs selector to point to the KPCR. Field at
0x124 = “CurrentThread”. Use !kdex2x86.strct KPCR on W2K or dt nt!KPCR (or
nt!_KPCR one of them works) on XP/.NET.) So this is EAX =
KeGetCurrentThread.

Thus edi+0xe4 is the ApcQueueLock address (imagine that!)

And THAT gets stored at ebp-0x14. So that’s nothing more than the address
of the lock (it never gets rewritten, mind you…)

Ah, but it is getting late. Since I’m not sure which version you were
running on and since it DOES vary between 2K/XP/.NET (we now have in stack
queued spin locks, mind you, and that required changes to the implementation
of this function) it is pretty clear I’m WAY out here on a limb. BUT, if
you were on a 2K system comparable to the one I looked at, it sure looks
like someone stomped on memory.

Of course, you might have additional information (like the version and
UP/MP) that would simplify unraveling this one.

Of course, it is worth noting that the debugger has a tough time unraveling
a jump through zero. It always has for whatever reason.

I hope that is enough rampant speculation for one evening.

Regards,

Tony

Tony Mason
Consulting Partner
OSR Open Systems Resources, Inc.
http://www.osr.com

-----Original Message-----
From: Carl Appellof [mailto:xxxxx@veritas.com]
Sent: Wednesday, November 06, 2002 2:53 PM
To: File Systems Developers
Subject: [ntfsd] DRIVER_IRQL_NOT_LESS_OR_EQUAL after KiDeliverApc

I just got a crash I can’t seem to puzzle out. Here is the bugcheck info:

DRIVER_IRQL_NOT_LESS_OR_EQUAL (d1)
An attempt was made to access a pagable (or completely invalid) address at
an
interrupt request level (IRQL) that is too high. This is usually
caused by drivers using improper addresses.
If kernel debugger is available get stack backtrace.
Arguments:
Arg1: 00000000, memory referenced
Arg2: 0000001c, IRQL
Arg3: 00000000, value 0 = read operation, 1 = write operation
Arg4: 00000000, address which referenced memory

Ok, so referencing 0 is a pretty common cause of a crash, but it’s the
stack that puzzles me. The bugcheck analyze command shows me the stack
from the exception trap:

Trap Frame address: f57859b0
Trap Frame:
ErrCode = 00000000
eax=853c6c1c ebx=00000000 ecx=853c6cc4 edx=00000001 esi=853be328
edi=853c6be0
eip=00000000 esp=f5785a24 ebp=f5785a4c iopl=0 nv up ei pl zr na po
nc
cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000
efl=00010246
00000000 ?? ???
*** Stack trace for last set context - .thread resets it
ChildEBP RetAddr
f5785a4c f67f7378 nt!KiTrap0E+0x284
f5785ab8 ffffffff RxJnlWrt!RxInstanceTableLock
ffb3b4c0 00000000 0xffffffff*** ERROR: Module load completed but symbols
could not be loaded for dcesm.sys

But “RxInstanceTableLock” is not a code address in my driver, it’s the
address of an ERESOURCE lock that this thread is trying to acquire. The
top of the “regular” stack looks like this:

ChildEBP RetAddr Args to Child
f57859b0 00000000 00000001 00000800 00000001 nt!KiTrap0E+0x284
f5785a20 8042f45d 00000100 853c6be0 ffdff000
f5785a4c 80469e00 00000000 00000000 00000000 nt!KiDeliverApc+0x1a1
f5785a6c 8042bd4d 00000000 f67f7378 00000000 nt!KiSwapThread+0x1e8
f5785a94 8041516b 85a9e868 00000000 00000000
nt!KeWaitForSingleObject+0x1a3
f5785ad0 804146df f67f7378 f67f7378 f5785af4 nt!ExpWaitForResource+0x2d
f5785ae0 80414733 f67f7378 853b7908 856b7d01
nt!ExpAcquireResourceExclusiveLite+0x7b
f5785af4 f67f5f2f f67f73ac 00000001 856b7d08
nt!ExAcquireResourceExclusiveLite+0x45

Ok, pretty normal, and the code was executing at PASSIVE_LEVEL here. I
was careful to call KeEnterCriticalRegion() before trying to acquire the
ERESOURCE lock exclusive.

Looking at the ERESOURCE lock I was trying to acquire, I see:
Resource @ RxJnlWrt!RxInstanceTableLock (0xf67f7378) Exclusively owned
Contention Count = 18740
NumberOfExclusiveWaiters = 6
Threads: 855f6820-01
1 total locks, 1 locks currently held

Notice that this lock is also the one that appears in the trap frame as a
possible “return address”. Why?

Also, although the !locks command says there are 6 exclusive waiters,
there are only 5 threads on the linked list for the associated lock
KEVENT. And the thread marked as the “owning” thread also seems to be
waiting for this lock (its stack goes on from
ExAcquireResourceExclusiveLite). Plus, the thread that actually crashed
doesn’t seem to be in the lock’s list of waiters at all. Perhaps the
crash caught the system just as it was about to grant the lock?

I’d like to hear any explanation for the stack I see after KiDeliverApc.

Thanks,
Carl Appellof


You are currently subscribed to ntfsd as: xxxxx@osr.com
To unsubscribe send a blank email to %%email.unsub%%

Thanks for the lucid analysis, Tony. You’ve taking me a few steps further
than I’ve gotten on my own. I got as far as the “call ebx” and failed to go
further.

If you can’t sleep tonight, you can continue with the following few new
pieces of information:
The machine is a 2-processor x86 system running Windows 2000 SP2.

I’ll also proceed to look further, given the insight you provided.

Thanks,
Carl Appellof

“Tony Mason” wrote in message news:xxxxx@ntfsd…
>
> Carl,
>
> First, I suspect that the reason you see this thread as OWNING the lock
> while it also appears to be waiting on it is because it just got the lock
> (it hasn’t returned back from the acquire call).
>
> Of course, APC delivery can also trigger the thread running as well (ah,
but
> then it wouldn’t appear to OWN the lock, would it?)
>
> Delivering pending APCs after being rescheduled is a normal OS function.
> The context switch is done, the OS looks and notes there is a pending APC,
> so it goes to examine the APC queue. This could very easily indicate that
> there is a special kernel APC on the queue.
>
> Since you didn’t give us the disassembly here, I took the liberty of
> grabbing a random crash dump and disassembling the code in question (this
> function doesn’t change much from OS version to OS version). I think I
hit
> paydirt here since the LAST instruction (prior to the KiDeliverApc+0x1a1)
> is:
>
> 8042f45b ffd3 call ebx
>
> Well, a call through EBX = 0 would give you an EIP of zero, just like you
> have in your stack trace. 0 isn’t a valid address in this process address
> space, so it generates a page fault. You are in the dispatcher so you are
> running at IRQL SYNCH_LEVEL (0x1c) which is why KiTrap0E bailed out…

Again, Tony, thanks for the pointers. I’ve extracted some points from your
previous note.

It looks to me as if the code is just at the end of trying to deliver a
kernal APC, and is attempting to release the thread’s ApcQueueLock that it
had acquired earlier. As you pointed out [ebp-0x14] should contain the
address of the thread’s ApcQueueLock, and ebx should contain the address of
hal!KfReleaseSpinLock. (As it happens, the crash dump I have matches your
system exactly.)

So the question is - how did ebx get zeroed? It was loaded once at the top
of the KiDeliverApc function. Must have gotten overwritten by some code in
a function called within this code path, because the KiDeliverApc function
itself never touches ebx after loading it, as far as I can tell. Weird.

Carl

“Tony Mason” wrote in message news:xxxxx@ntfsd…
>

> hal!KfReleaseSpinLock:
> 800654e0 33c0 xor eax,eax
>
> So EBX is supposed to be the spin lock release. This makes sense to me
> because earlier in the disassembly (just before this call) I saw:
>
> 8042f458 8b4dec mov ecx,[ebp-0x14]
>

.
>
> Thus edi+0xe4 is the ApcQueueLock address (imagine that!)
>
> And THAT gets stored at ebp-0x14. So that’s nothing more than the address
> of the lock (it never gets rewritten, mind you…)
>