hang waiting for IRP getting disk block for paging I/O

Hi,

I’m trying to implement paging I/O and running into a hang getting the
page’s data off disk:

kd> k 100
ChildEBP RetAddr
f9339fa8 804dc0f7 nt!KiSwapContext+0x2e
f9339fb4 804dc143 nt!KiSwapThread+0x46
f9339fdc 8066d356 nt!KeWaitForSingleObject+0x1c2
f933a004 f8aa87dd nt!VerifierKeWaitForSingleObject+0x56
f933a078 f8aa81a9 MWFS!_mos_windows_kern_device_rdwr+0x30d
[… OO goo…]
f933a704 f89f7f37 MWFS!fsi_read+0x23
f933a78c f89f79e0 MWFS!mwifs_rw+0x517
f933a7f4 f89e8453 MWFS!mwifs_rw_irp+0x190
f933a890 804e37f7 MWFS!mwifs_dispatch+0x433
f933a8a0 8066bec5 nt!IopfCallDriver+0x31
f933a8c4 804f95d8 nt!IovCallDriver+0xa0
f933a8d8 804f95ff nt!IopPageReadInternal+0xf4
f933a8f8 804f9264 nt!IoPageRead+0x1b
f933a96c 804eba6a nt!MiDispatchFault+0x274
f933a9bc 804f67f3 nt!MmAccessFault+0x5bc
f933a9fc 8057319d nt!MmCheckCachedPageState+0x461
f933aa88 f89f82d4 nt!CcCopyRead+0x3da
f933aab8 f89f7e03 MWFS!mwifs_cache_read+0x84
f933ab44 f89f79e0 MWFS!mwifs_rw+0x3e3
f933abac f89e8453 MWFS!mwifs_rw_irp+0x190
f933ac48 804e37f7 MWFS!mwifs_dispatch+0x433
f933ac58 8066bec5 nt!IopfCallDriver+0x31
f933ac7c 80567f81 nt!IovCallDriver+0xa0
f933ac90 805743a9 nt!IopSynchronousServiceTail+0x70
f933ad38 804de7ec nt!NtReadFile+0x580
f933ad38 7c90e4f4 nt!KiFastCallEntry+0xf8
0013e430 00000000 ntdll!KiFastSystemCallRet

The topmost _rdwr() got STATUS_PENDING from IoCallDriver() about the
IRP it issued. Examining the IRP shows PendingReturned is set, but
the IRP is marked as complete–but the event was never signalled for
some reason. If I defer the paging IRP and handle it on a work queue,
everything’s fine. Am I wrong to assume I can handle the paging IRP
in the context of the fault? Any advice appreciated.

Jeff

I think that you are called in arbitrary context (most probably at APC_LEVEL), so KeWaitFor*(INFINITE) shouldn’t be called. Instead waiting for event, register completion routine. It will be called in the same thread ctx if request will be processed synchronously.

Missing also some info like… Do you roll your own IRP or pass down IRP you got and fill stack? Why do you need to to wait for the IRP in such case?
Is it paging file or just memory mapped file you got an IRP for?

Some thoughts…
Is your UserEvent a NOTIFICATION event?
Also there was a discussion on NTDEV that Tail.Overlay.Thread must be set for some disk devices.

Best Regards
Bronislav Gabrhelik

From MSDN documentation for KeWaitForSingleObject…
Callers of KeWaitForSingleObject must be running at IRQL <= DISPATCH_LEVEL. However, if Timeout <> 0, the caller must be running at IRQL <= APC_LEVEL and in a nonarbitrary thread context.

Also look up for table named “Dispatch Routine IRQL and Thread Context” on MSDN.

> Am I wrong to assume I can handle the paging IRP

in the context of the fault?

No it is entirely usual, but (of course) your FSD has to expect it. What
is the event that MWFS!_mos_windows_kern_device_rdwr us waiting on? What
will set it?

Rod

“Jeff Rhyason” wrote in message news:xxxxx@ntfsd…
> Hi,
>
> I’m trying to implement paging I/O and running into a hang getting the
> page’s data off disk:
>
> kd> k 100
> ChildEBP RetAddr
> f9339fa8 804dc0f7 nt!KiSwapContext+0x2e
> f9339fb4 804dc143 nt!KiSwapThread+0x46
> f9339fdc 8066d356 nt!KeWaitForSingleObject+0x1c2
> f933a004 f8aa87dd nt!VerifierKeWaitForSingleObject+0x56
> f933a078 f8aa81a9 MWFS!_mos_windows_kern_device_rdwr+0x30d
> [… OO goo…]
> f933a704 f89f7f37 MWFS!fsi_read+0x23
> f933a78c f89f79e0 MWFS!mwifs_rw+0x517
> f933a7f4 f89e8453 MWFS!mwifs_rw_irp+0x190
> f933a890 804e37f7 MWFS!mwifs_dispatch+0x433
> f933a8a0 8066bec5 nt!IopfCallDriver+0x31
> f933a8c4 804f95d8 nt!IovCallDriver+0xa0
> f933a8d8 804f95ff nt!IopPageReadInternal+0xf4
> f933a8f8 804f9264 nt!IoPageRead+0x1b
> f933a96c 804eba6a nt!MiDispatchFault+0x274
> f933a9bc 804f67f3 nt!MmAccessFault+0x5bc
> f933a9fc 8057319d nt!MmCheckCachedPageState+0x461
> f933aa88 f89f82d4 nt!CcCopyRead+0x3da
> f933aab8 f89f7e03 MWFS!mwifs_cache_read+0x84
> f933ab44 f89f79e0 MWFS!mwifs_rw+0x3e3
> f933abac f89e8453 MWFS!mwifs_rw_irp+0x190
> f933ac48 804e37f7 MWFS!mwifs_dispatch+0x433
> f933ac58 8066bec5 nt!IopfCallDriver+0x31
> f933ac7c 80567f81 nt!IovCallDriver+0xa0
> f933ac90 805743a9 nt!IopSynchronousServiceTail+0x70
> f933ad38 804de7ec nt!NtReadFile+0x580
> f933ad38 7c90e4f4 nt!KiFastCallEntry+0xf8
> 0013e430 00000000 ntdll!KiFastSystemCallRet
>
> The topmost _rdwr() got STATUS_PENDING from IoCallDriver() about the
> IRP it issued. Examining the IRP shows PendingReturned is set, but
> the IRP is marked as complete–but the event was never signalled for
> some reason. If I defer the paging IRP and handle it on a work queue,
> everything’s fine. Am I wrong to assume I can handle the paging IRP
> in the context of the fault? Any advice appreciated.
>
> Jeff
>

Hi Rod and Bronislav,

Thanks for your replies!

On Thu, Mar 26, 2009 at 5:11 AM, Rod Widdowson wrote:
> No it is entirely usual, but (of course) your FSD has to expect it. ? ?What
> is the event that MWFS!_mos_windows_kern_device_rdwr us waiting on? What
> will set it?

The event is on _rdwr()'s stack, was given to
IoBuildSynchronousFsdRequest() to be signalled when the IRP is
complete.

On Thu, Mar 26, 2009 at 5:10 AM, wrote:
> I think that you are called in arbitrary context (most probably at APC_LEVEL), so KeWaitFor*(INFINITE) shouldn’t be called. Instead waiting for event, register completion routine. It will be called in the same thread ctx if request will be processed synchronously.

You’re right–it is at APC_LEVEL, which is OK by itself, according to
the KeWaitForSingleObject() page on MSDN, as long as the context is
nonarbitrary. According to the “Dispatch Routine IRQL and Thread
Context” document the context is arbitrary. I don’t fully understand
the design of this yet (the thread is mine!), but thanks for the
references–They look like they’ll fill in the gaps in my
understanding.

I did try a completion routine but it gets executed in another thread.
Is there any way to control that?

> Missing also some info like… Do you roll your own IRP or pass down IRP you got and fill stack? Why do you need to to wait for the IRP in such case?

I roll my own with IoBuildSynchronousFsdRequest().

> Is it paging file or just memory mapped file you got an IRP for?

It’s for a memory mapped file.

> Some thoughts…
> Is your UserEvent a NOTIFICATION event?

Yep.

> Also there was a discussion on NTDEV that Tail.Overlay.Thread must be set for some disk devices.

I’ll have a look. I tried setting this, but it doesn’t change things.

> From MSDN documentation for KeWaitForSingleObject…
> Callers of KeWaitForSingleObject must be running at IRQL <= DISPATCH_LEVEL. However, if Timeout <> 0, the caller must be running at IRQL ?<= APC_LEVEL and in a nonarbitrary thread context.
>
> Also look up for table named “Dispatch Routine IRQL and Thread Context” on MSDN.

So if I understand the docs, there are two constraints at play:

1) I shouldn’t be doing an infinite wait on the event because the
context may be “arbitrary”. (I don’t see how it is actually
arbitrary in this case, since it’d be the context that incurred the
page fault, wouldn’t it?) Does that mean I pretty much have to use a
work queue (to get a nonarbitrary context)?

2) Maybe the event can’t technically get signalled at completion at
APC_LEVEL. According to “Scheduling, Thread Context, and IRQL”, “The
I/O manager queues the special kernel-mode APC for I/O completion
whenever an I/O request completes.” This reads like it’s true for ALL
I/O requests (even though the paragraph talks about completing
buffered requests) and I guess that’s how the event gets signalled. If
that’s true, then the APC can’t be delivered until the IRQL drops to
PASSIVE_LEVEL right?

Am I understanding this? Thanks for your help.

Jeff

Jeff,

you are right that paging read comes in the context of page-faulting thread, so it is nonarbitrary, but by MSFT definition it is arbitrary. There is probably danger of priority inversion deadlock.

Event is dispatch object, so it must get signaled at APC_LEVEL, which is below DISPATCH_LEVEL, because at APC level thread scheduling works. Frankly I don’t understand why it is not signaled.
You shouldn’t use IoBuildSynchronousFsdRequest() if special APCs are disabled, so test it by KeAreAllApcsDisabled(). Unfortunately this API is available since WS2003, which introduced guarded sections. For older OSes I use following define.

#if (NTDDI_VERSION < NTDDI_WS03SP1)
#define KeAreAllApcsDisabled() (KeGetCurrentIrql > PASSIVE_LEVEL)
#endif

Don’t forget taht in completion routine you have to

  1. unmap, unlock and free MDL if it is not null
  2. free System buffer if (Irp->Flags & IRP_DEALLOCATE_BUFFER)
  3. Free irp itself
  4. return STATUS_MORE_PROCESSING_REQUIRED

Why the event is not signaled is because you created the IRP using IoBuildSynchronousFsdRequest which expects the irp’s UserEvent to be signaled by IoMgr’s IopCompleteRequest. However, since the paging read was from MM to resolve a page fault from CC, there is a good chance for MM having issued the IoPageRead with the special kernel APC disabled (inside a guarded region). As Bronislav said, you can detect this situation by KeAreAllApcsDisabled(). In your case, I suspect IoMgr has queued IopCompleteRequest but it can never be delievered to the original thread because of the disabled special kernel APC. Rajeev’s book has a good description on IopCompleteRequest and the event, see page 169 - 4.

AFAIK, it is incorrect (unsafe) to call IoBuildSynchronousFsdRequest to create the IRP while serviing a paging I/O because the special kernel APC “could” be disabled for the thread (it might be ok if the paging I/O is a fault from some kind of usermode memory). If your FSD IoAllocateIrp by yourself and also sets a completion routine, you should be able to signal the event in your own completion routine. Queuing the job to a different thread may not be necessary depending on your locking condition. Please refer to the WDK sample FastFat for more details on how noncached/paging IO is handled.

Hui

Hi Hui,

Thanks for the clarification. You are certainly right, I can use
IoAllocateIrp() and signal completion with a completion routine. It
certainly works. I’ll post a suggestion for the docs for
IoBuildSynchronousFsdRequest() on MSDN.

But is it OK to do? The MSDN docs for KeWaitForSingleObject() suggest
that waits should not be done at APC_LEVEL in an arbitrary context
when timeout == NULL. Practically, though, wait is bounded by the IRP
processing duration. Is that sufficient? Any comments appreciated.

Thanks for spending the time to explain this to me!

Jeff

> But is it OK to do? The MSDN docs for KeWaitForSingleObject() suggest

that waits should not be done at APC_LEVEL in an arbitrary context

It is OK to wait at APC_LEVEL.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com