Large file writes dead lock waiting for PagingIoResource

Large file writes with my isolation encryption filter end up dead locking while waiting for the PagingIoResource.

Like most isolation filters, the locking is based on fastfat but fastfat doesn’t implement the IRP_MJ_ACQUIRE_FOR_MOD_WRITE callbacks so I’m assuming that is the cause of the problem. Right now I’m using the same locking found in the IRP_MJ_ACQUIRE_FOR_CC_FLUSH. There are a few other posts that mention similiar issues but none have resolved it. One mentioned to return STATUS_CANT_WAIT in the ModWrite callback if access cannot be immediately granted. While that does make the whole operation complete without dead locking, it also seems to cause bogus data to be written.

Are there any examples or documentation about how the ModWriter should sync with the other components or at least describe what the locking should be?

Here are three threads waiting for the resource.


nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForPageWriteCompletion+0xb5
nt!MiFlushSectionInternal+0xb48
nt!MmFlushSection+0xa9
nt!CcFlushCachePriv+0x493
nt!CcWriteBehindInternal+0x17b
nt!CcWriteBehind+0x95
nt!CcWorkerThread+0x22d
nt!ExpWorkerThread+0x28c
nt!PspSystemThreadStartup+0x58
nt!KiStartSystemThread+0x16


nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!ExpWaitForResource+0x29f
nt!ExAcquireResourceSharedLite+0x409
MYDRIVER!PreAcquireModWrite+0xbd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPreFsFilterOperation+0x2dd
nt!FsFilterPerformCallbacks+0x12a
nt!FsRtlAcquireFileForModWriteEx+0xef
nt!MiGatherMappedPages+0x284
nt!MiMappedPageWriter+0x14f
nt!PspSystemThreadStartup+0x58
nt!KiStartSystemThread+0x16


nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!ExpWaitForResource+0x29f
nt!ExAcquireResourceSharedLite+0x409
MYDRIVER!PreAcquireCcFlush+0x10e
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPreFsFilterOperation+0x2dd
nt!FsFilterPerformCallbacks+0x12a
nt!FsRtlAcquireFileForCcFlushEx+0xec
nt!MmFlushSection+0x17a
nt!CcFlushCachePriv+0x493
nt!CcFlushCache+0x11
MYDRIVER!PreCleanup+0x2bd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IopCloseFile+0x12d
nt!NtClose+0x1c3
nt!KiSystemServiceCopyEnd+0x13

The first thread is not waiting for the resource, it allegedly holds it waiting for a concurrent IO on a page that can’t be completed in another thread waiting for the resource. The !locks command output would be helpful here.

Here is !locks output along with all threads stacks referenced in it.

1: kd> !locks
**** DUMP OF ALL RESOURCE OBJECTS ****

Resource @ 0xffffe001d30e5010 Exclusively owned
Contention Count = 1
NumberOfExclusiveWaiters = 1
Threads: ffffe001d61d9080-01<*>
Threads Waiting On Exclusive Access:
ffffe001d4917880

Resource @ 0xffffe001d640a9a0 Exclusively owned
Contention Count = 30
NumberOfExclusiveWaiters = 1
Threads: ffffe001d61d9080-01<*>
Threads Waiting On Exclusive Access:
ffffe001d2fcc880

Resource @ 0xffffe001d30e65e0 Shared 2 owning threads
Threads: ffffe001d4b71040-01<*> ffffe001d61d9080-01<*>

Resource @ 0xffffe001d30e6560 Exclusively owned
Threads: ffffe001d61d9080-01<*>
KD: Scanning for held locks.
9884 total locks, 4 locks currently held

1: kd> !thread ffffe001d61d9080; !thread ffffe001d4917880; !thread ffffe001d61d9080; !thread ffffe001d2fcc880; !thread ffffe001d4b71040; !thread ffffe001d61d9080; !thread ffffe001d61d9080
THREAD ffffe001d61d9080 Cid 05a8.03cc Teb: 00007ff6d6cf0000 Win32Thread: fffff901406b3010 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
ffffd00020b73200 Gate
IRP List:
ffffe001d42db530: (0006,0670) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap ffffc001171766d0
Owning Process ffffe001d5a35840 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 54993 Ticks: 2395 (0:00:00:37.421)
Context Switch Count 968 IdealProcessor: 1
UserTime 00:00:00.000
KernelTime 00:00:02.265
Win32 Start Address 0x00007fff25301df0
Stack Init ffffd00020b73c90 Current ffffd00020b72ee0
Base ffffd00020b74000 Limit ffffd00020b6e000 Call 0
Priority 15 BasePriority 8 UnusualBoost 5 ForegroundBoost 2 IoPriority 2 PagePriority 2
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForPageWriteCompletion+0xb5
nt!MiFlushSectionInternal+0xb48
nt!MmFlushSection+0x1a2
nt!CcFlushCachePriv+0x493
nt!CcFlushCache+0x11
MYDRIVER!PreCleanup+0x2bd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IopCloseFile+0x12d
nt!NtClose+0x1c3
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd00020b73b00) 0x00007fff29c60d3a

THREAD ffffe001d4917880 Cid 05a8.0ba4 Teb: 00007ff6d6d2c000 Win32Thread: fffff90142213b60 WAIT: (WrResource) KernelMode Non-Alertable
ffffe001d44c5be0 SynchronizationEvent
IRP List:
ffffe001d4278990: (0006,0670) Flags: 00000884 Mdl: 00000000
Not impersonating
DeviceMap ffffc001171766d0
Owning Process ffffe001d5a35840 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 57348 Ticks: 40 (0:00:00:00.625)
Context Switch Count 838 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.312
Win32 Start Address 0x00007fff29c033a0
Stack Init ffffd0002351bc90 Current ffffd0002351a5c0
Base ffffd0002351c000 Limit ffffd00023516000 Call 0
Priority 11 BasePriority 8 UnusualBoost 0 ForegroundBoost 2 IoPriority 2 PagePriority 5
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!ExpWaitForResource+0x29f
nt!ExAcquireResourceExclusiveLite+0x1da
MYDRIVER!CtxAcquireResourceExclusive+0xda
MYDRIVER!PreCreate+0x5a0f
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpCreate+0x325
nt!IopParseDevice+0x6c1
nt!ObpLookupObjectName+0x6b9
nt!ObOpenObjectByName+0x1e3
nt!NtQueryAttributesFile+0x141
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd0002351bb00) 0x00007fff29c6101a

THREAD ffffe001d61d9080 Cid 05a8.03cc Teb: 00007ff6d6cf0000 Win32Thread: fffff901406b3010 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
ffffd00020b73200 Gate
IRP List:
ffffe001d42db530: (0006,0670) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap ffffc001171766d0
Owning Process ffffe001d5a35840 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 54993 Ticks: 2395 (0:00:00:37.421)
Context Switch Count 968 IdealProcessor: 1
UserTime 00:00:00.000
KernelTime 00:00:02.265
Win32 Start Address 0x00007fff25301df0
Stack Init ffffd00020b73c90 Current ffffd00020b72ee0
Base ffffd00020b74000 Limit ffffd00020b6e000 Call 0
Priority 15 BasePriority 8 UnusualBoost 5 ForegroundBoost 2 IoPriority 2 PagePriority 2
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForPageWriteCompletion+0xb5
nt!MiFlushSectionInternal+0xb48
nt!MmFlushSection+0x1a2
nt!CcFlushCachePriv+0x493
nt!CcFlushCache+0x11
MYDRIVER!PreCleanup+0x2bd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IopCloseFile+0x12d
nt!NtClose+0x1c3
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd00020b73b00) 0x00007fff29c60d3a

THREAD ffffe001d2fcc880 Cid 0004.0040 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrResource) KernelMode Non-Alertable
ffffe001d4947480 SynchronizationEvent
Not impersonating
DeviceMap ffffc0011540d010
Owning Process ffffe001d2ead040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 57314 Ticks: 74 (0:00:00:01.156)
Context Switch Count 108 IdealProcessor: 0
UserTime 00:00:00.000
KernelTime 00:00:00.046
Win32 Start Address nt!MiMappedPageWriter (0xfffff8013e1494a4)
Stack Init ffffd000723c6c90 Current ffffd000723c6130
Base ffffd000723c7000 Limit ffffd000723c1000 Call 0
Priority 18 BasePriority 8 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!ExpWaitForResource+0x29f
nt!ExAcquireResourceExclusiveLite+0x1da
MYDRIVER!PreAcquireModWrite+0xd6
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPreFsFilterOperation+0x2dd
nt!FsFilterPerformCallbacks+0x12a
nt!FsRtlAcquireFileForModWriteEx+0xef
nt!MiGatherMappedPages+0x284
nt!MiMappedPageWriter+0x14f
nt!PspSystemThreadStartup+0x58
nt!KiStartSystemThread+0x16

THREAD ffffe001d4b71040 Cid 0004.0af0 Teb: 0000000000000000 Win32Thread: 0000000000000000 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
ffffd00072d86560 Gate
Not impersonating
DeviceMap ffffc0011540d010
Owning Process ffffe001d2ead040 Image: System
Attached Process N/A Image: N/A
Wait Start TickCount 54993 Ticks: 2395 (0:00:00:37.421)
Context Switch Count 939 IdealProcessor: 1
UserTime 00:00:00.000
KernelTime 00:00:00.687
Win32 Start Address nt!ExpWorkerThread (0xfffff8013e0af100)
Stack Init ffffd00072d86c90 Current ffffd00072d86240
Base ffffd00072d87000 Limit ffffd00072d81000 Call 0
Priority 13 BasePriority 13 UnusualBoost 0 ForegroundBoost 0 IoPriority 2 PagePriority 5
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt! ?? ::FNODOBFM::`string’+0x4bdaa
nt!MiFlushSectionInternal+0x8a7
nt!MmFlushSection+0xa9
nt!CcFlushCachePriv+0x493
nt!CcWriteBehindInternal+0x17b
nt!CcWriteBehind+0x95
nt!CcWorkerThread+0x22d
nt!ExpWorkerThread+0x28c
nt!PspSystemThreadStartup+0x58
nt!KiStartSystemThread+0x16

THREAD ffffe001d61d9080 Cid 05a8.03cc Teb: 00007ff6d6cf0000 Win32Thread: fffff901406b3010 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
ffffd00020b73200 Gate
IRP List:
ffffe001d42db530: (0006,0670) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap ffffc001171766d0
Owning Process ffffe001d5a35840 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 54993 Ticks: 2395 (0:00:00:37.421)
Context Switch Count 968 IdealProcessor: 1
UserTime 00:00:00.000
KernelTime 00:00:02.265
Win32 Start Address 0x00007fff25301df0
Stack Init ffffd00020b73c90 Current ffffd00020b72ee0
Base ffffd00020b74000 Limit ffffd00020b6e000 Call 0
Priority 15 BasePriority 8 UnusualBoost 5 ForegroundBoost 2 IoPriority 2 PagePriority 2
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForPageWriteCompletion+0xb5
nt!MiFlushSectionInternal+0xb48
nt!MmFlushSection+0x1a2
nt!CcFlushCachePriv+0x493
nt!CcFlushCache+0x11
MYDRIVER!PreCleanup+0x2bd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IopCloseFile+0x12d
nt!NtClose+0x1c3
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd00020b73b00) 0x00007fff29c60d3a

THREAD ffffe001d61d9080 Cid 05a8.03cc Teb: 00007ff6d6cf0000 Win32Thread: fffff901406b3010 WAIT: (WrVirtualMemory) KernelMode Non-Alertable
ffffd00020b73200 Gate
IRP List:
ffffe001d42db530: (0006,0670) Flags: 00000404 Mdl: 00000000
Not impersonating
DeviceMap ffffc001171766d0
Owning Process ffffe001d5a35840 Image: explorer.exe
Attached Process N/A Image: N/A
Wait Start TickCount 54993 Ticks: 2395 (0:00:00:37.421)
Context Switch Count 968 IdealProcessor: 1
UserTime 00:00:00.000
KernelTime 00:00:02.265
Win32 Start Address 0x00007fff25301df0
Stack Init ffffd00020b73c90 Current ffffd00020b72ee0
Base ffffd00020b74000 Limit ffffd00020b6e000 Call 0
Priority 15 BasePriority 8 UnusualBoost 5 ForegroundBoost 2 IoPriority 2 PagePriority 2
Call Site
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForPageWriteCompletion+0xb5
nt!MiFlushSectionInternal+0xb48
nt!MmFlushSection+0x1a2
nt!CcFlushCachePriv+0x493
nt!CcFlushCache+0x11
MYDRIVER!PreCleanup+0x2bd
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IopCloseFile+0x12d
nt!NtClose+0x1c3
nt!KiSystemServiceCopyEnd+0x13 (TrapFrame @ ffffd00020b73b00) 0x00007fff29c60d3a

As I said, I’m using the same locking pattern for ModWrite as I do for AcquireCc which is based on FastFat.

//---------------------
ModWrite/AcquireCc()

ufcb = (PFSRTL_ADVANCED_FCB_HEADER)shCtx->UpperFileObject->FsContext;

if (!ExIsResourceAcquiredSharedLite(ufcb->Resource)) {
ExAcquireResourceExclusiveLite(ufcb->Resource, TRUE);
} else {
ExAcquireResourceSharedLite(ufcb->Resource, TRUE);
}

ExAcquireResourceSharedLite(ufcb->PagingIoResource, TRUE);
//---------------------

//---------------------
PreClose()

ufcb = (PFSRTL_ADVANCED_FCB_HEADER)shCtx->UpperFileObject->FsContext;

CcFlushCache(shCtx->UpperFileObject->SectionObjectPointer, NULL, 0, NULL);

KeEnterCriticalRegion();
ExAcquireResourceExclusiveLite(ufcb->PagingIoResource, TRUE);
ExReleaseResourceLite(ufcb->PagingIoResource);
KeLeaveCriticalRegion();

CcPurgeCacheSection(shCtx->UpperFileObject->SectionObjectPointer, NULL, 0, FALSE);
//---------------------

Look for you acquire for MPW code not acquiring a lock.

I went back to returning STATUS_CANT_WAIT if the lock can’t be acquired which prevents the dead lock but now I’m back to dealing with partial data corruption of the written file.

It doesn’t happen all the time and again, only seems to appear with very large files (10’s of MB). Generally the first couple of megs of data are bogus but then the rest of the file is OK.

Considering the corruption doesn’t happen all the time and only seems to appear when I add checks to return STATUS_CANT_WAIT if the acquire fails, I’m assuming there’s a locking issue going on.

Is STATUS_CANT_WAIT the appropriate code to return on acquire failure? Does the MPW still write to the file even if the lock can’t be acquired?

No. MapPW lefts pages marked dirty and repeats the attempt on the next round. The MapPW is not the only thread that might flush pages, the Cache Manger’s write behind thread can also do this concurrently with the MapPW.

The strange thing is that the thread ffffe001d4b71040 managed to enter MiFlushSectionInternal without acquiring the file’s resource held by the thread ffffe001d61d9080. I assume that these threads tried to flush pages for the same data stream.

Do you have IRP_MJ_ACQUIRE_FOR_CC_FLUSH ( aka AcquireForCcFlush ) synchronization in the filter?

Yes.

ufcb = (PFSRTL_ADVANCED_FCB_HEADER)shCtx->UpperFileObject->FsContext;

if (!ExIsResourceAcquiredSharedLite(ufcb->Resource)) {
ExAcquireResourceExclusiveLite(ufcb->Resource, TRUE);
} else {
ExAcquireResourceSharedLite(ufcb->Resource, TRUE);
}

ExAcquireResourceSharedLite(ufcb->PagingIoResource, TRUE);

According to the !locks output the ffffe001d4b71040 thread held only one resource which is not consistent with ufcb->Resource and ufcb->PagingIoResource being acquired by AcquireForCcFlush .

The resource held by ffffe001d4b71040 thread was shared with ffffe001d61d9080 thread which makes me believe that this was the resource for the same data stream. Both threads managed to progress to MiFlushSectionInternal which raises the question about resources acquisition.

Right. That’s what I said earlier that there seems to be a locking issue. Returning STATUS_CANT_WAIT from AcquireModWrite if the lock can’t be acquired right away prevents the dead lock I was seeing before but now seems to be the cause of some funny corruption with large data writes. The full file is not corrupted but just some pages. And because it doesn’t happen all the time (sometimes the file gets written correctly) I don’t know what else it would be other than a locking issue.

Trouble is, I’m having trouble figuring out what the actual issue is. I don’t understand why this only manifests with large files. Maybe if I did, it would help narrow down where I need to focus so if anyone has thoughts it’d be much appreciated. Also, a couple things that might help me:

-Should isolation filters set IoSetTopLevelIrp? I notice that if I set the top level similar to FastFat then any file is 100% corrupted. Documentation says only FSD’s should call this function but because isolation filters do FSD like things, it kind of makes sense to use it. Not sure why it would cause the issue I’m seeing though.
-What’s the rule for exclusive vs shared resource acquisition in relation to Cc and ModWrite? I noticed in FastFat that for AcquireLazyWrite for example, they use Shared while the example here (http://www.osronline.com/article.cfm?id=167) uses Exclusive. Which one is correct?
-Probably not related but just curious, in FastFat read and write CcFlushCache gets called between calls to acquire Resource and PagingIoResource exclusive. I thought when CcFlushCache gets called that AcquireCcFlush would be called which, based on the callbacks, also tries to acquire the same locks. But if they are already acquired exclusively how can AcquireCcFlush ever complete? I’m obviously missing something.

No, if a request will be passed to underlying filters and FSD. A filter should restore the original value for TopLevelIrp when passing a request down the stack. A filter can set TopLevelIrp to pass some information to itself in recursive invocations from Cc* and Mm* functions.

I would go with exclusive acquisition in case of existing data corruption.

ERESOURCE can be acquired recursively.
A deadlock happens when a thread tries to acquire a resource exclusively while holding it shared.
A thread can acquire a resource shared or exclusively while holding it exclusively.
A thread can acquire a resource shared when holding it shared.