Hey,
I have an ?isolation filter? that has encountered the following deadlock:
Thread ?A? is a private thread that acquires my FCB resources, calls CcCoherencyFlushAndPurgeCache() and releases the locks. This thread is stuck in CcCoherencyFlushAndPurgeCache():
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!KeWaitForGate+0x10b
nt!MiWaitForFlushInProgress+0x94
nt!MiFlushSectionInternal+0x398
nt!MmFlushSection+0x1a2
nt!CcFlushCachePriv+0x616
nt!CcCoherencyFlushAndPurgeCache+0x60
mydriver!mydriverApiTryCcCoherencyFlushAndPurgeCache+0x71
mydriver!mydriverWorkItemRoutine+0x3ed
On the other hand, I have the lazy writer thread that is stuck in my paging-io write dispatch routine while trying to acquire the same lock:
nt!KiSwapContext+0x76
nt!KiSwapThread+0x14e
nt!KiCommitThreadWait+0x129
nt!ExpWaitForResource+0x29f
nt!ExAcquireResourceExclusiveLite+0x1da
mydriver!mydriverFsWritePagingIo+0x41d
mydriver!mydriverWrite+0x2e
fltmgr!FltpPerformPreCallbacks+0x29f
fltmgr!FltpPassThroughInternal+0x8c
fltmgr!FltpPassThrough+0x2be
fltmgr!FltpDispatch+0x9a
nt!IoSynchronousPageWrite+0x138
nt!MiIssueSynchronousFlush+0x66
nt!MiFlushSectionInternal+0x775
nt! ?? ::FNODOBFM::string'+0x5147d nt! ?? ::FNODOBFM::
string’+0xd92a
nt!ObpRemoveObjectRoutine+0x64
nt!ObfDereferenceObjectWithTag+0x8f
nt!CcDeleteSharedCacheMap+0x101
nt!CcWriteBehindInternal+0x330
nt!ExpWorkerThread+0x28c
nt!PspSystemThreadStartup+0x58
nt!KiStartSystemThread+0x16
At this point I would expect the lazy writer to pre-acquire locks through the CC callbacks or the Fast-Io callbacks, but according to my logs the same thread performs the following operations prior to sending the write request:
- calls AcquireForLazyWrite, I return TRUE
- calls ReleaseFromLazyWrite
- calls AcquireForSectionSynchronization, I return STATUS_SUCCESS
- calls ReleaseForSectionSynchronization
- Issues the write request that hangs.
Between #4 and #5, my private thread (thread ?A?), takes the resource and calls CcCoherencyFlushAndPurgeCache() which never returns.
The odd thing is that I expect the write request to arrive between #1-#2 or between #3-#4. But it has arrived afterwards, while no locks are pre-acquired.
In addition I would say that I also implement AcquireForCcFlush and AcquireForModifiedPageWriter. According to my log these callbacks were not called in the process.
Does anyone ever encountered this scenario?
I thought about returning STATUS_FILE_LOCK_CONFLICT in this case to prevent the deadlock but I?m not sure that would be the right solution.
Thanks!