thread stuck on NTFS write: Windows 7 64 bit

Amritanshu_Johri · February 1, 2013, 9:17am

Hello ntfsd,

We have a legacy filter driver that keeps track of changes happening
on files for a volume. It writes things back to a log file especially
when a file handle is closed.

We are observing an occasional hang in the system, it happens when a
new file has been created, here is how it looks.

Child-SP RetAddr Call Site

01 fffff8800919ed50 fffff800036e5d8f nt!KiCommitThreadWait+0x1d2
02 fffff8800919ede0 fffff8800165850a nt!KeWaitForSingleObject+0x19f
<– (stuck for a wait here)
03 fffff8800919ee80 fffff88001002bcf Ntfs!NtfsFsdWrite+0x26a
04 fffff8800919ef40 fffff880010016df
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
05 fffff8800919efd0 fffff88001247b0b fltmgr!FltpDispatch+0xcf
06 fffff8800919f030 fffff88001248852 mfl!call_driver_with_sync+0xdb
<– (We create our own Write IRP and send it down.)
… internal bookkeeping done in these frames …
12 fffff8800919f600 fffff880012686c2 mfl!event_handle+0x96
13 fffff8800919f670 fffff8800127309c mfl!hook_cleanup_cmpl+0x122
14 fffff8800919f6d0 fffff880012314b9 mfl!fs_hook_cleanup_cmpl+0x3c
… More internal book keeping …
1a fffff8800919f8b0 fffff800039e25ef mfl!fs_flt_cleanup+0x89 <–
(we see a cleanup IRP)
1b fffff8800919f8f0 fffff800039d1d14 nt!IopCloseFile+0x11f
1c fffff8800919f980 fffff800039d1ad1 nt!ObpDecrementHandleCount+0xb4
1d fffff8800919fa00 fffff800039d2094 nt!ObpCloseHandleTableEntry+0xb1
1e fffff8800919fa90 fffff800036de253 nt!ObpCloseHandle+0x94
1f fffff8800919fae0 0000000077b4140a nt!KiSystemServiceCopyEnd+0x13
20 000000000008e318 0000000000000000 0x77b4140a

IRP that is stuck:

0: kd> !irp 0xfffffa80`074f5bd0
Irp is active with 11 stacks 10 is current (= 0xfffffa80074f5f28)
Mdl=fffffa800d996bf0: No System Buffer: Thread fffffa800bdf2b50: Irp
stack trace.
cmd flg cl Device File Completion-Context
[0, 0] 0 0 00000000 00000000 00000000-00000000

… skipped empty stack locations for brevity

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[4, 0] 0 e0 fffffa80098dc030 fffffa800c731a10 fffff88001005e40-fffffa800e0e6cb0 Success Error Cancel
\FileSystem\Ntfs fltmgr!FltpPassThroughCompletion
Args: 00000004 00000000 0000027e 00000000
[4, 0] 0 e1 fffffa8009654de0 fffffa800c731a10
fffff88001247b60-fffffa800c8cead0 Success Error Cancel pending
\FileSystem\FltMgr mfl!completion_routine
Args: 00000004 00000000 ffffffffffffffff 00000000

The associated file object

0: kd> !fileobj fffffa800c731a10

\sc\sclog

Device Object: 0xfffffa800961cb10 \Driver\volmgr
Vpb: 0xfffffa800961ca50
Event signalled
Access: Write SharedRead SharedWrite SharedDelete

Flags: 0x40042
Synchronous IO
Cache Supported
Handle Created

FsContext: 0xfffff8a01c114c70 FsContext2: 0xfffff8a017bc1b40
CurrentByteOffset: 0
Cache Data:
Section Object Pointers: fffffa800b9050c8
Shared Cache Map: 00000000

I tried to make a guess about what is going on by disassembling bits
but couldn’t find much, here is a pieced up block of what
code was executed

Ntfs!NtfsFsdWrite+0x164:
fffff88001658404 81a330010000fffbffff and dword ptr [rbx+130h],0FFFFFBFFh fffff8800165840e 8b8330010000 mov eax,dword ptr [rbx+130h]
fffff88001658414 098334010000 or dword ptr [rbx+134h],eax fffff8800165841a 488b4620 mov rax,qword ptr [rsi+20h]
fffff880`0165841e 48894340 mov qword ptr [rbx+40h],rax

Ntfs!NtfsFsdWrite+0x182:
fffff88001658422 4584ff test r15b,r15b fffff88001658425 7546 jne Ntfs!NtfsFsdWrite+0x1cd
(fffff880`0165846d)

Ntfs!NtfsFsdWrite+0x187:
fffff88001658427 81ff880100c0 cmp edi,0C0000188h \<-- this is false fffff8800165842d 0f8499000000 je Ntfs!NtfsFsdWrite+0x22c
(fffff880016584cc) Ntfs!NtfsFsdWrite+0x193: fffff88001658433 81ff67030000 cmp edi,367h
fffff88001658439 0f849a000000 je Ntfs!NtfsFsdWrite+0x239 (fffff880016584d9)

Ntfs!NtfsFsdWrite+0x239:
fffff880016584d9 498b8c24b8000000 mov rcx,qword ptr [r12+0B8h] fffff880016584e1 0fb64103 movzx eax,byte ptr [rcx+3]
fffff880016584e5 83e0fe and eax,0FFFFFFFEh fffff880016584e8 884103 mov byte ptr [rcx+3],al
fffff880016584eb 488b4b78 mov rcx,qword ptr [rbx+78h] fffff880016584ef 4883c108 add rcx,8
fffff880016584f3 48c744242000000000 mov qword ptr [rsp+20h],0 fffff880016584fc 4533c9 xor r9d,r9d
fffff880016584ff 4533c0 xor r8d,r8d fffff88001658502 33d2 xor edx,edx
fffff88001658504 ff153ee70200 call qword ptr [Ntfs!_imp_KeWaitForSingleObject (fffff88001686c48)] <– we are
waiting here

fffff8800165850a 894504 mov dword ptr [rbp+4],eax fffff8800165850d 48837b7800 cmp qword ptr [rbx+78h],0
fffff88001658512 7422 je Ntfs!NtfsFsdWrite+0x296 (fffff88001658536)

Do let me know if anyone knows anything about it. or how to go about
root causing this.

TIA,
Johri

Scott_Noone_OSR · February 1, 2013, 10:27am

“Amritanshu” wrote in message news:xxxxx@ntfsd…

Ntfs!NtfsFsdWrite+0x187:
cmp edi,0C0000188h <– this is false
je Ntfs!NtfsFsdWrite+0x22c
Ntfs!NtfsFsdWrite+0x193:
cmp edi,367h
je Ntfs!NtfsFsdWrite+0x239

Assuming that EDI contains an NTSTATUS value:

E:\>winerror -s C0000188
1502 ERROR_LOG_FILE_FULL <–> c0000188 STATUS_LOG_FILE_FULL

E:\>winerror -s 367
765 ERROR_WAIT_FOR_OPLOCK <–> 00000367 STATUS_WAIT_FOR_OPLOCK

So, seems like NTFS waiting for an oplock break. Any other threads out there
stuck in your driver or NTFS? (“!stacks 2 mfl!”, “!stacks 2 ntfs!”). It’s
also possible that the cleanup you’re currently blocking is the one that
needs to trigger the oplock break.

-scott
OSR

Hello ntfsd,

We have a legacy filter driver that keeps track of changes happening
on files for a volume. It writes things back to a log file especially
when a file handle is closed.

We are observing an occasional hang in the system, it happens when a
new file has been created, here is how it looks.

Child-SP RetAddr Call Site

01 fffff8800919ed50 fffff800036e5d8f nt!KiCommitThreadWait+0x1d2
02 fffff8800919ede0 fffff8800165850a nt!KeWaitForSingleObject+0x19f
<– (stuck for a wait here)
03 fffff8800919ee80 fffff88001002bcf Ntfs!NtfsFsdWrite+0x26a
04 fffff8800919ef40 fffff880010016df
fltmgr!FltpLegacyProcessingAfterPreCallbacksCompleted+0x24f
05 fffff8800919efd0 fffff88001247b0b fltmgr!FltpDispatch+0xcf
06 fffff8800919f030 fffff88001248852 mfl!call_driver_with_sync+0xdb
<– (We create our own Write IRP and send it down.)
… internal bookkeeping done in these frames …
12 fffff8800919f600 fffff880012686c2 mfl!event_handle+0x96
13 fffff8800919f670 fffff8800127309c mfl!hook_cleanup_cmpl+0x122
14 fffff8800919f6d0 fffff880012314b9 mfl!fs_hook_cleanup_cmpl+0x3c
… More internal book keeping …
1a fffff8800919f8b0 fffff800039e25ef mfl!fs_flt_cleanup+0x89 <–
(we see a cleanup IRP)
1b fffff8800919f8f0 fffff800039d1d14 nt!IopCloseFile+0x11f
1c fffff8800919f980 fffff800039d1ad1 nt!ObpDecrementHandleCount+0xb4
1d fffff8800919fa00 fffff800039d2094 nt!ObpCloseHandleTableEntry+0xb1
1e fffff8800919fa90 fffff800036de253 nt!ObpCloseHandle+0x94
1f fffff8800919fae0 0000000077b4140a nt!KiSystemServiceCopyEnd+0x13
20 000000000008e318 0000000000000000 0x77b4140a

IRP that is stuck:

0: kd> !irp 0xfffffa80`074f5bd0
Irp is active with 11 stacks 10 is current (= 0xfffffa80074f5f28)
Mdl=fffffa800d996bf0: No System Buffer: Thread fffffa800bdf2b50: Irp
stack trace.
cmd flg cl Device File Completion-Context
[0, 0] 0 0 00000000 00000000 00000000-00000000

… skipped empty stack locations for brevity

Args: 00000000 00000000 00000000 00000000
[0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000

[4, 0] 0 e0 fffffa80098dc030 fffffa800c731a10
fffff88001005e40-fffffa800e0e6cb0 Success Error Cancel
\FileSystem\Ntfs fltmgr!FltpPassThroughCompletion
Args: 00000004 00000000 0000027e 00000000
[4, 0] 0 e1 fffffa8009654de0 fffffa800c731a10
fffff88001247b60-fffffa800c8cead0 Success Error Cancel pending
\FileSystem\FltMgr mfl!completion_routine
Args: 00000004 00000000 ffffffffffffffff 00000000