Debugging hints: page_fault_in_nonpaged_area

I have a kernel mode wdm driver (portcls miniport adapter implementing wavcyclic) largely derived from MS’ msvad sample, used as a virtual input device.

Rarely, the driver will cause a BSOD due to page_fault_in_nonpaged_area while capturing from it. The implementation includes a small buffer (220k) which is updated from a user mode application using DeviceIOctrl writes. The stream is consumed through standard Windows MME APIs (e.g. NOT directX). The issue occurs on both Win 7 and Win 8 64 bit.

I confirmed that paging markup was used consistently throughout the implementation, and also removed paging all together (neither had any effect). I created tools to put the environment in a memory constrained state, also with no effect; I CANNOT reproduce the BSOD at will (100+ hours of continuous testing with no failure, and then a random failure during a manual test days later).

Could someone please suggest some pointers on how to debug this type of BSOD and what likely causes might be?

Disclaimer: I should be treated as a kernel driver novice.

Read this article:

http://www.osronline.com/article.cfm?article=335

And also post the output of !analyze -v

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

I have a kernel mode wdm driver (portcls miniport adapter implementing
wavcyclic) largely derived from MS’ msvad sample, used as a virtual input
device.

Rarely, the driver will cause a BSOD due to page_fault_in_nonpaged_area
while capturing from it. The implementation includes a small buffer (220k)
which is updated from a user mode application using DeviceIOctrl writes.
The stream is consumed through standard Windows MME APIs (e.g. NOT directX).
The issue occurs on both Win 7 and Win 8 64 bit.

I confirmed that paging markup was used consistently throughout the
implementation, and also removed paging all together (neither had any
effect). I created tools to put the environment in a memory constrained
state, also with no effect; I CANNOT reproduce the BSOD at will (100+
hours of continuous testing with no failure, and then a random failure
during a manual test days later).

Could someone please suggest some pointers on how to debug this type of BSOD
and what likely causes might be?

Disclaimer: I should be treated as a kernel driver novice.

Please note that the bsod has never occurred in a debugging environment (dump below from production user).

2: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffffa810a99ee42, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff880058020c5, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000005, (reserved)

Debugging Details:

Could not read faulting driver name

READ_ADDRESS: GetPointerFromAddress: unable to read from fffff8000330b100
fffffa810a99ee42

FAULTING_IP:
dmic_x64+20c5
fffff880`058020c5 8b040a mov eax,dword ptr [rdx+rcx]

MM_INTERNAL_CODE: 5

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

BUGCHECK_STR: 0x50

PROCESS_NAME:

CURRENT_IRQL: 0

TRAP_FRAME: fffff8800dec3660 – (.trap 0xfffff8800dec3660)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffffa800e4b5cfa rbx=0000000000000000 rcx=fffffa810e47fde8
rdx=fffffffffc51f05a rsi=0000000000000000 rdi=0000000000000000
rip=fffff880058020c5 rsp=fffff8800dec37f8 rbp=0000000000000000
r8=00000000ffffbdec r9=0000000000000000 r10=0000000000000000
r11=fffffa800e484000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
dmic_x64+0x20c5:
fffff880058020c5 8b040a mov eax,dword ptr [rdx+rcx] ds:fffffa810a99ee42=???
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff8000314f873 to fffff800030d1ec0

STACK_TEXT:
fffff8800dec34f8 fffff8000314f873 : 0000000000000050 fffffa810a99ee42 0000000000000000 fffff8800dec3660 : nt!KeBugCheckEx
fffff8800dec3500 fffff800030cffee : 0000000000000000 fffffa810a99ee42 0000000000000000 00000000ffffbdec : nt! ?? ::FNODOBFM::string'+0x43801 fffff8800dec3660 fffff880058020c5 : fffff880058011eb fffffa800b2260f0 fffff88005cd65fb fffffa800b2260e8 : nt!KiPageFault+0x16e fffff8800dec37f8 fffff880058011eb : fffffa800b2260f0 fffff88005cd65fb fffffa800b2260e8 0000000000001680 : dmic_x64+0x20c5 fffff8800dec3800 fffffa800b2260f0 : fffff88005cd65fb fffffa800b2260e8 0000000000001680 fffffa800adad060 : dmic_x64+0x11eb fffff8800dec3808 fffff88005cd65fb : fffffa800b2260e8 0000000000001680 fffffa800adad060 0000000000000001 : 0xfffffa800b2260f0
fffff8800dec3810 fffff8800580618f : 0000000000000000 00000000000014c6 fffffa800a99f000 fffffa800f6cab40 : ksthunk!CKernelFilterDevice::FileObjectToFilterFile+0x6b
fffff8800dec3840 0000000000000000 : 00000000000014c6 fffffa800a99f000 fffffa800f6cab40 fffffa800f6cab40 : dmic_x64+0x618f

STACK_COMMAND: kb

FOLLOWUP_IP:
dmic_x64+20c5
fffff880`058020c5 8b040a mov eax,dword ptr [rdx+rcx]

SYMBOL_STACK_INDEX: 3

SYMBOL_NAME: dmic_x64+20c5

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: dmic_x64

IMAGE_NAME: dmic_x64.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 5334c446

FAILURE_BUCKET_ID: X64_0x50_dmic_x64+20c5

BUCKET_ID: X64_0x50_dmic_x64+20c5

Followup: MachineOwner

Is dmic_x64 your driver?

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

Please note that the bsod has never occurred in a debugging environment
(dump below from production user).

2: kd> !analyze -v
*******************************************************************************
*
*
* Bugcheck Analysis
*
*
*
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by
try-except,
it must be protected by a Probe. Typically the address is just plain bad or
it
is pointing at freed memory.
Arguments:
Arg1: fffffa810a99ee42, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff880058020c5, If non-zero, the instruction address which
referenced the bad memory
address.
Arg4: 0000000000000005, (reserved)

Debugging Details:

Could not read faulting driver name

READ_ADDRESS: GetPointerFromAddress: unable to read from fffff8000330b100
fffffa810a99ee42

FAULTING_IP:
dmic_x64+20c5
fffff880`058020c5 8b040a mov eax,dword ptr [rdx+rcx]

MM_INTERNAL_CODE: 5

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

BUGCHECK_STR: 0x50

PROCESS_NAME:

CURRENT_IRQL: 0

TRAP_FRAME: fffff8800dec3660 – (.trap 0xfffff8800dec3660)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=fffffa800e4b5cfa rbx=0000000000000000 rcx=fffffa810e47fde8
rdx=fffffffffc51f05a rsi=0000000000000000 rdi=0000000000000000
rip=fffff880058020c5 rsp=fffff8800dec37f8 rbp=0000000000000000
r8=00000000ffffbdec r9=0000000000000000 r10=0000000000000000
r11=fffffa800e484000 r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
dmic_x64+0x20c5:
fffff880058020c5 8b040a mov eax,dword ptr [rdx+rcx] ds:fffffa810a99ee42=???
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff8000314f873 to fffff800030d1ec0

STACK_TEXT:
fffff8800dec34f8 fffff8000314f873 : 0000000000000050 fffffa810a99ee42
0000000000000000 fffff8800dec3660 : nt!KeBugCheckEx
fffff8800dec3500 fffff800030cffee : 0000000000000000 fffffa810a99ee42
0000000000000000 00000000ffffbdec : nt! ?? ::FNODOBFM::string'+0x43801 fffff8800dec3660 fffff880058020c5 : fffff880058011eb fffffa800b2260f0 fffff88005cd65fb fffffa800b2260e8 : nt!KiPageFault+0x16e fffff8800dec37f8 fffff880058011eb : fffffa800b2260f0 fffff88005cd65fb fffffa800b2260e8 0000000000001680 : dmic_x64+0x20c5 fffff8800dec3800 fffffa800b2260f0 : fffff88005cd65fb fffffa800b2260e8 0000000000001680 fffffa800adad060 : dmic_x64+0x11eb fffff8800dec3808 fffff88005cd65fb : fffffa800b2260e8 0000000000001680 fffffa800adad060 0000000000000001 : 0xfffffa800b2260f0
fffff8800dec3810 fffff8800580618f : 0000000000000000 00000000000014c6
fffffa800a99f000 fffffa800f6cab40 :
ksthunk!CKernelFilterDevice::FileObjectToFilterFile+0x6b
fffff8800dec3840 0000000000000000 : 00000000000014c6 fffffa800a99f000
fffffa800f6cab40 fffffa800f6cab40 : dmic_x64+0x618f

STACK_COMMAND: kb

FOLLOWUP_IP:
dmic_x64+20c5
fffff880`058020c5 8b040a mov eax,dword ptr [rdx+rcx]

SYMBOL_STACK_INDEX: 3

SYMBOL_NAME: dmic_x64+20c5

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: dmic_x64

IMAGE_NAME: dmic_x64.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 5334c446

FAILURE_BUCKET_ID: X64_0x50_dmic_x64+20c5

BUCKET_ID: X64_0x50_dmic_x64+20c5

Followup: MachineOwner

Yes, it is the driver described in the first post.
Should there be additional diagnostics here?

I have the pdb and .sys in the symbol directory windbg is using and have confirmed that the driver is loaded locally.

xxxxx@gmail.com wrote:

Please note that the bsod has never occurred in a debugging environment (dump below from production user).

Note the actual values here:

TRAP_FRAME: fffff8800dec3660 – (.trap 0xfffff8800dec3660)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rcx=fffffa810e47fde8
rdx=fffffffffc51f05a

dmic_x64+0x20c5:
fffff880058020c5 8b040a mov eax,dword ptr [rdx+rcx] ds:fffffa810a99ee42=???

The value in rcx might be a real data structure, but you’re adding a
very large negative value to it.

You do need to figure out where in your driver this occurs. It looks
like an array index problem. So, do you really believe you have set
.sympath to the directory that contains the correct PDB? Remember that
your PDB must be from the exact same build that produced the SYS file in
this dump. It can’t be close. The timestamps must be the same.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I was reasonably sure until you asked. Doubling checking, I found that the pdb and sym did not have the correct name (as the build process renames the output for signing). O.o!

After correcting that, I now have a much more useful dump (did I mention I was novice)!

This seems to clearly indicate where the problem is (now I just need to determine what condition leads to the issue - which should be in my wheelhouse)

Thanks for the patience and assistance!

0: kd> !analyze -v
*******************************************************************************
* *
* Bugcheck Analysis *
* *
*******************************************************************************

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffffa80158ac000, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff8800f6c9faa, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000000, (reserved)

Debugging Details:

Could not read faulting driver name

READ_ADDRESS: GetPointerFromAddress: unable to read from fffff800032c3100
fffffa80158ac000

FAULTING_IP:
dmic_x64!memcpy+ca [d:\winmain\minkernel\crts\crtw32\string\amd64\memcpy.asm @ 205]
fffff880`0f6c9faa 4c8b540af8 mov r10,qword ptr [rdx+rcx-8]

MM_INTERNAL_CODE: 0

CUSTOMER_CRASH_COUNT: 1

DEFAULT_BUCKET_ID: VISTA_DRIVER_FAULT

BUGCHECK_STR: 0x50

PROCESS_NAME: xxxxxx

CURRENT_IRQL: 0

TRAP_FRAME: fffff8800b3ae660 – (.trap 0xfffff8800b3ae660)
NOTE: The trap frame does not contain all registers.
Some register values may be zeroed or incorrect.
rax=0000000000000000 rbx=0000000000000000 rcx=fffffa800ee0d530
rdx=0000000006a9ead6 rsi=0000000000000000 rdi=0000000000000000
rip=fffff8800f6c9faa rsp=fffff8800b3ae7f8 rbp=0000000000000000
r8=0000000000005ba4 r9=0000000000000242 r10=00000000000001d7
r11=fffffa800ee0c1aa r12=0000000000000000 r13=0000000000000000
r14=0000000000000000 r15=0000000000000000
iopl=0 nv up ei ng nz na po nc
dmic_x64!memcpy+0xca:
fffff8800f6c9faa 4c8b540af8 mov r10,qword ptr [rdx+rcx-8] ds:3780:fffffa80158abffe=???
Resetting default scope

LAST_CONTROL_TRANSFER: from fffff80003102c07 to fffff80003086a40

STACK_TEXT:
fffff8800b3ae4f8 fffff80003102c07 : 0000000000000050 fffffa80158ac000 0000000000000000 fffff8800b3ae660 : nt!KeBugCheckEx
fffff8800b3ae500 fffff80003084b6e : 0000000000000000 fffffa80158ac000 fffffa800d586a00 00000000ffffa29c : nt! ?? ::FNODOBFM::string'+0x41c2f fffff8800b3ae660 fffff8800f6c9faa : fffff8800f6c6642 fffffa800b0520f0 fffff880061fa5fb fffffa800b0520e8 : nt!KiPageFault+0x16e fffff8800b3ae7f8 fffff8800f6c6642 : fffffa800b0520f0 fffff880061fa5fb fffffa800b0520e8 0000000000000348 : dmic_x64!memcpy+0xca [d:\winmain\minkernel\crts\crtw32\string\amd64\memcpy.asm @ 205] fffff8800b3ae800 0000000000000000 : 0000000000000000 0000000000000000 0000000000000000 00000000`00000000 : dmic_x64!CAudioBuffer::WriteData+0x76 [c:xxxx\baseaudiobuffer.cpp @ 230]

STACK_COMMAND: kb

FOLLOWUP_IP:
dmic_x64!memcpy+ca [d:\winmain\minkernel\crts\crtw32\string\amd64\memcpy.asm @ 205]
fffff880`0f6c9faa 4c8b540af8 mov r10,qword ptr [rdx+rcx-8]

FAULTING_SOURCE_CODE:
No source found for ‘d:\winmain\minkernel\crts\crtw32\string\amd64\memcpy.asm’

SYMBOL_STACK_INDEX: 3

SYMBOL_NAME: dmic_x64!memcpy+ca

FOLLOWUP_NAME: MachineOwner

MODULE_NAME: dmic_x64

IMAGE_NAME: dmic_x64.sys

DEBUG_FLR_IMAGE_TIMESTAMP: 553f8a59

FAILURE_BUCKET_ID: X64_0x50_dmic_x64!memcpy+ca

BUCKET_ID: X64_0x50_dmic_x64!memcpy+ca

Followup: MachineOwner