Bugchecks Explained: PAGE_FAULT_IN_NONPAGED_AREA
OSR Staff | Published: 24-Aug-04| Modified: 24-Aug-04
To understand this bugcheck code, it’s first necessary to understand what a "page fault" is. If you’re not completely sure you understand this concept, read the article So, Exactly What Is A Page Fault here at OSR Online.
The Windows Memory Manager reserves pre-defined ranges of kernel virtual address space for specific uses. Because the Windows operating system utilizes virtual memory, the Memory Manager does not necessarily assign physical memory to every possible kernel virtual address within its pre-defined ranges. The Memory Manager knows that some of its kernel virtual address ranges are used for pageable memory, and other ranges are used for non-pageable memory. For example, the kernel virtual address space that is reserved for use by the non-paged pool is (obviously) part of one of the Memory Manager’s non-pageable address spaces.
Whenever the Memory Manager detects a page fault (that is, a failure to translate a kernel virtual address to a physical address) in one of its pre-assigned address ranges in which the memory is supposed to be non-pageable, it halts system execution with a PAGE_FAULT_IN_NONPAGED_AREA bugcheck.
The only thing that can cause one of these page faults is an inadvertent reference by a kernel mode component to an invalid memory address that just happens to correspond to one of the Memory Manager’s pre-assigned non-pageable address ranges. The most common reason for this bugcheck is a driver de-referencing a bad pointer.
There are basically innumerable things that can happen that can lead to an invalid memory access, so tracking down these bugchecks can sometimes be particularly difficult. Some of the most common reasons for these bugchecks are buffer overruns and underruns, or accessing of a completely bogus address.
Who Did It?
When analyzing these crash dumps, it is either immediately obvious who caused the problem or it can take some serious detective work. The bugcheck parameters for this particular code are the invalid address that was accessed, whether the access was a read or a write, and the address of the instruction that caused the invalid access. Here are some things to think about when analyzing these dumps:
1) Why is the address bad?
a) Was it previously freed? The !pool WinDBG command can be helpful in determining this.
b) Is this potentially a buffer underrun? A buffer overrun? To determine this, you will need to look at how the address is being used. If, for example, the address is being used in a copy operation, starting your analysis believing it to be a buffer overrun might not be a bad assumption (but just don’t forget that it might not be the right assumption!).
c) Is the address just completely bogus? The !pool command is also useful here, as is the !pte command
2) Where did the address being accessed come from?
3) At which point did the address become bad? Was it previously used successfully by another component?
Using the information gathered from the above steps you can usually begin to get a better idea as to where things went wrong.
How Should I Fix It?
Using Driver Verifier and the checked build of Windows should allow you to better pinpoint the offending driver in the system. If the driver is not a driver that you have any control over, the only available option is disabling the driver until a fixed version is available.
Related WinDBG Commands
Related Windows O/S Variables
Here’s an example that puts the above guidelines to use and tracks down a misbehaving driver. For clarity, the WinDBG output in this example has been stripped down to the parts important to our discussion.
Invalid system memory was referenced. This cannot be protected by try-except,
it must be protected by a Probe. Typically the address is just plain bad or it
is pointing at freed memory.
Arg1: ff8b6000, memory referenced.
Arg2: 00000000, value 0 = read operation, 1 = write operation.
Arg3: 804238fd, If non-zero, the instruction address which referenced the bad memory address.
Arg4: 00000000, (reserved)
READ_ADDRESS: ff8b6000 Nonpaged pool
804238fd f3a5 rep movsd
bed4ec7c 804b06e7 811bba08 bed4ecc4 bed4ecb8 nt!IopCompleteRequest+0xab
bed4eca4 804ac360 8143e4d0 80000005 81158f88 nt!IopSynchronousServiceTail+0x8f
bed4ed48 80466389 0000084c 0155f8c8 0155f8b0 nt!NtQueryVolumeInformationFile+0x320
bed4ed48 77f8e593 0000084c 0155f8c8 0155f8b0 nt!KiSystemService+0xc9
0155f870 767ebb9f 0000084c 0155f8c8 0155f8b0 ntdll!ZwQueryVolumeInformationFile+0xb
OK, so our system bugchecked because we tried to read address 0xFF8B6000, presumably while trying to complete an IRP_MJ_QUERY_VOLUME_INFORMATION IRP. The address looks reasonable and the bugcheck info is telling me that the address is in the nonpaged address space, so let’s see what the debugger says about the address:
0: kd> !pool ff8b6000
ff8b6000: Unable to get contents of pool block
That wasn’t much help, but because the !pool command didn’t tell me that the address had been freed, I’m going to assume that we’re not dealing with a memory access to freed pool. This may be a completely invalid assumption, but it allows me to move on for the moment.
0: kd> !pte ff8b6000
FF8B6000 - PDE at C0300FF8 PTE at C03FE2D8
contains 01036963 contains 7F8BD000
pfn 1036 G-DA--KWV not valid
The page table entry for the nonpaged address that we accessed is invalid, which is why the system bugchecked. The faulting IP from the bugcheck info is a rep movsd instruction, which is a copy instruction on the x86. So, I’m going to assume for the time being that this bugcheck occurred because of a buffer overrun. With that info in hand, I can move on to step two and figure out where the address came from.
Looking in the DDK documentation, I see that IRP_MJ_QUERY_VOLUME_INFORMATION IRPs all use METHOD_BUFFERED. Therefore, when I find the IRP that is being completed here, its data buffer is going to be at Irp->AssociatedIrp.SystemBuffer. Now, unfortunately, the last two calls on the call stack that I’ve been given aren’t documented. This means that I have no idea what their parameters are and so I have no idea where to find the IRP. Because of that, I have to find the IRP the hard way. Dumping all of the memory contents starting at the last frame’s EBP (0xBED4EC7C) and executing the !irp command on anything that looks like an IRP eventually leads to success:
0: kd> !irp 811bb9c8
Irp is active with 1 stacks 3 is current (= 0x811bba80)
No Mdl System buffer = ff8b5fe8 Thread 813ad980: Irp is completed.
cmd flg cl Device File Completion-Context
[ a, 0] 0 0 8143e4d0 00000000 00000000-00000000
Args: 00000000 00000000 00000000 00000000
That system buffer address looks awfully suspect in terms of the address that generated the blue screen, so let’s see what we can find out about it:
0: kd> !pool ff8b5fe8
*ff8b5fe0 size: 20 previous size: 20 (Allocated) Process: 81457020
0: kd> ? ff8b5fe0+0x20
Evaluate expression: -7643136 = ff8b6000
I can see here that the allocation that the system buffer address lies in is valid from 0xFF8B5FE0 up to but not including 0xFF8B6000, the address that killed us. Taking a look at the completion status of the IRP:
0: kd> dt nt!_IRP 811bb9c8 –r
+0x018 IoStatus :
+0x000 Status : 0x80000005
+0x000 Pointer : 0x80000005
+0x004 Information : 0x1c
Aha! The offending driver in this case has returned STATUS_BUFFER_OVERFLOW with the Information field set to the number of bytes needed to complete this request. Unfortunately, what this driver writer didn’t realize was that STATUS_BUFFER_OVERFLOW is simply an informational message and the I/O Manager will go ahead and copy the number of bytes specified by Information out of the system buffer and into the user’s buffer (remember, these IRPs are all METHOD_BUFFERED). Note that these are not the same semantics that you get when returning the error code STATUS_BUFFER_TOO_SMALL, which is what the developer meant to return. If a driver completes an IRP with that code, the user is returned the number of bytes needed to complete the request but no data is copied.
So, because of this mishap, when the IRP was completed with IoCompleteRequest the I/O Manager attempted to copy 0x1C bytes of data starting at address 0xFF8B5FE8 into the user’s buffer, which led to a buffer overrun and a PAGE_FAULT_IN_NONPAGED_AREA bugcheck.
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment
It's very good!Thank you for you,i have learned much
22-Aug-11, ming li
"this article is really helpful"
This article is really helpful for driver developers. We hope you proceed to post similar kind of bugcheck explanations.
25-Sep-06, Srilatha Bala
Excellent bug trace.
31-Mar-05, Peter Trinh
"RE: How do I lock the driver code to Non paged pool ?"
By default, all driver code is non paged. The only way driver code becomes pageable is if it is explicitly marked as pageable via a #pragma or it is paged out using the MmPageEntireDriver DDI. This is all discussed under "Making Drivers Pageable" in the DDK (note that that is the name of the section in the 3790 DDK, it may be different in earlier DDKs).
13-Sep-04, Scott Noone
"How do I lock the driver code to Non paged pool ?"
How do I ensure the code executed at IRQL above DISPATCH_LEVEL, does not page fault. How can I lock that part of the code to Non paged pool ?
12-Sep-04, Mohamed Husain
Excellent article, no loose ends. Keep it up!
29-Aug-04, Erwin Zoer
(1) Again an excellent article. I really like the use of the examples of using Windbg to analyze the BugCheck. You get to learn debugging techniques and they are very very good. Thanks much!!!
25-Aug-04, William Jones
"Bugchecks Explained: PAGE_FAULT_IN_NONPAGED_AREA"
Great article ! I'm not a driver addict, I just like to program using the DDK, and I'm trying to write a little storage class "driver".
Nevertheless, article like these really help understanding the way the operating system works.
thaks a lot !
25-Aug-04, David Landelle