The NT Insider:Defensive Driver Writing - Watch Out For the Other Guy...

Everything Windows Driver Development

Thu, 14 Mar 2019 118020 members

Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
Downloads
ListServer / Forum


	Express Links

	·	The NT Insider Digital Edition - May-June 2016 Now Available!
	·	Windows 8.1 Update: VS Express Now Supported
	·	HCK Client install on Windows N versions
	·	There's a WDFSTRING?
	·	When CAN You Call WdfIoQueueP...ously

THE NT INSIDER

Defensive Driver Writing - Watch Out For the Other Guy...
The NT Insider, Vol 6, Issue 2, Mar-Apr 1999 | Published: 15-Apr-99| Modified: 16-Aug-02

Many of us who have been writing drivers for “long enough” have seen drivers of varying degrees of supportability. On the one extreme are those drivers that are virtually impossible to debug (usually blamed on someone who left the company). These drivers often have minimal comments (or, worse, comments that don’t go with the code), a monolithic structure, and are a structure so fragile that as soon as anyone touches anything, something else breaks. And then there are those drivers that are a pleasure to work on. These drivers seem to find their own bugs (well, almost). This article is designed to help those who are new to driver writing as well as those more experienced driver writers by highlighting some of the tools that the Windows NT driver writers can use to make their drivers “defensive drivers”.

I’m going to ignore the really basic software engineering stuff that applies to writing any software, such as the need to add comments and write well-structured code. While these are important to improve the supportability of your driver, I’m going to assume that all of you know about these. If you want more information about improving the readability of your driver, you can find several decent software engineering books at the local library or bookstore. A couple of my favorites are Writing Solid Code (by Steve Mcguire, Microsoft Press – a good basic book for beginners), and Code Complete: A Practical Handbook of Software Construction (by Steve McConnell, Microsoft Press).

Instead of the basic precepts of software engineering, in this article I’m going to focus on specific techniques for preventing ugly driver problems. Developing and supporting drivers requires a whole set of specific techniques. This is because many driver problems can be very difficult to find and debug. However, by using just a few tried-and-true techniques you can prevent, or at least minimize the impact of, many of the ugliest driver problems.

Preventing Memory Corruption

One of the worst problems to debug in a driver is random memory corruption. You normally see this type of problem as your system is “randomly” crashing, just when everything seemed to be working correctly. Unfortunately, during this time, a driver could scribble to some random location in memory.

Usually after some period of time following the scribble, bad things start happening. Those bad things could be a BSOD, or worse (file corruption), or much worse (trashed disk). As it usually turns out, the driver at fault is not easily identifiable as the culprit, since it usually lays down its “land-mines” and quietly goes on to do something else. Subsequently, some other driver trips over the corruption and blows up.

When memory corruption occurs, it’s often due to a driver corrupting storage in the paged or non-paged pool. This article will give you some techniques for preventing pool corruption in your driver and finding out as soon as possible following the corruption of pool.

At OSR, we have implemented our own memory allocation and deallocation routines (OsrAllocMemory(), and OsrDeallocMemory()) that call the Windows NT memory allocation and deallocation routines. This allows us to do extensive memory analysis on each allocation and deallocation, and alert the driver writer to problems immediately after they occur. By putting the bulk of the verification logic in these routines, picking up all the added memory validation is as simple as changing ExAllocMemory() and ExFreePool() calls to OsrAllocMemory() and OsrDeallocMemory() calls.

Fill Allocated and Deallocated Memory with Special Pattern

One of the most common problems that we encounter involves a reference to memory after that memory has been previously deallocated. After it is deallocated, it is possible that the same block of memory might be allocated by another driver. With both drivers thinking they own the same block of memory, and subsequently modifying and referencing that same block of memory, “bad things” are guaranteed to happen. In some cases, the problem might not show up right away, especially if the memory block is reused as the same type of data structure.

An example of when this type of pool corruption occurs might be when a driver frees up a data structure in pool, and then later references that same structure. Take, for instance, the case in WDM when during PnP Remove processing, a driver deletes its Device Object by calling IoDeleteDevice(). This of course also deletes the related Device Extension. But what happens when the driver subsequently refers to the Device Extension? Very likely, pool corruption!

One way to detect this problem almost immediately is to fill memory with a specific pattern when the memory is deallocated. At OSR, we use the alternating bit pattern (0xa5a5a5a5), but almost any pattern can be used. We recommend against zero filling the memory block. Zeroed data too closely resembles memory in use, and is not easy to visually identify as deallocated memory when inspecting memory. By doing this, any locations in the free memory that might have been a pointer to another structure are immediately rendered invalid.

Incidentally, we also fill allocated memory with a special pattern. This forces our drivers to initialize their allocated memory blocks before they are used, and helps us ensure that no “old data” remains in the allocated block.

Pool Headers/Trailers

Another very common problem is a driver walking off the end of an allocated block of memory. This can occur for a variety of reasons, including the infamous “off-by-one” errors, allocating n characters instead of n wide-characters (has anybody not done that?) and general pointer arithmetic errors. In our memory allocation routine we add a header and trailer to each block of memory that we allocate (see Figure 1). Within the header and the trailer, we store a predefined value (a Magic Number). When it is time to deallocate the block of memory, we check to make sure the Magic Numbers in the header and the trailer are still intact. If not, we issue a debug message or breakpoint (depending on the debug level).

Also, when the memory block is deallocated, we change the Magic Numbers to different values, so that we can detect that a block of memory has been deallocated twice. If the block being deallocated has a Magic Number indicating that it is already been deallocated, we cause a debug message or a breakpoint.

<![endif]>

Figure 1 -- Tracking Allocated Memory Blocks

Another useful method of detecting memory problems is to maintain a list of currently allocated memory within the driver. As a driver allocates a block of memory, the memory block (including its header and trailer, as described previously) is put in a list of allocated memory. When the driver deallocates the memory, it is taken out of the list. Having a list of allocated memory allows the driver to periodically scan the allocated memory, looking for memory problems such as those we’ve already discussed. For example, if a memory block on the allocated memory list does not have a valid Magic Number in the header and trailer, there is sure to be a memory problem, and either a debug message will be displayed, or a break point will occur. Depending on the desired debug level, the allocated memory can be scanned frequently (such as every allocation or deallocation), or less frequently.

Storing Tracking Information

In the event that a memory corruption is found during a scan through the allocated memory list, it would be helpful if we had more information about the use of the memory block. Information such as where the memory block was allocated, from which pool (paged or non-paged) it was allocated, and what part of the driver was using it can be useful for debugging the memory problem. In the OSR memory allocation routines, we store a number of pieces of information in the memory block header that can be useful in debugging memory problems. We save the caller’s address, as well as the caller’s caller’s address. We also save the pool type (paged or non-paged) where the memory block was allocated, and a special “tag” that the driver writer can use to label the memory block.

Our memory block headers are defined by the following structure:

// Private memory allocation control header block

typedef struct {

LIST_ENTRY Link;

ULONG MagicNumber;

ULONG Size;

POOL_TYPE Type;

ULONG Reserved;

PVOID CallersAddress;

PVOID CallersCallersAddress;

ULONG Tag;

ULONG FileId;

} OSR_ALLOC_BLOCK, *POSR_ALLOC_BLOCK;

Our memory block trailers are defined as:

// Private memory allocation control trailer block

typedef struct {

ULONG MagicNumber;

} OSR_TRAILER_BLOCK, *POSR_TRAILER_BLOCK;

We find that by carefully tracking and checking allocated pool blocks, we can eliminate many common memory scribble problems… and find many others in as short a time as possible.

Pool tags

Pool tags are four character values that can be used to “label” a pool block that is allocated. The driver writer should assign a different pool tag for each different type of data structure that it allocates. By using multiple pool tags, it is easier to match a faulty memory block with the place it was allocated and how it is being used.

Let’s look at an example. If you use the following call in your driver:

pBuffer = ExAllocatePoolWithTag ( NonPagedPool, SizeInBytes, ‘ArsO’ );

NT will allocate a block of NonPaged memory of the indicated size for your driver, and associate it with with a label ‘OsrA’ (the tag appears “backwards” in the above example because of the way Intel (little endian machine) loads the data). In the checked build of NT, if you don’t call ExAllocatePoolWithTag(),and instead simply call ExAllocatePool(), the DDK actually calls ExAllocatePoolWithTag() with the tag “DDK”.

Once allocation operations are associated with a pool tag, a pool monitoring application can be used to display information about the allocated pool fragments associated with each pool tag. The number of allocation and deallocation operations that take place per pool tag, for each of the two pools, as well as the total amount of pool space allocated for each pool tag can be displayed.

The pool monitoring application supplied as part of the standard NT DDK is Poolmon. Another (the one that we use at OSR) is PoolTag which can be downloaded from the OSR website (www.osr.com).

If you use PoolTag, you can watch the memory allocation for tag OsrA to make sure it doesn’t grow endlessly, which would be an indication that there is a memory leak. For a sample output from PoolTag, see Figure 2.

Figure 2

IRQL checking

One other feature you might want to add to your memory allocation and deallocation routines (you’ve come this far, why not add one more bell and whistle), is to check the current IRQL level. If the IRQL level is DISPATCH_LEVEL, you should not allow the allocation of paged pool. The reason is that allocating paged pool at DISPATCH_LEVEL might cause a page fault. Page faults at DISPATCH_LEVEL or above will cause a nice Bug Check IRQL_NOT_LESS_OR_EQUAL which may not be obvious to the driver writer. It would be better to display a more useful message and maybe overriding the request to allocate Non paged pool.

Memory Enhancements in Windows NT

V4.0 of Windows NT includes functionality to help driver writers with memory corruption problems. For example, ExFreePool() in Windows NT V4.0 and later, checks for ERESOURCES in the block of memory being freed. If there is an ERESOURCE that hasn’t been released, Windows NT will bugcheck.

Even seasoned NT driver writers are often surprised to learn that the checked build of Windows NT V4.0 supports a number of pool checking features similar to those described earlier, including deallocation checking and tail checking. These features, however, are only present in the checked build of NT, and must be manually enabled using a program such as GFLAGS (located in the Windows NT Resource Kit). Using GFLAGS, set Enable Pool Tail Checking and Enable Pool Free Checking in the Registry and reboot the system. See Figure 3 for the list of GlobalFlags. Also refer to Microsoft Knowledge Base Articles Q147314 and Q164933 for more information. We have found that combining the support provided by these flags with the assistance of your own pool-tracking scheme can reduce the incidence of pool corruption to near zero.

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\GlobalFlag:

FLG_STOP_ON_EXCEPTION 0x00000001

FLG_SHOW_LDR_SNAPS 0x00000002

FLG_DEBUG_INITIAL_COMMAND 0x00000004

FLG_STOP_ON_HUNG_GUI 0x00000008

FLG_HEAP_ENABLE_TAIL_CHECK 0x00000010

FLG_HEAP_ENABLE_FREE_CHECK 0x00000020

FLG_HEAP_VALIDATE_PARAMETERS 0x00000040

FLG_HEAP_VALIDATE_ALL 0x00000080

FLG_POOL_ENABLE_TAIL_CHECK 0x00000100

FLG_POOL_ENABLE_FREE_CHECK 0x00000200

FLG_POOL_ENABLE_TAGGING 0x00000400

FLG_HEAP_ENABLE_TAGGING 0x00000800

FLG_USER_STACK_TRACE_DB 0x00001000

FLG_KERNEL_STACK_TRACE_DB 0x00002000

FLG_MAINTAIN_OBJECT_TYPELIST 0x00004000

FLG_HEAP_ENABLE_TAG_BY_DLL 0x00008000

FLG_IGNORE_DEBUG_PRIV 0x00010000

FLG_ENABLE_CSRDEBUG 0x00020000

FLG_ENABLE_KDEBUG_SYMBOL_LOAD 0x00040000

FLG_DISABLE_PAGE_KERNEL_STACKS 0x00080000

FLG_HEAP_ENABLE_CALL_TRACING 0x00100000

FLG_HEAP_DISABLE_COALESCING 0x00200000

FLG_VALID_BITS 0x003FFFFF

Figure 3

In Windows NT V5 (Windows 2000), read-only memory blocks (for example, those containing executable pages) will be enforced. If there is an attempt to write to these pages, NT will throw an exception. This will help reduce the problems with memory scribbles to what should be execute only code. This will be problematic for some driver writers though. There is a “technique” for stealing an address in another driver to allow a driver writer to insert functionality in the execution path of the other driver. Basically, to do this, you insert code in the other driver that causes the other driver to jump to your driver. When you are done, you jump back to the driver to let it continue processing. Unfortunately (for those who use this technique), this will now cause an exception.

Another feature, designed for Windows 2000 but also appearing in NT V4, SP4 is Special Pool. When Special Pool is enabled on Windows NT, it will align the allocated memory either at the front of a page, or at the end of a page. If it is aligned at the front of a page, the system will cause a Bug Check if there is an attempt to write before the start of the allocated memory. Likewise, if it is aligned at the tail of a page, the system will Bug Check if there is an attempt to write past the end of the allocated memory block. A word of caution though… if Special Pool is enabled, the Windows NT memory allocation routines will allocate a minimum of one page per allocated memory block. So you can see how this will quickly eat up pool. It’s best not to enable Special Pool on your system unless you have lots of physical memory. One nice feature of Special Pool is that it can be enabled on a per-driver or per-pooltag basis.

In Summary

We’ve discussed a few techniques that can help reduce driver problems. Sure, it takes some time to implement these features on the front end, early in a development project. And, we all know that when you start a new project, the last thing you feel like writing is boring infrastructure code. However, it’s often such code that makes the difference between a project being on time, and being hopelessly lost. After all, it pays to practice Defensive Driver Writing.

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

	Post Your Comments.
	Print this article.
	Email this article.