Print an article from OSR Online

The NT Insider

Rolling Your Own - Building IRPs to Perform I/O
(By: The NT Insider, Vol 4, Issue 1, Jan-Feb 1997 | Published: 15-Feb-97| Modified: 22-Aug-02)

Click Here to Download: Code Associated With This Article Zip Archive, 20KB

One frequent question heard from NT driver writers is how to perform I/O operations from within their driver. This question appears in one of two forms: either how to do I/O when the only thing available is a FILE_OBJECT but the Zw routines require a file handle; or why the handle returned from ZwCreateFile cannot be used by their driver later. The real question is how to do I/O from their driver, typically in multiple thread contexts.

Of course, there are several possible ways of solving this problem, but one key way is for your driver to actually build its own I/O requests(IRPs). The IRP contains all necessary information for an I/O operation to be completed in arbitrary thread context. IRPs rely upon FILE_OBJECTs and DEVICE_OBJECTs which are valid in any context, rather than file handles which are only valid in a particular process context. This article describes and demonstrates various ways you can accomplish this. Everything we describe is based upon material and information in the DDK and you do not need to use any undocumented kernel APIs.

There are a variety of reasons why your driver might need to construct its own IRPs. These include communicating with the file systems to perform I/O on the files, or to take advantage of the kernel-only features supported by NT file systems. Perhaps your driver is augmenting the functionality of an existing device, such as is done by the NT fault tolerant driver (FTDISK). Perhaps you have two cooperating drivers where you need to call between them. Or, perhaps you are implementing a physical file system on Windows NT and need to communicate with a media driver, or transport driver. Whatever the reason, building your own IRPs and passing them is the best way to accomplish this task in Windows NT.

In this article we will start by describing how to build IRPs "on your own" and then we?ll describe some of the routines provided by the I/O Manager to further ease the task of allocating and managing IRPs.

Allocation

IRPs can be allocated one of two ways. The simplest way is to call IoAllocateIrp(...). The I/O Manager will allocate an IRP with the appropriate number of I/O Stack Locations (which you specify in the call.) This is the simplest, and most common way to allocate I/O requests. Warning to the unwary, however: do not call IoInitializeIrp(...) if you called IoAllocateIrp(...). The DDK documentation has led many innocent victims astray on this point. In NT 3.51, the I/O Manager cleared an important IRP field (the Zoned flag), which would cause a subsequent call to IoFreeIrp(...) to call ExFreePool(...) on that IRP. If the IRP was from the "zone" rather than pool, this would corrupt non-paged pool.

There is a reason that IoInitializeIrp(...) is exported by the I/O Manager. Your driver might wish to create its own IRPs. One way to do this is to allocate memory from non-paged pool using ExAllocatePool(...). You initialize the newly allocated pool into the proper IRP format by calling IoInitializeIrp(...). Your driver can use IoSizeOfIrp(...) to compute the correct amount of pool to allocate for the IRP itself. Once you have allocated the IRP from non-paged pool, you can then call IoInitializeIrp to set up the fields within the IRP itself. An example of doing this is shown in Figure 1.

PIRP MyAllocateIrp(CCHAR NumberOfStackLocations) {
        USHORT IrpSize = IoSizeOfIrp(NumberOfStackLocations);

        Irp = ExAllocatePool(NonPagedPool, IrpSize);

        if (!Irp) {
            return 0; // failure

        }
        IoInitializeIrp(Irp, IrpSize, NumberOfStackLocations);
        return Irp;
}

Figure 1

Normally, drivers rely upon the I/O Manager for the allocation and management of IRPs. There are instances, however, when it becomes more efficient to have your driver allocate and manage its own IRPs.

When NT first starts up, the I/O Manager builds two look-aside lists: one for IRPs with a single I/O Stack location, and one with four I/O stack locations. When you allocate IRPs with larger stack sizes, or when the look-aside lists are empty, the I/O Manager allocates new IRPs from non-paged pool. If you know that you will be using IRPs requiring more than four I/O stack locations, you can improve performance somewhat by keeping IRPs in your own free list (in NT 3.51 this might be a list or a zone, in NT 4.0 a non-paged look-aside list). That is, your driver could create a pool of IRPs when it first started and keep them on a private list. When your driver needs an IRP it can allocate it from the list, and return it to the list when the I/O operation is done, thus eliminating the overhead of allocating and freeing pool memory. We?ve shown a simple skeleton for doing this in the code examples on the OSR web page in the file "roll.c".

How do you convince the I/O Manager to give that IRP back to your driver so it can be returned to your look-aside list? Simply use an I/O completion routine. Looking under the covers of how I/O Completion routines really work reveals that even though your driver doesn?t have an I/O stack location in the IRPs it creates, it can register an I/O Completion routine.

Why is this? If you think about how completion routines are used, you should realize that the last (lowest level) driver to be called doesn?t ever require an I/O completion routine (after all, it will be the driver completing the I/O request). So, the last I/O stack location can be used to store the completion routine of the next-to-the-last driver. Continuing this process up to the top of the driver call chain will give us one extra completion routine we could handle, and this is available to the original creator of the IRP.

For the creator of the IRP, setting up this I/O completion routine is identical to how intermediate drivers set up their I/O Completion routines (simply a call to IoSetCompletionRoutine(...)). We?ll describe how to build your completion routines later in this article.

Building

Once you?ve allocated your IRP, either using IoAllocateIrp, or from your own look-aside list, you must initialize the I/O request to indicate what operation you are requesting of the lower driver. The work your driver must do here is the same you would implement when calling a lower-level driver anyway. Simply set up the parameters block for the next driver. The additional work is that your driver is now responsible for initializing the other fields within your IRP
(Table 1).

Parameter

Description

MdlAddress

this field will point to the MDL containing the data (if any)

Flags

any appropriate flags (c.f., ntddk.h for the IRP_ flags)

AssociatedIrp.SystemBuffer

any data buffer for this I/O request

RequestorMode

UserMode or KernelMode. Typically, this is UserMode if the arguments being passed should be validated, KernelMode otherwise.

UserBuffer

any data buffer for this I/O request

Tail.Overlay.Thread

the PETHREAD for the original requestor

Table 1

Of course, some of these fields will not be needed for your particular I/O operation (such as the MdlAddress, AssociatedIrp.SystemBuffer, and UserBuffer arguments, only one of which is likely to be used by your driver). Of course, the specfic one to use is going to vary depending upon the exact I/O operation being performed.

The Tail.Overlay.Thread data structure is only important for certain types of devices, such as removable media drives, so that the system knows how to handle "error pop-ups" such as the abort/retry/cancel dialog presented when media is not loaded into the drive itself.

There are several different possible IRP flags which control how underlying drivers (notably file systems) will interpret the contents of the I/O request itself(List 1).

IRP_NOCACHE ? data for this I/O request should be read from the actual backing media and not from cache.
IRP_PAGING_IO ? the I/O operation in question is performing paging I/O. This bit is used by the Memory Manager.
IRP_MOUNT_COMPLETION ? the I/O operation in question is performing a mount operation.
IRP_SYNCHRONOUS_API ? the API in question expects synchronous behavior. While synchronous behavior is advised when this bit is set, it is not required.
IRP_ASSOCIATED_IRP ? the IRP in question is associated with some larger I/O operation.
IRP_BUFFERED_IO ? the AssociatedIrp.SystemBuffer field is valid
IRP_DEALLOCATE_BUFFER ? the system buffer was allocated from pool and should be deallocated by the I/O Manager.
IRP_INPUT_OPERATION ? the I/O operation is for input. This is used by the Memory Manager to indicate a page in operation.
IRP_SYNCHRONOUS_PAGING_IO ? the paging operation should complete synchronously. This bit is used by the Memory Manager.
IRP_CREATE_OPERATION ? the IRP represents a file system create operation.
IRP_READ_OPERATION ? the IRP represents a read operation.
IRP_WRITE_OPERATION ? the IRP represents a write operation.
IRP_CLOSE_OPERATION ? the IRP represents a close operation.
IRP_DEFER_IO_COMPLETION ? the IRP should be processed asynchronously. While asynchronous behavior is advised when this bit is set, it is not required.

List 1

Use care whenever any of these bits are set in the IRP, as they will have a fundamental effect on the manner in which underlying drivers (especially file systems) treat the I/O operations.

As noted earlier, your driver is also responsible for setting up the "next" I/O stack location. It just happens to be the first I/O stack location in this case. A pointer to the first I/O stack location is retrieved by calling IoGetNextIrpStackLocation(...). This returns a pointer to the I/O stack location for the next driver (the first one) to be called. The fields your driver is responsible for initializing can be found in Table 2.

Parameter

Description

MajorFunction

the function code for the I/O to be performed

MinorFunction

a minor function code for the I/O. This field should be zero if there is no minor function code.

Flags

any flags needed to modify the behavior of the I/O operation (c.f., ntddk.h for the SL_* flags.)

DeviceObject

the device to which your driver will pass the IRP.

FileObject

the file object representing the file for this I/O operation. Note that this is only used when sending IRPs to a file system.

Table 2

The flags field is used to modify the behavior of the underlying driver when processing various I/O requests. The possible flags, the I/O operation they modify, and the purpose of the I/O operation is described in Table 3.

Flag

Associated I/O Operation

Purpose

SL_FORCE_ACCESS_CHECK

CREATE

Force a security check, even when the call originated in kernel mode.

SL_OPEN_PAGING_FILE

CREATE

The file being opened is a paging file.

SL_OPEN_TARGET_DIRECTORY

CREATE

The file being opened need not exist if the directory exists. This is used to create a file object for a subsequent rename operation

SL_CASE_SENSITIVE

CREATE

The file name should be handled in a case-sensitive fashion.

SL_KEY_SPECIFIED

READ/WRITE

The key argument is valid

SL_OVERRIDE_VERIFY_VOLUME

READ/WRITE/DEVICE CONTROL

For removable media, the I/O should be performed even though the DO_VERIFY_VOLUME bit is set in the driver?s device object.

SL_WRITE_THROUGH

READ/WRITE

The data should be written through the cache.

SL_FT_SEQUENTIAL_WRITE

READ/WRITE

??? (FT Disk)

SL_FAIL_IMMEDIATELY

LOCK

The operation should fail if the lock cannot be immediately granted

SL_EXCLUSIVE_LOCK

LOCK

The lock requested is for exclusive access to the specified range.

SL_RESTART_SCAN

DIRECTORY_CONTROL_QUERY_EA

The enumeration of the directory or EA list should start from the beginning of the list.

SL_RETURN_SINGLE_ENTRY

DIRECTORY_CONTROL_QUERY_EA

At most one entry should be returned to the caller as a result of querying the directory or EA contents.

SL_INDEX_SPECIFIED

DIRECTORY_CONTROL_QUERY_EA

The current position in the list of directories or EAs should be set based upon the specified index value.

SL_WATCH_TREE

DIRECTORY_CONTROL

For a directory change notification, the specified request is on the entire directory tree.

SL_ALLOW_RAW_MOUNT

FILE_SYSTEM_CONTROL

When processing a mount, the RAW file system should mount this drive if no other file system mounts it.

Table 3

Finally, your driver must also initialize the I/O specific parameters for the particular I/O operation. In the case of read or write, this would include the offset, length, and key values for the I/O operation.

Completion

Sometimes, when you are building your own IRPs you will provide an I/O Completion routine. The specific rules to follow here are unclear, but as with most things in NT, if you do it wrong the system will fall apart at some future time.

The most important reason to provide a completion routine is that you can then re-use the I/O operation. Less important, but another reason mentioned in the DDK documentation, is that you can free it. This eliminates the need for the I/O Manager to perform I/O completion processing on the operation. In either of these cases you tell the I/O Manager to stop I/O completion processing by returning STATUS_MORE_PROCESSING_REQUIRED from your completion routine.

So when should you NOT use a completion routine? When you don?t care what the completion status of the I/O operation actually is, or when you cannot free the IRP from your I/O completion routine. The latter case is not described in the DDK documentation but is important to correct system operation. Typically, when the I/O Manager creates an I/O operation for a thread, the IRP associated with that I/O is stored on a linked list off the thread (the "ThreadListEntry" field within the IRP). This allows NT to do I/O cleanup when the thread exits. If your driver has a completion routine which returns STATUS_MORE_PROCESSING_REQUIRED and calls IoFreeIrp(...), the IRP might still remain on the thread?s I/O list, which will guarantee significant problems sometime later. It turns out that some of the I/O Manager functions for IRP creation add the IRP to the thread?s list, while others do not. Thus, when constructing your completion routine, it might be a good idea to check and ensure that your IRP isn?t on the thread?s I/O list!

Once your completion routine returns STATUS_MORE_PROCESSING_REQUIRED, the I/O Manager stops any additional I/O processing. Thus, you are free to do just about anything you want from within your completion routine. As you might expect, there are always caveats. First, you cannot assume, from within your completion routine, that you are in the context of the thread that originally started the I/O operation. Thus, object handles and user addresses aren?t necessarily valid. Second, you cannot assume that your completion routine has been called at PASSIVE_LEVEL. Instead, you might be called at DISPATCH_LEVEL, possibly because the driver you are calling completed the I/O request from its DPC routine. Keep this in mind when you design your own I/O Completion routine, as if you need to do any complicated completion processing, you may need to do it in a worker routine just to ensure it is safe.

Reuse

We described how your driver could keep these IRPs in a look-aside list. When you no longer need the IRP your driver can place it back on the look-aside list in your completion routine. However, you might have to do some additional processing before the IRP is ready for re-use.

For example, if you called into a file system driver and specified a user buffer for the request (by setting Irp->UserBuffer), it is possible the file system driver built an MDL to describe the buffer. If that is the case, since you must perform the cleanup of the IRP, you are responsible for unmapping, unlocking, and freeing the MDL associated with the IRP. This can simply be done by obtaining the system address for the MDL via MmGetSystemAddressForMdl(...) and passing the returned address to MmUnmapLockedPages(...). Once that is done you then call MmUnlockPages(...).

For those of you who actually have looked at the MDL manipulation routines in ntddk.h you might have noticed that this procedure could be optimized by using fields within the MDL to determine if the MDL has been mapped (or locked!) at all. However, to do this you must look inside the MDL. If it hasn?t, there is no need to call MmUnmapLockedPages. Because the DDK is very clear that the MDL itself is opaque, writing code such as this risks breaking in some future release of NT. For your project, you might decide that the potential future problem is worth the short-term performance benefit.

Short-cuts

Now that we?ve described how to build your own IRPs, we?ll mention that the I/O Manager provides at least three "short cut" routines you can use to ease the pain involved. While these I/O Manager functions are less flexible than building your own IRPs they can be used to quickly build most of the IRP and you can finish the initialization within your driver. These routines are:

IoBuildAsynchronousFsdRequest(...)
IoBuildSynchronousFsdRequest(...)
IoBuildDeviceIoControlRequest(...)

None of these three calls initialize the FileObject argument within the IRP itself, and hence if you are calling into a file system, your driver will have to set that field. Now that you understand how to build your own IRP within your driver, you will be able to augment the basic IRPs built by the I/O Manager to suit your own needs.

As mentioned earlier, using a completion routine with these I/O Manager helper functions can be somewhat complicated. You cannot free the IRPs in your completion routine with two of them (IoBuildSynchronousFsdRequest(...) and IoBuildDeviceIoControlRequest(...)), but you can in the third (IoBuildAsynchronousFsdRequest(...)). This is because the first two routines add the IRP to the thread?s IRP list. Since there is no I/O Manager call to remove the IRP from that list, the only option is to allow completion of the request.

Using any one of these three greatly simplifies the creation of IRPs, but also restricts your driver to those operations supported by these helper routines. For IoBuildDeviceIoControlRequest, the only two operations which can be performed are IRP_MJ_DEVICE_CONTROL and IRP_MJ_INTERNAL_DEVICE_CONTROL. For IoBuildSynchronousFsdRequest(...) and IoBuildAsynchronousFsdRequest(...) they are only available for IRP_MJ_READ, IRP_MJ_WRITE, IRP_MJ_FLUSH_BUFFERS, and IRP_MJ_SHUTDOWN. The samples accompanying this article (available on the OSR Web Page) demonstrate the use of each of these three functions.

Of course, if you need to perform additional operations, your driver will have to

A sample on using IoBuildSynchronousFsdRequest is available in the file sync.c as part of the supplementary examples for this article on the OSR Web Page.

Extra Credit

Congratulations! If you?ve made it this far you are now ready to show the NT world that you can build your own I/O request operations. It turns out that this can be used not only to do normal "file I/O" to a device but also to take advantage of some of the advanced features provided by the file system drivers. What are those "advanced" features you might ask? Some of them we?ve mentioned in previous articles. Two are new to NT 4.0.

DPC-based I/O

For certain types of drivers the ability to do I/O at DPC level can provide a dramatic improvement in overall performance. For example, suppose you were implementing a driver that did data collection. By queuing the I/O operation to the file system directly from your DPC, you could get data to disk in short order without relying upon an intermediate "data collection" program sitting in user mode.

How do you do this? Simple. From your DPC routine you can: build an IRP; attach an MDL which describes the data you just read from your data acquisition device; indicate that this I/O request is from a DPC routine (IRP_MN_DPC); and send the I/O operation off to the file system. The file system will return STATUS_PENDING (after all, it cannot really do the I/O at DISPATCH_LEVEL). Using this technique you can dramatically optimize the performance of your driver over the "traditional" approach of building an application program to communicate with your driver and then write the data to disk.

Use this with care! Not all file systems actually support I/O done at DISPATCH_LEVEL.

MDL-based I/O

Here?s another interesting trick which you can take advantage of in your driver when building your own IRPs. When you are doing I/O to the file system, it can provide you with an MDL which directly maps into the VM cache. By using this approach, it is possible to avoid duplicating data copy. How this is actually done varies depending upon the I/O operation. Note that this interface is not necessarily supported by all NT file systems.

With any of the MDL-based I/O routines, performing the operation requires that you make two calls, not one. Since the MDL is provided to you by the file system, you must call back into the file system in order to release that MDL once you are done with it.

Read

There?s nothing tricky about this except that you first build an IRP with no data buffer argument. By indicating the IRP_MN_MDL minor function code, you will be telling the file system that you want it to provide the MDL for you. Other than that, you build the IRP just like a normal read operation.

Upon return, your driver can use the FSD-provided MDL (to perform I/O to its device, for instance). When you are done with the MDL you call back into the FSD, indicating a read operation, but now the minor function is IRP_MN_MDL_COMPLETE. This releases the MDL back to the underlying file system.

Write

The operation of write is very similar to read, except that the MDL the FSD returns may not have any data in it. When you request an MDL from the file system for a write operation, the FSD is allowed to take advantage of the fact you will be writing that entire section of the memory. Hence it might not read whole memory pages from the disk that are going to be overwritten. Avoiding such unnecessary I/O can be a huge performance boost. Thus, your driver really should expect to overwrite the entire section of the file specified in the IRP.

Once the MDL has been returned by the file system, your driver fills in the buffer described by the MDL. When it is ready to release that MDL back to the underlying file system, you set the IRP_MN_MDL_COMPLETE minor function code and send the write IRP back to the file system.

Compressed

Another option that is now present in NT 4.0, is that a kernel mode driver can retrieve data from the file system in compressed format. To use this option a new set of values were added to the minor function values for read and write: the IRP_MN_COMPRESSED options. These "bit values" can be combined with the existing MDL operations to retrieve and store data from the file system in compressed format. This is used by SRV to allow data transmission in compressed format between NT 4.0 systems.

Conclusion

Building your own IRPs is merely another tool you can add to your arsenal of tricks to use when building real-world NT device drivers. While we encourage you to use these techniques, you should only do so when it is truly necessary. Our experience indicates that while powerful, these techniques can inject unneeded complexity and increase the time it takes to debug your project. That?s fine if you really need these features.

As we mentioned earlier in this article a set of code samples have been made available on the OSR Web Page (see Table 4 below). These are examples of how to build your own IRPs, and are not definitive statements of how it must be done. Instead, use them as a base for developing your own routines to issue I/O requests from your driver. In addition, a full driver example (the Kernel File Copy driver, or "kfc") demonstrates a simple kernel driver that takes two files and copies them completely from kernel mode using IRPs.

File

Description

async.c

Demonstrate using IoBuildAsynchronousFsdRequest

devctrl.c

Demonstrate using IoBuildDeviceIoControlRequest

roll.c

How to "roll your own" IRPs from non-paged pool.

sync.c

Demonstrate using IoBuildSynchronousFsdRequest

Table 4

This article was printed from OSR Online http://www.osronline.com