The NT Insider

I/O Manager & Vista
(By: The NT Insider, Vol 14, Issue 1, January - February 2007 | Published: 08-Feb-07| Modified: 08-Feb-07)

By the time you read this article, we'll all have woken up on January 30th, 2007 to a whole new world. High Windows Experience Index ratings will be the new currency of computer manliness. Each one of us will be required to look deep into our psyche and categorize ourselves into one of four groups (I'm a Home Premium and anyone who isn't is a lamer). Most importantly, however, is that fact those of us in the driver community will finally have had to come to grips with the fact that, yes, Windows Vista is here.

Luckily, on the surface much hasn't changed with WDM and KMDF drivers on Vista. Sure, there are some nits here and there such as the new rules for driver signing and a major revamp of the build environments, but for the most part drivers are architecturally the same. Does that mean that the only things different in Vista are the addition of Flip 3D and resizable thumbnails?

Well, on some level, yes, but...

Of course not! Vista gave the kernel devs at Microsoft the chance to make some fairly significant and interesting changes to all parts of the O/S. While we could probably choose any one of the Executive subsystems and write an entire article on it, we decided to explore the changes made to the I/O Manager, seeing as how that's the subsystem that we most often interact with as driver writers. After some whittling down, we've determined that the three most interesting changes (to us, and we wrote this article) are:

  • Enhanced I/O cancellation support
  • I/O prioritization and bandwidth reservation
  • Thread agnostic user I/O

I/O Cancellation Support
Because application hangs due to I/O not completing are such a nuisance, Vista has tried to make it easier for applications to cancel the I/O operations they've issued by adding two new cancel APIs: CancelIoEx and CancelSynchronousIo.

CancelIoEx is only of passing interest to us in this article seeing as how it's not a big departure from what CancelIo does. The main difference in the Ex version is that Windows has now given you the ability to cancel all I/O requests sent to the handle, not just the ones sent by the current thread. The Ex version also has an additional parameter that allows you to specify the exact request to cancel by supplying the overlapped structure used when sending it.

CancelSynchronousIo is a far more interesting API that allows an application to specify a thread handle and cancel any synchronous I/O request that the thread may be waiting on. Previously, the only way to cancel the synchronous I/O was to terminate the thread and force the I/O Manager to cancel all I/O requests associated with it.

CancelSynchronousIo, on the other hand, provides a slightly more elegant mechanism. What the API does is call a brand new native system service that actually queues a kernel APC to the thread. When the thread is blocked in the wait state, the kernel APC is delivered to the thread and executed, at which point the APC cancels the synchronous I/O issued by the thread. Once the APC has finished its processing, the thread returns to its original wait, which will fire either when the IRP is finally completed or the thread is aborted.

Creates Are Even More Cancellable
As you can imagine, creates are a bit of a special operation within the O/S. One particularly interesting point is that they are always handled synchronously within the I/O Manager, so there is no way to send an asynchronous create operation. Also, in previous versions of the O/S, the I/O Manager performed a non-alertable kernel wait if the create request was pended from within a driver. Therefore, even terminating the thread was not enough to cancel a create request that the user felt was taking too long.

To fix this problem, the I/O Manager now performs an alertable kernel wait when the create request has been pended by the driver. Now, if you attempt to terminate the thread waiting on the create request, the wait will be satisfied with STATUS_ALERTED and the I/O Manager will know to attempt to cancel the pending create request. This is because even though the user APC is blocked, the alert that the Process Manager fires to indicate the thread termination is not.

In addition to the I/O manager performing an alertable wait in the create path, the Multiple UNC Provider (MUP) now uses alertable waits while performing name resolution operations inside its create handler. This allows the user the ability to cancel attempts to browse machines on the network that are not responding in a timely fashion.

You might ask at this point, "Well, why not just perform a non-alertable user wait like the I/O manager does with other synchronous services?" The trouble with this approach is that user waits are subject to having their kernel stacks paged out, and due to the special path that CreateFile requests take through the operating system, this would be an undesirable side effect. (See sidebar, Why Wait Alertable? at the end of this article.)

Note: Everything we've talked about in terms of cancellation works only if the driver servicing the I/O supports cancel! The pros and cons of supporting cancel have already been discussed many times in The NT Insider, but based on Microsoft's push to take advantage of cancellation, it's probably worth your time to again think about how and if you deal with cancel in your driver. Remember the rule: If your driver can't guarantee that every IRP it receives will complete quickly (typically, this means within a couple of seconds), your driver needs to supply a cancel routine. This now includes IRP_MJ_CREATEs received by your driver.

I/O Prioritization and Bandwidth Reservation
When a driver receives an I/O operation, it's "just another thing to do" and typically what the I/O "means" to the user is long gone. For example, consider the storage stack. If the ATAPORT driver receives a flood of read requests, should those requests be processing in any particular order? If you're the user and half of those I/O requests are for search indexing purposes and half are for the FLAC audio that you're playing, the answer is obvious. However, the ATAPORT driver doesn't know which requests are the important ones. Well, at least not until Vista. Enter the concept of I/O prioritization.

I/O Prioritization
The end result of I/O prioritization is that IRPs now have a specific priority hint associated with them. This hint can be one of the five values defined in the IO_PRIORITY_HINT enumeration, with some helpful comments within the WDK's wdm.h (See Figure 1).

// Support to set priority hints on a filehandle.

typedef enum _IO_PRIORITY_HINT {
    IoPriorityVeryLow = 0, // Winfs promotion, defragging, content
                           // indexing and other background I/Os

    IoPriorityLow,         // Prefetching for applications.

    IoPriorityNormal,      // Normal I/Os

    IoPriorityHigh,        // Used by filesystems for checkpoint
                           // I/O

    IoPriorityCritical,    // Used by memory manager. Not available
                           // for applications.



Clearly, this was all invented with the storage stack in mind, considering that the comments relate the priorities to storage-related activities. However, as Figure 2 shows, the DDIs required to get or set the priority hint of an IRP are fully documented in the WDK.

__in PIRP Irp

__in PIRP  Irp,
__in IO_PRIORITY_HINT  PriorityHint

Figure 2 - DDI's To Set Priority Hint of an IRP

Note that these are just hints, so there is nothing to say that you must pay attention to the hint or that someone you send an IRP to will pay attention to the hint. However, if your driver manipulates I/O operations in the context of the storage stack, it's something that you probably want to be prepared to handle in your driver. Alternatively, it might be a feature that you want to add to your own driver if it is something that might be beneficial to your application. For more information on how you can adjust the priority hint from within an application, see the updated MSDN documentation for SetPriorityClass, SetThreadPriority, and SetFileInformationByHandle.

Although the ability to provide a hint that indicates the relative priority of an I/O request is helpful, some applications not only want their I/Os to have a high priority but also want a guarantee in terms of how much of the total available I/O bandwidth will be allocated to their requests. Imagine a user streaming a video from a hard drive. Not only does the media player want its I/O to have a higher priority than that of other tasks in the system, but it would also like a percentage of the overall disk I/O throughput reserved for its I/O requests to ensure glitch-free video playback. This desire to reserve a chunk of the available bandwidth on a device to provide some sort of delivery guarantee was the inspiration for the bandwidth reservation concept in Vista.

Bandwidth Reservation (or "Scheduled File I/O")
Bandwidth reservation is entirely undocumented in kernel mode, but in user mode it is abstracted through two APIs: GetFileBandwidthReservation and SetFileBandwidth
. The basic theory here is that the application specifies a handle to a file on which it wants to reserve bandwidth. The lowest-level driver in the branch then uses an algorithm based on the supplied parameters to determine how often and in what quantity it will process the I/Os issued against the corresponding file object. Thus, while providing no hard guarantees (this is Windows, after all), it does promise to be a fairly clever way to ensure that certain applications can get consistent access to a device's resources.

Thread Agnostic User I/O
If you weren't already aware, when a user issues an I/O request, the IRP that the I/O Manager creates to represent the I/O request is queued to the thread. Among other things, this ensures that a thread will not be fully torn down until all its outstanding I/O requests are completed. It also provides a convenient way for the Process Manager to find and cancel all the thread's outstanding I/O operations during thread termination.

New with Vista, however, is the concept of thread agnostic user I/O, meaning that the IRPs created by the I/O Manager are not queued to the thread. Instead, they are queued to the file object that the thread issued the I/O against. As far as we can tell, this feature was added for one reason and one reason alone: completion ports.

Worker Threads and Completion Ports
Completion ports provide a way to associate a "port" with an existing file handle. You can then send multiple overlapped I/Os to the file handle and create a set of worker threads that all wait on the completion port. When an I/O completes, the completion port is signaled and one of the awaiting worker threads is awoken to work on the operation's result and, if necessary, resubmit the I/O request. This is all fine and good, but due to the way that I/O completion has historically worked, there is a bit of a wrinkle.

I/O Completion...Again...
(Yes, that's right, another discussion about I/O completion in The NT Insider! We promise to make this one short though.)

Remember that the final stage of I/O completion (often referred to as Stage 2) occurs within a special kernel APC in the context of the requesting thread (that is, the thread the IRP is queued to). In the completion port scenario above, the thread that initiated the I/O wasn't necessarily the one that processed the completion of the I/O. This might lead to a situation in which a thread doing useful work is interrupted to perform Stage 2 I/O completion on a request it submitted some time ago. This is counterintuitive because you are not evenly distributing the work load as you would like - why shouldn't the Stage 2 I/O completion be done by some idle thread? Also, it defeats the "cache warming" side effect of I/O completion you get in a non-completion port scenario.

Thread Agnostic I/O Completion
Microsoft was aware of this problem and devised a two-part plan to counteract it:

  1. Convert all I/O sent to a HANDLE with an associated completion port into thread agnostic I/O. Thus, all those user I/Os are now queued to the file object and not the thread.
  2. Devise a slightly new I/O completion mechanism for thread agnostic I/Os. Basically, the idea is to not queue the special kernel APC for I/O completion to the thread that initiated the I/O request, but instead queue it to the completion port. The completion port then becomes signaled to indicate the completion of a new I/O, and the thread that wakes up to service the completion is the one that executes Stage 2 I/O completion. Thus, rather than pick some arbitrary thread to process Stage 2 I/O completion, it picks the thread that will process the result of the completion. This leads to already busy threads continuing with their work and gets us back the cache warming effect.

And Much More...
This article, of course, served only to highlight some of the more interesting I/O Manager features that you may have heard grumblings about. As the year progresses, expect to hear more interesting bits about additions and updates.

Why Wait Alertable? Sidebar

In this issue's I/O Manager & Vista article we note that both the I/O Manager and MUP now perform alertable waits in the create path. We also noted that this "magically" allows the operating system to provide a way for the user to cancel creates that are taking too long. So what gives? Why does the alertable wait make such a big difference?

Thread termination has historically been signaled exclusively through the use of an APC. And, if you check out the table in the WDK under the section titled Do Waiting Threads Receive Alerts and APCs?, you'll note that while performing a kernel wait there is no way to abort the wait with an APC. So, if a kernel component was performing a kernel mode wait and the user attempted to terminate the thread,  the component continued to wait for the object to become signaled.

Over time, this has proven to not be the best idea. If the user is trying to terminate the thread, the kernel component should really be notified that the user no longer cares about the operation being waited on. This would give the kernel code the chance to either abort what it is currently doing  or decide that, regardless of what the user wants, it needs to wait for the operation to complete.

Thus, a change was made to the process manager in Vista to not only queue an APC to the terminating thread but to also "alert" the thread using an undocumented native API. If the kernel thread is in a non-alertable wait state then absolutely no change in behavior is seen. However, if the thread is in an alertable wait state then the wait is immediately aborted with STATUS_ALERTED. The kernel component can use this as a hint that the user is most likely furiously hitting Control+C on the keyboard hoping for the threads to terminate as quickly and gracefully as possible.

This article was printed from OSR Online

Copyright 2017 OSR Open Systems Resources, Inc.