Several years ago we described the various synchronization primitives available to drivers in the Windows NT environment (The NT Insider, January 1997, “Getting ‘N Sync - A Catalog of NT Synchronization Mechanisms”). With the release of Windows XP, several new variants have become available for those developing drivers. In this article we will focus on the synchronization primitives available to various types of drivers (standard kernel mode, file system, and WDM drivers).
When choosing the synchronization techniques used in protecting a particular data structure within a driver, there are a few key things to consider:
- The IRQL at which the data is being accessed;
- Whether or not the lock needs to be re-acquirable by the same thread;
- The usage pattern of the data.
For access at IRQL below DISPATCH _LEVEL the locks of choice are the dispatcher locks: mutex, semaphore, synchronization event. In addition, there are two other locks available – the fast mutex and the ERESOURCE. We will describe these in more detail later in this article.
For access at elevated IRQL (DISPATCH_LEVEL or above) the lock of choice has traditionally been the spin lock. For standard kernel mode drivers or file system drivers, a new variant, the in-stack queued spin lock, is available (this function is not available for WDM drivers). This new spin lock provides better performance, particularly on machines with a large number of processors.
In addition to the locks mentioned above, there are also operations that rely upon hardware primitives for proper behavior and are often a good alternative to using spin locks. These are the interlocked operations and are an excellent alternative to spin locks.
The broadest category of synchronization mechanisms is that of dispatcher locks. These locks all share a common data structure (the DISPATCHER_HEADER) and can be used as the “object” parameter to KeWaitForSingleObject or KeWaitForMultipleObjects. For any of these, the object is said to be in the “signaled” or “not signaled” state. Essentially, a dispatcher lock is available if it is signaled and is unavailable if it is not signaled. When a dispatcher object of any type (including a lock) is not signaled, the thread will block until the object changes state (becomes signaled), the wait for the object times out, or an asynchronous procedure call becomes available for the waiting thread. Regardless, the return code from KeWaitForSingleObject or KeWaitForMultipleObjects indicates the reason the routine returned.
The simplest of these locks is the event. An event consists of only a dispatcher header, with no additional information. Thus, when a thread owns this lock, it releases it by setting the event (KeSetEvent). One disadvantage to this type of lock arises during debugging – this lock does not record the owning thread and thus this complicates finding deadlocks involving events.
If one thread owns the event, it is in the not signaled state. If any thread attempts to acquire that lock by calling KeWaitForSingleObject it will either return indicating a failure or it will block awaiting the state of the lock to change to signaled.
Similar to events are kernel mutexes. Unlike events, kernel mutexes allow a thread to acquire a mutex that it already owns. To achieve this, the operating system stores identification for the thread that initially acquired the mutex. If the thread then attempts to acquire a mutex that it already owns, the ownership is granted immediately and the count of the number of times the mutex has been acquired is adjusted appropriately. Also, as a side-effect of this call, kernel APCs are disabled (essentially, it is as if a call to KeEnterCriticalRegion has been made). This is to protect against arbitrary re-entrancy, since that might inject unresolvable deadlocks into the operating system. Again, so long as one thread owns a kernel mutex, any other thread that attempts to acquire the kernel mutex will block and wait.
Kernel semaphores are used whenever a restricted number of threads (but normally more than one) are allowed to proceed through a code path. For example, when there is a pool of some scarce resource (pre-allocated buffers, for instance), a semaphore can be used to restrict access to the scarce resource. Thus, there are two values used when controlling a semaphore – the limit, which indicates the maximum count for the semaphore, and the count, which indicates how many threads may proceed before the semaphore is no longer in the signaled state. When resources become available, the semaphore’s count value is adjusted using KeReleaseSemaphore. Thus, a semaphore-based lock is acquired using KeWaitForSingleObject and released using KeReleaseSemaphore.
As alternatives to the standard kernel locks, the executive makes two lock mechanisms of interest to drivers. This includes the fast mutex, which is often used as an alternative to the kernel mutex, and the ERESOURCE, which is often used for controlling access to data structures with high-levels of read-only concurrent access. Both of these structures are based, in turn, upon kernel synchronization primitives.
The fast mutex is typically used for data access that must be protected, but does not require support for re-acquisition of the lock by the owning thread. Because of this, these locks can be implemented in a simpler fashion that often does not require contention. However, if the fast mutex lock is owned by one thread and any thread attempts to acquire that mutex, the acquiring thread will block waiting on an event stored within the FAST_MUTEX data structure.
Because fast mutex structures cannot be reacquired by the same thread, it is important that any code using them block asynchronous procedure calls to ensure these APC routines do not execute calls back into the driver code using fast mutexes. Normally, the fast mutex routines handle this automatically (ExAcquireFastMutex and ExReleaseFastMutex) because they raise the IRQL to APC_LEVEL. However, many I/O operations do not work reliably at APC_LEVEL (this is related to the fact that I/O completion processing uses APCs under certain circumstances) and thus this can cause other unexpected problems. A variant of the fast mutex calls (ExAcquireFastMutexUnsafe and ExReleaseFastMutex Unsafe) can be used, but then the driver is responsible for controlling APC state, either using both KeEnterCritical Region and KeLeaveCriticalRegion, or by raising the IRQL to APC_LEVEL.
ERESOURCE locks are useful for arbitrating access to data structures where the data is read-mostly. These locks work by allowing concurrent shared access or single-threaded exclusive access. They also allow re-acquisition by a thread that already owns the lock. Perhaps their best advantage is the debugging support built into the operating system for isolating and resolving deadlocks involving ERESOURCEs (notably the !locks command in the kernel debugger’s extension libraries). In addition, lock acquisition is upgraded when a thread requests shared access to a lock for which it already has exclusive ownership!
Because there are two different modes of ownership for ERESOURCE locks, there are a number of complications. For example, if a thread attempts to acquire an ERESOURCE lock for exclusive ownership (using ExAcquireResource ExclusiveLite), and there are existing threads that own the ERESOURCE in shared mode (acquired using ExAcquire ResourceSharedLite), subsequent shared access is blocked and must wait for the thread desiring exclusive ownership to actually run. This ensures that exclusive ownership threads do not “starve” for ownership of the resource.
Sometimes, though, certain high-importance code paths would rather starve those exclusive ownership threads and thus this package provides a mechanism for acquiring shared access – even at the risk of starving exclusive access threads. This is achieved by using ExAcquireSharedStarveExclusive.
The Executive implements this logic by using both a semaphore and an event. Shared access threads wait on the semaphore (when there are exclusive access threads) and exclusive access threads wait on the event (when there are any owning threads). Each time an ERESOURCE is initialized, it is placed on a global list of ERESOURCE locks within the system (and this applies to both checked and free versions of the operating system). Thus, when debugging, it is possible for the debugger to walk that list of ERESOURCEs and display additional information about each one. For those attempting to find deadlocks, they can use information from that display, and from the ERESOURCE itself to identify the owning threads and the waiting threads. Typically, this is enough to find and identify a deadlock involving ERESOURCEs.
For this reason, ERESOURCEs are used extensively by the parts of the OS that are highly re-entrant and highly prone to deadlocks – the virtual memory system and the file systems.
For drivers that need to synchronize access at elevated IRQL (that is, at or above DISPATCH_LEVEL) the normal option is the spin lock. Spin locks traditionally come in one of two varieties: interrupt spin locks and executive spin locks.
Regardless of the type of spin lock, the purpose is the same – to ensure proper synchronization between critical OS code (including device driver code) and other parts of the OS. For example, synchronizing data between an interrupt service routine and a dispatch routine requires that the critical code all run at the IRQL of the interrupt service routine and under the protection of the interrupt spin lock.
Interrupt spin locks are associated with interrupt objects (and hence interrupt service routines) and are acquired at an IRQL above DISPATCH_LEVEL. The operating system acquires the interrupt spin lock prior to calling the interrupt service routine, and releases it after exiting the interrupt service routine. Drivers can acquire the spin lock using KeSynchronizeExecution. This routine takes a pointer to a driver-provided function. The operating system raises the IRQL and acquires the spin lock, as specified in the interrupt object provided to this call. Then it calls the driver-provided function. When that function returns, the operating system releases the spin lock and lowers the IRQL.
An alternative to using KeSynchronizeExecution, available in Windows XP and newer versions, is KeAcquireInterrupt SpinLock and the corresponding call KeReleaseInterrupt SpinLock. The model for using these is comparable to the model for using executive spin locks (KeAcquireSpinLock and KeReleaseSpinLock).
Executive spin locks are used for protecting code paths that execute at IRQL DISPATCH_LEVEL. Such code is typically associated with deferred procedure calls within device drivers. Often, however, executive spin locks are used to protect data structures because there are existing operations exported by the OS that provide access that in turn is protected by a spin lock. For example, the interlocked queue operations (e.g., ExInterlockedInsertTailList) are used within drivers, rather than using a dispatcher lock (e.g., an event) and the ExInsertTailList call.
In Windows XP two new types of spin locks are exported as well. The first type, available for file system drivers (in ntifs.h) is the queued spin lock. These are spin locks that the OS maintains in a list (or queue) and that are accessed by number, rather than by address. Typically, though, a driver does not need to acquire these directly, because the OS provides routines that (indirectly) acquire these spin locks.
The other new type of spin lock is the in-stack queued spin lock. These locks are extremely efficient spin locks for multi-processor machines because they eliminate contention between processors for the same piece of memory. They use the same base data structure (the KSPIN_LOCK) but track waiting processors using a queue mechanism, rather than the traditional memory contention-based spin lock mechanism. Note that the two mechanisms are incompatible with one another, even though they use the same spin lock data structure. A driver thread acquires an in-stack queued spin lock by using KeAcquireInStackQueuedSpinLock or KeAcquireInStackQueuedSpinLockAtDpcLevel and releases it with KeReleaseInStackQueuedSpinLock or KeReleaseInStackQueuedSpinLockFromDpcLevel. Because these locks are more efficient, we expect drivers written for Windows XP or newer versions of Windows will use them rather than the older spin locks.
An alternative to using any lock is to use an interlocked operation. These operations rely on hardware synchronization techniques – such as atomic memory read/modify/write cycles – rather than on OS-provided mechanisms such as dispatcher objects or spin locks. They are typically independent of IRQL and thus can be used safely between different parts of a driver.
For example, reference counts are typically managed by using InterlockedIncrement. The same function can be achieved by using ExInterlockedAddUlong, but the latter routine utilizes a spin lock to serialize access.
These routines are typically at least as efficient, and frequently more efficient, than implementing the same functionality using spin locks directly. For example, InterlockedExchange could be implemented by using a spin lock to protect the exchange within a driver, but it actually requires additional CPU instructions and the same number of memory-contended operations.
One interesting technique to note is that the interlocked queue operations (ExInterlockedInsertHeadList) use spin locks, but they do so in a manner that allows queues to be safely shared between interrupt service routines and dispatch code within drivers without relying upon the interrupt object’s spin lock. This is an exception to the general rule that interrupt service routines must not acquire executive spin locks. The advantage here is that it allows sharing of queues between the ISR, DPC, and dispatch entry point code without relying upon the interrupt spin lock to serialize access.
Of course, for the developer, the comment to a description of locks is “OK, but which lock is appropriate for my driver?” The answer here is that the lock to use is going to depend upon the IRQL of the code using the lock, the type of access to the lock, the re-entrancy requirements of the code, the version(s) of the operating system to be supported and the preference of the driver writer.
For code running at elevated IRQL (above APC_LEVEL), the correct choice is to use a spin lock. If the code in question is part of an interrupt service routine, an interrupt spin lock must be used to arbitrate access. Otherwise, either an executive spin lock, or an in-stack queued spin lock (for Windows XP and newer) is the right choice.
For code running below IRQL DISPATCH_LEVEL, the dispatcher objects are usually the correct choice. If the code in question requires re-entrant locks, then kernel mutexes or ERESOURCE locks are potential candidates – where ERESOURCE locks are the best choice for data that is read-mostly, while kernel mutexes are the best choice for data that is write-mostly.
If the code in question does not require re-entrant locks but does require exclusive access (write-mostly) then fast mutexes or synchronization events are the right choice. If the code in question does not require exclusive access (read-mostly) then ERESOURCEs are still probably the best choice.
If the code in question needs the special features of semaphores (shared access to some resource), then they should be used.
Windows NT, Windows 2000, and Windows XP offer an amazing array of synchronization primitives in order to allow shared access to data between the various code paths of a driver. Choosing the right one will provide both correct and efficient performance for your driver.