Print an article from OSR Online

File Systems

A Brief Explanation of FsRtlCheckOplock
(By: OSR Staff | Published: 07-May-03| Modified: 07-May-03)

The routine FsRtlCheckOplock is presently (Windows Server 2003 IFS Kit, May 2003) not documented. Thus, it may be unclear to those using this API precisely what it does or how to use it. This article attempts to describe how to use this API.

First, we will assume that you know what an oplock is (if not, see the article When Opportunity Locks - Oplocks on Windows NT) since this discussion probably will not make much sense if you do not know about oplocks.

One of the important functions performed by a file system that supports oplocks is to check the state of oplocks when performing functions that might invalidate cached information for some client - the holder of the oplock. Now typically the local holder of the oplock is a file server and thus it acts as a proxy for a remote system that really owns the oplock. Fortunately, these details are hidden within the file system runtime library and most file systems can rely upon Windows to "do the right thing".

A quick glance at the FASTFAT source code in the Windows Server 2003 IFS Kit turned up nine separate uses of FsRtlCheckOplock:

FatCommonCleanup - if this file is holding a Level 1 or Level 2 oplock, it must be released as part of closing the file since the client no longer is using this file. Note that the semantics of Batch Oplocks and Filter Oplocks do not require breaking the oplock at this point.

FatOpenExistingFcb - in this case we have to worry about whether or not some other client is already holding an oplock. Of course the exact semantics depend upon the type of access being granted and any oplocks already held against this file. For example, if an existing client owns a Level 1 oplock on the file, we cannot allow file access until the client has flushed any changes back to the original file. Generally, this case would block and wait, but sometimes (if FILE_COMPLETE_IF_OPLOCKED is specified) it might cause return before the oplock is broken, indicating STATUS_OPLOCK_BREAK_IN_PROGRESS. If an existing client owns a Level 2 oplock, this oplock might need to be broken if the current caller is performing a destructive action (e.g., FILE_OVERWRITE or FILE_SUPERSED) but otherwise will generally be allowed. Of course, there are many different cases here: open options, type of oplocks already granted, number of existing open instances of this file and even whether or not the caller is requesting a filter oplock. Each must be addressed if you do not use this routine within your file system.

FatCommonSetInformation (for FileEndOfFileInformation or FileAllocationInformation). This is clearly a case where the new information has to be reported to the remote client, since the size of the file is changing. Thus, any cached information on the client is likely to be stale or invalid at that point.

FatSetRenameInfo - the two data oplock types (Level 1 and Level 2) are not affected by changes to the name of the file, but the two attribute oplock types (Batch and Filter) are affected. Thus, it is necessary to break batch oplocks when the file name is changed.

FatCommonLockControl - only filter oplocks are compatible with byte range locks. Thus, all other types of oplocks must be broken.

FatCommonRead - if a read arrives on a file where there is a remote client holding a Level 1 oplock, then the remote client must flush any cached data back prior to returning the data. Note that the FAT implementation ignores paging I/O operations which has some interesting ramifications - for example, memory mapped file on the local system will not force a break of those oplocks, so the local client might read different data than the remote client has cached.

FatCommonWrite - clearly remote client caching just doesn't work if the file is modified (except by the client that owns the oplock, of course) so all outstanding oplocks of any type must be broken. Remote clients will need to discard their cached information and refresh their caches.

Oplock break operations can actually take a considerable amount of time which can cause the oplock package to post the original request so that it can be processed after the oplock break (or breaks) complete.

This package provides two hooks that can be used by file system drivers to gain insight into when the IRPs are posted and again when the oplock break for the posted IRP has been completed. Both are optional.

The OPLOCK_FS_PREPOST_IRP routine is called by the oplock package prior to enqueuing the original operation (the one the file system passes to FsRtlCheckOplock). There is no oppotunity here for the file system to modify the course of events here - the oplock package is merely informing the file system that the given IRP is being posted.

The OPLOCK_WAIT_COMPLETE_ROUTINE routine is called by the oplock package after the oplock break has been handled. If this routine is provided by the file system, the IRP is returned to the file system for additional processing. If this routine is not provided by the file system, the IRP is completed by the oplock package. Thus, if the file system does not need to do additional processing on the IRP after the oplock break is processed, this routine need not be specified.

Most file systems that support oplocks will be relying upon the file system runtime library to implement this functionality. We will leave it to another time (and another article) to attempt explaining the details for those who wish to implement this themselves.

This article was printed from OSR Online http://www.osronline.com