The NT Insider

Windows 8 Preview: File System Changes
(By: The NT Insider, Vol 18, Issue 3, Sept-Oct 2011 | Published: 17-Oct-11| Modified: 17-Oct-11)

With the recent release of the Windows Developer Preview (i.e., Win8), I took the opportunity to review the changes that they’ve made public that will affect file system and file system filter driver developers for the Windows 8 Platform.

While one can certainly look at the documentation, my approach in preparing this article was to look at the changes in the critical header files: ntifs.h and fltKernel.h, as these suggest much of the current scope of changes.

With that said, my experience with previous pre-release versions of Windows is that everything is subject to change.  Thus, it is unwise to count on any feature or change that we describe in this article until the final release of Windows 8.

New File Systems

The header files suggest the presence of new file systems as part of the Windows 8 release.  This includes the Protogon file system (although there is no new Protogon file system that I could see in the Developer Preview).  The headers indicate:

#define FILESYSTEM_STATISTICS_TYPE_PROTOGON 4

typedef struct _PROTOGON_STATISTICS {

    ULONG Unused;

} PROTOGON_STATISTICS, *PPROTOGON_STATISTICS;

Speculation in the press is that this will be the database-as-filesystem model previously promoted as WinFS.  If so, I suspect it will be a substantially different implementation (likely with a strong emphasis on performance).  Then again, if it wasn’t far enough along to include in the Developer Preview, it might be one of those features that doesn’t make the initial release.

In addition, there is quite a bit of new information about something called CSVFS (perhaps the “cluster shared volume file system” to work with “cluster shared volumes”?  If so, this may just be exposing new information to an existing technology).  Again, it seems a bit premature to speculate, especially given that if it is related to clustering, it is not likely that this feature would be present in the client release.

Emphasis on Data Verification

Many of the changes we observed relate to data verification support; whether this is an NTFS only feature, or if it will be present in other file systems (e.g., Protogon).  There are two new FSCTLs (FSCTL_{GET,SET}_INTEGRITY_ INFORMATION) that appears to be related to these changes, and a structure used to control this information (as shown below):

#define FSCTL_INTEGRITY_FLAG_CHECKSUM_ENFORCEMENT_OFF (1)

typedef struct _FSCTL_INTEGRITY_INFORMATION_BUFFER {
    USHORT ChecksumAlgorithm; // Checksum algorithm. e.g. CHECKSUM_TYPE_UNCHANGED, CHECKSUM_TYPE_NONE, CHECKSUM_TYPE_CRC32
    USHORT Reserved; // Must be 0
    ULONG Flags; // FSCTL_INTEGRITY_FLAG_xxx
} FSCTL_INTEGRITY_INFORMATION_BUFFER, *PFSCTL_INTEGRITY_INFORMATION_BUFFER;

And equally interesting, there is the presentation of “integrity streams”:

#define FILE_SUPPORTS_INTEGRITY_STREAMS 0x04000000 // winnt
 

Frankly, this makes quite a bit of sense: modern disk drive technologies are vastly more complicated than they have been in the past and can (and do) suffer from real issues with respect to data integrity.  Thus, it is quite possible to read data back from the drive and receive something other than what was originally written.  While it doesn’t happen frequently, it doeshappen and most applications are not written to withstand that sort of data corruption. Some applications, particularly database applications, are highly susceptible to this type of data corruption – the relationship information between the components can be lost when critical database information is lost.  While transactional database models will protect against some forms of failure, they do not protect against incorrect data being returned from the underlying storage media.

Data Deduplication

Another intriguing hint from the header files are suggestions for supporting data deduplication techniques.  For example the following new FSCTL operations for data deduplication are present (see below):

// Dedup FSCTLs
// Values 162 - 170 are reserved for Dedup.
//
#if (_WIN32_WINNT >= 0x0602)
#define FSCTL_DEDUP_FILE CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 165, METHOD_BUFFERED, FILE_ANY_ACCESS)
#define FSCTL_DEDUP_QUERY_FILE_HASHES CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 166, METHOD_NEITHER, FILE_READ_DATA)
#endif /*_WIN32_WINNT >= 0x0602 */

 

Thus, this suggests using a scheme by which file hashes are being queried and data deduplication is being done for those blocks with identical hash values (for good deduplication hash algorithms, the likelihood of a data hash collision is very low).

Offload Support

The header files strongly hint at new support for “offloading” I/O operations.  While we are speculating a bit, if this is similar to other forms of offloading, it would suggest the use of hardware to perform some operations (e.g., I/O operations as well as computing hashing).  This might be used, for example, for intelligent disk drives to allow them to perform additional high level processing, such as is done in Object Storage Devices.  When combined into file systems, such devices can actually provide specialized support and can even split data and meta-data across multiple drives (local and remote).  Whether or not that is what is envisioned here is still uncertain (after all, this is just based upon header file information).

There are two new FSCTL operations for offload (below):

#define FSCTL_OFFLOAD_READ CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 153, METHOD_BUFFERED, FILE_READ_ACCESS)
#define FSCTL_OFFLOAD_WRITE CTL_CODE(FILE_DEVICE_FILE_SYSTEM, 154, METHOD_BUFFERED, FILE_WRITE_ACCESS)

 

What is particularly important about this new support is that it is disabledif file system filter drivers do not support it, according to the comments within the header file itself:

//
// To better guarentee backwards compatibility for
// selective new file system functionality, this new
// functionality will be disabled until all mini
// file system filters as well as legacy file system
// filters explicitly opt-in to this new functional-
// ity. This is controlled by a new registry key in
// the filters service defintion called
// "SupportedFeatures".
//
// File System filters need to update their .INF
// files to set state that the given functionality is
// now supported. Even if a filter can't actually
// support the given operations they should mark in
// the .INF that it is supported and modify their
// filter to fail the operations they don't support.
//
 

Thus, it is important for those building file system filter drivers to be both aware of this new functionality and to ensure that they support it.  Otherwise, it runs the potential risk of breaking some other product’s functionality (or at least degrading performance) without supporting it.

Advanced FCB Header Changes

File system filter driver writers won’t notice this change as much as those of us building file systems, but the Windows 8 version of the advanced FCB header has changed.  The Advanced FCB header has gained a new field (Oplock) and the definition of FsRtlSetupAdvancedHeader(implemented in the header file) has changed to initialize the header properly.

The comment from the header file is fairly clear on this one:

//
// The following fields are valid only if the Version
// field in the FSRTL_COMMON_FCB_HEADER is greater
// than or equal to FSRTL_FCB_HEADER_V2. These
// fields are present in Windows 8 and beyond.
//
// For local file system this is the oplock field
// used by the oplock package to maintain current
// information about opportunistic locks on this
// file/directory.
//
// For remote file systems this field is reserved.


     union {

          OPLOCK Oplock;
          PVOID ReservedForRemote;

     };
#endif
#define FSRTL_FCB_HEADER_V2 (0x02)
 

Note that this declares there to be a new FCB header version (2), set in the Versionfield of the common FCB header.

Oplocks

This new oplock field appears to tie into a new round of oplock support changes in Windows 8.  For example, we have several new support functions: FsRtlCheckLockForOplock Requestand FsRtlAreThereWaitingFileLocks.  Interestingly, these new support routines have been optimized so that byte range locks on regions of the file outside the allocated range will not conflict with oplocks (recall that byte range locks and oplocks are traditionally inconsistent).

Logically, this makes sense.  The rationale for not allowing both is that applications using byte range locks want to obtain coherent copies of the data, which is incompatible with the basic premise of caching.  However, byte range locks on regions of the file where there is no data are used by applications as a means of inter-process communications (potentially on different computers) and not related to data coherency.  Thus, these routines should actually permit the use of oplocks in a broader range of situations.

New ECP Types

Another interesting change related to oplocks is the new support for extra create parameters (ECPs) for tracking oplock “ownership” state for not only the target of an open operation, but also the parent.  Note the definition of the new dual oplock key:

DEFINE_GUID( GUID_ECP_DUAL_OPLOCK_KEY, 0x41621a14, 0xb08b, 0x4df1, 0xb6, 0x76, 0xa0, 0x5f, 0xfd, 0xf0, 0x1b, 0xea );

Presumably, this would be useful for rename operations (for example) and would allow operations to proceed without forcing oplock breaks (much like the GUID_ECP_OPLOCK_ KEY was used previously to keep from breaking oplocks with associated opens to the original oplock holder).

Beyond this, we also have a solution to the frustration of receiving back an error whenever a reparse occurs inside a call to IoCreateFileSpecifyDeviceObjectHint(or any function that it calls it, including the various Filter Manager APIs for opening files).  This error (STATUS_MOUNT_POINT_NOT _RESOLVED) is difficult to resolve cleanly in a file system filter driver.  Having done so (by iterating through the path until finding the reparse point, then opening it to query the contents of the reparse information) it is encouraging to find that there will be a solution to this problem for future releases of Windows.

Basically, this ECP information will allow a filter to obtain detailed information about the reparse point:

typedef struct _IO_DEVICE_HINT_ECP_CONTEXT {

    PDEVICE_OBJECT TargetDevice;

    UNICODE_STRING RemainingName;

} IO_DEVICE_HINT_ECP_CONTEXT, *PIO_DEVICE_HINT_ECP_CONTEXT;

Indeed, the comment from the header file fairly clearly describes the model for this new ECP:

// This GUID and structure are for passing back

// information from the I/O manager to the filter

// manager about a reparse when the reparse target

// goes to a new device.

Cache Manager

There are a number of Cache Manager changes present in the Windows 8 header file as well.  One that has me a bit mystified is the exposure of a new structure:

typedef struct _READ_AHEAD_PARAMETERS {

    CSHORT NodeByteSize;

    //

    //  Granularity of read aheads, which must be an

    //  even power of 2 and >= PAGE_SIZE

    //  See Also: CcSetReadAheadGranularity.

    ULONG Granularity;

    //

    //  The request size in number of bytes, to be

    //  used when performing pipelined read-aheads.

    //  Each read ahead request that is pipelined is

    //  broken into smaller PipelinedRequestSize

    //  sized requests. This is typically used to

    //  increase the throughput by parallelizing

    //  multiple requets instead of one single big

    //  one.

    //

    //  Special behavior:

    //  If this value is zero, then Cc will break

    //  every read-ahead request into two. This is

    //  used for backward compatibility where we

    //  used to break every read-ahead request for

    //  remote FS into two.

    //

    ULONG PipelinedRequestSize;

    //

    //  Growth of read ahead in percentage of the

    //  data that has already been read by the

    //  application so far

    //

    ULONG ReadAheadGrowthPercentage;

} READ_AHEAD_PARAMETERS, *PREAD_AHEAD_PARAMETERS;

There is some interesting information in here, but the existing interface does not seem to provide any mechanism for us to obtain or directly modify these values.  Perhaps we will see something in a future release, or some clarification as to the ultimate use of this structure.

There are a number of additional new functions exported from the Cache Manager as well: CcCopyReadEx, CcScheduleReadAheadEx, and CcSetAdditionalCache AttributesEx.  These new functions introduce the concept of an “I/O issuer” – these new functions take an extra parameter, a PEPROCESS pointer.

Filter Manager Changes

There are a number of Filter Manager changes visible in the new header files as well.   Perhaps the most interesting is that there is now a sectioncontext – so that file system filter drivers can associate specific state with a section (in addition to the other context types already supported by the Filter Manager).  From the material available (including preliminary documentation) it appears that the purpose of this is to allow mini-filters to synchronize changes to section objects.  A number of new APIs have been introduced to support this, including FltGetSectionContext, FltSetSectionContext, FltRegisterForDataScan, FltCreateSectionForDataScan, andFltCloseSectionForDataScan.

The registration structures have changed to accommodate the new section contexts.  Note that section contexts are only present in Windows 8 and more recent.

Beginning with Windows 8, Filter Manager now supports filters for the named pipe file system (NPFS) as well as the mail slot file system (MSFS).  To filter these, mini-filters must indicate their interest as part of their registration information.   To further aid in supporting these, the Filter Manager now provides new functions for opening named pipes (FltCreateNamedPipeFile) and mail slots (FltCreate MailslotFile).  Presumably these are wrappers around the existing OS calls.

The Filter Manager APIs have been extended to support new features, such as the reparse point support (see GUID_ECP_FLT_CREATEFILE_TARGET). 

The MDL interfaces are now also available as part of FltReadFileExand FltWriteFileEx.  While not fundamental changes to the model, they should simplify development for those mini-filters that wish to use the capabilities of existing OS interfaces that have not previously been available via the Filter Manager interface directly.

There are also new Filter Manager APIs for invoking the fast I/O MDL operations directly, as well as Filter Manager wrappers around the get/set interface for quota information.

Finally, there is now support for retrieving multiple contexts simultaneously using the new FltGetContextsEx.  There is a corresponding “bulk release” operation in FltRelease ContextsEx.

Conclusions

There are other changes (notably in the security area) that we haven’t been able to cover, but it is clear that while there are a number of interesting new changes for file systems and filter drivers in Windows 8, they are more along the lines of evolutionary, rather than revolutionary.

Stay tuned as we watch Windows 8 evolve – after all, as we said before, none of this is known until the final version ships.

This article was printed from OSR Online http://www.osronline.com

Copyright 2017 OSR Open Systems Resources, Inc.