The NT Insider

Don't be Afraid to Commit -- The Transactional File System (TxFS) in Windows
(By: The NT Insider, Vol 12, Issue 3, May-June 2005 | Published: 20-Apr-05| Modified: 25-Apr-05)

One of the interesting aspects of our file systems work here at OSR is that we often start learning about things long before they are actually released in a new version of Windows.  For several years now, Microsoft has been talking about the nifty new features that will debut in Windows Longhorn.  While quite a bit of attention was focused on WinFS (which has subsequently been deferred from the Longhorn release), far less attention has been paid to the Transactional File System (TxFS) that is still scheduled for release with Windows Longhorn.

Recently, detailed information about Microsoft's forthcoming TxFS became generally available from a source that might be often overlooked for technical insight--specifically, patents issued by the US Patent and Trademark Office (USPTO).  Patent #6,856,993 was issued on February 15, 2005, and provides us with some insight into this new technology.  Further, the IFS Kit has included some prospective changes, which those of us building file system filter drivers must also keep in mind as the release of Windows Longhorn inexorably draws closer.

First, please note that while we can use this material to gain some appreciation of the technology, it is quite possible that the final version of TxFS that ships with Longhorn may differ substantially from the system that is described in the patent.  Nevertheless, it gives us a snapshot of the technology's past, and hopefully a better understanding of what to expect.  Another advantage of reviewing such documents is that they provide a conceptual description, rather than an implementation-level overview, such as is usually given in Microsoft documentation.

So first, let's start with a basic drawing, taken directly from their patent, showing the various components and their interactions--see Figure 1.

Figure 1

The goal of the transactional file system is to allow associating a series of discrete file system operations into a "transaction;" in other words, a set of operations that must all occur, or must all not occur.

Windows applications use the CreateFileEx operation to indicate that operations against a particular file are to be done transactionally; the patent document also indicates that changes to individual files may also be an attribute of the file.  Thus, from the file systems perspective, we can expect to see new options in the IRP_MJ_CREATE handler to indicate these transactional semantics, plus a transaction identifier; the patent document specifies that a GUID is used to track the transaction.

Once a transaction is started against a file handle, subsequent operations using that handle will be part of the same transaction.  Thus from the file systems perspective, individual FILE_OBJECT structures are associated with different transactions.  Because operations within the transaction must not be visible outside the transaction (at least not until the transaction has completed), it is important that they be sequestered from one another.

Transactions are not bound to a specific thread, and the patent document indicates that it is assumed transactions may be operated against by multiple processes and threads.  In addition, the model clearly allows for network-level file system participation (presumably through MRXSMB.SYS?the LanManager file system).  The drawings show one case where the transactional components (via a service implementing an inverted call model that they refer to as a "proxy") interact with the distributed transaction coordinator (DTC)--an existing Windows component used to coordinate transactional operations between different services.

The specific details of how this is to be done are also fascinating.  The document describes using a pair of logs; one of the logs contains operational information (changes to file system state, but not user data) as well as a separate data write log (updates to user data).  The model explicitly allows for memory-mapped file updates, which are separated out from updates outside the transaction--thus applications do not see updates done inside the transaction until the transaction has committed.

The document also describes the interactions between transaction aware operations and transaction unaware operations.  Applications that are not aware of transactions will observe different behavior than applications that are aware of transactions.  For example, when a file is deleted within a transaction, the deletion will be visible within the transaction and to anyone viewing the file in a transaction unaware environment. However, other transactional operations will not see the deletion--they are isolated from the operation, a process referred to as "name space isolation."

While we chose to show the architectural drawing depicting a transactionally aware application, the underlying technology admits for both transactionally aware and transactionally unaware application programs.  Transactionally aware applications would interact through a COM service of some sort, while transactionally unaware applications merely continue to use normal file system and memory mapping APIs.

Future Work
Our interest in this work is that it will have a major impact on file system filter drivers in particular--filters that perform data modifications will need to be aware of transactions and the semantics of transactions.   For example, even in the current IFS Kit there are prospective changes for this work.  In NTIFS.H we find:

FsRtlNotifyFilterChangeDirectory (
    __in PNOTIFY_SYNC NotifySync,
    __in PLIST_ENTRY NotifyList,
    __in PVOID FsContext,
    __in PSTRING FullDirectoryName,
    __in BOOLEAN WatchTree,
    __in BOOLEAN IgnoreBuffer,
    __in ULONG CompletionFilter,
    __in PIRP NotifyIrp,
    __in_opt PCHECK_FOR_TRAVERSE_ACCESS TraverseCallback,
    __in_opt PSECURITY_SUBJECT_CONTEXT SubjectContext,
    __in_opt PFILTER_REPORT_CHANGE FilterCallback

This new function (which has actually been in the IFS Kit for quite some time) exposes the need to check whether or not a callback is necessary (or useful) for a given change operation.  This is part of the concept of "name space isolation" as seen from a file systems perspective.

Of course, we do not anticipate seeing third-party file systems adding this support anytime soon; with Microsoft holding a patent on adding this support inside the file system, it appears that other file systems won't be able to match this new feature without licensing the technology from them.

As file system filter drivers are not "applications," and in fact are an integral part of the interaction stack, they may need to actively participate in the transaction system.  For example, because individual views of a file may now change depending upon whether or not the file is inside a transaction, a data-modifying filter must be aware of the distinctions.  In Figure 2, we see an example (directly from the patent document) that shows a file, accessed by different applications, and the resulting change in presentation with respect to in-memory data structures.

Figure 2

Notice, for example, that while the sample represents the same file, the file objects refer to different section objects.  The patent document is not clear if these two files would be using the same stream control block (the SCB from the drawing), but it is clear they would have distinct section object pointers, and thus distinct views.  It is also interesting to note (certainly from the OS perspective) that this suggests the Memory Manager has also been modified in order to accommodate these new semantics.

What this does mean is that it is likely that file system filters--particularly any that access or modify the data of files--will need substantial modifications to be able to support the new transactional semantics in the TxFS.

To learn more about this, we need to watch closely. The necessary information will no doubt be included in the Longhorn Windows Driver Kit (WDK)--the combination of the DDK and IFS Kit in earlier versions.  In the meantime, there's always past and future discussions at the regularly-hosted Plugfest to ponder.

This article was printed from OSR Online

Copyright 2017 OSR Open Systems Resources, Inc.