The NT Insider

(Un)Expected Behavior: Windows Vista and File Systems
(By: The NT Insider, Vol 14, Issue 4, November - December 2007 | Published: 04-Jan-08| Modified: 04-Jan-08)

For at least the past two years, we've been hearing about and experiencing the impact of the changes in Windows Vista for our work in file systems. Despite knowing there were numerous changes, we continue to find small, subtle changes that we didn't know about that materially affect our own development efforts. In this article, we'll discuss several of those key differences and how they make developing file systems for Windows Vista more complicated.

System File Protection

In Windows XP we were all familiar with the model of protecting files via a background thread that monitored if they were overwritten. In Windows Vista, all system files belong to the "Trusted Installer" and cannot be easily overwritten except by the owner. What made this interesting for us was a need to substitute in a checked version of ntdll.dll (we were receiving some strange errors during start-up and wanted to get the extra loader snap information that this DLL provides). The bootstrap process trivially allows us to vector to an alternate kernel and HAL, but to replace anything else (e.g., ntfs.sys or ntdll.dll) we have to overwrite the file (Let's ignore the hoops that one must jump through to extract the file from the checked image in the first place. Don't forget the need to match ntdll.dll with the version of the kernel/HAL that you are running).

It is at the point when we tried to copy over the file that we stumbled on the ACL protecting this DLL. Then we found out that the "security" tab that used to be displayed in Explorer was gone. The command line utility (cacls) had been replaced as well. To make this all work right, we ended up taking ownership of the file (so that we could modify the ACL) and granted write access to the file to ourselves. This process, while certainly understandable once it was all done, took almost an entire day of trial and error changes to replace a single file.

Despite the pain this caused us, kudos to Microsoft for using its own security system in a fashion that we've been suggesting for years.

Opening the Root Directory

Sometimes the differences in behavior from older versions of Windows are mystifying. For example, we have a generic routine that attempts to open the target, independent of whether it is a file or a directory. Since we're handling the caching in our parallel file system layer, we normally indicate that we do not want buffering by setting the FILE_NO_INTERMEDIATE_BUFFERING bit in the create options. In Windows Vista we found that this fails on the root directory on an NTFS volume, returning STATUS_ INVALID_PARAMETER.

Working around this issue was not overly difficult - it just involved adding a loop, so that if we received STATUS_INVALID_PARAMETER we would retry the open, this time using the FILE_DIRECTORY_FILE option bit instead of the FILE_NO_INTERMEDIATE_BUFFERING bit. This particular change is one that mystifies us, but is a standard example of the types of subtle behavioral changes that we've learned to expect in Vista.

Protecting the Keyboard

One of our file systems toolkits, the Data Modification Kit, works by maintaining a parallel stack next to the normal I/O stack. We do this because it ensures that other OS components cannot bypass us through the file object (since the file object points to our file systems stack, not the native file systems stack). This eliminates a class of problems with which we've wrangled over the past 10 years or so. In earlier versions of this toolkit, we actually tried to only pass along those things that "mattered" to our parallel stack, but found that the interaction issues (process "A" gets the encrypted data, process "B" gets the decrypted data, both for the same file) made this impractical. Thus, we've moved to a model in which we redirect everything that might be of interest to us to our parallel stack.

Once we did that to the \Windows directory, we found things misbehaved rather horribly. Winlogon.exe would fail to start up and as a result the system would crash. Of course, as is so often the case when debugging application failures at file systems level, there were few hints as to what made it fail. Some of the details that we ended up examining were the loader snaps, but they didn't yield much useful information for this particular problem. After a slow process of trial and error we were able to identify that a single DLL was causing the problem - kbdus.dll (we were installing with a US keyboard layout. If we'd used a different layout, the file would have been different but the results would have been the same).

With many hours in the debugger tracing what was happening, we finally found a particularly useful stack backtrace (shown in Figure 1).

kd> kv
ChildEBP RetAddrArgs to Child
8ba6d034 83b00843 8b5e4290 8ba6d054 8ba6d074 EwDmk!PreTraceQueryInfo+0x2bf (FPO: [Non-Fpo]) (CONV: stdcall) [c:\projects\client\src\core\minifilter.cpp @ 685]
8ba6d090 83b02f10 8ba6d0d8 00000000 8ba6d0d8 fltmgr!FltpPerformPreCallbacks+0x2e5 (FPO: [Non-Fpo])
8ba6d0a4 83b037eb 8ba6d0d8 00000000 8321e7d8 fltmgr!FltpPassThroughInternal+0x32 (FPO: [Non-Fpo])
8ba6d0c0 83b03c07 8ba6d000 8321e7d8 00000000 fltmgr!FltpPassThrough+0x199 (FPO: [Non-Fpo])
8ba6d0f0 81867928 8321e7d8 833e0720 833e0720 fltmgr!FltpDispatch+0xb1 (FPO: [Non-Fpo])
8ba6d108 81a1927e 8ba6d30a 8cad8008 000001dc nt!IofCallDriver+0x63
8ba6d138 81a00643 000001dc 00000009 8cad8008 nt!IopGetFileInformation+0xf1
8ba6d190 81a19f8e 8331b8f8 00000000 8ba6d2d4 nt!IopQueryNameInternal+0x1e6
8ba6d1b0 81a0d4dd 8331b8f8 8277d300 8ba6d2d4 nt!IopQueryName+0x1b
8ba6d254 81a0dd4b 8331b8f8 8ba6d2d4 00000210 nt!ObpQueryNameString+0xd6
8ba6d270 8ee19ec7 8331b8f8 8ba6d2d4 00000210 nt!ObQueryNameString+0x18
8ba6d4e8 8ee19d76 000000a4 00000000 00000000 win32k!ConvertHandleAndVerifyLoc+0x99 (FPO: [Non-Fpo])
8ba6d4fc 8ee04dd5 83416c58 000000a4 00000000 win32k!xxxSafeLoadKeyboardLayoutEx+0x14 (FPO: [Non-Fpo])
8ba6d7b4 8ee05495 8ba6d814 00000001 02000000 win32k!xxxCreateWindowStation+0x771 (FPO: [Non-Fpo])
8ba6dd40 81845f7a 0010f974 02000000 000000a4 win32k!NtUserCreateWindowStation+0x330 (FPO: [Non-Fpo])
8ba6dd40 77d20f34 0010f974 02000000 000000a4 nt!KiFastCallEntry+0x12a (FPO: [0,3] TrapFrame @ 8ba6dd64)
0010f930 77921826 779217d0 0010f974 02000000 ntdll!KiFastSystemCallRet (FPO: [0,0,0])
WARNING: Frame IP not in any known module. Following frames may be wrong.
0010fc84 77921629 0010fc9c 02000000 00000000 0x77921826
0010fca4 00de4d4c 00de56b8 00000000 02000000 0x77921629
0010fcf0 00de23dc 00df5104 0010fd34 00df5108 0xde4d4c


Figure 1 - Backtrace on the kbdus.dll Nightmare

The name "ConvertHandleAndVerifyLoc" was suggestive that this was in some fashion name sensitive. Recall that we are redirecting to a parallel stack, so the device name will be different, even if the path and file name are the same. As we're so fond of saying around here, "Names don't really mean much in Windows", but apparently that wisdom hasn't sunk in everywhere.

With a bit more work in the debugger, watching the behavior of this particular function, we were able to confirm that in fact it takes the file object and queries the name and then compares that to the name that it computes. Thus, we ended up with our name: \Device\<our parallel device>\Windows\System32\kbdus.dll and Win32 computed that it should be \Device\HarddiskVolume1\Windows\System32\kbdus.dll. When the two didn't match, it did the terminal exit.

This presented an interesting challenge, since we do not control the name of our device. Thus, there was no immediately obvious way to "make this work." The solution that we ultimately employed relies upon the fact that each file object contains a VPB and the VPB within the file object need not be the same as the VPB in the media device volume. Previously, we had not been using a VPB (since we're a parallel stack, we look more like a network file system). However, we decided that for physical volumes we would create a VPB. That VPB points to our parallel file system device object and to the underlying media volume.

That now means that when Win32 queries the name, the I/O Manager will know to extract the name of the underlying media volume and do a name query to the file system. Since it follows FileObject->Vpb and not DeviceObject->Vpb , it gets the right device name back.

Very few applications are sensitive to the actual name of the underlying device on which they are located. If we ignored this DLL, the system booted and ran without incident. Thus, that it was so sensitive for this specific file surprised us. Our only theory was that this is an attempt in Windows Vista to attempt to detect and prevent key loggers.

Filtering MUP instead of Redirector

One of the big changes of which we were aware in Vista was the decision to move the Filter Manager's attachment from the individual redirector to the MUP aggregation device. We have previously observed that this does create strange issues for legacy filters (e.g., our own legacy filter kit, which predates the callback model and watches for filters both by name and by their registration with MUP). We suspected that we would find other subtle issues as well and we have not been disappointed.

A common technique in file systems and filter drivers is to create stream file objects via IoCreateStreamFileObjectEx (or the other two variants of this function). We do this because in our parallel file system stack we need a file object to use when interacting with the underlying file system - in this case MUP.

Imagine our surprise when, on Windows Vista, we began observing MUP bugchecks (MUP_FILE_SYSTEM ? 0x103). Here?s one crash screen from the debugger (See Figure 2, below).

1: kd> !analyze -v
*******************************************************************************
*    *
* Bugcheck Analysis  *
*    *
*******************************************************************************

MUP_FILE_SYSTEM (103)
MUP file system detected an error.
Arguments:
Arg1: 00000001, MUP_BUGCHECK_NO_FILECONTEXT
Could not locate MUP file context corresponding to a file object.
Arg2: 9a364f00, Irp Address
Arg3: 83545e38, FILE_OBJECT Address whose MUP file context could not be found
Arg4: 83cb7c68, DEVICE_OBJECT Address

Debugging Details:
------------------


DEVICE_OBJECT: 83cb7c68
DRIVER_OBJECT: 83cb7e20
IMAGE_NAME:mup.sys
DEBUG_FLR_IMAGE_TIMESTAMP:4549acc8
MODULE_NAME: mup
FAULTING_MODULE: 84c29000 mup
DEFAULT_BUCKET_ID:VISTA_RC
BUGCHECK_STR:0x103
PROCESS_NAME:System
CURRENT_IRQL:0
LAST_CONTROL_TRANSFER:from 818ad13f to 81835688

STACK_TEXT:
943df24c 818ad13f 00000003 943d49dc 00000000 nt!RtlpBreakWithStatusInstruction
943df29c 818adbac 00000003 83545e38 9a364f00 nt!KiBugCheckDebugBreak+0x1c
943df648 818acfcb 00000103 00000001 9a364f00 nt!KeBugCheck2+0x5f4
943df668 84c2e52c 00000103 00000001 9a364f00 nt!KeBugCheckEx+0x1e
943df694 81ac2681 83cb7c68 9a364f00 83cb75f8 mup!MupCleanup+0x61
943df6b8 81867c80 84f6cc35 9a364f00 83cb7c68 nt!IovCallDriver+0x252
943df6cc 84f6cc35 83cb75f8 91a5be08 9a364f00 nt!IofCallDriver+0x1b
943df6f8 81ac2681 83cb75f8 9a364f00 9a364f00 fltmgr!FltpDispatch+0xdf
943df71c 81867c80 81a0b05c 9a364f10 83cb75f8 nt!IovCallDriver+0x252
943df730 81a0b05c 82eb4940 82edc350 00000001 nt!IofCallDriver+0x1b
943df774 819dbe24 82eb4940 83545e38 00000001 nt!IopCloseFile+0x386
943df7c4 819dc6e3 82eb4940 01edc350 00000001 nt!ObpDecrementHandleCount+0x14c
943df814 819dc780 85c00358 9078d220 82eb4940 nt!ObpCloseHandleTableEntry+0x23a
943df844 819dc858 82eb4940 00000000 00000000 nt!ObpCloseHandle+0x73
943df858 818461fa 80000910 943df8e8 81843b51 nt!NtClose+0x20
943df858 81843b51 80000910 943df8e8 81843b51 nt!KiFastCallEntry+0x12a
943df8d4 940fd8a0 80000910 00000000 994eaf88 nt!ZwClose+0x11
943df8e8 940fbc2f 80000910 00000000 9190b560 OsrDmk!DmkCloseFile+0x30 [g:\projects\dmkmain\src\core\miscsupp.cpp @ 450]
943df90c 940fb70c 99520fa0 84b1f648 943df944 OsrDmk!TearDownLowerFileState+0x4f [g:\projects\dmkmain\src\core\fileobjsup.cpp @ 429]
943df948 940e11ab 99520fa0 00000000 84b1f648 OsrDmk!DmkTeardownFO+0x17c [g:\projects\dmkmain\src\core\fileobjsup.cpp @ 121]
943dfa30 9410b10a 84b1f590 9a158f00 994eaf88 OsrDmk!DmkCreateWork+0xa5b [g:\projects\dmkmain\src\core\create.cpp @ 833]
943dfa90 81ac2681 84b1f590 9a158f00 8352825c OsrDmk!ShadowDispatch+0x17a [g:\projects\dmkmain\src\core\shadow.cpp @ 253]
943dfab4 81867c80 819c8e57 835a1c5c 84b1f590 nt!IovCallDriver+0x252
943dfac8 819c8e57 943d40c0 835c7dc4 84b1f578 nt!IofCallDriver+0x1b
943dfb80 819dacdd 84b1f590 00000000 835c7d20 nt!IopParseDevice+0xcff
943dfc10 819cc94e 00000000 943dfc68 00000040 nt!ObpLookupObjectName+0x615
943dfc70 819f2198 0006f558 00000000 8186a801 nt!ObOpenObjectByName+0x13c
943dfce4 81a1879d 00300ff4 c0110000 0006f558 nt!IopCreateFile+0x5ec
943dfd30 818461fa 00300ff4 c0110000 0006f558 nt!NtCreateFile+0x34
943dfd30 77520f34 00300ff4 c0110000 0006f558 nt!KiFastCallEntry+0x12a
WARNING: Frame IP not in any known module. Following frames may be wrong.
0006f850 00000000 00000000 00000000 00000000 0x77520f34


STACK_COMMAND:kb

FOLLOWUP_IP:
mup!MupCleanup+61
84c2e52c 83650c00anddword ptr [ebp+0Ch],0

SYMBOL_STACK_INDEX:4
FOLLOWUP_NAME:MachineOwner
SYMBOL_NAME:mup!MupCleanup+61
FAILURE_BUCKET_ID:0x103_VRF_mup!MupCleanup+61
BUCKET_ID:0x103_VRF_mup!MupCleanup+61
Followup: MachineOwner

Figure 2 -- Crash Info from Bugcheck in MUP


This was an error path case - for whatever reason, the open on the file had failed and we had thus decided to tear down the file object. Since the file object points to MUP, the IRP_MJ_CLEANUP operation is passed to MUP and it then dies a horrible death because there is no FsContext value. Unfortunately, this behavior differs dramatically from what the other file systems - and notably those in the WDK - actually do in this case. Here's the relevant code from the FAT source code in the WDK (See Figure 3, below).

   //
    //  Special case the unopened file object.  This will occur only when
    //  we are initializing Vcb and IoCreateStreamFileObject is being
    //  called.
    //

    if (TypeOfOpen == UnopenedFileObject) {

        DebugTrace(0, Dbg, "Unopened File Object\n", 0);

        FatCompleteRequest( IrpContext, Irp, STATUS_SUCCESS );

        DebugTrace(-1, Dbg, "FatCommonCleanup -> STATUS_SUCCESS\n", 0);
        return STATUS_SUCCESS;
    }


Figure 3 -- Relevant Source in FAT

The description is on target here - this is an unopened file object that is being torn down because something else has gone wrong. We maintain that this really is a defect in MUP, since this behavior does not manifest in any of the actual file systems (and they all handle this case in similar fashion).

Working around this issue proved to be a bit painful. We needed to dereference the file object to make it go away in this case but we could not have it point to MUP. Thus, we decided to have it point to our device object. This just left the "small detail" of reference counts which, unfortunately, required that we do something rather ugly (after all, we're manipulating I/O Manager reference counts). Here's one of the code segments that we used for handling this issue (See Figure 4, below).

                if (pFileObject->DeviceObject &&
                    (DmkGetMupDevice()== pFileObject->DeviceObject)) {
                    LONG refCnt;
                    KIRQL oldIrql;

                    //
                    // MUP does not properly handle the IRP_MJ_CLEANUP.  This logic
                    // works around this issue but relies on several things:
                    //  - That MUP isn't really going to "go away" (unload), so the
                    //    ref count won't go to zero.
                    //  - That setting the device object to NULL will just cause the
                    //    FO to be deleted (no irp)
                    //
                    // Note that this is really an issue on Vista where the filter
                    // sits on top of MUP, versus earlier versions where it sat on
                    // the redirector.
                    //
                    // This is a VERY ugly fix - while we serialize "properly" we
                    // cannot do teardown here.  If the ref count ever goes to zero
                    // we will need to rethink this whole process.  Hopefully, we can
                    // convince Microsoft that this is a bug worthy of being fixed.
                    //
                    //
                    oldIrql = KeAcquireQueuedSpinLock(LockQueueIoDatabaseLock);

                    //
                    // Bump our own ref count.
                    //
                    DeviceObject->ReferenceCount++;

                    //
                    // Drop the ref count on the original (MUP) device
                    //
                    pFileObject->DeviceObject->ReferenceCount--;
                    refCnt = pFileObject->DeviceObject->ReferenceCount;
                    KeReleaseQueuedSpinLock(LockQueueIoDatabaseLock, oldIrql);

                    //
                    // Now we can point this file object at our own device, since
                    // we handle it properly in teardown.
                    //
                    pFileObject->DeviceObject = DeviceObject;

                    //
                    // Since we don't believe we'll ever make this go to zero, let's
                    // assert it here in case we're wrong.
                    //
                    ASSERT(refCnt);
              
                }


Figure 4 -- How We Address the MUP Issue (partially)

In our own parallel file system stack, we properly handle the IRP_MJ_CLEANUP in this case. This effectively resolved this particular problem.

Conclusion

Our experiences to date have reinforced for us the understanding that Windows Vista is a major new version of Windows. It includes numerous changes, often subtle, that impact directly on file systems and file system filter drivers. Some of these can be easy to work around, others are far more complex.

The lesson for everyone in this field is to keep in mind that your file systems and file system filters will observe different behavior in Vista than they did in earlier OS versions. The challenge is thus to find them and resolve them in a fashion that is applicable to all the platforms that you support.

This article was printed from OSR Online http://www.osronline.com

Copyright 2017 OSR Open Systems Resources, Inc.