Handling for NTFS Delete operations

Hi Folks,

We are working on encryption mini filter driver that includes a header
followed by the file data in encrypted format

We have a test suite where we are copying files to the USB and the driver
encrypts them once they are copied to the removable media and the extension
of the file is changed.

The test cases does this these operations in a loop

1.Copy the files to USB

2.Check if they are encrypted

3.Delete the files using DELETE_ON_CLOSE

4.Goto step 1 .

We clean our state of the file being deleted during the Cleanup processing
of the Delete on Close FO.

We observe that after a few iterations the some of the files are remaining
unencrypted, this issue is what we want to resolve .

The reason we analyzed for this behavior is when the second or later
iteration of the suite is running even though the encrypted file is
deleted, there comes a PostCreate for one of the encrypted files from a
previous test iteration. This leads to the state of the file being treated
as encrypted by our driver as the header is now read from the deleted
file. This leads to the file not being encrypted.

As part of the check, we tried IRP_MJ_QUERY_INFORMATION with
FileStandardInformation to check if the delete pending flag is set to
ignore setting our state for the deleted file. However that did not help.

So we are looking to see if there is a way to find identify if the file is
deleted from the context of the old IO that we are getting in PostCreate ?

Any inputs will be appreciated,

Thanks,

Mehtab

@Symantec

what is your current target operating system on which you are running the test suite.
if it is windows 8 and above the setdisposition you won’t get due to delete_on_close flag. its an optimization introduced in windows 8 and above.
better to do : just check and take a decision on precreate: check for delete_on_close flag if required set a flag and keep it in your fcb somewhere.
at pre cleanup : check for the flag and do a fltsetinfor with setdisposition.

Hope this may help.

.nT

Are you working on an isolation filter when a filter initializes a file object and completes create request or just modify requests for a file object initialized by an underlying file system driver? Your description looks more like the latter one. This design has inherit race conditions that can’t be eradicated completely. You can try to serialize all critical operations to remove concurrency from the design.

On 07/05/2017 07:11 PM, Nadeem Syed66 wrote:

The test cases does this these operations in a loop

1.Copy the files to USB

2.Check if they are encrypted

3.Delete the files using DELETE_ON_CLOSE

4.Goto step 1 .

So, you’d get a precreate and postcreate, as well as precleanup and
postcleanup in #1, and again in #3. I’m assuming your test does the
right thing in #3 and calls CloseHandle() and waits for that to finish.

The reason we analyzed for this behavior is when the second or later
iteration of the suite is running even though the encrypted file is
deleted, there comes a PostCreate for one of the encrypted files from a
previous test iteration. This leads to the state of the file being
treated as encrypted by our driver as the header is now read from the
deleted file. This leads to the file not being encrypted.

This doesn’t make sense, because the test needs to wait for its creates
to complete before proceeding. If the test was really running #3 and #1
in parallel, then you’d have other problems (that have nothing to do
with your driver.)

But even if some other opener was floating around the system and
happened to hit this race, note that NTFS will consider the old file and
the new file to be completely different files. They’ll have different
file contexts, stream contexts, file IDs, everything. About the only
thing they have in common is a name.

So one possibility is you’re trying to perform name based
lookups/equality checks to ask this question, and that’s obviously
vulnerable to race conditions as names get reused via deletes or renames.

As part of the check, we tried IRP_MJ_QUERY_INFORMATION with
FileStandardInformation to check if the delete pending flag is set to
ignore setting our state for the deleted file. However that did not help.

There are two cases here:

  1. If the test is really executing postcreate from step #1, then it
    hasn’t gotten far enough to mark the object as deleted. Similarly, if
    #1 and #3 are running in parallel, the create for #3 hasn’t had its
    handle closed yet, so the file won’t be delete pending.
  2. If this create came from some other source, it could either go to the
    pre-deleted old file, or to the new file that’s using the same name
    after the previous delete. In the latter case you wouldn’t expect it to
    report itself as deleted. Again, this smells like conflating the two
    different files.

I provide some more analysis on this so as it could help others. However first answers some of the queries above

I am running this on Window 7 as well as 10. So the issue was there for all OS’s.
for Malcolm

So, you’d get a precreate and postcreate, as well as precleanup and postcleanup in #1, and again in #3. I’m assuming your test does the right thing in #3 and calls CloseHandle() and waits for that to finish.

Yes the scripts are running serially and are basically python commands for file operations like copy and delete.

Updated Analysis

When the new file is created it is getting assigned the data blocks of the previously deleted file. So during a postCreate processing of the new file, when the driver reads the blocks directly from the disk it reads the older encryption header still present of the previously deleted file, as the new data is still not flushed. This leads to it incorrectly updating the state of the file as encrypted.

-Mehtab

On 07/06/2017 07:41 PM, xxxxx@gmail.com wrote:

When the new file is created it is getting assigned the data blocks of the previously deleted file. So during a postCreate processing of the new file, when the driver reads the blocks directly from the disk it reads the older encryption header still present of the previously deleted file, as the new data is still not flushed.

Why would it read directly from disk? Since contents there are
undefined, the behavior of anything trying to reason about it is also
undefined. If the filter read through the file, the new file would not
have an updated VDL and would not return stale data.


http://www.malsmith.net

>Why would it read directly from disk? Since contents there are
undefined, the behavior of anything trying to reason about it is also
undefined. If the filter read through the file, the new file would not
have an updated VDL and would not return stale data.

The new file content are being copied and they are resulting in the file size and vdl not being 0. We are doing a non-cache IO to the underlying FS, NTFS in this case and its returning the old data.

On 07/08/2017 04:46 AM, xxxxx@gmail.com wrote:

The new file content are being copied and they are resulting in the file size and vdl not being 0. We are doing a non-cache IO to the underlying FS, NTFS in this case and its returning the old data.

There are three ways to read data:

  1. Through the cache. In NT, this will always return any dirty data, so
    unwritten data is still returned in the read.
  2. As noncached (nonpaging). In NTFS, this will cause the region being
    read to be flushed to the disk, then the read goes to the disk. This
    will also return dirty data, albeit by flushing it first.
  3. As paging. Paging really means, “I’m the memory manager, trust me, I
    know what I’m doing.” It will read what’s really on the disk without
    attempting to flush anything, because the memory manager would never ask
    to read something if it had a newer version of it.

So it sounds like you’re issuing a paging read to observe this behavior.

Some options:

  1. Use a noncached, non-paging read.
  2. Use a cached read.
  3. Read through a memory mapping.
  4. Flush the file before issuing the read.
    (There’s probably many more.)

But there’s also something here that I’m unclear about. If this logic
were in post create, there’s no ability for a caller to dirty the data
in the cache until at least one handle exists to the file. Is this
logic in post create or at some other time?

  • M


http://www.malsmith.net

Yes ,we are doing a non-cached paging read, that helps explain the reason why it returns the old data.
I tried doing a non-cached and non-paging read and that resolved the issue.

Also this logic is in post-create, however its not the first create we are getting this issue, it is in some later create processing when other applications are trying to access to the file when the new file being copied that we hit this state

Thanks,
Mehtab