Seeing only IRP_MJ_CLOSE after a successful IRP_MJ_CREATE. Missing IRP_MJ_CLEANUP??

Antti_Nivala · August 4, 2009, 4:56pm

I have encountered a repeatable sequence that leads to our driver receiving only IRP_MJ_CLOSE for a file object for which we have just processed IRP_MJ_CREATE and returned a success code. This is happening when saving a document in Word 2007 if we delay the processing of the IRP_MJ_CREATE request that asks for access 0x01130089.

If we delay this IRP_MJ_CREATE a bit, then (I’m guessing) some filter driver or the OS or Word decides to cancel the request. The problem is that in our FSD (full FSD, not filter), we have already processed IRP_MJ_CREATE and returned a success code. In FileMon, I see that the IRP_MJ_CREATE request appears with the SUCCESS return value. Of course, while processing IRP_MJ_CREATE, we have called IoSetShareAccess or IoCheckShareAccess and updated the relevant ShareAccess.

Now that our driver has updated ShareAccess and completed IRP_MJ_CREATE with a success code, we get into trouble when we don’t get IRP_MJ_CLEANUP (ShareAccess doesn’t get undone). We see only IRP_MJ_CLOSE in FileMon output in this case. In FileMon, the IRP_MJ_CLOSE that appears after the successful IRP_MJ_CREATE appears to be invoked by System.

I have verified with various methods that the IRP_MJ_CREATE and IRP_MJ_CLOSE that I am talking about are really for the same CCB / FileObject and that really there is no IRP_MJ_CLEANUP in between. Actually, I am not yet even sure if that IRP_MJ_CLOSE gets into our driver either, but in FileMon I can see it.

I can consistently reproduce this problem on two machines (identical HP laptops). I cannot reproduce it on many other machines. Reproducing the problem also seems to require that I have F-Secure Anti-Virus 8.0 enabled, but even with F-Secure Anti-Virus enabled, I cannot reproduce the problem on other machines than these HP laptops.

To me, it seems likely that one of the filter drivers on the machines could be misbehaving. Looking at IoCancelFileOpen documentation it seems that maybe some filter driver is incorrectly cancelling a request even though a handle has already been created. But I need to know this for sure.

Or, is the fault in the end in our FSD after all? Should we be prepared for this kind of situation (= not receiving IRP_MJ_CLEANUP for a handle we have opened with a successful IRP_MJ_CREATE)? Are we expected to revoke ShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP is missing?

Best regards,
Antti Nivala

Scott_Noone_OSR · August 5, 2009, 2:27pm

It certainly sounds like a bug in a filter above you.

Does the create ultimately fail back to the user? If a filter took your
successful create request and then completed it with an error without
calling IoCancelFileOpen I could see this happening. It’s why
IoCancelFileOpen exists, to send an IRP_MJ_CLEANUP request to the devices
below that believe the request was successful.

If the request is failing, then the question is of course who is failing it.
If you have everything isolated down to a reproducible create request you
could always put an access breakpoint on Irp->IoStatus.Status.

Good luck!

-scott

–
Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

wrote in message news:xxxxx@ntfsd…
>I have encountered a repeatable sequence that leads to our driver receiving
>only IRP_MJ_CLOSE for a file object for which we have just processed
>IRP_MJ_CREATE and returned a success code. This is happening when saving a
>document in Word 2007 if we delay the processing of the IRP_MJ_CREATE
>request that asks for access 0x01130089.
>
> If we delay this IRP_MJ_CREATE a bit, then (I’m guessing) some filter
> driver or the OS or Word decides to cancel the request. The problem is
> that in our FSD (full FSD, not filter), we have already processed
> IRP_MJ_CREATE and returned a success code. In FileMon, I see that the
> IRP_MJ_CREATE request appears with the SUCCESS return value. Of course,
> while processing IRP_MJ_CREATE, we have called IoSetShareAccess or
> IoCheckShareAccess and updated the relevant ShareAccess.
>
> Now that our driver has updated ShareAccess and completed IRP_MJ_CREATE
> with a success code, we get into trouble when we don’t get IRP_MJ_CLEANUP
> (ShareAccess doesn’t get undone). We see only IRP_MJ_CLOSE in FileMon
> output in this case. In FileMon, the IRP_MJ_CLOSE that appears after the
> successful IRP_MJ_CREATE appears to be invoked by System.
>
> I have verified with various methods that the IRP_MJ_CREATE and
> IRP_MJ_CLOSE that I am talking about are really for the same CCB /
> FileObject and that really there is no IRP_MJ_CLEANUP in between.
> Actually, I am not yet even sure if that IRP_MJ_CLOSE gets into our driver
> either, but in FileMon I can see it.
>
> I can consistently reproduce this problem on two machines (identical HP
> laptops). I cannot reproduce it on many other machines. Reproducing the
> problem also seems to require that I have F-Secure Anti-Virus 8.0 enabled,
> but even with F-Secure Anti-Virus enabled, I cannot reproduce the problem
> on other machines than these HP laptops.
>
> To me, it seems likely that one of the filter drivers on the machines
> could be misbehaving. Looking at IoCancelFileOpen documentation it seems
> that maybe some filter driver is incorrectly cancelling a request even
> though a handle has already been created. But I need to know this for
> sure.
>
> Or, is the fault in the end in our FSD after all? Should we be prepared
> for this kind of situation (= not receiving IRP_MJ_CLEANUP for a handle we
> have opened with a successful IRP_MJ_CREATE)? Are we expected to revoke
> ShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP is missing?
>
> Best regards,
> Antti Nivala
>
>

Daniel_Terhell · August 5, 2009, 3:17pm

Normally you should be prepared not to see a cleanup. Cleanup is sent when
the last handle to a file object is closed. Close is sent when the last
reference to a file object is deleted. If you are sure to have a valid
handle I would think there’s something wrong.

//Daniel

wrote in message news:xxxxx@ntfsd…
> Or, is the fault in the end in our FSD after all? Should we be prepared
> for this kind of situation (= not receiving IRP_MJ_CLEANUP for a handle we
> have opened with a successful IRP_MJ_CREATE)? Are we expected to revoke
> ShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP is missing?
>
> Best regards,
> Antti Nivala
>
>

Scott_Noone_OSR · August 5, 2009, 3:29pm

> If you are sure to have a valid handle I would think there’s something

wrong.

My reading was that he completed the IRP_MJ_CREATE request with success from
his FSD. If that is the case then he should expect to get a cleanup.

-scott

–
Scott Noone
Consulting Associate
OSR Open Systems Resources, Inc.
http://www.osronline.com

wrote in message news:xxxxx@ntfsd…
> Normally you should be prepared not to see a cleanup. Cleanup is sent when
> the last handle to a file object is closed. Close is sent when the last
> reference to a file object is deleted. If you are sure to have a valid
> handle I would think there’s something wrong.
>
> //Daniel
>
>
> wrote in message news:xxxxx@ntfsd…
>> Or, is the fault in the end in our FSD after all? Should we be prepared
>> for this kind of situation (= not receiving IRP_MJ_CLEANUP for a handle
>> we have opened with a successful IRP_MJ_CREATE)? Are we expected to
>> revoke ShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP is missing?
>>
>> Best regards,
>> Antti Nivala
>>
>>
>

Daniel_Terhell · August 5, 2009, 3:46pm

Right this is a FSD, not a filter. I just put my “knowledge” here in case
anyone understands this as always to expect a cleanup for every file object
created.

//Daniel

“Scott Noone” wrote in message news:xxxxx@ntfsd…
>> If you are sure to have a valid handle I would think there’s something
>> wrong.
>
> My reading was that he completed the IRP_MJ_CREATE request with success
> from his FSD. If that is the case then he should expect to get a cleanup.
>
> -scott
>
> –
> Scott Noone
> Consulting Associate
> OSR Open Systems Resources, Inc.
> http://www.osronline.com
>
>
> wrote in message news:xxxxx@ntfsd…
>> Normally you should be prepared not to see a cleanup. Cleanup is sent
>> when the last handle to a file object is closed. Close is sent when the
>> last reference to a file object is deleted. If you are sure to have a
>> valid handle I would think there’s something wrong.
>>
>> //Daniel
>>
>>
>> wrote in message news:xxxxx@ntfsd…
>>> Or, is the fault in the end in our FSD after all? Should we be prepared
>>> for this kind of situation (= not receiving IRP_MJ_CLEANUP for a handle
>>> we have opened with a successful IRP_MJ_CREATE)? Are we expected to
>>> revoke ShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP is missing?
>>>
>>> Best regards,
>>> Antti Nivala
>>>
>>>
>>
>
>
>

Antti_Nivala · August 5, 2009, 3:55pm

Scott, Daniel,

Many thanks for the feedback.

Scott wrote:
Does the create ultimately fail back to the user?

In FileMon, I see that the IRP_MJ_CREATE request appears with the
SUCCESS return value. I thought that meant that the request is succeeding to the user (winword.exe) but I guess that is not certain if some filter is working on top of FileMon(?).

I haven’t yet gained kernel debugging access to the machine where this problem is reproducible so I’m currently relying on user-mode remote debugging info and FileMon logs. I will try to get full access to the system and confirm these things.

Daniel wrote:
If you are sure to have a valid handle I would think there’s something wrong.

When exactly can I be sure that such a valid handle exists? Previously, I was simply thinking that if I complete IRP_MJ_CREATE with STATUS_SUCCESS, then from my point of view creation has completed 100-percently and I can rely on receiving IRP_MJ_CLEANUP and IRP_MJ_CLOSE later. Is this assumption wrong? For example, is it possible that after I have completed IRP_MJ_CREATE with STATUS_SUCCESS, something special happens in the system and a “valid handle” is not created after all?

(I apologize, I may not be using the concepts correctly in this case – I guess I don’t fully understand the difference of the handle getting created vs. our FSD completing the IRP_MJ_CREATE request.)

In documentation I see the FileObject flag FO_HANDLE_CREATED. From the FastFat codes I have concluded that this flag is set by the system when the time is right. We are not setting it in our FSD, but whenever I get IRP_MJ_CLEANUP, that flag seems to be have been set, which makes sense.

Daniel wrote:
Normally you should be prepared not to see a cleanup. Cleanup is sent when
the last handle to a file object is closed. Close is sent when the last
reference to a file object is deleted.

Daniel – does this mean that we should call IoRemoveShareAccess in IRP_MJ_CLOSE processing in some cases? In FastFat codes, I see that IoRemoveShareAccess is called during IRP_MJ_CLEANUP only (plus in failed code paths of IRP_MJ_CREATE, of course)? A key part of the problem I am seeing is that the successful IRP_MJ_CREATE processing in our FSD has called IoSetShareAccess or IoCheckShareAccess, and that is not getting revoked at any point later because we don’t get IRP_MJ_CLEANUP.

I think it is likely that something that I am describing here is incorrect, but I will be able to see which part only after I get kernel-mode debugging access to this particular machine. Meanwhile, I’d appreciate if you could clarify to me the FSD’s responsibility of calling IoRemoveShareAccess: Is it sufficient to do it in IRP_MJ_CLEANUP only? Or, is it possible that IRP_MJ_CLEANUP is not always received and the FSD should thus be prepared to call IoRemoveShareAccess in its IRP_MJ_CLOSE processing in some cases (unlike FastFat)?

-Antti

Antti_Nivala · August 5, 2009, 4:06pm

Sorry, I posted just before I saw the additions by Scott and Daniel.

Scott wrote:
My reading was that he completed the IRP_MJ_CREATE request with success from
his FSD. If that is the case then he should expect to get a cleanup.

OK, this answers that part of my concerns.

I conclude that I do not need to look at adding IoRemoveShareAccess calls to IRP_MJ_CLOSE processing. The proper place to call IoRemoveShareAccess is in IRP_MJ_CLEANUP processing, like we see in FastFat codes.

So, the remaining mystery to me is why I do not receive IRP_MJ_CLEANUP for this particular file object. I’ll try to get to the bottom of it with additional debugging as per Scott’s advice.

-Antti

OSR_Community_User · August 5, 2009, 4:13pm

> I conclude that I do not need to look at adding IoRemoveShareAccess calls to IRP_MJ_CLOSE

processing. The proper place to call IoRemoveShareAccess is in IRP_MJ_CLEANUP processing,

I would have a special path “close without preceding cleanup” just to resolve the interop with that filter, and call IoRemoveShareAccess there.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

OSR_Community_User · August 5, 2009, 5:41pm

Hi Antti -

When Word does a replace of the tmp and doc files, it opens the file with ACCESS_SYSTEM_SECURITY. If the thread doesn’t have that permission, the open will fail. NTFS usually checks for this and returns an error status, but I have seen cases where it doesn’t. FAT does not check for this and I am assuming your FSD doesn’t either. What happen then is that the open completes successfully, but before the IoMgr returns an open handle to the user, it does the access check. At that point the open is failed. I have always seen a cleanup in this case, not sure why you don’t if this is the same problem you are seeing. This sequence of events can cause the share access on the file to become out of whack if caching in initiated on the file in the create path, and opens that should succeed won’t.

OSR_Community_User · August 5, 2009, 6:03pm

Hi Antti -

I pulled out the details of the issue, and I had forgotten that in this scenario a cleanup irp is never sent. Here is what happens with FAT:

Word calls ReplaceFile() on the doc file

NtOpenFile() is called on the doc file by ReplaceFile() with desired access
DELETE and ACCESS_SYSTEM_SECURITY

The FAT file system gets the open request, increments the READERS and DELETERS
counters in the File Control Block (FCB) and returns STATUS_SUCCESS

If caching is initiated post create, the cache manager and the memory manager take
references on the the file object.

The open request returns to the IoMgr and it now tries to create a handle for
the opened file.

Because the ACCESS_SYSTEM_SECURITY access flag was set for the open,
SePrivilegeCheck() is called to check the thread for privilege
SeSecurityPrivilege.

The thread does not have the SeSecurityPrivilege privilege so the IoMgr fails
the open and returns STATUS_PRIVILEGE_NOT_HELD to ReplaceFile(). Because the
open failed, a CLEANUP irp is never sent so the file object referenced by the
cache manager and memory manager is never cleaned up and the FCB reference
count will always include this file object.

A little bit later Word tries to open the doc file with share_read and
share_write access but not shared_delete. Since the FCB has a DELETER
reference, FAT fails the request.

Antti_Nivala · August 6, 2009, 3:36pm

Hi Melissa,

Thanks for the extremely useful information. I haven’t yet been able to try to confirm that this is what is going on in our case, but based on your description it seems very likely. Our FSD implementation is similar to FastFat in access checks and as you assumed, we are not checking for ACCESS_SYSTEM_SECURITY.

I will try to work around the problem by e.g. adding the checking for ACCESS_SYSTEM_SECURITY so that we can fail the IRP_MJ_CREATE request ourselves in this case (like NTFS would do, I guess).

I will also consider Maxim’s suggestion of also preparing to call IoRemoveShareAccess in IRP_MJ_CLOSE if IRP_MJ_CLEANUP has not been seen.

-Antti

Antti_Nivala · August 10, 2009, 5:00pm

Hi,

Below is similar and also very precise information from Kaspersky Lab that I received by e-mail. The sequence they describe is exactly what we are seeing:

------ begin quote

Now we can explain what’s going on:

Assume some file is opened with a desired access set to GENERIC_READ | ACCESS_SYSTEM_SECURITY | DELETE and a share mode set to FILE_SHARE_READ | FILE_SHARE_DELETE and a thread which initiates this create operation has no SeSecurityPrivilege.
Almost all AV products (if not each and every) with resident on-access monitor modules check files in their post-create paths, in other words they read files.
Now suppose that such an AV product performs a cached read from the previously mentioned file, so this cached read operation initializes a cache map on a file object for this file, therefore this file object is now referenced by the Cache Manager.
If this file is opened on an NTFS partition, then the step 3 is not executed because the underlying NTFS driver fails this create request with STATUS_PRIVILEGE_NOT_HELD error code which in turn is propogated to a post-create routine of an AV filter driver, so this filter driver doesn’t perform a read operation for this file.
In case of a file opened on a FAT partition the situation changes. The thing is that the FAT driver doesn’t check SeSecurityPrivilege when the ACCESS_SYSTEM_SECURITY flag is set. So the FAT driver successfully completes such a create request, updates a share access mask in a corresponding FCB and returns it to the I/O Manager. But in this case an AV filter driver does perform a cached read operation on the file in question (because a create operation succeeded).
When the I/O Manager performs final processing for this request it checks whether SeSecurtyPrivilege correlates with the ACCESS_SYSTEM_SECURITY flag. If not then the I/O Manager simply fails this request and since a file handle is not yet created, the I/O Manager doesn’t initiate IRP_MJ_CLEANUP request.
And finally assume the file in question is opened once more with a desired access set to GENERIC_READ and a share mode set to FILE_SHARE_READ. In case of a FAT partition this new request will fail with STATUS_SHARING_VIOLATION error code because a corresponding FCB (which is still safe and sound due to the Cache Manager’s reference on the related file object) has FILE_SHARE_DELETE bit set but this new create request didn’t specified it.

So taking into account the above explanation you can try to resolve this issue straight in your FSD by simply checking SeSecurityPrivilege against the ACCESS_SYSTEM_SECURITY flag while processing IRP_MJ_CREATE requests, and if SeSecurityPrivilege is not set while ACCESS_SYSTEM_SECURITY is present, you should fail such requests. Suppose it won’t make things worse because the I/O Manager will nevertheless check this privilege and in case it’s not held the I/O Manager will forcibly fail such requests but with a nasty side effect of blocked files.

------ end quote

Many thanks to everybody who helped, and special thanks to Melissa from Symantec and the guys at Kaspersky Lab who were able to hit the nail on the head!

I have implemented the proposed change to our FSD (= checking for SeSecurityPrivilege when ACCESS_SYSTEM_SECURITY is requested and failing the request in our FSD if necessary). As far as I can tell, this has fully solved this particular problem.

However, I think the main issue still remains… Things are not going well if an FSD returns STATUS_SUCCESS from IRP_MJ_CREATE, then an AV filter performs a read, and then I/O Manager decides to fail the create request. In this case, the FSD doesn’t receive IRP_MJ_CLEANUP and ShareAccess is not revoked.

To me, this seems to have similarities to the “Emerging issue with IoCancelFileOpen” article here on the OSR site. Anyone willing to comment on this further?

Best regards,
Antti

Antti_Nivala · August 10, 2009, 5:22pm

Additional note: I found the following article on the Internet that suggests that this kind of conflict has been fixed in Vista SP1:

http://www.sophos.com/support/knowledgebase/article/46352.html

Does anyone now more about this? Does this apply to the case we have discussed in this thread? If yes, how was it fixed? By changing FastFat, or by changing I/O Manager, or what?

-Antti

rod_widdowson · August 11, 2009, 5:21am

> To me, this seems to have similarities to the "Emerging issue with

IoCancelFileOpen" article
here on the OSR site. Anyone willing to comment on this further?

Absolutely. Consider your mail:

Now suppose that such an AV product performs a cached read from the
previously mentioned file,
so this cached read operation initializes a cache map on a file object for
this file,
therefore this file object is now referenced by the Cache Manager.

And consider the OSR article

OSR developers have suggested that in fact this is a bug in the lower
filter
because it is never safe to perform a cached I/O call IRP_MJ_READ or
IRP_MJ_WRITE with the file object from the IRP_MJ_CREATE.

So it is a broken filter.

NTFS tries quite hard to defend itself from this by always growing a stream
file object to back the cache with and I would suggest you investigate that.
However note that it isn’t totally safe since the filter might instead map a
section which will pin the file object.