Hitting an oplock in Pre-Read

Ged_Murphy · August 24, 2015, 9:34am

Hi all,

Can someone please give me some pointers on how the correct way to avoid
hitting an oplock.

My driver mirrors files from another location on the local disk. On first
open the driver creates a zero byte placeholder file to match the remote
file.
I want to pre-populate the placeholder on first access, so my intention was
to copy the data into position in the Pre-Read or Pre-Write callbacks, then
let the normal operation go through. However I find that my call to
FltCreateFile* never returns and it appears I’m hitting an oplock (adding
FILE_OPEN_REQUIRING_OPLOCK will return STATUS_CANNOT_BREAK_OPLOCK)

What’s the correct method for handling this scenario?
I obviously can’t use the FILE_OBJECT that is passed in, but I can’t get my
own FILE_OBJECT. If I break the oplock using FILE_COMPLETE_IF_OPLOCKED then
it hangs in the FltWriteFile when copying the data into the file.

Thanks.

OSR_Community_User · August 24, 2015, 3:17pm

You will need to defer your work to after the oplock break.

The problem is that you aren’t “breaking the oplocks” when you specify “FILE_COMPLETE_IF_OPLOCKED”, you’re telling the FSD that you’ll wait until the oplock break is done (which you find out by doing an FSCTL_OPLOCK_BREAK_NOTIFY) before you do I/O. You violate the promise you made to the file system and thus hang when you do the write because the oplock break is still pending.

The point is that until you return control back to the caller, the oplocks break isn’t going to happen (typically because kernel APCs are disabled). The very act of returning control back to the caller normally causes kernel APCs to be re-enabled, so the oplocks IRP can be completed and the oplock break can be processed.

So why don’t you hold your write until the oplocks break notify completes? You can safely do the write at that point (after FSCTL_OPLOCK_BREAK_NOTIFY returns from the FSD).

Tony
OSR

Ged_Murphy · August 24, 2015, 3:50pm

Hi Tony,

Thanks for clearing up the FILE_COMPLETE_IF_OPLOCKED usage in FltCreateFile,
that part makes sense now.

I’m not sure how I can defer the work until the oplock is broken though. At
what point is the oplock broken?
I’m assuming (perhaps incorrectly) that the lock is held on the read or
write operation before it enters my pre-callback. I’m also assuming that the
lock is only broken once I’ve relinquished control and allowed the operation
to go down to the FSD and back up to the caller. However for the read
operation, the caller is expecting data on its return. If I let the call go
down to the FSD without pre-populating the file, the FSD will just see an
empty file and the caller will get an END_OF_FILE error.

I suppose I could fulfil the callers initial request using data from the
remote location, then do the copy once the caller has broken the lock, but
that feels a bit hacky, and it’s even more yucky for the write operation.

If the lock is broken by the FSD, I suppose I could also modify the file and
the operation in the post-callback, but that feels wrong too.

What do you suggest? Do I have my whole ‘pre-populate’ architecture wrong?

Thanks,
Ged.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: 24 August 2015 20:16
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Hitting an oplock in Pre-Read

You will need to defer your work to after the oplock break.

The problem is that you aren’t “breaking the oplocks” when you specify
“FILE_COMPLETE_IF_OPLOCKED”, you’re telling the FSD that you’ll wait until
the oplock break is done (which you find out by doing an
FSCTL_OPLOCK_BREAK_NOTIFY) before you do I/O. You violate the promise you
made to the file system and thus hang when you do the write because the
oplock break is still pending.

The point is that until you return control back to the caller, the oplocks
break isn’t going to happen (typically because kernel APCs are disabled).
The very act of returning control back to the caller normally causes kernel
APCs to be re-enabled, so the oplocks IRP can be completed and the oplock
break can be processed.

So why don’t you hold your write until the oplocks break notify completes?
You can safely do the write at that point (after FSCTL_OPLOCK_BREAK_NOTIFY
returns from the FSD).

Tony
OSR

—
NTFSD is sponsored by OSR

OSR is hiring!! Info at http://www.osr.com/careers

For our schedule of debugging and file system seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

OSR_Community_User · August 24, 2015, 4:15pm

Oplocks are broken in a number of places, but create is one of the more common. In this case, a second open to the file will trigger the oplocks break. Due to the implementation model used by the SMB server, it’s possible for the thread performing the create to ALSO be the thread that “owns” the oplocks (typically on behalf of a different client). But oplocks breaks are processed via a callback (kernel APC). As long as kernel APCs are blocked, the oplocks cannot be broken.

Once a file has multiple opens on it, the choice of oplocks that are allowed is vastly smaller - those that permit a client to cache a CLEAN copy of the data. If the data is written, those oplocks can be broken, but that won’t block the caller (it’s just a notification to the client that the cached copy is no longer valid).

You’re missing the point: if STATUS_OPLOCK_BREAK_IN_PROGRESS is returned (in the create path), the caller can’t do a read from the file (or a write to the file) until they’ve been told the oplocks break is done - they must call FSCTL_OPLOCK_BREAK_NOTIFY (via the file system control path) before they can use the file object.

If you really are seeing reads on the file object BEFORE they’ve completed the oplock break then there’s a bug. Of course, it wouldn’t be the first time I’ve seen that sort of behavior “just work” and find applications exploiting it.

So, assuming that you have reads on files that you must satisfy, then you should do so from your cached information. Think of it as an asynchronous write-back cache: you read from the remote into your in-memory cache. When you do a read, you look and see if it is in your in-memory cache first and if it is, read from there. If it isn’t, either fetch from remote or read from local (as appropriate).

Then you can present it as a “performance optimization”!

Tony

Ged_Murphy · August 25, 2015, 12:57pm

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Tony Mason
Sent: 24 August 2015 21:15
To: Windows File Systems Devs Interest List
Subject: RE: [ntfsd] Hitting an oplock in Pre-Read

> If you really are seeing reads on the file object BEFORE they’ve
> completed the oplock break then there’s a bug. Of course, it
> wouldn’t be the first time I’ve seen that sort of behavior “just
> work” and find applications exploiting it.

Perhaps there is, although I’d be surprised if I uncovered a bug in doing
something relatively simple.

What’s happening in this specific case is:
- On directory refresh, I’m adding non-existent (fake/remote) files to the
QueryDirectory routine’s return buffer. (Explorer will eventually try to
read from these files to learn what’s in them).
- Explorer requests handles the files, and on the first pre-create I create
the file which is just an empty place holder. I then allow the call to
continue to the FSD and it succeeds.
- Explorer opens multiple other handles to the same file. All these
obviously open without any issues.
- Now explorer tries to read from one of the files. It’s at this point (in
my pre-read callback) that I need to pre-populate the file so the FSD has
something to read. I open the remote file and I try to open the local file
in order to copy the data into place before I allow the call to go down the
stack. It’s the opening of this loca/destination file that’s hitting the
oplock. In my mind, this shouldn’t be happening.

Using your advise, what I think I’m going to have to do to work around it is
to create a generic work item and call FSCTL_OPLOCK_BREAK_NOTIFY on that
thread until I can access the file. In the meantime I’ll read from the
file_object to the remote file to fulfil the request on the local file and
complete the op so it doesn’t enter the FSD. When the oplock is broken, I’ll
use the worker thread to do the copy instead of trying to do it in the
pre-read.
I’ll worry about speeding things up with caching at a later date and just
rely on Cc to handle my caching for now.

It still feels a little hacky though, and I’m still not clear on why I’m
seeing this lock in the pre-read on a local file.

Thanks for your help with this.
Ged.