STATUS_THREAD_IS_TERMINATING returned by FltSendMessage

Hi,

I have just analyzed once more a very strange scenario with FltSendMessage in our minifilter. We reproduced it three or four times in the last days, but each of them was a during a quite long and heavy stress testing, so its very hard to reproduce.

We are using FltSendMessage to send requests to a user mode process, up to N requests in parallel at any time (properly synchronized using a counter and a spinlock around the counter), and we have N threads in user mode waiting each of them with FilterGetMessage to receive one message (using a per-thread overlapped structure, and WaitForMultipleObjects to wait on the overlapped structure to signal message-received and a termination event, then GetOverlappedResult if necessary), then process it and send a reply using FilterReplyMessage.
The minifilter waits using FltSendMessage up to 11000 ms to receive an answer. If no answer is received, it timeouts with STATUS_TIMEOUT.

We usually send tens of thousands of requests (send message - wait answer cycles) without problems. Then, very rarely, the FltSendMessage call returns suddenly STATUS_THREAD_IS_TERMINATING, usually very quickly (so, it is NOT a timeout), even before we could process the message in user mode, WITHOUT the user mode thread that processes the message being actually terminated at all (or any other of our user mode processing threads). Then, we get to process the message in user mode, we call FilterReplyMessage, which returns ERROR_FLT_NO_WAITER_FOR_REPLY (normally unexpected by us, however logical in this situation).

For example, on of our concrete cases is:

  • before FltSendMessage, timestamp 18:18:31.91
  • message reveiced in user mode, after GetOverlappedResult, timestamp 18:18:31:91
  • after FltSendMessage, timestamp 18:18:31.97, STATUS_THREAD_IS_TERMINATING
  • message processed in user mode, before FilterReplyMessage, timestamp 18:18:32.24
  • after FilterReplyMessage, timestamp 18:18:32.24, ERROR_FLT_NO_WAITER_FOR_REPLY

The strange part is, that we precisely trace the start and termination of all of our message processing threads. In this particular case, all threads where gracefully terminated upon request (upon signaling that request-terminate event that WaitForMultipleObjects was waiting on in each thread) LONG after this event. All threads terminated between timestamp 18:39:27.82 and 18:39:27.84, so almost 11 minutes later (after we found during the test this particular error at FltSendMessage). Also, beside this error, the whole stress testing, all other messages / requests both before and after this particular request where processed correctly, both in the minifilter and in all of the user mode threads.

any ideas, opinions, similar behaviors observed?

thank you very much,

Sandor LUKACS
Virus Analyst, SOFTWIN

In case you observe this more often on Vista than on XP, I would guess
STATUS_THREAD_IS_TERMINATING means that *not* your receiving usermode thread
is terminating…but instead the thread in whose context you call
FltSendMessage is terminating.

This would be related to the topic “how to handle canceling of synchronous
I/O”.

wrote news:xxxxx@ntfsd…
> Hi,
>
> I have just analyzed once more a very strange scenario with FltSendMessage
> in our minifilter. We reproduced it three or four times in the last days,
> but each of them was a during a quite long and heavy stress testing, so
> its very hard to reproduce.
>
> We are using FltSendMessage to send requests to a user mode process, up to
> N requests in parallel at any time (properly synchronized using a counter
> and a spinlock around the counter), and we have N threads in user mode
> waiting each of them with FilterGetMessage to receive one message (using a
> per-thread overlapped structure, and WaitForMultipleObjects to wait on the
> overlapped structure to signal message-received and a termination event,
> then GetOverlappedResult if necessary), then process it and send a reply
> using FilterReplyMessage.
> The minifilter waits using FltSendMessage up to 11000 ms to receive an
> answer. If no answer is received, it timeouts with STATUS_TIMEOUT.
>
> We usually send tens of thousands of requests (send message - wait answer
> cycles) without problems. Then, very rarely, the FltSendMessage call
> returns suddenly STATUS_THREAD_IS_TERMINATING, usually very quickly (so,
> it is NOT a timeout), even before we could process the message in user
> mode, WITHOUT the user mode thread that processes the message being
> actually terminated at all (or any other of our user mode processing
> threads). Then, we get to process the message in user mode, we call
> FilterReplyMessage, which returns ERROR_FLT_NO_WAITER_FOR_REPLY (normally
> unexpected by us, however logical in this situation).
>
> For example, on of our concrete cases is:
> - before FltSendMessage, timestamp 18:18:31.91
> - message reveiced in user mode, after GetOverlappedResult, timestamp
> 18:18:31:91
> - after FltSendMessage, timestamp 18:18:31.97,
> STATUS_THREAD_IS_TERMINATING
> - message processed in user mode, before FilterReplyMessage, timestamp
> 18:18:32.24
> - after FilterReplyMessage, timestamp 18:18:32.24,
> ERROR_FLT_NO_WAITER_FOR_REPLY
>
> The strange part is, that we precisely trace the start and termination of
> all of our message processing threads. In this particular case, all
> threads where gracefully terminated upon request (upon signaling that
> request-terminate event that WaitForMultipleObjects was waiting on in each
> thread) LONG after this event. All threads terminated between timestamp
> 18:39:27.82 and 18:39:27.84, so almost 11 minutes later (after we found
> during the test this particular error at FltSendMessage). Also, beside
> this error, the whole stress testing, all other messages / requests both
> before and after this particular request where processed correctly, both
> in the minifilter and in all of the user mode threads.
>
> any ideas, opinions, similar behaviors observed?
>
> thank you very much,
>
> Sandor LUKACS
> Virus Analyst, SOFTWIN
>

Thank you,

frank wrote:

In case you observe this more often on Vista than on XP, I would guess
STATUS_THREAD_IS_TERMINATING means that *not* your receiving usermode thread
is terminating…but instead the thread in whose context you call
FltSendMessage is terminating.

This would be related to the topic “how to handle canceling of synchronous
I/O”.

I don’t think, although I can’t 100% exclude that this would be the
case. More exactly, consider
a very simple stress testing scenario, consisting of let’s say 10 batch
scripts which try to copy
all files (*.*) from directories, containing only viruses. Now, if I run
this stress scenario on
our filter, obviously, we will be called at each Create/Open request on
the source files, and
we will deny the open (with STATUS_ACCESS_DENIED). The scanning is in
user mode and
for this we need to send the requests up from the minifilter. Now, if I
don’t receive an answer
in a timely fashion (and have no second chance option, like in the case
of this testing/debugging)
I must allow that create. The result is obvious, one missed file, which
actually gets copied
over to the destination directory. Now, if this would be a synchronous
I/O that is going to
be canceled, then, I think, that copy shall NOT go on. But, that copy
does go on after this
STATUS_THREAD_IS_TERMINATING, and after that copy many other came (which are
on the other hand successfully blocked). Strange is also, that one such
BAT would NOT use
multiple threads AFAIK, and, the primary thread is not closed.

And yes, this seems to be Vista specific (although we haven’t stressed
it by far as much on
XP, as on Vista).

Any other idea / suggestion?

thank you,

Sandor LUKACS
Virus Analyst, SOFTWIN

wrote news:xxxxx@ntfsd…
>
>> Hi,
>>
>> I have just analyzed once more a very strange scenario with FltSendMessage
>> in our minifilter. We reproduced it three or four times in the last days,
>> but each of them was a during a quite long and heavy stress testing, so
>> its very hard to reproduce.
>>
>> We are using FltSendMessage to send requests to a user mode process, up to
>> N requests in parallel at any time (properly synchronized using a counter
>> and a spinlock around the counter), and we have N threads in user mode
>> waiting each of them with FilterGetMessage to receive one message (using a
>> per-thread overlapped structure, and WaitForMultipleObjects to wait on the
>> overlapped structure to signal message-received and a termination event,
>> then GetOverlappedResult if necessary), then process it and send a reply
>> using FilterReplyMessage.
>> The minifilter waits using FltSendMessage up to 11000 ms to receive an
>> answer. If no answer is received, it timeouts with STATUS_TIMEOUT.
>>
>> We usually send tens of thousands of requests (send message - wait answer
>> cycles) without problems. Then, very rarely, the FltSendMessage call
>> returns suddenly STATUS_THREAD_IS_TERMINATING, usually very quickly (so,
>> it is NOT a timeout), even before we could process the message in user
>> mode, WITHOUT the user mode thread that processes the message being
>> actually terminated at all (or any other of our user mode processing
>> threads). Then, we get to process the message in user mode, we call
>> FilterReplyMessage, which returns ERROR_FLT_NO_WAITER_FOR_REPLY (normally
>> unexpected by us, however logical in this situation).
>>
>> For example, on of our concrete cases is:
>> - before FltSendMessage, timestamp 18:18:31.91
>> - message reveiced in user mode, after GetOverlappedResult, timestamp
>> 18:18:31:91
>> - after FltSendMessage, timestamp 18:18:31.97,
>> STATUS_THREAD_IS_TERMINATING
>> - message processed in user mode, before FilterReplyMessage, timestamp
>> 18:18:32.24
>> - after FilterReplyMessage, timestamp 18:18:32.24,
>> ERROR_FLT_NO_WAITER_FOR_REPLY
>>
>> The strange part is, that we precisely trace the start and termination of
>> all of our message processing threads. In this particular case, all
>> threads where gracefully terminated upon request (upon signaling that
>> request-terminate event that WaitForMultipleObjects was waiting on in each
>> thread) LONG after this event. All threads terminated between timestamp
>> 18:39:27.82 and 18:39:27.84, so almost 11 minutes later (after we found
>> during the test this particular error at FltSendMessage). Also, beside
>> this error, the whole stress testing, all other messages / requests both
>> before and after this particular request where processed correctly, both
>> in the minifilter and in all of the user mode threads.
>>
>> any ideas, opinions, similar behaviors observed?
>>
>> thank you very much,
>>
>> Sandor LUKACS
>> Virus Analyst, SOFTWIN
>>
>>
>
>
>
> —
> Questions? First check the IFS FAQ at https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@bitdefender.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>

The ability for FltSendMessage to break out of a wait if the thread is
terminating was introduced only in Vista. In XP, the thread will
continue to hang.

At any rate - FltSendMessage() will break out of the wait if the thread
in whose context is running is being terminated, not the waiter thread.
This is to let the thread progress in its termination - or hangs will be
perceived. Your design should deal with this case - or you have a kernel
thread that does FltSendMessage() with timeouts.

On a different note, with multiple user threads, it would be much more
efficient to use a completion port that is associated that you poll.

Ravi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Sandor LUKACS
Sent: Tuesday, June 05, 2007 5:42 AM
To: Windows File Systems Devs Interest List
Subject: Re: [ntfsd] STATUS_THREAD_IS_TERMINATING returned by
FltSendMessage

Thank you,

frank wrote:

In case you observe this more often on Vista than on XP, I would guess

STATUS_THREAD_IS_TERMINATING means that *not* your receiving usermode
thread
is terminating…but instead the thread in whose context you call
FltSendMessage is terminating.

This would be related to the topic “how to handle canceling of
synchronous
I/O”.

I don’t think, although I can’t 100% exclude that this would be the
case. More exactly, consider
a very simple stress testing scenario, consisting of let’s say 10 batch
scripts which try to copy
all files (*.*) from directories, containing only viruses. Now, if I run

this stress scenario on
our filter, obviously, we will be called at each Create/Open request on
the source files, and
we will deny the open (with STATUS_ACCESS_DENIED). The scanning is in
user mode and
for this we need to send the requests up from the minifilter. Now, if I
don’t receive an answer
in a timely fashion (and have no second chance option, like in the case
of this testing/debugging)
I must allow that create. The result is obvious, one missed file, which
actually gets copied
over to the destination directory. Now, if this would be a synchronous
I/O that is going to
be canceled, then, I think, that copy shall NOT go on. But, that copy
does go on after this
STATUS_THREAD_IS_TERMINATING, and after that copy many other came (which
are
on the other hand successfully blocked). Strange is also, that one such
BAT would NOT use
multiple threads AFAIK, and, the primary thread is not closed.

And yes, this seems to be Vista specific (although we haven’t stressed
it by far as much on
XP, as on Vista).

Any other idea / suggestion?

thank you,

Sandor LUKACS
Virus Analyst, SOFTWIN

wrote news:xxxxx@ntfsd…
>
>> Hi,
>>
>> I have just analyzed once more a very strange scenario with
FltSendMessage
>> in our minifilter. We reproduced it three or four times in the last
days,
>> but each of them was a during a quite long and heavy stress testing,
so
>> its very hard to reproduce.
>>
>> We are using FltSendMessage to send requests to a user mode process,
up to
>> N requests in parallel at any time (properly synchronized using a
counter
>> and a spinlock around the counter), and we have N threads in user
mode
>> waiting each of them with FilterGetMessage to receive one message
(using a
>> per-thread overlapped structure, and WaitForMultipleObjects to wait
on the
>> overlapped structure to signal message-received and a termination
event,
>> then GetOverlappedResult if necessary), then process it and send a
reply
>> using FilterReplyMessage.
>> The minifilter waits using FltSendMessage up to 11000 ms to receive
an
>> answer. If no answer is received, it timeouts with STATUS_TIMEOUT.
>>
>> We usually send tens of thousands of requests (send message - wait
answer
>> cycles) without problems. Then, very rarely, the FltSendMessage call
>> returns suddenly STATUS_THREAD_IS_TERMINATING, usually very quickly
(so,
>> it is NOT a timeout), even before we could process the message in
user
>> mode, WITHOUT the user mode thread that processes the message being
>> actually terminated at all (or any other of our user mode processing
>> threads). Then, we get to process the message in user mode, we call
>> FilterReplyMessage, which returns ERROR_FLT_NO_WAITER_FOR_REPLY
(normally
>> unexpected by us, however logical in this situation).
>>
>> For example, on of our concrete cases is:
>> - before FltSendMessage, timestamp 18:18:31.91
>> - message reveiced in user mode, after GetOverlappedResult, timestamp

>> 18:18:31:91
>> - after FltSendMessage, timestamp 18:18:31.97,
>> STATUS_THREAD_IS_TERMINATING
>> - message processed in user mode, before FilterReplyMessage,
timestamp
>> 18:18:32.24
>> - after FilterReplyMessage, timestamp 18:18:32.24,
>> ERROR_FLT_NO_WAITER_FOR_REPLY
>>
>> The strange part is, that we precisely trace the start and
termination of
>> all of our message processing threads. In this particular case, all
>> threads where gracefully terminated upon request (upon signaling that

>> request-terminate event that WaitForMultipleObjects was waiting on in
each
>> thread) LONG after this event. All threads terminated between
timestamp
>> 18:39:27.82 and 18:39:27.84, so almost 11 minutes later (after we
found
>> during the test this particular error at FltSendMessage). Also,
beside
>> this error, the whole stress testing, all other messages / requests
both
>> before and after this particular request where processed correctly,
both
>> in the minifilter and in all of the user mode threads.
>>
>> any ideas, opinions, similar behaviors observed?
>>
>> thank you very much,
>>
>> Sandor LUKACS
>> Virus Analyst, SOFTWIN
>>
>>
>
>
>
> —
> Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17
>
> You are currently subscribed to ntfsd as: xxxxx@bitdefender.com
> To unsubscribe send a blank email to xxxxx@lists.osr.com
>
>


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

>“Ravisankar Pudipeddi” wrote
>news:xxxxx@ntfsd…
>On a different note, with multiple user threads, it would be much more
>efficient to use a completion port that is associated that you poll.

Sorry, but waiting overlapped on a CommunicationPort is polling?

My comment was: if you use an *i/o completion* port that is associated
with the communication port handle, instead of each user thread waiting
on the communication port directly, your app would be more scalable -
instead of multiple threads waking up on a single processor machine for
instance and fighting for CPU.
Ravi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of frank
Sent: Tuesday, June 05, 2007 3:22 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] STATUS_THREAD_IS_TERMINATING returned by
FltSendMessage

“Ravisankar Pudipeddi” wrote
>news:xxxxx@ntfsd…
>On a different note, with multiple user threads, it would be much more
>efficient to use a completion port that is associated that you poll.

Sorry, but waiting overlapped on a CommunicationPort is polling?


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com

Thank you very much Ravi, Frank.

At any rate - FltSendMessage() will break out of the wait if the thread in whose context is running is
being terminated, not the waiter thread. This is to let the thread progress in its termination - or
hangs will be perceived. Your design should deal with this case - or you have a kernel thread that
does FltSendMessage() with timeouts.
This means, that in our simplest scenario (that with N batch scripts that run COPY) actually one of the COPY threads shall be terminating to generate STATUS_THREAD_IS_TERMINATING on FltSendMessage. As far as I know, this is not the case, but I will double check it again.

(And yes, we actually have both implementations, with and without I/O completion ports.)

have a nice day,

Sandor

Ah yes, misunderstanding on my side. Apologies and thanks for the comment.

“Ravisankar Pudipeddi” wrote
news:xxxxx@ntfsd…
My comment was: if you use an i/o completion port that is associated
with the communication port handle, instead of each user thread waiting
on the communication port directly, your app would be more scalable -
instead of multiple threads waking up on a single processor machine for
instance and fighting for CPU.
Ravi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of frank
Sent: Tuesday, June 05, 2007 3:22 PM
To: Windows File Systems Devs Interest List
Subject: Re:[ntfsd] STATUS_THREAD_IS_TERMINATING returned by
FltSendMessage

>“Ravisankar Pudipeddi” wrote
>news:xxxxx@ntfsd…
>On a different note, with multiple user threads, it would be much more
>efficient to use a completion port that is associated that you poll.

Sorry, but waiting overlapped on a CommunicationPort is polling?


Questions? First check the IFS FAQ at
https://www.osronline.com/article.cfm?id=17

You are currently subscribed to ntfsd as: xxxxx@windows.microsoft.com
To unsubscribe send a blank email to xxxxx@lists.osr.com