fltSendMessage not always get the response

I sow that there are couple of threads on this , but non of than can fix my issue. I will gladly accept any help on this.
So the problem:
as usual I have a mini filter driver and user mode application, which are communicating using “fltxxx/filterxxx” interface.the driver side of code is similar to this:

#define REL_TIMEOUT_IN_SEC(Time) (Time * -1 * ((LONGLONG) 1 * 10 * 1000 * 1000))
timeout.QuadPart = REL_TIMEOUT_IN_SEC(10);
status = FltSendMessage(Filter
, &ClientPort
, rawMsg, sizeof(RawMessage), outBuffer, &outputLength, &timeout);
where outBuffer is plain memory of 4096 bytes length,
and the outputLength is
outputLength = 4096+ sizeof(FILTER_REPLY_HEADER);
( i also tried with only with outputLength =4096 , but no difference in terms of my problem)

and here is the user land code

while (run_)
{
MessageClientWrapper recivedMessage;
HRESULT r = FilterGetMessage(port_, &recivedMessage.header_, 6500, NULL);
if (FAILED(r))
{ … error handling… ; continue}
switch( recivedMessage.msg_.type_)
{
case type1:
f1(recivedMessage)
… // other cases
}
}
struct Replay
{
FILTER_REPLY_HEADER header_;
char fileHeader_[4096];
};
void f1(MessageClientWrapper& op)
{
Replay replay = { { 0, 0 }, { 0 } };
replay.header_.MessageId = op.header_.MessageId;
replay.header_.Status = 0;
replay.fileHeader_ =
HRESULT result = FilterReplyMessage(port_, (PFILTER_REPLY_HEADER)&replay, sizeof(FILTER_REPLY_HEADER) + 4096);
}

for some portion of messages the communication works fine, but from time to time I’m getting status_timeout in driver code and STATUS_FLT_NO_WAITER_FOR_REPLY in user code. there is no way that replay preparation is taking more than 10 seconds.
can pleas anyone tell what I’m doing wrong ?