CPU utilization goes up to 100% using NT's Executive built-in worker thread pool

Palak_Kapoor · January 15, 2018, 4:20am

I’m trying to create worker threads in my disk class filter driver using NT’s Executive
built- in worker thread pool to process read/write commands from a custom IOCTL sent from the application. While doing this, the CPU utilization goes up to 100% which leads to a system hang. Kindly suggest what can be done to avoid this.

Palak_Kapoor · January 15, 2018, 4:26am

I am using the first approach mentioned in the link below to create worker threads

http://www.osronline.com/article.cfm?id=65

Jamey_Kirby · January 15, 2018, 11:17am

Need a little more detail. What are you waiting on in the thread?

On Mon, Jan 15, 2018 at 4:26 AM xxxxx@gmail.com
wrote:

> I am using the first approach mentioned in the link below to create worker
> threads
>
> http://www.osronline.com/article.cfm?id=65
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Jamey_Kirby · January 15, 2018, 11:19am

Also, if you using work routines rather than private thread, you can run
into to issues processing IO in a worker. The kernel maintains a limited
number of worker threads, and it could be easy to use them all up. I
suggest using PsCreateSystemThread() and manage your own threads.

On Mon, Jan 15, 2018 at 11:16 AM Jamey Kirby wrote:

> Need a little more detail. What are you waiting on in the thread?
>
>
> On Mon, Jan 15, 2018 at 4:26 AM xxxxx@gmail.com <
> xxxxx@lists.osr.com> wrote:
>
>> I am using the first approach mentioned in the link below to create
>> worker threads
>>
>> http://www.osronline.com/article.cfm?id=65
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: <
>> http://www.osronline.com/showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
></http:>

Tim_Roberts · January 15, 2018, 11:24am

xxxxx@gmail.com wrote:

I’m trying to create worker threads in my disk class filter driver using NT’s Executive
built- in worker thread pool to process read/write commands from a custom IOCTL sent from the application. While doing this, the CPU utilization goes up to 100% which leads to a system hang. Kindly suggest what can be done to avoid this.

How are you waiting for the next request in your thread?Â If, for
example, you pass an event handle to KeWaitForSingleObject with a zero
timeout, you would get 100% CPU utilization.Â You need to make sure your
thread releases the CPU at some point.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Gabriel_Bercea · January 15, 2018, 1:24pm

Do not use work items for your task. Create you’re enjoying dedicated
threads to handle your tasks.

Gabriel
www.kasardia.com

On Jan 15, 2018 17:24, “xxxxx@probo.com” wrote:

> xxxxx@gmail.com wrote:
> > I’m trying to create worker threads in my disk class filter driver using
> NT’s Executive
> > built- in worker thread pool to process read/write commands from a
> custom IOCTL sent from the application. While doing this, the CPU
> utilization goes up to 100% which leads to a system hang. Kindly suggest
> what can be done to avoid this.
>
> How are you waiting for the next request in your thread? If, for
> example, you pass an event handle to KeWaitForSingleObject with a zero
> timeout, you would get 100% CPU utilization. You need to make sure your
> thread releases the CPU at some point.
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Palak_Kapoor · January 17, 2018, 12:59am

I’m sending read/write commands from a custom device ioctl in the application to my driver. In the disk class filter driver I’m marking IRP as pending, initializing notification event using KeInitializeEvent and using NT’s Executive built-in work item to start a thread routine. In the thread start routine I’m using KeWaitForSingleObject and then doing IoSkipCurrentIrpStackLocation. Then using IoCallDriver IRP is sent to the driver associated with a specified device object.

If I pass any other value in KeWaitForSingleObject except 0 then a crash happens at line IoSkipCurrentIrpStackLocation with error IRQL not less or equal. If 0 is used then the system hangs. In both the cases the CPU utilization is around 100%. The same thing happens if PsCreateSystemThread is used.

The code in filter driver is as follows:

pstThreadExtension = ExAllocatePoolWithTag(NonPagedPool, sizeof(THREAD_EXTENSION), POOL_TAG_THREAD_EXTENSION);
pstThreadExtension->pstDeviceExtension = pstDeviceExtension;
pstThreadExtension->pstIrp= pstIrp;
KeInitializeEvent(&pstThreadExtension->stNotificationEvent, NotificationEvent, FALSE);

pstThreadExtension->pWorkItem = ExAllocatePool(NonPagedPool, sizeof(WORK_QUEUE_ITEM));
pstThreadExtension->WorkItemFlag = TRUE;

ExInitializeWorkItem((PIO_WORKITEM)pstThreadExtension->pWorkItem, ThreadStartRoutine, pstThreadExtension);
ExQueueWorkItem(pstThreadExtension->pWorkItem, DelayedWorkQueue);
IoMarkIrpPending(pstIrp);
KeSetEvent(&pstThreadExtension->stNotificationEvent, IO_NO_INCREMENT, FALSE);

VOID
ThreadStartRoutine(
__in PVOID pvThreadContext
)
{

PTHREAD_EXTENSION pstThreadExtension = (PTHREAD_EXTENSION) pvThreadContext;
PDEVICE_EXTENSION pstDeviceExtension = NULL;

ASSERT(pstThreadExtension != NULL);
pstDeviceExtension = pstThreadExtension->pstDeviceExtension;
KeWaitForSingleObject(&pstThreadExtension->stNotificationEvent, Executive, KernelMode, FALSE, NULL);

IoSkipCurrentIrpStackLocation(pstThreadExtension->pstIrp);

IoCallDriver(pstDeviceExtension->pstNextLowerDriver, pstThreadExtension->pstIrp);
IoReleaseRemoveLock(&pstDeviceExtension->stRemoveLock, pstThreadExtension->pstIrp);
ExFreePool(pstThreadExtension->pWorkItem);
ExFreePoolWithTag(pstThreadExtension, POOL_TAG_THREAD_EXTENSION);

return;
}

Tim_Roberts · January 17, 2018, 2:29pm

xxxxx@gmail.com wrote:

I’m sending read/write commands from a custom device ioctl in the application to my driver. In the disk class filter driver I’m marking IRP as pending, initializing notification event using KeInitializeEvent and using NT’s Executive built-in work item to start a thread routine. In the thread start routine I’m using KeWaitForSingleObject and then doing IoSkipCurrentIrpStackLocation. Then using IoCallDriver IRP is sent to the driver associated with a specified device object.

If I pass any other value in KeWaitForSingleObject except 0 then a crash happens at line IoSkipCurrentIrpStackLocation with error IRQL not less or equal. If 0 is used then the system hangs. In both the cases the CPU utilization is around 100%.

Do you mean 0 or do you mean NULL?Â One of the oddities of
KeWaitForSingleObject is that those have two very different meanings.Â
If you pass NULL for the Timeout, it waits forever,Â If you pass a
LARGE_INTEGER containing 0, it does not wait at all.

You should probably call IoMarkIrpPending before queueing the work
item.Â Yes, you’re supposedly waiting for the event before you pass the
IRP along, but if you marked the IRP first, you wouldn’t have to wait.

What does your code do after it calls KeSetEvent?Â Are you returning
STATUS_PENDING?Â Is it possible your code path is accidentally calling
IoCompleteRequest?

Why spawn off a work item for this?Â Is there lots more you aren’t
showing us?Â You aren’t setting a completion routine (and you CAN’T do
so with IoSkipCurrentStackLocation), so what’s the point of the
callback?Â Are you just delaying?

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Jamey_Kirby · January 18, 2018, 9:04am

Also, work items are not the best way to do this. Use a system thread. Work
items are for infrequent use. You’ll have other issues down the road.

On Wed, Jan 17, 2018, 2:29 PM xxxxx@probo.com wrote:

> xxxxx@gmail.com wrote:
> > I’m sending read/write commands from a custom device ioctl in the
> application to my driver. In the disk class filter driver I’m marking IRP
> as pending, initializing notification event using KeInitializeEvent and
> using NT’s Executive built-in work item to start a thread routine. In the
> thread start routine I’m using KeWaitForSingleObject and then doing
> IoSkipCurrentIrpStackLocation. Then using IoCallDriver IRP is sent to the
> driver associated with a specified device object.
> >
> > If I pass any other value in KeWaitForSingleObject except 0 then a
> crash happens at line IoSkipCurrentIrpStackLocation with error IRQL not
> less or equal. If 0 is used then the system hangs. In both the cases the
> CPU utilization is around 100%.
>
> Do you mean 0 or do you mean NULL? One of the oddities of
> KeWaitForSingleObject is that those have two very different meanings.
> If you pass NULL for the Timeout, it waits forever, If you pass a
> LARGE_INTEGER containing 0, it does not wait at all.
>
> You should probably call IoMarkIrpPending before queueing the work
> item. Yes, you’re supposedly waiting for the event before you pass the
> IRP along, but if you marked the IRP first, you wouldn’t have to wait.
>
> What does your code do after it calls KeSetEvent? Are you returning
> STATUS_PENDING? Is it possible your code path is accidentally calling
> IoCompleteRequest?
>
> Why spawn off a work item for this? Is there lots more you aren’t
> showing us? You aren’t setting a completion routine (and you CAN’T do
> so with IoSkipCurrentStackLocation), so what’s the point of the
> callback? Are you just delaying?
>
> –
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Gabriel_Bercea · January 18, 2018, 11:58am

As I read it your problem is an IRP processing issue not a 100% cpu issue (
which it also is ).
You need to pay attention to one aspect that might seem trivial, but
nontheless a lot of people seem to get wrong.
If you mark an IRP pending you must return STATUS_PENDING from that
dispatch routine.
In your case it is not clear if you are doing that or not.

In fact it is not really clear what you are doing there at all.

If you plan to pend your IRP in the dispatch so you can immediately send it
down in the worker thread I am not sure why would you do that in the first
place. Just “skip and forget” in the dispatch routine.
There are “tons” of ways to play around with IRP processing and how send
them down the stack so I am not really sure what you want to do.

Again if you are sending from your UM application to your CDO then you
don’t really need to do much. If you want to process that async, pend the
irp, return STATUS_PENDING from your dispatch routine,
and from a worker thread all you need to do is call:
IoGetCurrentIrpStackLocation(irp) -> optional if you have any need for the
irpSp
IoCompleteRequest() -> this will release your UM application

If you are processing this in the some storage stack then the processing or
the IRP is not trivial anymore and you need to figure out what you want to
do and give us more details.

Gabriel.
www.kasardia.com

On Thu, Jan 18, 2018 at 3:03 PM, xxxxx@gmail.com
wrote:

> Also, work items are not the best way to do this. Use a system thread.
> Work items are for infrequent use. You’ll have other issues down the road.
>
> On Wed, Jan 17, 2018, 2:29 PM xxxxx@probo.com wrote:
>
>> xxxxx@gmail.com wrote:
>> > I’m sending read/write commands from a custom device ioctl in the
>> application to my driver. In the disk class filter driver I’m marking IRP
>> as pending, initializing notification event using KeInitializeEvent and
>> using NT’s Executive built-in work item to start a thread routine. In the
>> thread start routine I’m using KeWaitForSingleObject and then doing
>> IoSkipCurrentIrpStackLocation. Then using IoCallDriver IRP is sent to the
>> driver associated with a specified device object.
>> >
>> > If I pass any other value in KeWaitForSingleObject except 0 then a
>> crash happens at line IoSkipCurrentIrpStackLocation with error IRQL not
>> less or equal. If 0 is used then the system hangs. In both the cases the
>> CPU utilization is around 100%.
>>
>> Do you mean 0 or do you mean NULL? One of the oddities of
>> KeWaitForSingleObject is that those have two very different meanings.
>> If you pass NULL for the Timeout, it waits forever, If you pass a
>> LARGE_INTEGER containing 0, it does not wait at all.
>>
>> You should probably call IoMarkIrpPending before queueing the work
>> item. Yes, you’re supposedly waiting for the event before you pass the
>> IRP along, but if you marked the IRP first, you wouldn’t have to wait.
>>
>> What does your code do after it calls KeSetEvent? Are you returning
>> STATUS_PENDING? Is it possible your code path is accidentally calling
>> IoCompleteRequest?
>>
>> Why spawn off a work item for this? Is there lots more you aren’t
>> showing us? You aren’t setting a completion routine (and you CAN’T do
>> so with IoSkipCurrentStackLocation), so what’s the point of the
>> callback? Are you just delaying?
>>
>> –
>> Tim Roberts, xxxxx@probo.com
>> Providenza & Boekelheide, Inc.
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> Visit the list online at: http:>> showlists.cfm?list=ntdev>
>>
>> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
>> software drivers!
>> Details at http:
>>
>> To unsubscribe, visit the List Server section of OSR Online at <
>> http://www.osronline.com/page.cfm?name=ListServer>
>>
> — NTDEV is sponsored by OSR Visit the list online at: MONTHLY seminars
> on crash dump analysis, WDF, Windows internals and software drivers!
> Details at To unsubscribe, visit the List Server section of OSR Online at

–
Bercea. G.</http:></http:>

Palak_Kapoor · January 20, 2018, 6:46am

I’m doing this for an experiment purpose to see the reliability in case of PsCreateSystemThread and WorkItem. But in both the cases I’m getting 100% CPU utilization and system hang. Even if I pass NULL in KeWaitForSingleObject, CPU utilization is 100%. And yes, I’m returning PENDING status.

The entire code in disk class filter driver is as follows:

NTSTATUS
FilterDispatchIo(
PDEVICE_OBJECT pstDeviceObject,
PIRP pstIrp

PTHREAD_EXTENSION pstThreadExtension = NULL;
PDEVICE_EXTENSION pstDeviceExtension = (PDEVICE_EXTENSION)
pstDeviceObject->DeviceExtension;

do
{
IoAcquireRemoveLock(&pstDeviceExtension->stRemoveLock, pstIrp);
ntStatus = STATUS_SUCCESS;
pstThreadExtension = ExAllocatePoolWithTag(NonPagedPool, sizeof(THREAD_EXTENSION),
POOL_TAG_THREAD_EXTENSION);
if (NULL == pstThreadExtension)
{
break;
}

RtlZeroMemory(pstThreadExtension, sizeof(THREAD_EXTENSION));
pstThreadExtension->pstDeviceExtension = pstDeviceExtension;
pstThreadExtension->pstIrp= pstIrp;
KeInitializeEvent(&pstThreadExtension->stNotificationEvent, NotificationEvent,
FALSE);

if (STATUS_SUCCESS != PsCreateSystemThread(&hThreadHandle, \
THREAD_ALL_ACCESS, NULL, NULL, NULL, ThreadStartRoutine, pstThreadExtension))
{
break;
}

ZwClose(hThreadHandle);

IoMarkIrpPending(pstIrp);
KeSetEvent(&pstThreadExtension->stNotificationEvent, IO_NO_INCREMENT, FALSE);
ntStatus = STATUS_PENDING;
while (FALSE);
return ntStatus;

VOID
ThreadStartRoutine(
__in PVOID pvThreadContext
)
{
PTHREAD_EXTENSION pstThreadExtension = (PTHREAD_EXTENSION) pvThreadContext;
PDEVICE_EXTENSION pstDeviceExtension = NULL;
LARGE_INTEGER timeout = { 0 };

ASSERT(pstThreadExtension != NULL);
pstDeviceExtension = pstThreadExtension->pstDeviceExtension;
KeWaitForSingleObject(&pstThreadExtension->stNotificationEvent, Executive, KernelMode, FALSE,
&timeout);
IoSkipCurrentIrpStackLocation(pstThreadExtension->pstIrp);
IoCallDriver(pstDeviceExtension->pstNextLowerDriver, pstThreadExtension->pstIrp);
IoReleaseRemoveLock(&pstDeviceExtension->stRemoveLock, pstThreadExtension->pstIrp);

ExFreePoolWithTag(pstThreadExtension, POOL_TAG_THREAD_EXTENSION);

return;
}

Palak_Kapoor · January 20, 2018, 6:51am

This experiment is for processing asynchronous read/write commands.

Jamey_Kirby · January 20, 2018, 8:41am

So much wrong here.

On Sat, Jan 20, 2018, 6:55 AM xxxxx@gmail.com
wrote:

> This experiment is for processing asynchronous read/write commands.
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:>

Tim_Roberts · January 20, 2018, 6:16pm

On Jan 20, 2018, at 3:46 AM, xxxxx@gmail.com wrote:
>
> I’m doing this for an experiment purpose to see the reliability in case of PsCreateSystemThread and WorkItem. But in both the cases I’m getting 100% CPU utilization and system hang. Even if I pass NULL in KeWaitForSingleObject, CPU utilization is 100%. And yes, I’m returning PENDING status.

KeWaitForSingleObject, when given a timeout value of 0, always returns immediately. It does not wait for anything. If it was able to acquire the resource, it returns STATUS_SUCCESS. Otherwise, it returns STATUS_TIMEOUT without acquiring the resource. You’re not checking the return value, so you don’t have any clue whether you acquired the lock or not, which means you don’t know if you need to release the lock or not.

Where is it hanging? Have you broken in with the kernel debugger to check the processor states to see where the hang is?
—
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Gabriel_Bercea · January 21, 2018, 5:18am

This is simply wrong.
First of all it just bugs me that you create a thread each time you get
your dispatch called.
Just create this worker thread or a set of threads from DriverEntry and
have it/them process pending Irps through an internal queuing mechanism,
not the way you do it, that is a thread per dispatch call. That alone could
cause 100% CPU usage. It simply hurts to watch.
Moving along it would seem that you do not really understand how
events/threads/scheduling synchronization and async behavior works, or at
least you are making some assumptions which are simply not right.
What do I mean by that.
In the dispatch routine you are:

Creating a thread
Marking the IRP pending
Setting the event
Returning pending status

while on the thread:
a) Waiting for the event with 0 - which is not waiting at all
b) Skip stack location
c) IoCallDriver

The actions from the 2 threads could happen in any given sequenced order: 1
-> a -> b -> c -> 2 -> etc… and so on. or 1 -> 2 -> a -> b -> 3 -> 4 -> c
etc…

Your assumption is either that the thread will start after you mark your
IRP pending, which is not true. As soon as PsCreateSystemThread returns you
can consider your thread already running, if not in some cases could
already be over. Thus by doing this you will call the lower drivers with an
IRP which is not pending but worse you are sending the IRP down the stack
from the worker, but in the dispatch routine you continue to “modify” the
IRP ( which you don’t own anymore since the worker sent it down), by
marking it pending, again while the thread has skipped and called the lower
drivers.
I hope you understand the illegalities you are committing here.

I am not really sure what to advise you from here. Just try to make
everything work from within the dispatch routine only. Use a completion
routine and then skip and forget. And do your processing in the completion
callback. Don’t forget to mark the IRP pending from there though if you are
taking this exact approach.

Good luck,
Gabriel

On Sun, Jan 21, 2018 at 12:15 AM, xxxxx@probo.com
wrote:

> On Jan 20, 2018, at 3:46 AM, xxxxx@gmail.com
> wrote:
> >
> > I’m doing this for an experiment purpose to see the reliability in case
> of PsCreateSystemThread and WorkItem. But in both the cases I’m getting
> 100% CPU utilization and system hang. Even if I pass NULL in
> KeWaitForSingleObject, CPU utilization is 100%. And yes, I’m returning
> PENDING status.
>
> KeWaitForSingleObject, when given a timeout value of 0, always returns
> immediately. It does not wait for anything. If it was able to acquire the
> resource, it returns STATUS_SUCCESS. Otherwise, it returns STATUS_TIMEOUT
> without acquiring the resource. You’re not checking the return value, so
> you don’t have any clue whether you acquired the lock or not, which means
> you don’t know if you need to release the lock or not.
>
> Where is it hanging? Have you broken in with the kernel debugger to check
> the processor states to see where the hang is?
> —
> Tim Roberts, xxxxx@probo.com
> Providenza & Boekelheide, Inc.
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
>

–
Bercea. G.</http:></http:>

Palak_Kapoor · January 21, 2018, 5:32am

Thanks. I’ll try to rectify my code as per your suggestion.