Notifying user mode applications of a device interrupt without a DPC

Randy · June 19, 2017, 4:11pm

I currently have a driver whose job is to notify a waiting single-threaded user mode application of any interrupts sent by the device, which generates interrupts every 1ms. Currently, the driver’s implementation of the problem is pretty standard:

The user mode application waits on event for the next interrupt.
The interrupt handler will queue up a DPC if it recognizes an interrupt.
The DPC sets the event, waking up the user mode app.

Long story short, on older machines running Windows 7 or 10, the DPC scheduling latency (that is, the latency between when a DPC is scheduled and when it is executed) is sometimes large enough to miss the next interrupt or two, occurring on average once or twice per minute. For reference, I profiled the scheduling latency using KeQueryPerformanceCounter, which I believe is one of the right ways of going about that based on previous posts on this list. There is nothing else running on these machines, and I’ve set the resolution of the system clock to 1ms in the user mode app using the media API functions. Even unplugging and disabling all network adapters to reduce the number of DPCs from NDIS did not eliminate the problem.

What I would really like to do is get rid of this latency. My first idea was to get rid of the DPC, but the event APIs are not available at DIRQL AFAIK, and it’s generally frowned upon to do any work in the interrupt handler anyway. However, apparently some competitors are somehow magically not experiencing this latency, so I’m wondering if it really is possible to just not use DPCs. I was thinking about using the Interlocked* APIs somehow with possibly a spin lock (not in the interrupt handler though!), but I was wondering if there was another way since that sounds beyond tricky.

I do understand that I’m trying to do something that really shouldn’t be done, and that Windows isn’t a RTOS, but sometimes in the corporate world there’s not always the luxury to say no.

Thanks in advance for any help.

Jeffrey_Tippet_MSFT · June 19, 2017, 5:12pm

If your thread is sleeping, then you need to interact with the scheduler to make it runnable again. There’s not a way to interact with the scheduler from an ISR. At a minimum, you need to drop down to a DPC, so you can call KeSetEvent or whatever.

If there’s literally work to do every 1ms, and missing work is catastrophic, then the usermode thread should never sleep. Have it spin while it doesn’t see any work to do. You won’t need an event or any scheduling primitive; an interlocked flag that’s visible to both the ISR and the usermode thread is sufficient.

Spinning efficiently is harder than it looks; you need to avoid issuing too many interlocked operations, lest you overwhelm the memory bus. See also YieldProcessor.

If the work is bursty, you can write code that kicks into and out of the above polling mode. Use a Sleep to exit polling mode when there appears to be no work to do, and a DPC + KeSetEvent to kick the thread back into polling mode when more work arrives.

If the work is paced regularly, you can try using a usermode timer. But remember that the clock tick is itself not going to be better than 1ms (and may be worse on some hardware), so a timer has its own problems. If you set a 1ms timer, you might get unlucky and the timer tick is out of phase with the work by 990us. Then your thread only has 10us remaining in its deadline. Since it’s unlikely that your hardware’s clock is disciplined to the system’s clock, you’ll probably drift over time, which means you have to endure all possible phases, including the bad ones.

Note that the thread scheduling quantum on Windows is typically an order of magnitude greater than 1ms. So if your thread ever gets preempted by another CPU-bound thread, your thread will miss several deadlines. Depending on the severity of that, you’ll need to take variously aggressive measures to win the scheduler’s undivided attention. One aggressive possibility is to declare that you need a CPU all to yourself. That’s okay for an industrial control unit, not so okay for a consumer-marketed device that’ll ship to a million desktop users.

Although KeQueryPerformanceCounter is fine enough, it’s hard to get a big picture “feel” for how things are going with it. My favorite tool for this sort of analysis is WPA. With the right view, you can get it to show you a nice little visualization of the ISR, DPC, and usermode thread. It’ll show you how much latency there was from the ISR to the DPC. If there are outliers, you can look and see who’s code was hogging the CPU. WPA can also give you a very high-level hint at you what your competitors are doing.

1ms is pretty low. If it’s at all possible to redefine the problem, you’ll have better success if you can increase the period by an order of magnitude. Then you’re in the range of what the graphics & audio stacks manage to do with reasonable success rates across a wide diversity of commodity hardware. I’m not saying 1ms deadlines are out of the question, just that they’re high cost.

As you seem to have noticed, it’s not easy to extract RTOS guarantees from a non-RTOS kernel. You can never reach the point where you miss zero deadlines. It’s a game of reducing the % of deadlines that you miss, and increasing the % of hardware configurations that work “good enough”. You’ll want to ask yourself “how much CPU am I prepared to waste in order to miss x% fewer deadlines?” As well as “how many constraints can I impose on the system?” and “am I interested in baby-sitting a fragile algorithm that is specifically tuned for the current scheduling details of the Windows kernel?”

Randy · June 19, 2017, 5:56pm

Thanks for the quick response, Jeff.

I’ll check out using WPA for measuring the latency. I was ignorant of the dpcisr option, so I’ll give that a spin.

I’ll go ahead and attempt to use an interlocked flag then and experiment with the best way to wait efficiently. I’m not sure the UM timer would work out for the reasons you listed, but I’ll look more into that as well.

MBond · June 19, 2017, 7:21pm

It is possible to do many things but

However, apparently some competitors are somehow magically not experiencing this latency

Whatever you have been told in this regard is obviously not true. It is entirely possible that your HW / driver are of poor quality, but having a scheduling latency of a few ms is certainly not abnormal.

Modern hardware /drivers where performance is an issue will attempt to coalesce interrupts and even alternate between interrupt and polling mode depending on the data rate. The problem you have is not the presence of a DPC per se, but what you are doing in it almost assuredly.

A better description of what your environment looks like will elicit better answers, but for a start using event to communicate with UM is usually the wrong approach. Given that you say ?every 1 ms? we understand that you are working with a deterministic data rate, and given that you have tried disabling things, you can have some control over the operating environment. Given this, please describe some of these things so that we can help you better

Sent from Mailhttps: for Windows 10

From: xxxxx@gmail.com mailto:xxxxx
Sent: June 19, 2017 4:11 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] Notifying user mode applications of a device interrupt without a DPC

I currently have a driver whose job is to notify a waiting single-threaded user mode application of any interrupts sent by the device, which generates interrupts every 1ms. Currently, the driver’s implementation of the problem is pretty standard:

- The user mode application waits on event for the next interrupt.
- The interrupt handler will queue up a DPC if it recognizes an interrupt.
- The DPC sets the event, waking up the user mode app.

Long story short, on older machines running Windows 7 or 10, the DPC scheduling latency (that is, the latency between when a DPC is scheduled and when it is executed) is sometimes large enough to miss the next interrupt or two, occurring on average once or twice per minute. For reference, I profiled the scheduling latency using KeQueryPerformanceCounter, which I believe is one of the right ways of going about that based on previous posts on this list. There is nothing else running on these machines, and I’ve set the resolution of the system clock to 1ms in the user mode app using the media API functions. Even unplugging and disabling all network adapters to reduce the number of DPCs from NDIS did not eliminate the problem.

What I would really like to do is get rid of this latency. My first idea was to get rid of the DPC, but the event APIs are not available at DIRQL AFAIK, and it’s generally frowned upon to do any work in the interrupt handler anyway. However, apparently some competitors are somehow magically not experiencing this latency, so I’m wondering if it really is possible to just not use DPCs. I was thinking about using the Interlocked* APIs somehow with possibly a spin lock (not in the interrupt handler though!), but I was wondering if there was another way since that sounds beyond tricky.

I do understand that I’m trying to do something that really shouldn’t be done, and that Windows isn’t a RTOS, but sometimes in the corporate world there’s not always the luxury to say no.

Thanks in advance for any help.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Randy · June 19, 2017, 9:07pm

>It is possible to do many things but

>However, apparently some competitors are somehow magically not experiencing this
>latency
>

Whatever you have been told in this regard is obviously not true. It is
entirely possible that your HW / driver are of poor quality, but having a
scheduling latency of a few ms is certainly not abnormal.

With respect, this was the type of answer I was hoping to avoid. I understand that my request is unusual, and I have a good feel of the bottlenecks of the driver after strenuously profiling it.

Scott_Noone_OSR · June 20, 2017, 9:36am

Note that you can also add your own custom events to the trace. So, for
example, you can log an event each time you miss your deadline and then use
that as a guide while looking through the resulting data. See here:

https://www.osr.com/nt-insider/2015-issue1/happiness-xperf/

-scott
OSR
@OSRDrivers

wrote in message news:xxxxx@ntdev…

Thanks for the quick response, Jeff.

I’ll check out using WPA for measuring the latency. I was ignorant of the
dpcisr option, so I’ll give that a spin.

I’ll go ahead and attempt to use an interlocked flag then and experiment
with the best way to wait efficiently. I’m not sure the UM timer would work
out for the reasons you listed, but I’ll look more into that as well.

Peter_Viscarola_OSR · June 20, 2017, 5:37pm

What Mr. Tippet said. He knows what he’s talking about. I agree 100%, particularly, about this:

and even more particularly with his comment that indicates “this is harder to do than it might at first seem.”

You SURE? Discrete events? 1ms apart? On Windows?

Again, Mr. Tippet speaks the truth:

Can you re-balance the work done in user-mode and in kernel-mode? Consider, perhaps, doing some of the work in your DpcForIsr instead of waking the thread?

I’ve been doing this a while, and I personally would not sign up to trigger a user-mode app every 1ms using discrete events. In my experience, if something is REALLY this difficult to do, either I’m doing the wrong thing or doing the right thing with the wrong tools.

Peter
OSR
@OSRDrivers

anton_bassov · June 20, 2017, 5:50pm

You make fundamentally wrong assumptions…

First of all you have to think about what happens when after you signal an event from DPC.

Depending it its priority, your DPC will go either to the front or the back of DPC queue. Let’s say you are just desperate so that you queue it to the front, and, hence,it signals the event pretty shortly. Does it necessarily imply that your target thread starts running straight away?Think again.

To begin with, thread processing is not going to happen until DPC queue on a given CPU gets flushed, and it may take quite a while in some situations (for example,consider heavy network traffic).

Certainly, another CPU may run the target thread while you still process DPC queue, but this part
depends on how the target thread’s priority fares against the ones of other threads in the system.

This is where your latency comes from - it has nothing to do with DPCs. What you have to do in order to eliminate latency is to raise your target thread’s priority to maximum. However, beware - if your target thread does not yield the CPU pretty shortly and/or the event it blocks on gets signaled by DPC a bit too often the system may become “not-so-user-friendly”, so to say. For example, its GUI may go freezing on a regular basis, file caches will be left unflushed,etc,etc,etc…

Anton Bassov

Randy · June 20, 2017, 6:08pm

@Scott, I’ll definitely spend some time doing this in the future. I don’t know why this isn’t a thing in WPA itself yet.

@Peter, the requirement is from a customer, and the normal pattern of using DPCs+events on faster, pretty modern machines almost always meets the requirement. However, the customer’s test environment as well as my much slower test machines run into the DPC scheduling latency issue. Frankly I may just have to tell the customer to abandon Windows in favor of something else.

@Anton, the target thread’s priority is already running at REALTIME_PRIORITY. That’s the only way it can keep up with events every 1ms.

Don_Burn · June 20, 2017, 8:03pm

But as Anton pointed out there are things you can do to improve the DPC
response. Also, as has been pointed out 1ms is pretty challenging for
Windows, you can’t normally get there. Mr Tippet has it right, if you are
really going to do this, you need to have the user thread spin on an
interlocked operation, and accept the fact that the CPU it runs on will do
nothing else.

Don Burn
Windows Driver Consulting
Website: http://www.windrvr.com

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Tuesday, June 20, 2017 6:08 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Notifying user mode applications of a device interrupt
without a DPC

@Scott, I’ll definitely spend some time doing this in the future. I don’t
know why this isn’t a thing in WPA itself yet.

@Peter, the requirement is from a customer, and the normal pattern of using
DPCs+events on faster, pretty modern machines almost always meets the
requirement. However, the customer’s test environment as well as my much
slower test machines run into the DPC scheduling latency issue. Frankly I
may just have to tell the customer to abandon Windows in favor of something
else.

@Anton, the target thread’s priority is already running at
REALTIME_PRIORITY. That’s the only way it can keep up with events every
1ms.

—
NTDEV is sponsored by OSR

Visit the list online at:
http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software
drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at
http:</http:></http:></http:>

MBond · June 20, 2017, 9:28pm

Well, let?s analyze that statement a bit more.

Assuming that you are correct and that you have extensively analyzed your driver and the timing that it exhibits under at least one particular load on one system. What is the bottleneck that you feel is the problem? What process have you observed as pre-empting your code (or preventing you from being scheduled) and therefore causing the unwanted delays?

I don?t know the answers to these questions and I am not asking them to be difficult. I am just pointing out that if you have effectively done this kind of analysis that you won?t have asked the question as you did.

I will also reiterate that using events to trigger UM code in this manner is an inherently low performance design. You may not be able to do anything about this, but I would be looking at this part before attempting anything fancy in the ISR / DPC code

Sent from Mailhttps: for Windows 10

From: xxxxx@gmail.com mailto:xxxxx
Sent: June 19, 2017 9:07 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Notifying user mode applications of a device interrupt without a DPC

>It is possible to do many things but
>
>
>However, apparently some competitors are somehow magically not experiencing this
>latency
>
>
>Whatever you have been told in this regard is obviously not true. It is
>entirely possible that your HW / driver are of poor quality, but having a
>scheduling latency of a few ms is certainly not abnormal.

With respect, this was the type of answer I was hoping to avoid. I understand that my request is unusual, and I have a good feel of the bottlenecks of the driver after strenuously profiling it.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

OSR_Community_User · June 20, 2017, 10:02pm

I think that the problem can be solved if the hardware interrupts not once
every ms but 10x every ms.

This is a hardware issue not a software issue. With the quartz oscillators
found these days it should be possible to develop such a device and have a
much lower number of missed interrupts.

MBond · June 20, 2017, 10:03pm

No, using the realtime priority class is the only way that you can keep up the illusion that this works.

The design is broken. This is not a windows thing, but my be much more exposed on windows than other platforms.

Without even looking at your requirements or hardware specs, I can virtually assure you that this can be done, but requires a different design.

Sent from Mailhttps: for Windows 10

From: xxxxx@gmail.com mailto:xxxxx
Sent: June 20, 2017 6:07 PM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Notifying user mode applications of a device interrupt without a DPC

@Scott, I’ll definitely spend some time doing this in the future. I don’t know why this isn’t a thing in WPA itself yet.

@Peter, the requirement is from a customer, and the normal pattern of using DPCs+events on faster, pretty modern machines almost always meets the requirement. However, the customer’s test environment as well as my much slower test machines run into the DPC scheduling latency issue. Frankly I may just have to tell the customer to abandon Windows in favor of something else.

@Anton, the target thread’s priority is already running at REALTIME_PRIORITY. That’s the only way it can keep up with events every 1ms.

—
NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

anton_bassov · June 21, 2017, 6:15am

> Frankly I may just have to tell the customer to abandon Windows in favor of something else.

Seems to be the best idea - REALTIME_PRIORITY thread blocking on event that gets
signaled every ms…well, it looks like the system does not really have a chance to do anything, apart from running your app, does it. Therefore, it could be not a bad idea to replace the whole thing with RTOS or even with a bare-metal application under these circumstances…

Anton Bassov

Daniel_Terhell · June 21, 2017, 10:13am

You need to carefully select the hardware and measure the drivers that run a
system, then it’s achievable. If this should be able to run anywhere, you
are out of luck.

//Daniel

anton_bassov · June 21, 2017, 10:41am

> I think that the problem can be solved if the hardware interrupts not once every ms

but 10x every ms.

Long live interrupt storm!!!

Anton Bassov

Peter_Viscarola_OSR · June 21, 2017, 12:29pm

You know… we LOVE questions like this on NTDEV. All the complexity, with few of the details and none of the consequences (to US who comment) of actually being on the hook to make it work. Everyone has an opinion.

So… also sprach Mr. Terhell:

This.

If you look at the ISR-to-DPC latency profile on Windows, you’ll find the average latency is very good… a few micro-seconds… but the distribution is very wide. It’s the outliers that’ll kill you. What’s worse is that you can think you have things working great, and then a customer updates a driver (intentionally or using WU), and all of a sudden that OTHER driver’s DPC behavior changes dramatically and your driver no longer meets the time requirement. We have seen this many, many, many, times in the real world over the years on Windows.

If I was tasked to do this, I would eliminate the DPC completely and would use some shared memory directly between the ISR and the user-mode app … which is what Mr. Tippet suggested originally.

Peter
OSR
@OSRDrivers

prabhakar_vinayagam · June 21, 2017, 2:44pm

User application create an event and thread and pass an event handler to
driver through ioctl, in driver side create an event object and wait over
there and set an event in dpc routine once the interrupt generated in the
isr routine , queue it to dpc routine where you can set t event , driver
will get notification which has been wait for single object and set over
there… Call any function in thread and increment some counter value
whenever the thread executes which indicate the interior generated.
On 21 Jun 2017 21:59, wrote:

> You know… we LOVE questions like this on NTDEV. All the complexity,
> with few of the details and none of the consequences (to US who comment) of
> actually being on the hook to make it work. Everyone has an opinion.
>
> So… also sprach Mr. Terhell:
>
>

>
> This.
>
> If you look at the ISR-to-DPC latency profile on Windows, you’ll find the
> average latency is very good… a few micro-seconds… but the distribution
> is very wide. It’s the outliers that’ll kill you. What’s worse is that
> you can think you have things working great, and then a customer updates a
> driver (intentionally or using WU), and all of a sudden that OTHER driver’s
> DPC behavior changes dramatically and your driver no longer meets the time
> requirement. We have seen this many, many, many, times in the real world
> over the years on Windows.
>
> If I was tasked to do this, I would eliminate the DPC completely and would
> use some shared memory directly between the ISR and the user-mode app …
> which is what Mr. Tippet suggested originally.
>
> Peter
> OSR
> @OSRDrivers
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: http:> showlists.cfm?list=ntdev>
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer>
></http:></http:>

Randy · June 21, 2017, 2:50pm

Thanks to everyone for all the feedback. To reiterate, I don’t have the luxury of changing the design of notifying a UM app of the interrupt, so I’m attempting the interlocked flag pattern. I have a feeling this will have its own set of problems, so I may end up giving up and recommending a different OS.

Pavel_A1 · June 21, 2017, 7:47pm

> I don’t have the luxury of changing the design

… but swapping the OS? Isn’t it a much bigger luxury?

– pa