Periodic DPC scheduling latency spikes on certain logical processors

Tomas_Whitlock · February 5, 2016, 11:44am

Hi all,

I’m seeing a very regular, periodic spike in the latency from scheduling a DPC (in an ISR) to the DPC beginning to execute, on certain logical processors (other than where the ISR executes). The period of these latency spikes is about ~330ms and the worst-case latency appears to be around 15 ms. Average case latency is in the order of a microsecond.

The background is that a customer with a real-time application using our drivers experiences occasional data loss; as an experiment we tried distributing various DPCs to different logical processors in case some driver was occasionally hogging the logical processor where the ISR executes. At that point, we noticed these strange latency spikes. While KeSetTargetProcessorDpcEx is something that is best avoided most of the time, we don’t understand why using it should produce extremely poor worst-case DPC latency of ~15 ms, and why it should occur periodically.

All power management options in the BIOS and Windows are off; with those options on, we see that the latency spikes become essentially random and much more frequent. That’s to be expected I guess, but what I don’t expect is that latency spikes occur periodically even with all power management options turned off.

The OS is Windows 7 Ultimate. The processor is an Intel Core i7 920 and the motherboard is a Supermicro X8SAX.

We have gone to some effort to verify that our driver is not the source of the latency spikes (e.g. getting caught up on our own spinlocks). Driver verifier does not flag any incorrect behaviour. I guess being careless with IRQLs and spinlocks might be able to cause some kind of problem like this, but not with a regular periodicity and would probably be caught by verifier.

The period of the latency spikes does not change if we change the behaviour of our test program, e.g. less or more activity, different activity patterns etc.

I’m aware of LatencyMon and similar tools. LatencyMon in particular does not flag up anything. It reports DPC scheduling latencies as being in the order of 100 microseconds.

Any ideas on what might cause a ~15 ms latency spike every ~330 ms? Is it in fact normal behaviour in Windows 7 for certain cores of the CPU?

There is the possibility that these latency spikes are not actually real, and are an artifact of how we do measurements or are a spinlock bug in our driver, but I’m putting this question out here in case a third of a second seems like a familiar number to someone.

regards
Tomas

Peter_Viscarola_OSR · February 5, 2016, 12:40pm

First question: What’s the PRIORITY of the DPC that you’re queuing to the non-local processor.

Queuing remote DPCs (that is, DPCs that are not targeted to the same processor on which you’re executing) is almost never a good idea. This is in great part due to the fact that folks mostly misunderstand how this works.

When you queue a remote DPC, you take the DPC object and put it on the queue for the remote processor. Simple, right? Except: How will the remote processor become aware of the fact that there’s a new entry in its DPC list to service? And when will the remote processor SERVICE this queued DPC?

The answer is, the remotely queue DPC object won’t be services until something on the remote processor generates an IRQL DISPATCH_LEVEL interrupt. Ordinarily, this means something needs to happen on the remote processor – an interrupt, a timer coming due, something like that – that’ll cause that remote processor to examine it’s DPC list and discover the (cross-queued) queued DPC object.

Result: Weird latency between queuing the DPC onto the remote processor and the DPC running.

You can overcome this by setting the Importance of the DPC object that’s being queued to HighImportance. Not only will this cause the DPC object to be queued at the head of the queue (instead of the tail), but it will also cause an Interprocessor Interrupt (IPI) to be generated from the current processor to the remote processor. When this IPI is serviced, an IRQL DISPATCH_LEVEL software interrupt is logically generated, which in turn results in the DPC list being examined. Bingo! The newly queued DPC object is discovered.

Of course, the downside of this is that generating and servicing IPIs is not a low overhead activity, and has the effect of disturbing the “normal order of things” on the remote processor.

Hence: Queuing remote DPCs (that is, DPCs that are not targeted to the same processor on which you’re executing) is almost never a good idea.

Peter
OSR
@OSRDrivers

Tomas_Whitlock · February 5, 2016, 1:05pm

Peter,

I agree that targetting a different processor to the one that the ISR is executing on is generally a bad idea, but this was an experiment, based on the theory that the occasional high DPC scheduling latency seen in the customer’s system was due to some other DPC behaving badly and hogging that processor. The normal production version of our driver doesn’t call KeSetTargetProcessorEx at all.

We aren’t intending to use KeSetTargetProcessorEx as a solution, but we just wanted to see if we could confirm the theory that our DPC is occasionally getting delayed by another DPC whose execution time is very long. Putting our DPC on an “unused” processor (as far as DPCs are concerned, anyway) and finding that the customer’s occasional data losses went away would have told us something.

I’m queueing the DPC to the remote processor with normal priority, which, as you have explained, might explain the weird scheduling latency. I guess that without the IPI, the target processor, having no ISRs running on it that schedule DPCs, might only wake up and process its DPC queue at timer ticks (which seem to be 15 ms on this machine).

Having gone back to the page in the DDK docs that explains DPC queues, I see that I missed the bit that explains the differences between queueing the DPC locally or remotely. So thanks for the tip - I’ll go away and try changing the priority to HighImportance, to see if it eliminates the large scheduling latencies.

regards
Tomas

Peter_Viscarola_OSR · February 5, 2016, 1:18pm

With respect, I think that’s likely to be a faulty experiment.

If the interrupt occurs on Processor A and you queue it remotely to Processor B, what basis do you have for believing that the “DPC behaving badly and hogging the processor” won’t be running on Processor B. You can’t know… Right?

The way you know these things is by measuring the perf of the system. WPA (XPERF) is particularly good for exactly this purpose.

Peter
OSR
@OSRDrivers

Tim_Roberts · February 5, 2016, 1:27pm

xxxxx@alpha-data.com wrote:

The background is that a customer with a real-time application using our drivers experiences occasional data loss; as an experiment we tried distributing various DPCs to different logical processors in case some driver was occasionally hogging the logical processor where the ISR executes. At that point, we noticed these strange latency spikes. While KeSetTargetProcessorDpcEx is something that is best avoided most of the time, we don’t understand why using it should produce extremely poor worst-case DPC latency of ~15 ms, and why it should occur periodically.

As an experiment, if you disable the audio device, does the problem
persist? We had a very similar problem caused by a stupid spin-wait in
a Realtek audio driver, although that was more than a decade ago, and I
assume they are smarter now.

That board does not have onboard graphics, right? We have sometimes
seen latency issues caused by monitor refresh tying up the main memory
bus, and 15ms is close to the typical 60Hz monitor refresh. That
shouldn’t be an issue with an external graphics card. Plus, the memory
bus on that motherboard is pretty hot.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Tomas_Whitlock · February 5, 2016, 1:27pm

Peter,

That had occurred to me, and you’re right that I can’t know for sure.

However, a positive outcome (i.e. it appears to eliminate or mitigate the data loss issue) would be evidence, though not solid proof, that a CPU-hogging DPC isn’t running on the other processor. We can try all of the the cores if necessary…

A negative outcome (it doesn’t help or it is worse) would be evidence that a CPU-hogging DPC is running on the other processor.

XPERF was on my list of things to try, but setting the DPC to another logical processor seemed to be like it would be a quicker test (although potentially a faulty one, as you say).

Yes, I will do some measurements using XPERF

thanks
Tomas

Tomas_Whitlock · February 5, 2016, 1:45pm

@Peter,

Update: I’ve tried setting the DPC importance, and it has indeed made the 15 ms latency go away. Thanks!

And I accept your advice about XPERF. If I can run it on the customer’s machine, I will.

@Tim,

Thanks for the ideas. I’ll discuss trying some of these things with the customer.

regards
Tomas

Alex_Grig · February 5, 2016, 2:32pm

You want MediumHighImportance