irqls

Albert · January 21, 2009, 2:22am

hello,

what is hte exact readon to have soft IRQLs in the kernel?

thanks

AP

Maxim_S_Shatskih · January 21, 2009, 2:33am

To be able to suspend preemptivity, for instance (this is a must to implement a spinlock).

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

“A P” wrote in message news:xxxxx@ntdev…
hello,

what is hte exact readon to have soft IRQLs in the kernel?

thanks

AP

Albert · January 21, 2009, 2:36am

thanks, that is what i told the ‘guy’ who asked me, he is an architect, but
he refuses to agree with me, funniest is, he says ‘go google the correct
answer’ and wont tell me his logic :))

On Wed, Jan 21, 2009 at 1:02 PM, Maxim S. Shatskih
wrote:

> To be able to suspend preemptivity, for instance (this is a must to
> implement a spinlock).
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
>
> “A P” wrote in message news:xxxxx@ntdev…
> hello,
>
> what is hte exact readon to have soft IRQLs in the kernel?
>
> thanks
>
> AP
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

Maxim_S_Shatskih · January 21, 2009, 2:46am

OK, gather the responses from this forum and show to him.

IRQL is derived on the hardware interrupt level (the interrupt controller register). Historically this is the PDP/VAX-11 feature, and thus a VMS feature, though things are going back - in modern x64 CPUs, you have CR8 register as APIC TPR, so, once again the IRQL register is embedded to the CPU.

But it is too convinient to also implement “preemptivity suspend” as an IRQL raise. After all, ISRs run with preemptivity suspended.

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com
“A P” wrote in message news:xxxxx@ntdev…
thanks, that is what i told the ‘guy’ who asked me, he is an architect, but he refuses to agree with me, funniest is, he says ‘go google the correct answer’ and wont tell me his logic :))

On Wed, Jan 21, 2009 at 1:02 PM, Maxim S. Shatskih wrote:

To be able to suspend preemptivity, for instance (this is a must to implement a spinlock).

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

“A P” wrote in message news:xxxxx@ntdev…
hello,

what is hte exact readon to have soft IRQLs in the kernel?

thanks

AP

—
NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Albert · January 21, 2009, 2:54am

well, i didnnt know abt hte history of PDP, but i did tel lhim it is to have
code eecution hierarchy to have multitasking…

On Wed, Jan 21, 2009 at 1:16 PM, Maxim S. Shatskih
wrote:

> OK, gather the responses from this forum and show to him.
>
> IRQL is derived on the hardware interrupt level (the interrupt
> controller register). Historically this is the PDP/VAX-11 feature, and thus
> a VMS feature, though things are going back - in modern x64 CPUs, you have
> CR8 register as APIC TPR, so, once again the IRQL register is embedded to
> the CPU.
>
> But it is too convinient to also implement “preemptivity suspend” as an
> IRQL raise. After all, ISRs run with preemptivity suspended.
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> “A P” wrote in message news:xxxxx@ntdev…
> thanks, that is what i told the ‘guy’ who asked me, he is an architect,
> but he refuses to agree with me, funniest is, he says ‘go google the correct
> answer’ and wont tell me his logic :))
>
>
>
> On Wed, Jan 21, 2009 at 1:02 PM, Maxim S. Shatskih <
> xxxxx@storagecraft.com> wrote:
>
>> To be able to suspend preemptivity, for instance (this is a must to
>> implement a spinlock).
>>
>> –
>> Maxim S. Shatskih
>> Windows DDK MVP
>> xxxxx@storagecraft.com
>> http://www.storagecraft.com
>>
>>
>> “A P” wrote in message news:xxxxx@ntdev…
>> hello,
>>
>> what is hte exact readon to have soft IRQLs in the kernel?
>>
>> thanks
>>
>> AP
>>
>>
>> —
>> NTDEV is sponsored by OSR
>>
>> For our schedule of WDF, WDM, debugging and other seminars visit:
>> http://www.osr.com/seminars
>>
>> To unsubscribe, visit the List Server section of OSR Online at
>> http://www.osronline.com/page.cfm?name=ListServer
>>
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>

anton_bassov · January 21, 2009, 3:32am

> wont tell me his logic

I would say this is just the question of convenience - it is really convenient to use software interrupt level as an implementation of " can block vs cannot block" concept. For example, when you want to queue a DPC from ISR while spinlock is being held by non-DPC code all you have to do is just to request a software interrupt via ICR, so that it will fire automatically when spinlock gets released (i.e. a write to TPR is made) - pure and simple. Otherwise, you would have to write some extra code to handle this situation ( or, probably, just defer DPC processing until the moment a call to some of dispatcher functions is made)…

Anton Bassov

James_Harper · January 21, 2009, 4:34am

> > wont tell me his logic

I would say this is just the question of convenience - it is really
convenient to use software interrupt level as an implementation of "
can
block vs cannot block" concept. For example, when you want to queue a
DPC
from ISR while spinlock is being held by non-DPC code all you have to
do
is just to request a software interrupt via ICR, so that it will fire
automatically when spinlock gets released (i.e. a write to TPR is
made) -
pure and simple. Otherwise, you would have to write some extra code to
handle this situation ( or, probably, just defer DPC processing until
the
moment a call to some of dispatcher functions is made)…

Something I discovered recently - windows w2k3sp2 (and presumably vista,
w2k8 and w7 too) don’t write TPR at all once booted, which is why they
virtualise so well compared to XP.

I always assumed that w2k3sp2 just had some optimizations in place to
minimise TPR writes, but according to mmio traces under Xen, it actually
doesn’t touch it at all!

James

anton_bassov · January 21, 2009, 5:07am

James,

windows w2k3sp2 (and presumably vista, w2k8 and w7 too) don’t write TPR at all once booted,

Do you mean they took Linux approach??? Although Linux does not support the concept of IRQL to the extent Windows does , it still prioritizes software interrupts to one another. However, it does not write to TPR -instead, it implements the whole thing in software. I believe the main reason for this is cross-platform portability - otherwise, TPR seems to be just an ideal way of implementing interrupt priorities, so that it is really bizarre for the OS that targets mainly x86 and x86_64 platforms to take this approach…

Anton Bassov

Jake_Oshins · January 21, 2009, 11:41am

It’s a little bit more than a convenience.

IRQL, fundamentally, is a way to do two things:

Strongly order all the various interrupt sources (timers, devices,
self-interrupts, interprocessor-interrupts, etc.)
Selectively and progressively mask those interrupts.

Take those and put them together with the fact that building a pre-emptive
scheduler almost requires using interrupts, and you notice that you’ll have
some IRQL which stops pre-emption.

Furthermore, it allows you to assign an IRQL to each and every spinlock in
the system. Before spinning waiting to acquire a lock, you raise IRQL to
the level of the lock, ensuring that pre-emption can’t deadlock the system
by unscheduling the code which currently holds the lock. (Almost all the
locks drivers use have an IRQL of DISPATCH_LEVEL.)

With traditional spinlocks, you can even be interrupted to do something
higher priority (like a TLB) flush while spinning waiting for a lock. Most
non-Cutler OSes put all locks at the same priority, which doesn’t allow
this. Granted, though, with queued spinlocks, this becomes a lot less
useful, since queued locks almost demand that you mask all interrupts.

See also:

http://www.microsoft.com/whdc/driver/kernel/IRQL.mspx
http://www.microsoft.com/whdc/driver/kernel/locks.mspx

–
Jake Oshins
Hyper-V I/O Architect (former HAL guy, and that involved a lot of IRQL
manipulation)
Windows Kernel Team

This post implies no warranties and confers no rights.

“Maxim S. Shatskih” wrote in message
news:xxxxx@ntdev…
> OK, gather the responses from this forum and show to him.
>
> IRQL is derived on the hardware interrupt level (the interrupt
> controller register). Historically this is the PDP/VAX-11 feature, and
> thus a VMS feature, though things are going back - in modern x64 CPUs, you
> have CR8 register as APIC TPR, so, once again the IRQL register is
> embedded to the CPU.
>
> But it is too convinient to also implement “preemptivity suspend” as an
> IRQL raise. After all, ISRs run with preemptivity suspended.
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
> “A P” wrote in message news:xxxxx@ntdev…
> thanks, that is what i told the ‘guy’ who asked me, he is an architect,
> but he refuses to agree with me, funniest is, he says ‘go google the
> correct answer’ and wont tell me his logic :))
>
>
>
> On Wed, Jan 21, 2009 at 1:02 PM, Maxim S. Shatskih
> wrote:
>
> To be able to suspend preemptivity, for instance (this is a must to
> implement a spinlock).
>
> –
> Maxim S. Shatskih
> Windows DDK MVP
> xxxxx@storagecraft.com
> http://www.storagecraft.com
>
> “A P” wrote in message news:xxxxx@ntdev…
> hello,
>
> what is hte exact readon to have soft IRQLs in the kernel?
>
> thanks
>
> AP
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>
>

Pavel_A1 · January 21, 2009, 1:52pm

Jake,

could you maybe explain a bit more, please, why queued locks
“almost demand that you mask all interrupts” ?
The locks paper doesn’t suggest this… all it says
is that queued spinlocks are “a more efficient variation of ordinary
spin locks”.

– pa

Jake Oshins wrote:

It’s a little bit more than a convenience.

IRQL, fundamentally, is a way to do two things:

Strongly order all the various interrupt sources (timers, devices,
self-interrupts, interprocessor-interrupts, etc.)

Selectively and progressively mask those interrupts.

Take those and put them together with the fact that building a
pre-emptive scheduler almost requires using interrupts, and you notice
that you’ll have some IRQL which stops pre-emption.

Furthermore, it allows you to assign an IRQL to each and every spinlock
in the system. Before spinning waiting to acquire a lock, you raise
IRQL to the level of the lock, ensuring that pre-emption can’t deadlock
the system by unscheduling the code which currently holds the lock.
(Almost all the locks drivers use have an IRQL of DISPATCH_LEVEL.)

With traditional spinlocks, you can even be interrupted to do something
higher priority (like a TLB) flush while spinning waiting for a lock.
Most non-Cutler OSes put all locks at the same priority, which doesn’t
allow this. Granted, though, with queued spinlocks, this becomes a lot
less useful, since queued locks almost demand that you mask all interrupts.

See also:

http://www.microsoft.com/whdc/driver/kernel/IRQL.mspx
http://www.microsoft.com/whdc/driver/kernel/locks.mspx

Jake_Oshins · January 21, 2009, 4:35pm

Sure.

Examine, first, the way that a traditional spinlock works. Some atomic
compare and swap instruction (lock cmpxchg, lock bts, etc.) causes the
processor to attempt to own the cache line containing the lock. This causes
a lot of bus traffic and leads to starvation issues in large machines. If
the processor can’t acquire the lock, the process starts over again, leading
to more bus traffic. But if a processor gets interrupted, it stops trying
to acquire the lock and goes off and does useful work. Other processors can
acquire the lock while the interrupted processor handles its interrupt.

With a queued lock, the processor attempts to insert its lock waiter
structure at the end of the list of processors waiting on the lock. There
is some bus traffic as various processors try to insert themselves in the
queue. But once the processor has inserted itself into the queue, then it
spins on a local value waiting for the moment when ownership is assigned to
it by the previous owner. This tends to cause no bus traffic, as that cache
line can remain in the waiting processor’s cache. When the current owner
releases the lock, it follows the linked list to the next waiter and assigns
the lock to that waiter.

Notice here that if a processor waiting on a queued lock is interrupted, it
remains in the queue. If the current owner releases the lock and assigns it
to a processor which is off servicing an interrupt, all the waiters wait
until the interrupted processor returns from the interrupt. This is not
good.

So queued locks are better for high contention locks with short hold times,
particularly in NUMA machines. Traditional locks are better for low
contention locks with long hold times. (Actually, ideally all spinlocks
would have short hold times.)

In-stack queued locks have an IRQL of DISPATCH_LEVEL in Windows, so that the
contract on locking doesn’t change much in the driver model if a driver uses
queued locks, and so that drivers can’t easily cause TLB updates and other
really high priority interrupts to be delayed. The hottest locks in the
kernel itself are queued spinlocks with an IRQL of HIGH_LEVEL so that
waiters can’t be interrupted.

–
Jake Oshins
Hyper-V I/O Architect
Windows Kernel Team

This post implies no warranties and confers no rights.

“Pavel A.” wrote in message news:xxxxx@ntdev…
> Jake,
>
> could you maybe explain a bit more, please, why queued locks
> “almost demand that you mask all interrupts” ?
> The locks paper doesn’t suggest this… all it says
> is that queued spinlocks are “a more efficient variation of ordinary spin
> locks”.
>
> – pa
>
>
> Jake Oshins wrote:
>> It’s a little bit more than a convenience.
>>
>> IRQL, fundamentally, is a way to do two things:
>>
>> 1) Strongly order all the various interrupt sources (timers, devices,
>> self-interrupts, interprocessor-interrupts, etc.)
>>
>> 2) Selectively and progressively mask those interrupts.
>>
>> Take those and put them together with the fact that building a
>> pre-emptive scheduler almost requires using interrupts, and you notice
>> that you’ll have some IRQL which stops pre-emption.
>>
>> Furthermore, it allows you to assign an IRQL to each and every spinlock
>> in the system. Before spinning waiting to acquire a lock, you raise IRQL
>> to the level of the lock, ensuring that pre-emption can’t deadlock the
>> system by unscheduling the code which currently holds the lock. (Almost
>> all the locks drivers use have an IRQL of DISPATCH_LEVEL.)
>>
>> With traditional spinlocks, you can even be interrupted to do something
>> higher priority (like a TLB) flush while spinning waiting for a lock.
>> Most non-Cutler OSes put all locks at the same priority, which doesn’t
>> allow this. Granted, though, with queued spinlocks, this becomes a lot
>> less useful, since queued locks almost demand that you mask all
>> interrupts.
>>
>> See also:
>>
>> http://www.microsoft.com/whdc/driver/kernel/IRQL.mspx
>> http://www.microsoft.com/whdc/driver/kernel/locks.mspx
>>
>

Pavel_A1 · January 21, 2009, 4:53pm

Thanks!
–pa

Jake Oshins wrote:

Sure.

Examine, first, the way that a traditional spinlock works. Some atomic
compare and swap instruction (lock cmpxchg, lock bts, etc.) causes the
processor to attempt to own the cache line containing the lock. This
causes a lot of bus traffic and leads to starvation issues in large
machines. If the processor can’t acquire the lock, the process starts
over again, leading to more bus traffic. But if a processor gets
interrupted, it stops trying to acquire the lock and goes off and does
useful work. Other processors can acquire the lock while the
interrupted processor handles its interrupt.

With a queued lock, the processor attempts to insert its lock waiter
structure at the end of the list of processors waiting on the lock.
There is some bus traffic as various processors try to insert themselves
in the queue. But once the processor has inserted itself into the
queue, then it spins on a local value waiting for the moment when
ownership is assigned to it by the previous owner. This tends to cause
no bus traffic, as that cache line can remain in the waiting processor’s
cache. When the current owner releases the lock, it follows the linked
list to the next waiter and assigns the lock to that waiter.

Notice here that if a processor waiting on a queued lock is interrupted,
it remains in the queue. If the current owner releases the lock and
assigns it to a processor which is off servicing an interrupt, all the
waiters wait until the interrupted processor returns from the
interrupt. This is not good.

So queued locks are better for high contention locks with short hold
times, particularly in NUMA machines. Traditional locks are better for
low contention locks with long hold times. (Actually, ideally all
spinlocks would have short hold times.)

In-stack queued locks have an IRQL of DISPATCH_LEVEL in Windows, so that
the contract on locking doesn’t change much in the driver model if a
driver uses queued locks, and so that drivers can’t easily cause TLB
updates and other really high priority interrupts to be delayed. The
hottest locks in the kernel itself are queued spinlocks with an IRQL of
HIGH_LEVEL so that waiters can’t be interrupted.

Cay_Bremer · January 21, 2009, 5:45pm

Thanks a lot for explaining the up- and downsides of queued spinlocks vs.
conventional ones!

Contrary to the documentation’s universal recommendation of queued locks,
I’ve always suspected that their relatively higher overhead could outweigh
the potential advantages in low-contention scenarios.

(By the way, is it “spinlock” or “spin lock”? “Spinlock” is more common,
but MSDN thinks otherwise.)

Cay

On Wed, 21 Jan 2009 22:35:14 +0100, Jake Oshins
wrote:
> Sure.
>
> Examine, first, the way that a traditional spinlock works. Some atomic
> compare and swap instruction (lock cmpxchg, lock bts, etc.) causes the
> processor to attempt to own the cache line containing the lock. This
> causes a lot of bus traffic and leads to starvation issues in large
> machines. If the processor can’t acquire the lock, the process starts
> over again, leading to more bus traffic. But if a processor gets
> interrupted, it stops trying to acquire the lock and goes off and does
> useful work. Other processors can acquire the lock while the
> interrupted processor handles its interrupt.
>
> With a queued lock, the processor attempts to insert its lock waiter
> structure at the end of the list of processors waiting on the lock.
> There is some bus traffic as various processors try to insert themselves
> in the queue. But once the processor has inserted itself into the
> queue, then it spins on a local value waiting for the moment when
> ownership is assigned to it by the previous owner. This tends to cause
> no bus traffic, as that cache line can remain in the waiting processor’s
> cache. When the current owner releases the lock, it follows the linked
> list to the next waiter and assigns the lock to that waiter.
>
> Notice here that if a processor waiting on a queued lock is interrupted,
> it remains in the queue. If the current owner releases the lock and
> assigns it to a processor which is off servicing an interrupt, all the
> waiters wait until the interrupted processor returns from the
> interrupt. This is not good.
>
> So queued locks are better for high contention locks with short hold
> times, particularly in NUMA machines. Traditional locks are better for
> low contention locks with long hold times. (Actually, ideally all
> spinlocks would have short hold times.)
>
> In-stack queued locks have an IRQL of DISPATCH_LEVEL in Windows, so that
> the contract on locking doesn’t change much in the driver model if a
> driver uses queued locks, and so that drivers can’t easily cause TLB
> updates and other really high priority interrupts to be delayed. The
> hottest locks in the kernel itself are queued spinlocks with an IRQL of
> HIGH_LEVEL so that waiters can’t be interrupted.

Pavel_A1 · January 21, 2009, 6:47pm

Maxim S. Shatskih wrote:

IRQL is derived on the hardware interrupt level (the interrupt
controller register). Historically this is the PDP/VAX-11 feature, and
thus a VMS feature, though things are going back - in modern x64 CPUs,
you have CR8 register as APIC TPR, so, once again the IRQL register is
embedded to the CPU.

So, are these the reasons, why Linux has no IRQL concept -

a. Linux supports architectures where the “IRQL register” does not exist,
b. The original Linux kernel was influenced by Win95 and some PC Unixes,
rather than WinNT, Digital Unix, etc?

regards,
–pa

Tim_Roberts · January 21, 2009, 8:17pm

Pavel A. wrote:

Maxim S. Shatskih wrote:
> IRQL is derived on the hardware interrupt level (the interrupt
> controller register). Historically this is the PDP/VAX-11 feature,
> and thus a VMS feature, though things are going back - in modern x64
> CPUs, you have CR8 register as APIC TPR, so, once again the IRQL
> register is embedded to the CPU.

So, are these the reasons, why Linux has no IRQL concept -

a. Linux supports architectures where the “IRQL register” does not exist,
b. The original Linux kernel was influenced by Win95 and some PC
Unixes, rather than WinNT, Digital Unix, etc?

The original Linux kernel was release in 1991, long before Windows 95
and NT were even in beta. In fact, that was even before Windows 3.1.

The design of the Linux kernel was influenced by Unix and MINIX.

–
Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

anton_bassov · January 21, 2009, 8:39pm

> So, are these the reasons, why Linux has no IRQL concept -

Well, it just does not prioritize hardware interrupts to one another. However, software interrupt priority is still there…

a. Linux supports architectures where the “IRQL register” does not exist,

This is why it it implements software interrupt priority completely in a software…

b. The original Linux kernel was influenced by Win95 and some PC Unixes, rather than WinNT,
Digital Unix, etc?

Please note that TPR turned up on x86 only with P II family that introduced APIC - earlier CPUs did not have it. Therefore, NT originally implemented the concept of IRQL in a software as well…

Anton Bassov

anton_bassov · January 21, 2009, 8:58pm

> The design of the Linux kernel was influenced by Unix and MINIX.

Only by the former …

From the very, very beginning it abandoned microkernel model that MINIX relied upon - instead, it followed the way of “classical” UNIX design of 1970s, effectively turning MINIX into a totally different system. Please google for “Tannenbaum vs Torvalds discussion” for more details …

Anton Bassov

James_Harper · January 22, 2009, 12:47am

>

> The design of the Linux kernel was influenced by Unix and MINIX.

Only by the former …

From the very, very beginning it abandoned microkernel model that
MINIX
relied upon - instead, it followed the way of “classical” UNIX design
of
1970s, effectively turning MINIX into a totally different system.
Please
google for “Tannenbaum vs Torvalds discussion” for more details …

And use the term ‘flamewar’ in there for good measure too

James

Maxim_S_Shatskih · January 22, 2009, 4:27am

> since queued locks almost demand that you mask all interrupts.

Sorry Jake, can you clarify this a bit more?

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

Maxim_S_Shatskih · January 22, 2009, 4:29am

> to a processor which is off servicing an interrupt, all the waiters wait

until the interrupted processor returns from the interrupt. This is not
good.

Thanks Jake!

–
Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com