Overlapped and APC

I know that completion ports fixed all the problems that are related with APC.

But for apcs( I mean completion routines), What problems, that are caused by overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call), waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed completion ports. What was the problem with overlapped io, so completion routine was invented?

Thanks

KeWaitForSingleBill(…)

Overlapped structs pretty much require that you have a thread for every
pending I/O, or at the very least, you handle completing I/O on the same
thread that initiated it. Remember that when this stuff was designed, very
few operating systems were multi-threaded. I think that the original
designers thought that threads would end up being so fundamentally light
that creating lots and lots of them would be useful and common. Much of the
Win32 user-mode programming model has this assumption.

The last twenty years have taught us that you really don’t want hundreds of
threads, each dedicated to some task. The interaction between them is
overwhelmingly complex and error prone, not to mention inefficient. What we
now know is that you really don’t want very many more threads than there are
processors (virtual, physical or logical) to run your code. It makes more
sense to treat each completing I/O as something to be handled by a pool of
threads.

This is the idea behind the completion port. (And it’s the idea behind most
of the kernel-mode constructs in NT from the beginning.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by
overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

Thanks

KeWaitForSingleBill(…)

Thank you Mr. Jake Oshins for explanation of the idea behind the completion ports. But my question was a little different.

Let me sort the IO types:

1-) Synchronous I/O
2-) Overlapped I/O
3-) Completion Routines
4-) Completion Ports

As you explained, I understand the difference between 3-) Completion Routines and 4-) Completion Ports. But I’m just trying to figure out why developers designed Completion Routines over Overlapped I\O.

As I said, eventually both of their main threads call waitforxxx and, also same threads handle completion. So what is the difference?

Thanks.

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Saturday, March 12, 2011 6:01 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

Overlapped structs pretty much require that you have a thread for every
pending I/O, or at the very least, you handle completing I/O on the same
thread that initiated it. Remember that when this stuff was designed, very
few operating systems were multi-threaded. I think that the original
designers thought that threads would end up being so fundamentally light
that creating lots and lots of them would be useful and common. Much of the

Win32 user-mode programming model has this assumption.
*****

Actually, this is not true. You do not need one thread per pending I/O, and
there is no requirement to handle completion in the initiating thread.

When this was designed, Windows was already multithreaded, and the threads
were known to not be “light”, nor was it expected that such usage would be
common. I have never seen any assumption that this was so, or assumed so at
user level. We knew it wasn’t true.

Note that async callback I/O was not created until end users demanded it,
and to make it viable, the concept of “alertable wait state” had to be used,
because the async callbacks of VAX VMS were a total flaming disaster! (I
believe the phrase used by Dave Cutler was “Asynchronous callback I/O will
go into Windows over my dead body!” and I point out that (a) Windows has
async callback I/O and (b) Dave Cutler is alive. What happened to satisfy
him that async callback I/O was viable? Alertable wait state! (I tried to
use async callback I/O under VMS, and there was no possible way to make it
work right, because it Just Happened, which could result in recursive calls
to the allocator when the allocator had been preempted by a previous async
call).

It is quite common to have a small number of threads (perhaps one)
initiating I/O, and a small number of threads (usually equal to the number
of cores) to handle completion. Using callback I/O is one of the single
worst I/O models that exists, because it is (a) hard to use (b) requires
handling the callback in the initiating thread, thus reducing throughput (c)
requires polling for completion (otherwise known as “entering alertable wait
state”, which must be done frequently enough to maximize throughput but not
so often as to overload the scheduler with unnecessary AWS activations). It
should be avoided. The best model is the I/O Completion Port model, which
allows multiple threads to respond to I/O completion.

******

The last twenty years have taught us that you really don’t want hundreds of
threads, each dedicated to some task. The interaction between them is
overwhelmingly complex and error prone, not to mention inefficient. What we

now know is that you really don’t want very many more threads than there are

processors (virtual, physical or logical) to run your code. It makes more
sense to treat each completing I/O as something to be handled by a pool of
threads.

*****

Threads have complex interactions only if programs are badly written.
Synchronization, which all introductory courses teach, should be avoided
like the plague; synchronization represents where threads “rub together”,
and like any physical system, all this does is cause friction which
generates heat and wastes energy. Only in the kernel is synchronization
critical, because we work so close to the hardware, and often outside the
fundamental OS concepts like “scheduler”. In application space, I consider
that the instant you add a mutex or CRTICAL_SECTION to code, your design has
failed. There are better models, such as the agent pattern, that handle
multithreading much more cleanly than explicit programmer synchronization.
The problem with the way we teach it is that we spend so much time teaching
it, with lots of exercises, and lots of emphasis on its importance, that the
students end up thinking it is the *only* way to handle multithreading, and
it is not only *not* the only way, it is usually the *worst possible* way.
I typically use an IOCP without handle binding to do interthread message
queueing, or just use the standard PostMessage queue, and thus never have
issues about synchronization (synchronization *is* needed, of course, but I,
as a programmer, should never have to THINK about it as an issue). I have
not used a mutex or CRITICAL_SECTION in multithreaded applications in at
least 15 years, once I stopped using MS-DOS (otherwise known as Windows 9x,
which didn’t have IOCPs).

One of the problems about “interthread communication” is “how do I
communicate what should happen when this I/O operation completes”. The
simplest way is to embed the OVERLAPPED structure in a structure of my own
that carries all the necessary context, such as pointers to the buffers,
information about what to do, etc. When I get the pointer to the OVERLAPPED
structure in my IOCP handler, I just cast it to the larger structure and I
have everything I could possibly need (and if I don’t, I built my structure
wrong). No global variables are needed, for example. Generally, one of the
worst disasters is the old habit of using global variables to hold state,
which means that all the threads have to interact in complex ways. If you
simply ignore the concept of global variables, life gets a lot simpler.

typedef struct {
OVERLAPPED ovl;
…my context data here
} MY_OVERLAPPED;

*****

This is the idea behind the completion port. (And it’s the idea behind most

of the kernel-mode constructs in NT from the beginning.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by

overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

Thanks

KeWaitForSingleBill(…)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

See below…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@gmail.com
Sent: Saturday, March 12, 2011 3:18 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Overlapped and APC

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by
overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

*****
Actually, GetOverlappedResult takes an option as to whether or not it should
wait. It should not wait; the last parameter would always be “FALSE”. But
if you think you should be waiting in the main thread, you have already made
a serious design error. The main thread should start the I/O transaction,
and should forget about it (the “fire-and-forget” pattern).
*****

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.
****
Huh? No, if you are using IOCPs, the main thread does fire-and-forget, and
SOME OTHER thread gets notified that the I/O has completed! If the main
thread ever cares, the I/O handler thread will tell it, but it shouldn’t
care after the I/O is initiated.
****

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

****
Overlapped I/O using callback requires that the initiating thread handle the
completion. This means that the thread becomes its own bottleneck; it has
to handle the completion, and it has to poll for completion (known as
“entering alertable wait state”). First rule of asynchronous I/O: the
thread that initiates it does not care about it once it has been fired off.
If there is ANY reason the thread needs to know about it, it will be
notified (most commonly by PostMessage) asynchronously.

I have no idea what the callback I/O mechanism was invented; it strikes me
as one of the Really Bad Ideas of Windows, from a performance viewpoint, and
from a complexity viewpoint. I used it once, about 15 years ago, and I
consider it one of the great mistakes I made in programming. If that app
ever crosses my desk again, it is going to be rewritten.
joe
****

Thanks

KeWaitForSingleBill(…)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

I’m not sure what kind of programs you have been writing these 15 years, but
while I agree that unnecessary synchronization should be avoided at all
costs, IMHO any kind of systems programming, in KM or UM, requires thread
sync. Good designs will allow threads to ‘run free’ and operate on as
orthogonal a datum as possible with as thread local and NUMA local a dataset
as possible, but eventually, whether through a routine in the application, a
system service or simple a memory allocation, a shared resource will need to
be accessed and a sync point generated. This is especially true in OO
languages where object creation and destruction are frequent events and
access to the heap is often a limiting factor on an otherwise independent
set of computations

“Joseph M. Newcomer” wrote in message news:xxxxx@ntdev…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Saturday, March 12, 2011 6:01 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

Overlapped structs pretty much require that you have a thread for every
pending I/O, or at the very least, you handle completing I/O on the same
thread that initiated it. Remember that when this stuff was designed, very
few operating systems were multi-threaded. I think that the original
designers thought that threads would end up being so fundamentally light
that creating lots and lots of them would be useful and common. Much of the

Win32 user-mode programming model has this assumption.
*****

Actually, this is not true. You do not need one thread per pending I/O, and
there is no requirement to handle completion in the initiating thread.

When this was designed, Windows was already multithreaded, and the threads
were known to not be “light”, nor was it expected that such usage would be
common. I have never seen any assumption that this was so, or assumed so at
user level. We knew it wasn’t true.

Note that async callback I/O was not created until end users demanded it,
and to make it viable, the concept of “alertable wait state” had to be used,
because the async callbacks of VAX VMS were a total flaming disaster! (I
believe the phrase used by Dave Cutler was “Asynchronous callback I/O will
go into Windows over my dead body!” and I point out that (a) Windows has
async callback I/O and (b) Dave Cutler is alive. What happened to satisfy
him that async callback I/O was viable? Alertable wait state! (I tried to
use async callback I/O under VMS, and there was no possible way to make it
work right, because it Just Happened, which could result in recursive calls
to the allocator when the allocator had been preempted by a previous async
call).

It is quite common to have a small number of threads (perhaps one)
initiating I/O, and a small number of threads (usually equal to the number
of cores) to handle completion. Using callback I/O is one of the single
worst I/O models that exists, because it is (a) hard to use (b) requires
handling the callback in the initiating thread, thus reducing throughput (c)
requires polling for completion (otherwise known as “entering alertable wait
state”, which must be done frequently enough to maximize throughput but not
so often as to overload the scheduler with unnecessary AWS activations). It
should be avoided. The best model is the I/O Completion Port model, which
allows multiple threads to respond to I/O completion.

******

The last twenty years have taught us that you really don’t want hundreds of
threads, each dedicated to some task. The interaction between them is
overwhelmingly complex and error prone, not to mention inefficient. What we

now know is that you really don’t want very many more threads than there are

processors (virtual, physical or logical) to run your code. It makes more
sense to treat each completing I/O as something to be handled by a pool of
threads.

*****

Threads have complex interactions only if programs are badly written.
Synchronization, which all introductory courses teach, should be avoided
like the plague; synchronization represents where threads “rub together”,
and like any physical system, all this does is cause friction which
generates heat and wastes energy. Only in the kernel is synchronization
critical, because we work so close to the hardware, and often outside the
fundamental OS concepts like “scheduler”. In application space, I consider
that the instant you add a mutex or CRTICAL_SECTION to code, your design has
failed. There are better models, such as the agent pattern, that handle
multithreading much more cleanly than explicit programmer synchronization.
The problem with the way we teach it is that we spend so much time teaching
it, with lots of exercises, and lots of emphasis on its importance, that the
students end up thinking it is the *only* way to handle multithreading, and
it is not only *not* the only way, it is usually the *worst possible* way.
I typically use an IOCP without handle binding to do interthread message
queueing, or just use the standard PostMessage queue, and thus never have
issues about synchronization (synchronization *is* needed, of course, but I,
as a programmer, should never have to THINK about it as an issue). I have
not used a mutex or CRITICAL_SECTION in multithreaded applications in at
least 15 years, once I stopped using MS-DOS (otherwise known as Windows 9x,
which didn’t have IOCPs).

One of the problems about “interthread communication” is “how do I
communicate what should happen when this I/O operation completes”. The
simplest way is to embed the OVERLAPPED structure in a structure of my own
that carries all the necessary context, such as pointers to the buffers,
information about what to do, etc. When I get the pointer to the OVERLAPPED
structure in my IOCP handler, I just cast it to the larger structure and I
have everything I could possibly need (and if I don’t, I built my structure
wrong). No global variables are needed, for example. Generally, one of the
worst disasters is the old habit of using global variables to hold state,
which means that all the threads have to interact in complex ways. If you
simply ignore the concept of global variables, life gets a lot simpler.

typedef struct {
OVERLAPPED ovl;
…my context data here
} MY_OVERLAPPED;

*****

This is the idea behind the completion port. (And it’s the idea behind most

of the kernel-mode constructs in NT from the beginning.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by

overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

Thanks

KeWaitForSingleBill(…)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

> Overlapped structs pretty much require that you have a thread for every pending I/O, or at the very least,

you handle completing I/O on the same thread that initiated it.

???

What about WaitForMultipleObjects()??? IIRC, you can use this function under Windows pretty much the same way you would use select() in UNIX environment (i.e. specify WaitAll parameter as FALSE, and, at this point, this function’s behavior becomes semantically identical to that of select() - the only difference is that you specify event handles, rather than file descriptors, plus get informed only about a single completion per invocation). Don’t forget that IO model based upon select() was invented very long before multithreading came into the play, and it was meant to work for single-threaded process.There is no need for either “thread for every pending I/O” or “completing I/O on the same thread that initiated it” - just create a dedicated thread that calls WaitForMultipleObjects() with WaitAll parameter specified as FALSE in a loop, and think of it as of SIGIO handler that calls select() under UNIX. Simple, ugh…

Anton Bassov

The kind of code that 30 years ago would have involved several mutexes and
lots of synchronization. It took me years to learn that synchronization
should be avoided by adopting higher-level patterns, and since I started
doing this, I don’t have deadlocks, performance problems, lock contention,
or any of the other problems that plague people who believe what they were
taught about synchronization.

Low-level locks on shared queues are, of course, essential. The key idea is
not that synchronization should be avoided, but that *reasoning* about
synchronization should be avoided, because this is where all the errors are
made. Any “convergence” involves asynchronous notifications; for example,
never block on a set of threads waiting for them to finish; instead, keep
track of the threads you have launched (as a “reference count”) and as each
thread terminates, it sends a notification that it has terminated, at which
point you decrement the reference count; when the reference count reaches
zero, you know all the threads have finished. These kinds of patterns tend
to also be robust-under-maintenance, always keeping in mind that maintenance
is done by unskilled programmers (the new hire, or yourself six months
later).

In OO, as in many contexts, contention on the storage allocator is a
problem. This is rarely addressed by the OO languages, which tend to not
allow multiple allocators, and which also leads to architectural problems
when multiple storage pools are used in multiple threads. I teach courses on
this problem. It is interesting that C++0x does simplify some of these
issues, and I am currently revising my course to take C++0x (VS2010) into
account.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of m
Sent: Monday, March 14, 2011 7:27 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

I’m not sure what kind of programs you have been writing these 15 years, but

while I agree that unnecessary synchronization should be avoided at all
costs, IMHO any kind of systems programming, in KM or UM, requires thread
sync. Good designs will allow threads to ‘run free’ and operate on as
orthogonal a datum as possible with as thread local and NUMA local a dataset

as possible, but eventually, whether through a routine in the application, a

system service or simple a memory allocation, a shared resource will need to

be accessed and a sync point generated. This is especially true in OO
languages where object creation and destruction are frequent events and
access to the heap is often a limiting factor on an otherwise independent
set of computations

“Joseph M. Newcomer” wrote in message news:xxxxx@ntdev…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Saturday, March 12, 2011 6:01 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

Overlapped structs pretty much require that you have a thread for every
pending I/O, or at the very least, you handle completing I/O on the same
thread that initiated it. Remember that when this stuff was designed, very
few operating systems were multi-threaded. I think that the original
designers thought that threads would end up being so fundamentally light
that creating lots and lots of them would be useful and common. Much of the

Win32 user-mode programming model has this assumption.
*****

Actually, this is not true. You do not need one thread per pending I/O, and
there is no requirement to handle completion in the initiating thread.

When this was designed, Windows was already multithreaded, and the threads
were known to not be “light”, nor was it expected that such usage would be
common. I have never seen any assumption that this was so, or assumed so at
user level. We knew it wasn’t true.

Note that async callback I/O was not created until end users demanded it,
and to make it viable, the concept of “alertable wait state” had to be used,
because the async callbacks of VAX VMS were a total flaming disaster! (I
believe the phrase used by Dave Cutler was “Asynchronous callback I/O will
go into Windows over my dead body!” and I point out that (a) Windows has
async callback I/O and (b) Dave Cutler is alive. What happened to satisfy
him that async callback I/O was viable? Alertable wait state! (I tried to
use async callback I/O under VMS, and there was no possible way to make it
work right, because it Just Happened, which could result in recursive calls
to the allocator when the allocator had been preempted by a previous async
call).

It is quite common to have a small number of threads (perhaps one)
initiating I/O, and a small number of threads (usually equal to the number
of cores) to handle completion. Using callback I/O is one of the single
worst I/O models that exists, because it is (a) hard to use (b) requires
handling the callback in the initiating thread, thus reducing throughput (c)
requires polling for completion (otherwise known as “entering alertable wait
state”, which must be done frequently enough to maximize throughput but not
so often as to overload the scheduler with unnecessary AWS activations). It
should be avoided. The best model is the I/O Completion Port model, which
allows multiple threads to respond to I/O completion.

******

The last twenty years have taught us that you really don’t want hundreds of
threads, each dedicated to some task. The interaction between them is
overwhelmingly complex and error prone, not to mention inefficient. What we

now know is that you really don’t want very many more threads than there are

processors (virtual, physical or logical) to run your code. It makes more
sense to treat each completing I/O as something to be handled by a pool of
threads.

*****

Threads have complex interactions only if programs are badly written.
Synchronization, which all introductory courses teach, should be avoided
like the plague; synchronization represents where threads “rub together”,
and like any physical system, all this does is cause friction which
generates heat and wastes energy. Only in the kernel is synchronization
critical, because we work so close to the hardware, and often outside the
fundamental OS concepts like “scheduler”. In application space, I consider
that the instant you add a mutex or CRTICAL_SECTION to code, your design has
failed. There are better models, such as the agent pattern, that handle
multithreading much more cleanly than explicit programmer synchronization.
The problem with the way we teach it is that we spend so much time teaching
it, with lots of exercises, and lots of emphasis on its importance, that the
students end up thinking it is the *only* way to handle multithreading, and
it is not only *not* the only way, it is usually the *worst possible* way.
I typically use an IOCP without handle binding to do interthread message
queueing, or just use the standard PostMessage queue, and thus never have
issues about synchronization (synchronization *is* needed, of course, but I,
as a programmer, should never have to THINK about it as an issue). I have
not used a mutex or CRITICAL_SECTION in multithreaded applications in at
least 15 years, once I stopped using MS-DOS (otherwise known as Windows 9x,
which didn’t have IOCPs).

One of the problems about “interthread communication” is “how do I
communicate what should happen when this I/O operation completes”. The
simplest way is to embed the OVERLAPPED structure in a structure of my own
that carries all the necessary context, such as pointers to the buffers,
information about what to do, etc. When I get the pointer to the OVERLAPPED
structure in my IOCP handler, I just cast it to the larger structure and I
have everything I could possibly need (and if I don’t, I built my structure
wrong). No global variables are needed, for example. Generally, one of the
worst disasters is the old habit of using global variables to hold state,
which means that all the threads have to interact in complex ways. If you
simply ignore the concept of global variables, life gets a lot simpler.

typedef struct {
OVERLAPPED ovl;
…my context data here
} MY_OVERLAPPED;

*****

This is the idea behind the completion port. (And it’s the idea behind most

of the kernel-mode constructs in NT from the beginning.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by

overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

Thanks

KeWaitForSingleBill(…)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

I pity you for being assigned to teach a C++ course; I avoid it like the
plague because in my experience the thinks those short-cuts hide are exactly
the things that cause the most problems. The last time I let one of my guys
convince me that it would be ok if he wrote it in C++ I spent more time
reviewing and fixing it than he did writing it.

Notwithstanding that rant, I wholeheartedly agree that a sound high-level
design is essential to a good implementation; my point was rather that
whereas you say sync should be avoided as a principal, I support the maximum
orthogonality of multi-threaded tasks as a precept. Perhaps the difference
is semantical, but I have found that if I start talking about too many sync
points then they just don’t use sync objects (when needed) rather than
understanding the point that the design is wrong. When put in terms of set
theory it seems easier to understand that thread A assigning a task to
thread B and then waiting for completion is a useless design for increased
parallelism in an application.

As I am sure you will recall, when the application becomes too generic
deadlock managers become essential because sometimes there is no way to
control lock acquisition order (al la SQL servers etc.) and while this
extremely high-level pattern doesn’t suffer from deadlocks per se, it can’t
be said that the performance on a specific dataset will be better than a
dedicated algorithm with either an exclusive lock, shared-reader lock, or
refcount (aka rundown) access pattern. The usual trade offs between perf
and flexibility apply - but I digress …

“Joseph M. Newcomer” wrote in message news:xxxxx@ntdev…

The kind of code that 30 years ago would have involved several mutexes and
lots of synchronization. It took me years to learn that synchronization
should be avoided by adopting higher-level patterns, and since I started
doing this, I don’t have deadlocks, performance problems, lock contention,
or any of the other problems that plague people who believe what they were
taught about synchronization.

Low-level locks on shared queues are, of course, essential. The key idea is
not that synchronization should be avoided, but that *reasoning* about
synchronization should be avoided, because this is where all the errors are
made. Any “convergence” involves asynchronous notifications; for example,
never block on a set of threads waiting for them to finish; instead, keep
track of the threads you have launched (as a “reference count”) and as each
thread terminates, it sends a notification that it has terminated, at which
point you decrement the reference count; when the reference count reaches
zero, you know all the threads have finished. These kinds of patterns tend
to also be robust-under-maintenance, always keeping in mind that maintenance
is done by unskilled programmers (the new hire, or yourself six months
later).

In OO, as in many contexts, contention on the storage allocator is a
problem. This is rarely addressed by the OO languages, which tend to not
allow multiple allocators, and which also leads to architectural problems
when multiple storage pools are used in multiple threads. I teach courses on
this problem. It is interesting that C++0x does simplify some of these
issues, and I am currently revising my course to take C++0x (VS2010) into
account.
joe

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of m
Sent: Monday, March 14, 2011 7:27 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

I’m not sure what kind of programs you have been writing these 15 years, but

while I agree that unnecessary synchronization should be avoided at all
costs, IMHO any kind of systems programming, in KM or UM, requires thread
sync. Good designs will allow threads to ‘run free’ and operate on as
orthogonal a datum as possible with as thread local and NUMA local a dataset

as possible, but eventually, whether through a routine in the application, a

system service or simple a memory allocation, a shared resource will need to

be accessed and a sync point generated. This is especially true in OO
languages where object creation and destruction are frequent events and
access to the heap is often a limiting factor on an otherwise independent
set of computations

“Joseph M. Newcomer” wrote in message news:xxxxx@ntdev…

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Jake Oshins
Sent: Saturday, March 12, 2011 6:01 PM
To: Windows System Software Devs Interest List
Subject: Re:[ntdev] Overlapped and APC

Overlapped structs pretty much require that you have a thread for every
pending I/O, or at the very least, you handle completing I/O on the same
thread that initiated it. Remember that when this stuff was designed, very
few operating systems were multi-threaded. I think that the original
designers thought that threads would end up being so fundamentally light
that creating lots and lots of them would be useful and common. Much of the

Win32 user-mode programming model has this assumption.
*****

Actually, this is not true. You do not need one thread per pending I/O, and
there is no requirement to handle completion in the initiating thread.

When this was designed, Windows was already multithreaded, and the threads
were known to not be “light”, nor was it expected that such usage would be
common. I have never seen any assumption that this was so, or assumed so at
user level. We knew it wasn’t true.

Note that async callback I/O was not created until end users demanded it,
and to make it viable, the concept of “alertable wait state” had to be used,
because the async callbacks of VAX VMS were a total flaming disaster! (I
believe the phrase used by Dave Cutler was “Asynchronous callback I/O will
go into Windows over my dead body!” and I point out that (a) Windows has
async callback I/O and (b) Dave Cutler is alive. What happened to satisfy
him that async callback I/O was viable? Alertable wait state! (I tried to
use async callback I/O under VMS, and there was no possible way to make it
work right, because it Just Happened, which could result in recursive calls
to the allocator when the allocator had been preempted by a previous async
call).

It is quite common to have a small number of threads (perhaps one)
initiating I/O, and a small number of threads (usually equal to the number
of cores) to handle completion. Using callback I/O is one of the single
worst I/O models that exists, because it is (a) hard to use (b) requires
handling the callback in the initiating thread, thus reducing throughput (c)
requires polling for completion (otherwise known as “entering alertable wait
state”, which must be done frequently enough to maximize throughput but not
so often as to overload the scheduler with unnecessary AWS activations). It
should be avoided. The best model is the I/O Completion Port model, which
allows multiple threads to respond to I/O completion.

******

The last twenty years have taught us that you really don’t want hundreds of
threads, each dedicated to some task. The interaction between them is
overwhelmingly complex and error prone, not to mention inefficient. What we

now know is that you really don’t want very many more threads than there are

processors (virtual, physical or logical) to run your code. It makes more
sense to treat each completing I/O as something to be handled by a pool of
threads.

*****

Threads have complex interactions only if programs are badly written.
Synchronization, which all introductory courses teach, should be avoided
like the plague; synchronization represents where threads “rub together”,
and like any physical system, all this does is cause friction which
generates heat and wastes energy. Only in the kernel is synchronization
critical, because we work so close to the hardware, and often outside the
fundamental OS concepts like “scheduler”. In application space, I consider
that the instant you add a mutex or CRTICAL_SECTION to code, your design has
failed. There are better models, such as the agent pattern, that handle
multithreading much more cleanly than explicit programmer synchronization.
The problem with the way we teach it is that we spend so much time teaching
it, with lots of exercises, and lots of emphasis on its importance, that the
students end up thinking it is the *only* way to handle multithreading, and
it is not only *not* the only way, it is usually the *worst possible* way.
I typically use an IOCP without handle binding to do interthread message
queueing, or just use the standard PostMessage queue, and thus never have
issues about synchronization (synchronization *is* needed, of course, but I,
as a programmer, should never have to THINK about it as an issue). I have
not used a mutex or CRITICAL_SECTION in multithreaded applications in at
least 15 years, once I stopped using MS-DOS (otherwise known as Windows 9x,
which didn’t have IOCPs).

One of the problems about “interthread communication” is “how do I
communicate what should happen when this I/O operation completes”. The
simplest way is to embed the OVERLAPPED structure in a structure of my own
that carries all the necessary context, such as pointers to the buffers,
information about what to do, etc. When I get the pointer to the OVERLAPPED
structure in my IOCP handler, I just cast it to the larger structure and I
have everything I could possibly need (and if I don’t, I built my structure
wrong). No global variables are needed, for example. Generally, one of the
worst disasters is the old habit of using global variables to hold state,
which means that all the threads have to interact in complex ways. If you
simply ignore the concept of global variables, life gets a lot simpler.

typedef struct {
OVERLAPPED ovl;
…my context data here
} MY_OVERLAPPED;

*****

This is the idea behind the completion port. (And it’s the idea behind most

of the kernel-mode constructs in NT from the beginning.)

Jake Oshins
Hyper-V I/O Architect
Windows Kernel Group

This post implies no warranties and confers no rights.


wrote in message news:xxxxx@ntdev…

I know that completion ports fixed all the problems that are related with
APC.

But for apcs( I mean completion routines), What problems, that are caused by

overlapped io, were fixed by completion port invention.

In overlapped io main thread calls Getoverlappedresult, Getoverlappedresult
calls waitforxxx and starts to wait.

In Completion ports also main thread calls, (in fact must call),
waitforxxxEx and starts to wait.

Completion routine has a problem about load balancing. So they developed
completion ports. What was the problem with overlapped io, so completion
routine was invented?

Thanks

KeWaitForSingleBill(…)


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer


This message has been scanned for viruses and
dangerous content by MailScanner, and is
believed to be clean.

This is rude.
Lock the thread please.

xxxxx@gmail.com wrote:

This is rude.

Err … what is rude, exactly?

Lock the thread please.

Why?

>> Lock the thread please.

Why?

Actually, it seems to be going in “exciting” direction - it is not that far away from turning into a classical
“C vs C++” discusion. Taking into consideration that this thread was meant to be off-topic for NTDEV since the original question that firmly belongs in the UM MSFT-sponsored groups like “kernel”, “Visual C++” et al, the OP, apparently, feels responsible for this, so he asks list slaves to lock the thread before it spiraled out of control. In any case, I think that at this stage it is, indeed, better to continue this discussion on NTTALK…

Anton Bassov

I apologise if I have offended you somehow, but I believe that all of this
discussion is related to your original post. In any case, I’ll consider the
thread closed and hope that you have found adequate answers to your
questions.

wrote in message news:xxxxx@ntdev…

This is rude.
Lock the thread please.