Painfully long DPC latencies - who's to blame?

Hello everyone,

I’ll start off with a quick rundown of the situation: We have a USB device from which we are reading data over a bulk endpoint. The data being read is generated by a timed process on the device itself. If this endpoint is not serviced with IN tokens for an extended period of time, the data accumulates in and eventually overflows the onboard buffer on the device. This is an error and is reported immediately to the user. (In other words, we cannot tolerate any missing data.)

Allow me first to pre-empt what some of you may be thinking: Yes, we are aware that host controllers make no guarantee about servicing bulk endpoints. If we want deterministic service, we should use interrupt or isochronous endpoints, etc. etc. etc. For various reasons, we decided to go down the bulk road many years ago, and it has served (and continues to serve) us well. It’s only on a recent crop of machines that we are beginning to notice some issues.

On certain machines, mostly those running Vista, USB analysis shows long periods of time (upwards of several milliseconds) during which the host controller is not sending out IN tokens. This all happens despite the fact that we always have several URBs queued to the USB driver stack, each of which holds 32 KB of data from the device.

From the software side of things, our URB completion DPC copies the data out of the buffer associated with the URB to the buffer the user ultimately reads from, does some accounting, then resubmits the IRP to be reused. We instrumented our DPC, which normally run once every (32 KB / device_data_rate_in_bytes_per_second) seconds. In the failure case, that is, during the long periods of no IN tokens issuing from the host controller, we notice our DPCs being delayed by up to 3 or 4 milliseconds from when they normally run (for the most part, they complete in a timely fashion).

A colleague pointed me to the DPC Latency Checker (http://www.thesycon.de/deu/latency_check.shtml), which sure enough shows a gigantic spike in system DPC latency every 7 or 8 seconds. By ‘gigantic’, I mean over two orders of magnitude. Baseline DPC latency averages about 40 us with periodic spikes as high as 9 ms. Not surprisingly, our software failures correlate very strongly with the spikes.

I thought I might find the offending actor in the system by using tracemon/tracerpt, but the system doesn’t appear to be spending inordinately long amounts of times in ISRs or DPCs according to the histograms.

So if my DPCs are not being delayed by ISRs or other DPCs taking a long time, what would be causing them to run so much later than they should? Could it be a thread of execution in the kernel raising its IRQL and spinning on something, or performing some otherwise very long operation at that elevated IRQL? I have no idea how I might narrow now the offending driver on this system. I’ve already tried disabling as many drivers as I can, but I can’t get this spike to stop occurring, and I’m just about at the end of my toolbox trying to determine what is ultimately causing the periodic long DPC latencies. (By the way, there are a number of online reports of DPC latency spikes in recent machines. There seems to be no consensus as to what is causing these spikes.)

Any advice or guidance would be appreciated.

Thanks,
Curtis Petersen


Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

> So if my DPCs are not being delayed by ISRs or other DPCs taking a long time,

what would be causing them to run so much later than they should?

Plenty of things, both normal and abnormal ones. As an example of the former, consider heavy network traffic. As an example of the latter, consider some misbehaving driver like StarForce. Although, according to MSFT requirements, your driver cannot spend more than 100 microseconds in DPC routine if you want it to be certified, StartForce found a workaround - it makes a DPC re-queue itself, so that you can wait for up to (3 seconds!!!) before your machine starts responding again. They do it on purpose - believe it or not, StarForce is MSFT-certified,
because they don’t spend more than 100 microseconds in DPC routine itself, and, hence, technically don’t violate the above mentioned requirement. The only reason why I gave you this example is to show you that there are some tricks that can fool testing/monitoring tools…

Anton Bassov

I can’t answer DPC part but can comment USB part. Is it high or full speed device? We don’t have any problem with full speed device serviced by user mode thread at speed near to maximal bus bandwidth. The key is to have enough URBs or reasonble size queued. Several may not be enough. Currently I use 32 of 4 kB size and it seems to work well at both XP and Vista. For USB 2.0 32 kB sized request could be sufficient but it might be necessary to use tens to avoid any delays. 32 kB is enough for 1 ms, 32 requests is enough for 2 timer ticks. Note too big request is also bad. There are delays on request boundary probably caused by data processing when the previous request is completed. Using 256 kB requests cause significant delays even for full speed device.

First, I’d try to figure out if the problem is caused by delayed DPC. Solution is above, queue more requests. It is not RT OS, anyway, and you can’t count with DPC latency in tens usec. If it doesn’t help, problem which causes DPC delays may also block HC processing (interrupt of higher priority?).

BTW, your design with DPC running “once every (32 KB / device_data_rate_in_bytes_per_second) seconds” seems suspicious. Why not have enough URBs queued and process data in the completion routine and then resubmit URB?

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]


From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Curtis Petersen[SMTP:xxxxx@yahoo.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, January 23, 2008 12:47 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] Painfully long DPC latencies - who’s to blame?

Hello everyone,

I’ll start off with a quick rundown of the situation: We have a USB device from which we are reading data over a bulk endpoint. The data being read is generated by a timed process on the device itself. If this endpoint is not serviced with IN tokens for an extended period of time, the data accumulates in and eventually overflows the onboard buffer on the device. This is an error and is reported immediately to the user. (In other words, we cannot tolerate any missing data.)

Allow me first to pre-empt what some of you may be thinking: Yes, we are aware that host controllers make no guarantee about servicing bulk endpoints. If we want deterministic service, we should use interrupt or isochronous endpoints, etc. etc. etc. For various reasons, we decided to go down the bulk road many years ago, and it has served (and continues to serve) us well. It’s only on a recent crop of machines that we are beginning to notice some issues.

On certain machines, mostly those running Vista, USB analysis shows long periods of time (upwards of several milliseconds) during which the host controller is not sending out IN tokens. This all happens despite the fact that we always have several URBs queued to the USB driver stack, each of which holds 32 KB of data from the device.

From the software side of things, our URB completion DPC copies the data out of the buffer associated with the URB to the buffer the user ultimately reads from, does some accounting, then resubmits the IRP to be reused. We instrumented our DPC, which normally run once every (32 KB / device_data_rate_in_bytes_per_second) seconds. In the failure case, that is, during the long periods of no IN tokens issuing from the host controller, we notice our DPCs being delayed by up to 3 or 4 milliseconds from when they normally run (for the most part, they complete in a timely fashion).

A colleague pointed me to the DPC Latency Checker (http://www.thesycon.de/deu/latency_check.shtml), which sure enough shows a gigantic spike in system DPC latency every 7 or 8 seconds. By ‘gigantic’, I mean over two orders of magnitude. Baseline DPC latency averages about 40 us with periodic spikes as high as 9 ms. Not surprisingly, our software failures correlate very strongly with the spikes.>

I thought I might find the offending actor in the system by using tracemon/tracerpt, but the system doesn’t appear to be spending inordinately long amounts of times in ISRs or DPCs according to the histograms.

So if my DPCs are not being delayed by ISRs or other DPCs taking a long time, what would be causing them to run so much later than they should? Could it be a thread of execution in the kernel raising its IRQL and spinning on something, or performing some otherwise very long operation at that elevated IRQL? I have no idea how I might narrow now the offending driver on this system. I’ve already tried disabling as many drivers as I can, but I can’t get this spike to stop occurring, and I’m just about at the end of my toolbox trying to determine what is ultimately causing the periodic long DPC latencies. (By the way, there are a number of online reports of DPC latency spikes in recent machines. There seems to be no consensus as to what is causing these spikes.)

Any advice or guidance would be appreciated.

Thanks,
Curtis Petersen


Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Curtis Petersen wrote:

On certain machines, mostly those running Vista, USB analysis shows long periods of time (upwards of several milliseconds) during which the host controller is not sending out IN tokens. This all happens despite the fact that we always have several URBs queued to the USB driver stack, each of which holds 32 KB of data from the device.

There are some rather serious known issues in the USB host controller
driver in Vista, and recent posts suggest that almost all have been
eliminated in Vista SP1. Do you have an opportunity to test the SP1 beta?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

(By the way, there are a number of online reports of DPC latency spikes in
recent machines. There seems to be no consensus as to what is causing these
spikes.)

Are DPCs migrating from one processor to another on a multiprocessor
machine?

Loren

> Are DPCs migrating from one processor to another on a multiprocessor machine?

Not until Vista…

Under the earlier OSes DPC routine had to be executed on the same CPU where it got queued. However, Vista allows DPCs to target some particular processor (oops, it looks like a probable explanation to the whole thing - according to the OP, this problem arises only under Vista)…

Anton Bassov

KeSetTargetProcessorDpc has existed for very long time (probably from NT 3.1
days) and had been documented long time ago.

Here is a good description of DPC internals:
http://technet.microsoft.com/en-us/sysinternals/bb963898.aspx

Dmitriy Budko
VMware

Under the earlier OSes DPC routine had to be executed on the same CPU
where it got queued. However, Vista allows DPCs to target some particular
processor (oops, it looks like a probable explanation to the whole thing -
according to the OP, this problem arises only under Vista)…

Anton Bassov

> KeSetTargetProcessorDpc has existed for very long time

(probably from NT 3.1 days) and had been documented long time ago.

I just confused targeted DPC with threaded one - I remember that Vista introduced some new features to DPC handling, but somehow forgot which particular feature turned up under Vista and which one had existed long before…

Anton Bassov

----- Original Message ----

From: Michal Vodicka
> To: Windows System Software Devs Interest List
> Sent: Tuesday, January 22, 2008 6:25:11 PM
> Subject: RE: [ntdev] Painfully long DPC latencies - who’s to blame?

> BTW, your design with DPC running “once every (32 KB
> /
>
device_data_rate_in_bytes_per_second) seconds” seems suspicious. Why not have
> enough
>
URBs queued and process data in the completion routine and then
> resubmit
>
URB?

That is exactly what I do. Currently, the design queues 4 URBs of 32 KB each. The completion routine for each URB copies the data then resubmits. I will definitely try tweaking those values to see if I can survive the spikes.

By the way, I’ve played with those two values in the past on XP to see if there was a correlation between URB size and performance. I noticed on the analyzer that smaller URBs sometimes result in an increased number of IN tokens/microframe, which improves performance significantly. This leads me to say that as I continue investigating this issue and trying different things, I find myself frustrated at just how much of a ‘black box’ the USB driver stack is. I have found that there exists precious little information on how usbehci.sys/usbhub.sys/usbport.sys and friends work internally. I can read the EHCI specification and get a decent idea of what the structures in the host controller look like and how they will interact, but since it’s the driver setting up the qHs/qTDs etc., there’s a bit of a disconnect in my knowledge. Perhaps Microsoft could release some more information on this part of the driver stack, especially as it pertains to how to ‘get
the most out of it’–sort of like a USB best practices white paper. I can see something like that being an extremely beneficial resource for the driver developer community.

Curtis Petersen

____________________________________________________________________________________
Never miss a thing. Make Yahoo your home page.
http://www.yahoo.com/r/hs

----- Original Message ----

From: Loren Wilton
> To: Windows System Software Devs Interest List
> Sent: Tuesday, January 22, 2008 9:24:31 PM
> Subject: Re: [ntdev] Painfully long DPC latencies - who’s to blame?

> Are DPCs migrating from one processor to another on a multiprocessor
> machine?

I’m afraid I’m not quite sure what you mean by that (or rather, whose DPCs you’re referring to). As I understand it, non-targetted DPCs are scheduled to run on the same processor that IoRequestDpc()/KeInsertQueueDpc() is run on. I would guess that the DPCs scheduled by the USB driver stack below me are not targetted to a specific processor, although I can’t know that given that the DPC object is created and scheduled by the USB driver stack. Would the migrating behavior be detrimental to my performance or to the system in general?

Curtis Petersen

____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile. Try it now. http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of Curtis Petersen[SMTP:xxxxx@yahoo.com]
Reply To: Windows System Software Devs Interest List
Sent: Wednesday, January 23, 2008 4:25 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Painfully long DPC latencies - who’s to blame?

That is exactly what I do. Currently, the design queues 4 URBs of 32 KB each. The completion routine for each URB copies the data then resubmits. I will definitely try tweaking those values to see if I can survive the spikes.

In my experience 4 URBs aren’t enough even for full speed device. It is quite sufficient when OS runs just your app but under heavy load and other hw access it isn’t. I experimented with it and found 8 URBs work in most situations, 16 almost always so I use 32 to be safe :slight_smile:

By the way, I’ve played with those two values in the past on XP to see if there was a correlation between URB size and performance. I noticed on the analyzer that smaller URBs sometimes result in an increased number of IN tokens/microframe, which improves performance significantly. This leads me to say that as I continue investigating this issue and trying different things, I find myself frustrated at just how much of a ‘black box’ the USB driver stack is.

It isn’t just USB. It is influenced by whole OS behaviour. NT is not and never was RTOS so you can’t count with anything. The best you can do is to be prepared for long delays caused by something else you can’t influence.

I have found that there exists precious little information on how usbehci.sys/usbhub.sys/usbport.sys and friends work internally. I can read the EHCI specification and get a decent idea of what the structures in the host controller look like and how they will interact, but since it’s the driver setting up the qHs/qTDs etc., there’s a bit of a disconnect in my knowledge. Perhaps Microsoft could release some more information on this part of the driver stack, especially as it pertains to how to ‘get
the most out of it’–sort of like a USB best practices white paper. I can see something like that being an extremely beneficial resource for the driver developer community.

More information is better, of course. But one shouldn’t depend on implementation details or current behaviour. It can change between OS version, SPs or even with the last hotfix. And it does…

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]

Hi All,

I’m writing a 10Gb Ethernet driver and suffering from the same problem
that Anton has described, but I’m looking at it from the opposite
direction.

In other words, The HW (and the sender) is able to deliver 10Gb per
second which means that the receiver side should process around 850,000
packets in a second.
No meter how efficient I try to write my code, it takes a complete CPU
to process it, and this can go on for ever.

So I have a few alternatives that I have tried:

  1. Go by the book, and ask for another interrupt every 100us. All works
    fine but performance is bad.
  2. Be in a DPC for a 16 mili-second, BW is (sometimes) better, but
    computer becomes very unresponsive. By the way, since the application
    receiving the data might also be swapped out by the DPC, the BW might
    also fail.
  3. Issue an interrupt and queue a DPC. If the DPC runs more than a 100us
    use a kernel thread to continue the processing.

So, do I have other alternatives here, or is using a kernel thread the
only solution.

Thanks
Tzachi

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of
xxxxx@hotmail.com
Sent: Wednesday, January 23, 2008 2:10 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Painfully long DPC latencies - who’s to blame?

> So if my DPCs are not being delayed by ISRs or other DPCs taking a
> long time, what would be causing them to run so much later
than they should?

Plenty of things, both normal and abnormal ones. As an
example of the former, consider heavy network traffic. As an
example of the latter, consider some misbehaving driver like
StarForce. Although, according to MSFT requirements, your
driver cannot spend more than 100 microseconds in DPC routine
if you want it to be certified, StartForce found a workaround

  • it makes a DPC re-queue itself, so that you can wait for up
    to (3 seconds!!!) before your machine starts responding
    again. They do it on purpose - believe it or not, StarForce
    is MSFT-certified, because they don’t spend more than 100
    microseconds in DPC routine itself, and, hence, technically
    don’t violate the above mentioned requirement. The only
    reason why I gave you this example is to show you that there
    are some tricks that can fool testing/monitoring tools…

Anton Bassov


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online
at http://www.osronline.com/page.cfm?name=ListServer

> So, do I have other alternatives here, or is using a kernel thread the only solution.

What about trying a low-priority DPC? IIRC, its processing is going to be postponed until either CPU has no threads to dispatch so that dispatches an idle thread, or queue grows above some certain limit…

Anton Bassov

My advice would be not to mess with DPC priorities or target processor in an NDIS driver. NDIS already does numerous things (like using low priority DPCs) to try to “coalesce” multiple interrupt notifications into single DpcForIsr callbacks.

850K packets/second is a lot of freakin data. Even if your device is well designed, you’ve limited the need to recopy any packet data in your driver, and you’ve written your code efficiently, there’s STILL gonna be a lot of work for you to do. I mean… just scanning the ring buffer and calling NdisMCompleteRequest (or whatever it is) for 850K packets is going to take some time.

I don’t know what to begin to suggest here…

Peter
OSR
(one of my more helpful posts, I know…)

I would suggest that if the OP has actually managed to accomplish this feat,
but is concerned that “it takes a complete CPU to process it, and this can
go on for ever”, he should declare victory, document that this throughput
consumes an entire processor for as long as needed, and check this stuff
into the build.

On Jan 23, 2008 10:00 PM, wrote:

>


>
> My advice would be not to mess with DPC priorities or target processor in
> an NDIS driver. NDIS already does numerous things (like using low priority
> DPCs) to try to “coalesce” multiple interrupt notifications into single
> DpcForIsr callbacks.
>
> 850K packets/second is a lot of freakin data. Even if your device is well
> designed, you’ve limited the need to recopy any packet data in your driver,
> and you’ve written your code efficiently, there’s STILL gonna be a lot of
> work for you to do. I mean… just scanning the ring buffer and calling
> NdisMCompleteRequest (or whatever it is) for 850K packets is going to take
> some time.
>
> I don’t know what to begin to suggest here…
>
> Peter
> OSR
> (one of my more helpful posts, I know…)
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Mark Roddy

> 850K packets/second is a lot of freakin data. Even if your device is well designed,

you’ve limited the need to recopy any packet data in your driver, and you’ve written
your code efficiently, there’s STILL gonna be a lot of work for you to do. I mean…
just scanning the ring buffer and calling NdisMCompleteRequest (or whatever it is)
for 850K packets is going to take some time.

It is not only that…

The problem is that bound protocol may send data in context of indication (and they definitely do it in case of TCP). Consider the case of a large TCP transmission. You have received data and indicated it above the stack. TCPIP will send ACK numbers to the counter-party, and it may do so right in context of PtReceivePacket(). As a result, SendComplete() DPC will get queued. When it finishes, TCPIP may send data again (probably, this time in context of totally unrelated connection), so that anothet DPC will get queued. Meanwhile the host that received ACK numbers will send the next part of a transmission, another packet will get indicated to TCP,
new send in context of PtReceivePacket() and so on and so forth. Have you got any more questions why the machine goes unresponsive - in case if network traffic is really heavy, CPU just has no time to do anything, apart from dispatching the resulting DPCs that never seem to end.

IIRC, Linux does not run more than N (I don’t remember precise number) tasklets in a row, and the rationale behind this is to avoid the scenario we are speaking about. However, Windows takes rather different approach -unless DPC is of low priority it drains the whole queue. This is why the very first idea that got into my head is to try low-priority DPCs.

Another option could be to introduce a spinlock in indication handler and make send handler check its state(without trying to acquire it, of course) - if spinlock is held, it means that send is being done in context of indication. If this is the case, you can queue a workitem, instead of proceeding with send operation straight away. If you do it this way, all sends will get done only after indication has been completed…

In any case, it is hard to give a precise recipe - I think the poster should try different options and see which of them works best…

Anton Bassov

Perhaps this means that 10GBE adapters are intended
for high end systems that have high CPU, DRAM speed,
and don’t have any “bad” devices/drivers
that may be ok in PCs or laptops.

Also, such devices are expected to use special “high end”
tricks like offloads, RSS, RDMA and so on…

–PA

----- Original Message -----
From: Tzachi Dar
Newsgroups: ntdev
To: Windows System Software Devs Interest List
Sent: Thu, Jan 24, 2008 00:15
Subject: RE: Painfully long DPC latencies - who’s to blame?

Hi All,

I’m writing a 10Gb Ethernet driver and suffering from the same problem
that Anton has described, but I’m looking at it from the opposite
direction.

In other words, The HW (and the sender) is able to deliver 10Gb per
second which means that the receiver side should process around 850,000
packets in a second.
No meter how efficient I try to write my code, it takes a complete CPU
to process it, and this can go on for ever.

So I have a few alternatives that I have tried:

  1. Go by the book, and ask for another interrupt every 100us. All works
    fine but performance is bad.
  2. Be in a DPC for a 16 mili-second, BW is (sometimes) better, but
    computer becomes very unresponsive. By the way, since the application
    receiving the data might also be swapped out by the DPC, the BW might
    also fail.
  3. Issue an interrupt and queue a DPC. If the DPC runs more than a 100us
    use a kernel thread to continue the processing.

So, do I have other alternatives here, or is using a kernel thread the
only solution.

Thanks
Tzachi

LOL… Half empty or half full?

*I* was going to suggest that if the OP has really accomplished this task, and it takes an entire CPU and the system becomes quite unresponsive, that he declare defeat, cancel his product, and forget about having to write the driver. If the I/O Subsystem can’t support 850K I/O completions per second GIVEN A PARTICULAR HARDWARE DESIGN without the system becoming unresponsive, it’s probably not a generically useful device and should be abandoned.

But, yes… I agree: Either declare victory and check it in or declare defeat and forget it.

I wonder how long it’ll take to download pr0n torrent at 10GB??

Peter
OSR

I think you added in “and the system becomes quite unresponsive”, which if
the system was supposed to do anything else besides stream 850,000
packets/sec would indeed amount to a defeat that one should walk away from.
It is always difficult to guess what these Oppies are actually up to as they
tend to post some inner detail of some great mystery they are working on and
then we are supposed to fill in the blanks and provide sensible answers.



On Jan 24, 2008 8:37 PM, wrote:

>


>
> LOL… Half empty or half full?
>
> I was going to suggest that if the OP has really accomplished this task,
> and it takes an entire CPU and the system becomes quite unresponsive, that
> he declare defeat, cancel his product, and forget about having to write the
> driver. If the I/O Subsystem can’t support 850K I/O completions per second
> GIVEN A PARTICULAR HARDWARE DESIGN without the system becoming
> unresponsive, it’s probably not a generically useful device and should be
> abandoned.
>
> But, yes… I agree: Either declare victory and check it in or declare
> defeat and forget it.
>
> I wonder how long it’ll take to download pr0n torrent at 10GB??
>
> Peter
> OSR
>
>
> —
> NTDEV is sponsored by OSR
>
> For our schedule of WDF, WDM, debugging and other seminars visit:
> http://www.osr.com/seminars
>
> To unsubscribe, visit the List Server section of OSR Online at
> http://www.osronline.com/page.cfm?name=ListServer
>


Mark Roddy

> ----------

From: xxxxx@lists.osr.com[SMTP:xxxxx@lists.osr.com] on behalf of xxxxx@osr.com[SMTP:xxxxx@osr.com]
Reply To: Windows System Software Devs Interest List
Sent: Friday, January 25, 2008 2:37 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] Painfully long DPC latencies - who’s to blame?

*I* was going to suggest that if the OP has really accomplished this task, and it takes an entire CPU and the system becomes quite unresponsive, that he declare defeat, cancel his product, and forget about having to write the driver. If the I/O Subsystem can’t support 850K I/O completions per second GIVEN A PARTICULAR HARDWARE DESIGN without the system becoming unresponsive, it’s probably not a generically useful device and should be abandoned.

Hardware is more and more powerful every day and dual core CPUs already became standard. What about quad core CPU with one or more cores dedicated just for this purpose? I’m just speculating; I’m not sure if processor affinity can be set the way which’d allow to completely separate networking from the rest of OS.

I wonder how long it’ll take to download pr0n torrent at 10GB??

It’d be probably faster than you can watch :wink:

Best regards,

Michal Vodicka
UPEK, Inc.
[xxxxx@upek.com, http://www.upek.com]