Observing low RX throughput for 10G ethernet NIC

Hi Experts,

I am developing 10G ethernet driver based on NDIS 6.4 framework.
I am observing the low Rx throughput but the Rx throughput observed is as expected for 10G device.

From my analysis i have found the driver able to receive MSI interrupt for received RX packet. MSI interrupt handler just disable’s the interrupt for Rx and then MSI DPC is called where the packet is processed by receiving the Packet from DMA channel.For each DPC call maximum no. of packets processed are 64.

As per msdn it states that DPC is called on the same processor as of MSI interrupt handler is called. Hence unless the untill DPC is not completed there won’t be any interrupt received driver. Is my understanding correct?if yes does this have impact on throughput?

Please help me how i can go ahead for further debugging on this issue?
As of my debugging HW does not have any issue since same HW used for linux is working fine and able achieve the expected TP. Mapped HW register value’s of linux and window both are same.

Any insight of how to go ahead for finding the root cause will be of great help.

Thanks and Regard,
Sachin

xxxxx@gmail.com wrote:

As per msdn it states that DPC is called on the same processor as of MSI interrupt handler is called. Hence unless the untill DPC is not completed there won’t be any interrupt received driver. Is my understanding correct?

No. The interrupt has a higher IRQL than the DPC, so if an interrupt
comes in, the DPC will be suspended while the ISR runs, unless your DPC
has grabbed the interrupt lock.

Please help me how i can go ahead for further debugging on this issue?

What issue?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

First, your conclusion that your hardware does not have any problems because it works on *nix is unsound.

Second, your DPC may run on the same CPU, but the critical factor about receiving more interrupts is when your driver re-enables them. You say that this is after a maximum of 64 packets. Most high performance NIC drivers will alternate between an interrupt driven mode and a polling mode depending on the volume of traffic and most will include multiple queues of classified traffic so that higher levels (I.e. TCP) can receive in order packets for specific streams of data, but also allow a fan out of processing on multiple CPUs. What does yours do?

Sent from Mailhttps: for Windows 10

From: xxxxx@gmail.commailto:xxxxx
Sent: May 12, 2017 5:44 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: [ntdev] Observing low RX throughput for 10G ethernet NIC

Hi Experts,

I am developing 10G ethernet driver based on NDIS 6.4 framework.
I am observing the low Rx throughput but the Rx throughput observed is as expected for 10G device.

From my analysis i have found the driver able to receive MSI interrupt for received RX packet. MSI interrupt handler just disable’s the interrupt for Rx and then MSI DPC is called where the packet is processed by receiving the Packet from DMA channel.For each DPC call maximum no. of packets processed are 64.

As per msdn it states that DPC is called on the same processor as of MSI interrupt handler is called. Hence unless the untill DPC is not completed there won’t be any interrupt received driver. Is my understanding correct?if yes does this have impact on throughput?

Please help me how i can go ahead for further debugging on this issue?
As of my debugging HW does not have any issue since same HW used for linux is working fine and able achieve the expected TP. Mapped HW register value’s of linux and window both are same.

Any insight of how to go ahead for finding the root cause will be of great help.

Thanks and Regard,
Sachin


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

Thanks for your valuable comments…

In our driver there is no polling mode for processing of the packets…
The processing of the packet is done on DPC mode once the ISR’s DPC function is called.

For polling mechanism, Do we need to poll for the RX packets on the different CPU with creating a thread for handling it.

Regards,
Sachin

This topic is too large for a discussion on a form like this, but briefly consider the following:

If the volume of packets arriving at your NIC is very low, then there will be a long time between received packets. During the interval between packets, it would be wasteful for a CPU to be continually polling your device for new data that does not exist, so the OS should go ahead and schedule other activities and when a packet has arrived and the device needs attention it should trigger an interrupt. This will allow the OS to schedule threads and efficiently use CPU resources while still giving your device the attention that it needs when there is actually something to do.

If the volume of packets arriving at your NIC is very high, then there will be no time at all between received packets. The packets will be read off of the wire and become available for consumption by the OS faster than any single CPU could handle them. In this situation, using an interrupt would be wasteful because it requires a context switch that will consume CPU resources that could otherwise be used to retrieve or process packets. In this situation a long running DPC that continually provides upper layers in the OS new packets to process would allow a machine with a sufficient number of cores to achieve a higher throughput. Your device requires continuous attention and it gets it. A refinement that further increases throughput is to pre-classify the packets on your device into the probable network streams so that each ?connection? can be handled in a separate instance of the DPC function that can run concurrently (a facet of RSS).

The key problem is knowing when to use each mode and how to limit the monopolization of system resources when in polling mode. This will be device and system specific, but well designed algorithms can handle this.

This is not to mention the myriad of hardware offload and other optimizations that are supported in the modern stack. These add significant complications to the tradeoff I mention here. As I said, this is far too large a problem to have an effective answer from a form like this.

Sent from Mailhttps: for Windows 10

From: xxxxx@gmail.commailto:xxxxx
Sent: May 15, 2017 1:02 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Observing low RX throughput for 10G ethernet NIC

Thanks for your valuable comments…

In our driver there is no polling mode for processing of the packets…
The processing of the packet is done on DPC mode once the ISR’s DPC function is called.

For polling mechanism, Do we need to poll for the RX packets on the different CPU with creating a thread for handling it.

Regards,
Sachin


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>