Handling high speed data received in PC's RAM

Hello,

Can you please explain how Ethernet card handles UDP incoming messages ?

If a message is sent to the card and the user does not call to ‘receive’,
where the data waits ?

Is there a buffer handled by the kernel that holds the data till user calls
receive ?

If the source keeps sending and the PC does not call ‘receive’ what happens
when there is no place in the buffer ?

Why I’m asking ?

We start developing an FPGA which gets data from A2D and has to write it to
PC’s RAM.
The input speed is ~10Gb/sec.

I prefer data will be copied to a pre allocated RAM in PC and not copied
twice: to a buffer handled by kernel and from there to virtual application
space.

Best regards,
Z.V

You know, Mr. Vered, it’s hard teaching you how to write drivers one email at a time over the course of these many months. Is this a university project you’re working on? I mention this because getting piecemeal advice is rarely conducive to a good project outcome. What makes sense in isolation, when a specific question is answered, doesn’t necessarily make sense when the architecture is viewed as a whole.

Sigh.

The answer to all these questions is, mostly, it depends.

Network cards typically provide some amount of on-device buffering. Sometimes this buffer is ALL that’s required. For other devices, the driver provides buffers from non-paged host memory to the card (sometimes a selection of buffer sizes is provided… such as small buffers and large buffers). As the card fills these buffers (host-support or device-resident) it indicates that these buffers are ready to the driver.

There are no in-built features of Windows that handle this aspect of the receive automatically. It’s up to each driver, and it’s what makes writing NIC drivers particularly interesting and (for some of us) enjoyable.

The driver then signals them to NDIS. Clearly, the details of this aspect aren’t of interest to you.

If the NIC runs out of buffer space, it does exactly what you’d expect… it signals a “buffer overrun”… it starts tossing data blocks. The driver can respond in ways more or less clever to this condition, as you can imagine.

Zero-copy networking is, to some, the “holy grail” of a network stack. I remember a certain executive vice president of a certain large software vendor who (who has recently left the company) who was obsessed with zero-copy networking. Drove me absolutely crazy. But he eventually got it.

Ignoring networks, which have considerable infrastructure that is specific to messaging… getting data back to an application is all about implementing the right upper edge interface. We’ve done this for high-speed throughput devices many, many, times. The only way you can avoid the copy is, of course, by having the application and the driver share some memory. I suggest having the app provide at startup a series of LARGE buffers (1MB or more each) to the driver. The app then indicates which parts of these buffers (initially all of them) are available to the driver by sending a series of IOCTLs, which the driver keeps pending. The driver than fills these buffers, and based on fill-level and/or time (whichever occurs first) signals signal data back to the application by completing one or more of the IOCTLs. When the app finishes processing the returned data, it sends the IOCTL back (again) to the driver to indicate that the buffer range is once again available for the driver’s use.

That’s one way. It’s the cleanest way I know of, and it works quite well for high volumes of data.

Peter
OSR
@OSRDrivers

It sounds like the fairly new user mode API called registered I/O might be a fit for you. See https://technet.microsoft.com/en-us/library/Hh997032.aspx

RIO is an attempt to apply some of the strategies learned from RDMA hardware to the normal TCP/IP stack. You won’t be able to achieve zero memory copies, but should achieve a consistent single copy from the receive buffers DMAed by the hardware, to the buffers queued by the application. RIO looks like a good fit for message based protocols, like UDP, and a less good fit for streaming protocols like TCP. RIO I believe can also process traffic without requiring a user/kernel transition per message, as the prequeued buffers can be filled from DPC level (possibly on a different core, although since a copy is involved you might want the DPC to run on the same core as the user mode code, to optimize cache benefits), and a user mode app can poll the completion queue. A pool of buffers is registered with the kernel, so you avoid the overhead of continually mapping/unmapping them when ownership passes back and forth between user and kernel mode.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of Zvi Vered
Sent: Friday, July 31, 2015 11:43 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Handling high speed data received in PC’s RAM

Hello,

Can you please explain how Ethernet card handles UDP incoming messages ?

If a message is sent to the card and the user does not call to ‘receive’, where the data waits ?

Is there a buffer handled by the kernel that holds the data till user calls receive ?

If the source keeps sending and the PC does not call ‘receive’ what happens when there is no place in the buffer ?

Why I’m asking ?

We start developing an FPGA which gets data from A2D and has to write it to PC’s RAM.
The input speed is ~10Gb/sec.

I prefer data will be copied to a pre allocated RAM in PC and not copied
twice: to a buffer handled by kernel and from there to virtual application space.

Best regards,
Z.V


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Guys…I don’t think he’s a actually interested in networking. He used UDP as an example. He has an A/D converter that’s returning data to user mode.

Peter
OSR
@OSRDrivers

> Is there a buffer handled by the kernel that holds the data till user calls

receive ?

Yes, at the socket layer above TCP/IP (in AFD.SYS).

TDI and WinSock Kernel clients must repro this logic of AFD.SYS in themselves.

If the source keeps sending and the PC does not call ‘receive’ what happens
when there is no place in the buffer ?

The packets are dropped. UDP allows for this.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com