I have migrated to winsock from TDI, but when i compares the IO performance, i am finding that, with winsock api the IO time taken is much more than with TDI.
I am using the below apis for the winsock implementation
WskRegister
WskCaptureProviderNPI
WskSocketConnect
WskSend
WskReceive
WskDisconnect
WskCloseSocket
WskReleaseProviderNPI
WskDeregister
Could anyone please let me know what is wrong here.
Any help is much more appreciated.
If you register a receive event handler, you are called back while in the DPC level packet indication from the nic miniport with a list of buffer fragments for the connection. I believe this is before the copy, so if you can process the fragmented data, you can avoid the copy to a virtually contiguous buffer. This is about as fast a path as possible, unless you move to a queue polling architecture (rdma qp or PacketDirect). If you queue a buffer, the receive irp completion happens after the copy.
If your sends of small data seem slow, are you setting the push flag, and is nagle possibly enabled?
Also note NICs will generally do interrupt moderation, which decreases interrupt overhead but also increases receive latency. Are you transferring large blocks or tiny ones?
Jan
From: xxxxx@lists.osr.com on behalf of xxxxx@gmail.com Sent: Thursday, September 29, 2016 10:46:21 AM To: Windows System Software Devs Interest List Subject: [ntdev] Performance of winsock implementation
Hi Experts,
I have migrated to winsock from TDI, but when i compares the IO performance, i am finding that, with winsock api the IO time taken is much more than with TDI. I am using the below apis for the winsock implementation 1. WskRegister 2. WskCaptureProviderNPI 3. WskSocketConnect 4. WskSend 5. WskReceive 6. WskDisconnect 7. WskCloseSocket 8. WskReleaseProviderNPI 9. WskDeregister
Could anyone please let me know what is wrong here. Any help is much more appreciated.