TDI Driver Reconnect Issue

Hi,

I have a TDI driver which basically communicates over TCP to server.I have registered a callback “ClientEventDisconnect” which gives me a callback when a disconnect is encountered. Now my driver start up at BOOT time and during boot this callback is evoked. However when i examined the server side, it clearly shows that the server has not issued a disconnect on the connection. Which means that something on the local stack has issued a disconnect to my driver.

After encountering this disconnect , i proceed to do a local TCP disconnect using the TDI_DISCONNECT request. This request fails with status STATUS_CONNECTION_INVALID. Then i try to connect to the same server using the same IP/PORT settings. However every connection attempt i try fails with status STATUS_IO_TIMEOUT.

Now i analyzed the network traffic and found that the previous TCP connection between the two machines is still active. Not only that for every connection attempt i try to make, the local TCP stack sends SYN to the server. Now the server replies with ACK instead of a SYN-ACK. My driver attempts to connect in a loop as i need to resume as soon as possible. The above cycles continue infinitely with no successful connection made. The TCP connection never resets.

This reconnection code works well for following scenarios

  1. Server process is killed and restarted
  2. Remove network cables i.e disrupting connection lines. etc

Now i dont know wht else can be done so that the local TCP resets the connection and starts afresh. Can anybody help me out ??

What was specified in the DisconnectFlags when the stack signals
ClientEventDisconnect?

Is the ConnectionContext correct as well (meaning, is it the value you
provided when you created the Connection Endpoint)?

What is the stack back-trace if you set a break-point on your
ClientEventDisconnect? Does the stack (with properly aligned symbols, of
course) show any hint as to why the indication is being delivered to your
client? Basically, is there an NDIS Packet receive occurring, a PnP event,
something else?

What happens if you ignore the disconnect? Can you still use the connection
endpoint? Do you still get receive events?

Is it possible that you formed the TDI_DISCONNECT Irp improperly in response
to the event?

Can you ‘close’ the Connection Endpoint?

Good Luck,
Dave Cattley
Consulting Engineer
Systems Software Development

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Friday, March 13, 2009 7:03 AM
To: Windows System Software Devs Interest List
Subject: [ntdev] TDI Driver Reconnect Issue

Hi,

I have a TDI driver which basically communicates over TCP to server.I have
registered a callback “ClientEventDisconnect” which gives me a callback when
a disconnect is encountered. Now my driver start up at BOOT time and during
boot this callback is evoked. However when i examined the server side, it
clearly shows that the server has not issued a disconnect on the connection.
Which means that something on the local stack has issued a disconnect to my
driver.

After encountering this disconnect , i proceed to do a local TCP disconnect
using the TDI_DISCONNECT request. This request fails with status
STATUS_CONNECTION_INVALID. Then i try to connect to the same server using
the same IP/PORT settings. However every connection attempt i try fails with
status STATUS_IO_TIMEOUT.

Now i analyzed the network traffic and found that the previous TCP
connection between the two machines is still active. Not only that for every
connection attempt i try to make, the local TCP stack sends SYN to the
server. Now the server replies with ACK instead of a SYN-ACK. My driver
attempts to connect in a loop as i need to resume as soon as possible. The
above cycles continue infinitely with no successful connection made. The TCP
connection never resets.

This reconnection code works well for following scenarios

  1. Server process is killed and restarted
  2. Remove network cables i.e disrupting connection lines. etc

Now i dont know wht else can be done so that the local TCP resets the
connection and starts afresh. Can anybody help me out ??


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

I could give a hint why it happened. I got similar issue but in a Protocol driver. After receiving and sending some number of packets the Protocol driver gets unexpected Unbind adapter call from NDIS.
After analyzing a speed of connection I found that despite on fact that both a client a server have 1Gb card the speed was 100Kb. It happened because one of PC was connected to 100Kb switch.
Something in Windows network stuff tries to reset a NIC because it expects 1Gb speed but gets in real 100Kb. You should check you connection speed.

Igor Sharovar

First of all I really apologize for not posting quick replies on my part.

Thanks David and Igor for replying back.

David I did few investigations on the questions u asked and have come up with following.

What was specified in the DisconnectFlags when the stack signals ClientEventDisconnect?
The disconnect flags contain the Value 2, which i suppose is the value of “TDI_DISCONNECT_ABORT
|TDI_DISCONNECT_RELEASE” . I am not quite sure what ORing of these values mean actually.

Is the ConnectionContext correct as well (meaning, is it the value you provided when you created the Connection Endpoint)?
Yes, the connection endpoint is correct. I have the values which i provided while creation.

What is the stack back-trace if you set a break-point on your ClientEventDisconnect?
The following stack backtrace is seen when i get a disconnect
kd> kb
ChildEBP RetAddr Args to Child
80550214 f96afdbd 822e2970 00000000 00000000 Proto!TDI_DisconnectEvent [l:\vdi\client\windows\proto.cpp @ 1824]
8055024c f96b5e85 822e2970 00000000 c00000b5 tcpip!NotifyOfDisc+0x17a
805502d0 f96a83ec f96f08e0 00000000 805503fc tcpip!TCBTimeout+0x78b
805502e0 80501543 f96f08f0 f96f08e0 8fb00fae tcpip!TCBTimeoutdpc+0xf
805503fc 8050165f 8055b0a0 ffdff9c0 ffdff000 nt!KiTimerListExpire+0x14b
80550428 80544e5f 8055b4a0 00000000 00000bf8 nt!KiTimerExpiration+0xb1
80550450 80544d44 00000000 0000000e 00000000 nt!KiRetireDpcList+0x61
80550454 00000000 0000000e 00000000 00000000 nt!KiIdleLoop+0x28

Does the stack (with properly aligned symbols, of course) show any hint as to why the indication is being delivered to your client? Basically, is there an NDIS Packet receive occurring, a PnP event, something else?
The only thing i can see most common is a microsoft driver “ipnat.sys” . It is during the loading of this driver that the disconnect occurs. If i disable this driver from loading the issue doesnt occur. Still i am not quite sure as to why this driver is issuing a disconnect on the stack.

What happens if you ignore the disconnect? Can you still use the connection endpoint? Do you still get receive events?
If i ignore the disconnect I am not able to transmit neither do i get any receive events. The connection endpoint becomes invalid.

Is it possible that you formed the TDI_DISCONNECT Irp improperly in response to the event?
No , i dont think so. The same code works fine if i issue a disconnect from my end. I have tested the code with variety of scenarios like killing the server process and restarting it, disabling networks etc The code works fine. It just this one scenario when this code fails.

Can you ‘close’ the Connection Endpoint?
No i am not able to close the connection endpoint. The moment i try to close it I get and error
“STATUS_CONNECTION_INVALID”.

Now what i was able to figure out from all this was that the driver “Ipnat.sys” was making some calls which leads to this disconnect. This happens when the driver is getting loaded , approx at winlogon. From the stack it figures that there is a TCB timeout occuring . But there is no reason why this should happen specially when the connection is intact ( this also happes when machines are connected peer to peer). Now the driver is a “IP Network address translator” and my best guess would be that the disconnect is due to some security issue which may arise coz the machine has established TCP connections before we have even logged in. Still I am not quite sure. Any ideas ???

> Now what i was able to figure out from all this was that the driver “Ipnat.sys” was making some calls

which leads to this disconnect. This happens when the driver is getting loaded , approx at winlogon.

ipnat.sys is the Windows Firewall core (not only the NAT). So, probably it breaks all connections on load.


Maxim S. Shatskih
Windows DDK MVP
xxxxx@storagecraft.com
http://www.storagecraft.com

I tried disabling the windows firewall. Still i get the disconnect call. Its only when i uninstall the device that the problem goes away. I tried adding local port exceptions for this , still no luck. Is there any work around for this ?

I think you need to figure out what ‘this’ is first. Are you 100% certain
that the connections are being proactively reset by some action of a system
component (like the firewall driver) or is that still conjecture?

Good Luck,
Dave Cattley

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@yahoo.com
Sent: Monday, March 30, 2009 3:38 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] TDI Driver Reconnect Issue

I tried disabling the windows firewall. Still i get the disconnect call. Its
only when i uninstall the device that the problem goes away. I tried adding
local port exceptions for this , still no luck. Is there any work around for
this ?


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Hi,

Yes I am 100% certain that the driver “ipnat” is responsible for connection being reset. The moment i remove this driver, I never encounter this disconnect issue. But if this driver is the core of the windows firewall as Maxim suggested, then i cant disable/uninstall it considering its usage. I tried adding port exceptions to the Windows firewall but no luck. :frowning: