Memory leak in NDIS5 <-> NDIS6 wrapper layer?

Hi folks,

I’ve been attempting to debug a memory leak for a little while, and come
to an unsavoury conclusion.

I’m testing an NDIS5 driver on 2K8. On the Rx side of things:

  • I allocate packets with NdisAllocatePacket
  • Chain an MDL into the packet using NdisChainBufferAtFront.
  • Indicate the packet up the stack.

The one thing to note is that the MDL is not allocated using
NdisAllocateBuffer but is an MDL allocated by one of our other drivers.
I note however that NdisAllocateBuffer simply does an IoAllocateMdl
followed by an MmBuildMdlForNonPagedPool.

  • Later on when the stack indicates it’s finished with the packet:
  • I unchain the buffer using NdisUnchainBufferAtFront.
  • I free the packet using NdisFreePacket.

Unfortunately, there’s a sizeable memory leak with the pooltag NDnd. We
have a fancy “packet recycling” algorithm, which I disabled, so that
packets are always exlicitly allocated or freed. However, the memory
leak still exists.

Investigation revealed that the most likely candidate was this allocation:

NDIS!ndisAllocateFromNPagedPool
NDIS!ndisPplAllocate+0x61
NDIS!NdisAllocateNetBufferAndNetBufferList+0x53
NDIS!ndisXlateRecvPacketArrayToNetBufferLists+0xc8
NDIS!ndisMIndicatePacketsToNetBufferLists+0x3d
sfcndis5!NICHandleRxPush+0x1ea

So it looks like the net buffer list or some associated structure
created to “encapsulate” the NDIS5 packet (which roughly translates to
an NDIS6 net buffer) is not getting freed.

I’ve checked that we free pretty much everything we allocate:

+0xc8c PacketPoolAllocs : 1
+0xc90 PacketPoolFrees : 0
+0xc94 PacketAllocs : 0x6fea1
+0xc98 PacketFrees : 0x6fe9c
+0xc9c BuffersChained : 0x6fea1
+0xca0 BuffersUnchained : 0x6fe9c

and the result of !poolused does not look good:

1: kd> !poolused 0x1 NDnd
Sorting by Tag

Pool Used:
NonPaged Paged
Tag Allocs Frees Diff Used Allocs Frees Diff
Used
NDnd 600438 90 600348 153701592 0 0
0 0

What have I forgotten? If nothing, then looks like a flaw in the wrapper.

MH.

Martin,

From your description, it does not sound like you have done anything
unsavory.

A couple of things to check and/or try:

VERIFY that the Mdl->Next pointer is NULL in the MDL you are getting from
your other driver. Also, AFAIK, NDIS will not like this MDL unless it
describes NonPagedPool.

TRY allocating a new MDL using NdisAllocateBuffer describing the packet (in
the MDL you get) and indicate *that* to NDIS.

Does your other driver pass you an MDL ‘Chain’ or a single MDL?

Good Luck,
-dave

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Martin Harvey
Sent: Wednesday, January 23, 2008 2:23 PM
To: Windows System Software Devs Interest List
Subject: [ntdev] Memory leak in NDIS5 <-> NDIS6 wrapper layer?

Hi folks,

I’ve been attempting to debug a memory leak for a little while, and come
to an unsavoury conclusion.

I’m testing an NDIS5 driver on 2K8. On the Rx side of things:

  • I allocate packets with NdisAllocatePacket
  • Chain an MDL into the packet using NdisChainBufferAtFront.
  • Indicate the packet up the stack.

The one thing to note is that the MDL is not allocated using
NdisAllocateBuffer but is an MDL allocated by one of our other drivers.
I note however that NdisAllocateBuffer simply does an IoAllocateMdl
followed by an MmBuildMdlForNonPagedPool.

  • Later on when the stack indicates it’s finished with the packet:
  • I unchain the buffer using NdisUnchainBufferAtFront.
  • I free the packet using NdisFreePacket.

Unfortunately, there’s a sizeable memory leak with the pooltag NDnd. We
have a fancy “packet recycling” algorithm, which I disabled, so that
packets are always exlicitly allocated or freed. However, the memory
leak still exists.

Investigation revealed that the most likely candidate was this allocation:

NDIS!ndisAllocateFromNPagedPool
NDIS!ndisPplAllocate+0x61
NDIS!NdisAllocateNetBufferAndNetBufferList+0x53
NDIS!ndisXlateRecvPacketArrayToNetBufferLists+0xc8
NDIS!ndisMIndicatePacketsToNetBufferLists+0x3d
sfcndis5!NICHandleRxPush+0x1ea

So it looks like the net buffer list or some associated structure
created to “encapsulate” the NDIS5 packet (which roughly translates to
an NDIS6 net buffer) is not getting freed.

I’ve checked that we free pretty much everything we allocate:

+0xc8c PacketPoolAllocs : 1
+0xc90 PacketPoolFrees : 0
+0xc94 PacketAllocs : 0x6fea1
+0xc98 PacketFrees : 0x6fe9c
+0xc9c BuffersChained : 0x6fea1
+0xca0 BuffersUnchained : 0x6fe9c

and the result of !poolused does not look good:

1: kd> !poolused 0x1 NDnd
Sorting by Tag

Pool Used:
NonPaged Paged
Tag Allocs Frees Diff Used Allocs Frees Diff
Used
NDnd 600438 90 600348 153701592 0 0
0 0

What have I forgotten? If nothing, then looks like a flaw in the wrapper.

MH.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

David R. Cattley wrote:

A couple of things to check and/or try:

VERIFY that the Mdl->Next pointer is NULL in the MDL you are getting from
your other driver. Also, AFAIK, NDIS will not like this MDL unless it
describes NonPagedPool.

  • Verified that the Mdl->Next pointer is NULL. Yes it is, so I’m
    chaining a single buffer per packet.

  • The Mdl’s that we give it describe a DMA common buffer, which I
    beleive is nonpaged, although not necessarily from the pool. We do use
    MmBuildMdlForNonpagedPool.

TRY allocating a new MDL using NdisAllocateBuffer describing the packet (in
the MDL you get) and indicate *that* to NDIS.

After a little bit of rearrangement concerning the use of
packet->MiniportReserved, I’ve now done that. Instead of passing our own
MDL’s up, I now use NdisAllocateBuffer, and pass that MDL up, freeing it
on completion. Unfortunately, this seems not to have improved matters at
all, and I still have the memory leak.

Does your other driver pass you an MDL ‘Chain’ or a single MDL?

Single MDL, so we don’t have to consider splitting or building MDL
chains at the moment.

So, I think I’ve covered all your suggestions, and “no change” appears
to be the position here. We initially discovered the failure during soak
testing: we had to run a 10 gigabit ethernet link at line rate for about
a day to get a box to run out of nonpaged pool and become unresponsive,
but the leak happens regardless of load: even a simple ping will result
in memory being leaked, so it’s not an overflow or low resources code path

MH.

A horrible solution but a useful diagnostic…

Have you tried *copying* the data from your CBDMA buffer into an allocation
from NPPool described by an NDIS_BUFFER and indicating *that*.

I have a vague recollection that NDIS/WDM drivers cannot indicate receives
from other than NPPool - but this could be completely dis-mis-information.
At the moment I cannot find any reference to why it is that I have that
recollection. However, NDIS docs were at one time explicit that
NDIS_BUFFERs (MDLs) could only come from NDIS buffer pools unless you were a
protocol driver (in other words, a send packet or a packet which is the
target of an NdisTransferData call).

It may well be that NDIS, being unaware that your miniport is (beneath it)
receiving into a CBDMA buffer, makes assumptions or takes actions
incompatible with that fact.

That is sure not an answer and I should be clear - I have no idea *why* the
MDLs are leaking and I think I am about out of ideas for how to further box
the symptom.

Good Luck,
-dave

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Martin Harvey
Sent: Thursday, January 24, 2008 11:10 AM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Memory leak in NDIS5 <-> NDIS6 wrapper layer?

David R. Cattley wrote:

A couple of things to check and/or try:

VERIFY that the Mdl->Next pointer is NULL in the MDL you are getting from
your other driver. Also, AFAIK, NDIS will not like this MDL unless it
describes NonPagedPool.

  • Verified that the Mdl->Next pointer is NULL. Yes it is, so I’m
    chaining a single buffer per packet.

  • The Mdl’s that we give it describe a DMA common buffer, which I
    beleive is nonpaged, although not necessarily from the pool. We do use
    MmBuildMdlForNonpagedPool.

TRY allocating a new MDL using NdisAllocateBuffer describing the packet (in
the MDL you get) and indicate *that* to NDIS.

After a little bit of rearrangement concerning the use of
packet->MiniportReserved, I’ve now done that. Instead of passing our own
MDL’s up, I now use NdisAllocateBuffer, and pass that MDL up, freeing it
on completion. Unfortunately, this seems not to have improved matters at
all, and I still have the memory leak.

Does your other driver pass you an MDL ‘Chain’ or a single MDL?

Single MDL, so we don’t have to consider splitting or building MDL
chains at the moment.

So, I think I’ve covered all your suggestions, and “no change” appears
to be the position here. We initially discovered the failure during soak
testing: we had to run a 10 gigabit ethernet link at line rate for about
a day to get a box to run out of nonpaged pool and become unresponsive,
but the leak happens regardless of load: even a simple ping will result
in memory being leaked, so it’s not an overflow or low resources code path

MH.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

David R. Cattley wrote:

A horrible solution but a useful diagnostic…

Have you tried *copying* the data from your CBDMA buffer into an allocation
from NPPool described by an NDIS_BUFFER and indicating *that*.

I have now. No change. :frowning:

That is sure not an answer and I should be clear - I have no idea *why* the
MDLs are leaking and I think I am about out of ideas for how to further box
the symptom.

I don’t beleive that MDL’s are leaking, I believe that NetBufferLists
and/or NetBuffers are leaking. However, given that I’m now allocating
and passing everything up the “NDIS way”, I’m starting to suspect this
might be a problem inside NDIS.

MH.

Martin,

Sorry for confusing MDL with NBL in my last message. Too many TLAs, not
enough neurons.

-dave

-----Original Message-----
From: xxxxx@lists.osr.com
[mailto:xxxxx@lists.osr.com] On Behalf Of Martin Harvey
Sent: Thursday, January 24, 2008 1:59 PM
To: Windows System Software Devs Interest List
Subject: Re: [ntdev] Memory leak in NDIS5 <-> NDIS6 wrapper layer?

David R. Cattley wrote:

A horrible solution but a useful diagnostic…

Have you tried *copying* the data from your CBDMA buffer into an allocation
from NPPool described by an NDIS_BUFFER and indicating *that*.

I have now. No change. :frowning:

That is sure not an answer and I should be clear - I have no idea *why* the
MDLs are leaking and I think I am about out of ideas for how to further box
the symptom.

I don’t beleive that MDL’s are leaking, I believe that NetBufferLists
and/or NetBuffers are leaking. However, given that I’m now allocating
and passing everything up the “NDIS way”, I’m starting to suspect this
might be a problem inside NDIS.

MH.


NTDEV is sponsored by OSR

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

> recollection. However, NDIS docs were at one time explicit that

NDIS_BUFFERs (MDLs) could only come from NDIS buffer pools unless you
were a
protocol driver (in other words, a send packet or a packet which is the
target of an NdisTransferData call).

NT OSes never used NDIS buffer pools, NdisFreeBuffer is a synonym for
IoFreeMdl.


Maxim Shatskih, Windows DDK MVP
StorageCraft Corporation
xxxxx@storagecraft.com
http://www.storagecraft.com

For what it’s worth, I’m trying to track down (what is probably) the same memory leak. At first, I thought it had to do with reassembling fragmented IP packets, but that doesn’t appear to be the case since I see the memory loss when using non-fragmented packets. I have a simple UDP echo program that helps illustrate this problem using an NDIS 5 driver on Vista.

Patrick Klos