KMDF DMA Chaining

I’m having some issues wrapping my head around the implementation of chaining Scatter/Gather DMA using the KMDF. I’ll start with what my situation is, then what I was doing in WDM, then explain what I’m currently doing in KMDF (and why that’s not adequate).

Situation:
I have a piece of hardware with Scatter/Gather capability. I have one 16MB buffer in user memory that gets 16kb DMA’d to it from the board on every other hardware-generated interrupt (approx 2000 Hz). I also have eight 4MB buffers that get 1kb DMA’d on every hardware-generated interrupt. These hardware-generated interrupts tell the driver a specific offset into user memory of where the data on the board should go. The possibility exists that I could have nine pieces of data each going to a different offset into 9 different MDLs. Imagine I need line 1 of each buffer below

============== Buffer 0
0 ************
1 ------------
2 ************
3 ************

============== Buffer 3
0 ************
1 ------------
2 ************
3 ************

============== Buffer 7
0 ************
1 ------------
2 ************
3 ************

What I was doing in WDM:
I had been using GetScatterGatherList() and using the same DMA chain in each of the execution routines but didn’t use the execution routine to actually start the transfer. Once all the SGLs were converted and added to the chain, I’d light off the transfer by setting up the DMA registers on the board.

What I’m currently doing in KMDF:
I very closely model the DMA routines referenced in the Orwick/Smith book (the PLX9656 sample driver). I’m using WdfDmaTransactionInitializeUsingOffset(dmaTransaction, EvtProgramVideoDma, WdfDmaDirectionReadFromDevice, pMdl, offsetIntoUserMemory, sizeofTransfer) then WdfDmaTransactionExecute. Inside EvtProgramVideoDma, I setup the DTEs in a common buffer (currently one for each of the 9 possible buffers) using the Scatter/Gather List provided by the framework.

Where I’m at right now:
This works wonderfully if I only ever need to transfer one at a time. However, I can’t program one DMA channel for multiple DMA transactions at the same time. I’d love to be able to transfer all these together in one DMA chain. The conceptual problem I have is that I don’t understand how I can accomplish that because I only have access the the SGL inside EvtProgramVideoDma. I was looking into chaining the MDLs using the “Next” member, but I can’t determine if WdfDmaTransactionInitializeUsingOffset will offset into each MDL or only the first one. I assume it is only the first one. Even if it is each MDL, the 16MB buffer will have a different offset than the 4MB ones (which all DO have the same offset).

I’m really hoping for some guidance. I probably left out a bunch of important details, so please let me know what I can provide.

Thanks for your time.

  • Phil

I may get some flack from the WDF purists, but you could just call WdfDmaEnablerWdmGetDmaAdapter and get the same kind of DMAAdapter you used before, and use your old code. WDF is a super useful framework but if it starts making a problem harder than it was with WDM, perhaps that’s a sign it’s not such a good fit for a specific problem. If they REALLY didn’t want you to use a DMAAdapter on your own, they would not have WdfDmaEnablerWdmGetDmaAdapter.

Just my $0.02, and standard disclaimer than I don’t know all the details of you problem from one newsgroup posting, so feel free to reject this suggestion or to build on it.

Jan

On 10/3/15, 4:05 PM, “xxxxx@lists.osr.com on behalf of xxxxx@acm.org” wrote:

>I’m having some issues wrapping my head around the implementation of chaining Scatter/Gather DMA using the KMDF. I’ll start with what my situation is, then what I was doing in WDM, then explain what I’m currently doing in KMDF (and why that’s not adequate).
>
>Situation:
>I have a piece of hardware with Scatter/Gather capability. I have one 16MB buffer in user memory that gets 16kb DMA’d to it from the board on every other hardware-generated interrupt (approx 2000 Hz). I also have eight 4MB buffers that get 1kb DMA’d on every hardware-generated interrupt. These hardware-generated interrupts tell the driver a specific offset into user memory of where the data on the board should go. The possibility exists that I could have nine pieces of data each going to a different offset into 9 different MDLs. Imagine I need line 1 of each buffer below
>
>============== Buffer 0
>0
>1 ------------
>2

>3
>==============
>============== Buffer 3
>0

>1 ------------
>2
>3

>==============
>============== Buffer 7
>0
>1 ------------
>2

>3 ************
>==============
>
>What I was doing in WDM:
>I had been using GetScatterGatherList() and using the same DMA chain in each of the execution routines but didn’t use the execution routine to actually start the transfer. Once all the SGLs were converted and added to the chain, I’d light off the transfer by setting up the DMA registers on the board.
>
>What I’m currently doing in KMDF:
>I very closely model the DMA routines referenced in the Orwick/Smith book (the PLX9656 sample driver). I’m using WdfDmaTransactionInitializeUsingOffset(dmaTransaction, EvtProgramVideoDma, WdfDmaDirectionReadFromDevice, pMdl, offsetIntoUserMemory, sizeofTransfer) then WdfDmaTransactionExecute. Inside EvtProgramVideoDma, I setup the DTEs in a common buffer (currently one for each of the 9 possible buffers) using the Scatter/Gather List provided by the framework.
>
>Where I’m at right now:
>This works wonderfully if I only ever need to transfer one at a time. However, I can’t program one DMA channel for multiple DMA transactions at the same time. I’d love to be able to transfer all these together in one DMA chain. The conceptual problem I have is that I don’t understand how I can accomplish that because I only have access the the SGL inside EvtProgramVideoDma. I was looking into chaining the MDLs using the “Next” member, but I can’t determine if WdfDmaTransactionInitializeUsingOffset will offset into each MDL or only the first one. I assume it is only the first one. Even if it is each MDL, the 16MB buffer will have a different offset than the 4MB ones (which all DO have the same offset).
>
>I’m really hoping for some guidance. I probably left out a bunch of important details, so please let me know what I can provide.
>
>Thanks for your time.
>
>- Phil
>
>
>—
>NTDEV is sponsored by OSR
>
>Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev
>
>OSR is HIRING!! See http://www.osr.com/careers
>
>For our schedule of WDF, WDM, debugging and other seminars visit:
>http://www.osr.com/seminars
>
>To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

Thanks for the suggestion, Jan. I’ll go that route while I wait to see if anyone has any WDF-only suggestions.

Here’s a follow up:

I tried doing the same thing I did in the old driver, but it seems the SGL pointer is not always valid after the execution routine completes. However, implementing the old way gave me the idea to make it work using WDF.

Instead of having 9 different common buffers, I now create just one. In each execution routine, I use the same common buffer and just increment a counter for how many transfer elements I have used. I use that counter as the offset into the common buffer to know where to start with the next element in the next execution routine. Once all the relevant routines have been run, I go to the last transfer element and set the “last transfer” bits before setting up the DMA registers to actually start the transfer.

If anyone is interested in doing this themselves, I’m happy to provide any code/guidance.