IoAllocateMdl() vs MmAllocatePagesForMdl()

Hi guys,
I write driver for PCI device with scatter gather dma
I have almost all done, but when I tried driver on 64bit, it didnt work.
I am attaching two techniques which I tried.
Goal is allocate some memory in lower 32bit range, because my device doesnt support 64bit addresing.
Problem is in second technique when I want to write to the buffer from driver. Unfortunately I need to store some values in thi s buffer of my PCIE device. It cause BSOD.

First technique works well, but allocated address is bellow 4GB only on 32 bit systems. So it fails on 64bit systems.

Thank you in advance, Ondrej

//first technique - good

sgDescVA = MmAllocateNonCachedMemory(sgDescLength);
RtlZeroMemory(sgDescVA,sgDescLength);

deviceData->ScatterGatherDescMdl = IoAllocateMdl(sgDescVA,sgDescLength,FALSE,FALSE,NULL);

MmBuildMdlForNonPagedPool(deviceData->ScatterGatherDescMdl);

sgDescVA = MmGetMdlVirtualAddress(deviceData->ScatterGatherDescMdl);

//writing to the buffer, no problems
RtlZeroMemory(sgDescVA,sgDescLength); //OK

//second technique - bad
lowAddr.QuadPart = 0;
highAddr.QuadPart = 0xFFFFFFFF; //4GB

sgDescVA = MmAllocateMappingAddress(sgDescLength,‘tag2’);

deviceData->ScatterGatherDescMdl = MmAllocatePagesForMdl(lowAddr, highAddr, lowAddr, sgDescLength);

__try
{
//doesnt works with/without MmProbeAndLockPages, I tried both alternatives
//MmProbeAndLockPages(deviceData->ScatterGatherDescMdl, KernelMode, IoModifyAccess);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{

}

//there I tried both MmMapLockedPagesWithReservedMapping and MmMapLockedPagesSpecifyCache
//a
sgDescVA = MmMapLockedPagesWithReservedMapping(sgDescVA,‘tag2’, deviceData->ScatterGatherDescMdl,MmNonCached);

//b
//sgDescVA = MmMapLockedPagesSpecifyCache(deviceData->ScatterGatherDescMdl, KernelMode, MmNonCached, sgDescVA, FALSE, NormalPagePriority);

//test
sgDescVA = MmGetMdlVirtualAddress(deviceData->ScatterGatherDescMdl);

//riting to the buffer, it cause bsod
RtlZeroMemory(sgDescVA,sgDescLength); //cause BSOD everytime :frowning:

Hmmm… Thislooks to ME like you’re allocating an intermediate buffer between your driver and the application. If your driver supports scatter/gather, why are you allocating an intermediate buffer?

You know that Windows will automatically handle the issue of a user data buffer being above 4GB and your device only being capable of 32-bit DMA, right?

Peter
OSR

I think it needs some more comentary. Actually I use two buffers. First is data buffer for dma transaction. Second buffer is descriptor for scatter gather. I have no problem with first buffer. First buffer is mapped to user space. My hardware device writes to the first buffer and I read thi s data from user space. I need to fill second buffer by some data for my plx chipset in hardware device which do the S/G transcation. So, the seccond buffer contains physical addresess and sizes of blocks of data. So I want to fill the second buffer in driver by some data. But when I try this, it cause bsod. And I also need physical addresess under 4GB.

i have this problem only in second technique which i tried. I would use first technique but it works only on 32 bit system…

So I am looking for some way how allocate piece of memory (physical address under 4GB) which I can modify from driver.

i used RtlZeroMemory(sgDescVA,sgDescLength); only as example

thanks a lot
Ondrej

I hope it is clear now.
Ondrej

xxxxx@humusoft.cz wrote:

I write driver for PCI device with scatter gather dma
I have almost all done, but when I tried driver on 64bit, it didnt work.
I am attaching two techniques which I tried.
Goal is allocate some memory in lower 32bit range, because my device doesnt support 64bit addresing.
Problem is in second technique when I want to write to the buffer from driver. Unfortunately I need to store some values in thi s buffer of my PCIE device. It cause BSOD.

First technique works well, but allocated address is bellow 4GB only on 32 bit systems. So it fails on 64bit systems.

The correct way to handle this is to use the kernels well-defined and
well-exercised DMA abstration to handle this for you. Call
IoGetDmaAdapter, describing your DMA engine accurately. Then you can
call AllocateCommonBuffer and FreeCommonBuffer to allocate your
descriptor space, already mapped and locked.

You already have a DMA_ADAPTER to create your scatter/gather lists,
don’t you?


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

I *still* don’t understand the driver model you’re using, but… would it not be better to use AllocateCommonBuffer for what you’re doing?

This would ensure that the buffer you allocate falls within your device’s DMA range.

Peter
OSR

> sgDescVA = MmAllocateNonCachedMemory(sgDescLength);

Wrong.

Use IoGetDMAAdapter and its methods.

Your problem if being unable to address >4GB is trivially solved then, by a single BOOLEAN flag to IoGetDMAAdapter.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

The downside of AllocateCommonBuffer is that it allocates physically contiguous memory, and thus can fail when memory has become fragmented. If your device doesn’t support scatter-gather for the shared memory segment, that’s all you can do.

If you could use scatter gather DMA for the shared memory buffer, Windows don’t have a great story. AllocateCommonBuffer works if there’s memory. MmAllocatePagesForMdl also works, but only if you don’t mind being naughty and grabbing the physical addresses from MM rather than the DMA APIs. It also wouldn’t necessarily setup any IOMMU mappings which might hypothetically be needed in the future.

-p

-----Original Message-----
From: xxxxx@lists.osr.com [mailto:xxxxx@lists.osr.com] On Behalf Of xxxxx@osr.com
Sent: Wednesday, December 4, 2013 10:50 AM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] IoAllocateMdl() vs MmAllocatePagesForMdl()

I *still* don’t understand the driver model you’re using, but… would it not be better to use AllocateCommonBuffer for what you’re doing?

This would ensure that the buffer you allocate falls within your device’s DMA range.

Peter
OSR


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at http://www.osronline.com/page.cfm?name=ListServer

> Hi guys,

I write driver for PCI device with scatter gather dma
I have almost all done, but when I tried driver on 64bit, it didnt work.
I am attaching two techniques which I tried.
Goal is allocate some memory in lower 32bit range, because my device
doesnt support 64bit addresing.
Problem is in second technique when I want to write to the buffer from
driver. Unfortunately I need to store some values in thi s buffer of my
PCIE device. It cause BSOD.

First technique works well, but allocated address is bellow 4GB only on
32 bit systems. So it fails on 64bit systems.

Thank you in advance, Ondrej

//first technique - good

sgDescVA = MmAllocateNonCachedMemory(sgDescLength);
RtlZeroMemory(sgDescVA,sgDescLength);

deviceData->ScatterGatherDescMdl =
IoAllocateMdl(sgDescVA,sgDescLength,FALSE,FALSE,NULL);

MmBuildMdlForNonPagedPool(deviceData->ScatterGatherDescMdl);

sgDescVA = MmGetMdlVirtualAddress(deviceData->ScatterGatherDescMdl);

//writing to the buffer, no problems
RtlZeroMemory(sgDescVA,sgDescLength); //OK

//second technique - bad
lowAddr.QuadPart = 0;
highAddr.QuadPart = 0xFFFFFFFF; //4GB

sgDescVA = MmAllocateMappingAddress(sgDescLength,‘tag2’);

deviceData->ScatterGatherDescMdl = MmAllocatePagesForMdl(lowAddr,
highAddr, lowAddr, sgDescLength);

__try
{
//doesnt works with/without MmProbeAndLockPages, I tried both
alternatives
//MmProbeAndLockPages(deviceData->ScatterGatherDescMdl,
KernelMode, IoModifyAccess);
}
__except(EXCEPTION_EXECUTE_HANDLER)
{

}

//there I tried both MmMapLockedPagesWithReservedMapping and
MmMapLockedPagesSpecifyCache
//a
sgDescVA = MmMapLockedPagesWithReservedMapping(sgDescVA,‘tag2’,
deviceData->ScatterGatherDescMdl,MmNonCached);

//b
//sgDescVA =
MmMapLockedPagesSpecifyCache(deviceData->ScatterGatherDescMdl,
KernelMode, MmNonCached, sgDescVA, FALSE, NormalPagePriority);

//test
sgDescVA = MmGetMdlVirtualAddress(deviceData->ScatterGatherDescMdl);

Since it causes a BSOD every time, the obvious thing to do is set a
breakpoint on the next line and see what the value of sgDescVA is. I kind
of expect it is NULL, because you don’t show a test for the return result.
Also, this returns the virtual address of the MDL. What is the thread
context in which this is issued, and do you know the address to be valid
in that thread context? Knowing the value you are getting is the first
step in understanding what went wrong. Also, the is the “base VA” so the
official start of the buffer requires adding MmGetMdlByteOffset to this
value. And where is the !analyze -v output? You have asked us to solve a
problem for which you have failed to supply any of the critical
information required to solve it.

Also, I question why you feel compelled to zero out an area you are about
to overwrite anyway. What is the purpose of this action? (But given the
BSOD, it also means that your attempt to write input data to it will also
fail). Look instead at MmGetSystemAddressForMdlSafe.

Note also that you must ensure that all those pages are locked down if you
are not at PASSIVE_LEVEL, and in any case must be locked down before you
create the s/g list.

So give us something useful to go on, here.
joe

//riting to the buffer, it cause bsod
RtlZeroMemory(sgDescVA,sgDescLength); //cause BSOD everytime :frowning:


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

PeterWie! How’ve you been?!

Quite correct, but as a matter of implementation and NOT of architecture I’ll note. The memory allocated is LOGICALLY contiguous, but not necessarily – by architecture – physically contiguous.

Yes, totally useless to your point, I realize.

As long as the buffer requirement is modest, the memory on the system is not crazily constrained, and the driver is started “early enough” in the system’s life cycle, allocating the common buffer should be fine.

But, remember: I *still* don’t understand the OP’s device usage model. If he needs a location for a buffer that’s used for communication with his device, that IS (one of the) main use of a common buffer after all.

OP: What size buffer are you allocating?

Peter
OSR

> OP: What size buffer are you allocating?

Peter
OSR

I’d guess he allocates single page. As OP explained, this buffer is not
for data - it is for descriptor chain.
He uses PLX chip; on very old ones one had to write this descriptor
chain into chip memory, but starting from PLX8080 / PLX8056 ( remember
these oldies?) descriptor chain for each scatter gather DMA transfer is
located in host memory.

Best regards,
Alex Krol

>> OP: What size buffer are you allocating?

>
> Peter
> OSR

I’d guess he allocates single page. As OP explained, this buffer is not
for data - it is for descriptor chain.
He uses PLX chip; on very old ones one had to write this descriptor
chain into chip memory, but starting from PLX8080 / PLX8056 ( remember
these oldies?) descriptor chain for each scatter gather DMA transfer is
located in host memory.

Which makes me wonder why the memory was allocated non-cached; other than
incurring a serious performance hit building the s/g list, the non-caching
does not offer any value over caching, because cache coherency is
automatically maintained by the hardware. If there is an issue about the
overhead of handling cache writeback coherency during the fetching of the
s/g list because of cross-core usage, one arguement could be that calling
KeFlushIoBuffers() on the s/g list might be a solution, but otherwise I do
not see any value in having allocated that memory uncached, even if the
chip had to read the entire s/g list in before starting operation. This
is so obviously a bad idea that it was fixed in the later releases of that
chipset. I am also a bit concerned that the memory is allocated before
the length of the s/g list is known. What happens if a page is not enough
to hold the s/g list?

joe

Best regards,
Alex Krol


NTDEV is sponsored by OSR

Visit the list at: http://www.osronline.com/showlists.cfm?list=ntdev

OSR is HIRING!! See http://www.osr.com/careers

For our schedule of WDF, WDM, debugging and other seminars visit:
http://www.osr.com/seminars

To unsubscribe, visit the List Server section of OSR Online at
http://www.osronline.com/page.cfm?name=ListServer

Hi guys,
at the beggining I used AllocateCommonBuffer. But I need a lot of memory. I want to deal well witch memory fragmentation.
I used this sequence. It works. But I want to allocate physical addresses under 4GB. Example at the end - modification of buffer - works.

thanks for advices
Ondrej

sequence is:

  1. IoGetDmaAdapter
  2. MmAllocateNonCachedMemory
  3. RtlZeroMemory
  4. IoAllocateMdl
  5. MmBuildMdlForNonPagedPool
  6. GetScatterGatherList

code:

dmaBuffLength = (ULONG)(ROUND_TO_PAGES(dmaBuffLength));

/* Set device description for DMA. */
RtlZeroMemory(&devDescr,sizeof(devDescr));
devDescr.Version = DEVICE_DESCRIPTION_VERSION;
devDescr.Master = TRUE;
devDescr.ScatterGather = TRUE;
devDescr.Dma32BitAddresses = TRUE;
devDescr.InterfaceType = PCIBus;
devDescr.MaximumLength = MAXDMABUFFLENGTH;

/* Get DMA adapter object.*/
deviceData->DmaAdapter = IoGetDmaAdapter(deviceData->pdo,&devDescr,&numberOfMapRegisters);
if(deviceData->DmaAdapter == NULL) return(STATUS_INSUFFICIENT_RESOURCES);

if(dmaBuffLength > numberOfMapRegisters*PAGE_SIZE) return(STATUS_INSUFFICIENT_RESOURCES);

/*Allocate DMA data buffer.*/
dmaBuffVA = (void *)MmAllocateNonCachedMemory(dmaBuffLength);
RtlZeroMemory(dmaBuffVA,dmaBuffLength);

currentMdl = IoAllocateMdl(dmaBuffVA,dmaBuffLength,FALSE,TRUE,NULL);

MmBuildMdlForNonPagedPool(deviceData->DmaBufferFirstMdl);

/*Build Scatter/Gather list for DMA data buffer.*/
dmaBuffVA = MmGetMdlVirtualAddress(deviceData->DmaBufferFirstMdl);

*pContext = ID_SGDATABUFF; //ID for callback function
status = deviceData->DmaAdapter->DmaOperations->GetScatterGatherList(deviceData->DmaAdapter,
DeviceObject,
deviceData->DmaBufferFirstMdl,
dmaBuffVA,
dmaBuffLength,
MyAdapterListControl, //callback
pContext,
FALSE);//transfer to buffer from device

//example, my function, works fine
StoreData(dmaBuffVA,offset,value); //only example

/************************************************/

When I use this sequence the example at the end - modification of buffer - cause BSOD.

second sequence:
1)IoGetDmaAdapter
2)MmAllocateMappingAddress
3)MmAllocatePagesForMdl
4)MmMapLockedPagesWithReservedMapping
5)GetScatterGatherList

dmaBuffLength = (ULONG)(ROUND_TO_PAGES(dmaBuffLength));

/* Set device description for DMA. */
RtlZeroMemory(&devDescr,sizeof(devDescr));
devDescr.Version = DEVICE_DESCRIPTION_VERSION;
devDescr.Master = TRUE;
devDescr.ScatterGather = TRUE;
devDescr.Dma32BitAddresses = TRUE;
devDescr.Dma64BitAddresses = FALSE;
devDescr.InterfaceType = PCIBus;
devDescr.MaximumLength = MAXDMABUFFLENGTH;

/* Get DMA adapter object.*/
deviceData->DmaAdapter = IoGetDmaAdapter(deviceData->pdo,&devDescr,&numberOfMapRegisters);
if(deviceData->DmaAdapter == NULL) return(STATUS_INSUFFICIENT_RESOURCES);

if(dmaBuffLength > numberOfMapRegisters*PAGE_SIZE) return(STATUS_INSUFFICIENT_RESOURCES);

/*Allocate DMA data buffer.*/

lowAddr.QuadPart = 0;
highAddr.QuadPart = 0xFFFFFFFF; //4GB

dmaBuffVA = MmAllocateMappingAddress(dmaBuffLength,‘tag1’);
deviceData->DmaBufferFirstMdl = MmAllocatePagesForMdl(lowAddr, highAddr, lowAddr, dmaBuffLength);

checkVA = MmMapLockedPagesWithReservedMapping(dmaBuffVA,‘tag1’,deviceData->DmaBufferFirstMdl,MmNonCached);

/*Build Scatter/Gather list for DMA data buffer.*/
dmaBuffVA = MmGetMdlVirtualAddress(deviceData->DmaBufferFirstMdl);

*pContext = ID_SGDATABUFF; //ID for callback function
status = deviceData->DmaAdapter->DmaOperations->GetScatterGatherList(deviceData->DmaAdapter,
DeviceObject,
deviceData->DmaBufferFirstMdl,
dmaBuffVA,
dmaBuffLength,
MyAdapterListControl, //callback
pContext,
FALSE);//transfer to buffer from device

//example, my function, cause BSOD
StoreData(dmaBuffVA,offset,value); //only example

Yeah, maybe the problem is in MmGetMdlByteOffset. I did not count with that.

> at the beggining I used AllocateCommonBuffer. But I need a lot of memory.

Allocate several common buffers.

I used this sequence. It works.

The sequence you’re using works by mere luck, and there is nothing surprising it fails with some setup.

But I want to allocate physical addresses under 4GB.

->AllocateCommonBuffer

Call it several times if you want.

Why do you need lots of memory for SGLs, BTW? each entry is < 100 bytes, correct? so, 1MB is like 10.000 SG entries, which is a huge lot.


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com

OK, I’m going to stop focusing on WHY you’re trying to do what you’re doing (because you seem to refuse to explain it to us), and try to JUST answer the question you’ve been asking us.

The FIRST sequence you show has, for me, a lot that I don’t like in it. A lot.

The SECOND sequence you show, the one that you say BSODs, should work just fine.

Have you walked through this code in the debugger?

  1. Why aren’t you validating the returned checkVA from the call MmMapLockedPagesWithREservedMapping? Does checkVA == dmaBuffVA immediately after this acll?

  2. What is MmGetMdlVirtualAddress returning? Is is a non-zero value?

  3. FINALLY, after all this… you keep saying it crashes. WHAT’S THE CRASH CODE that you’re getting? Show us the output of !analyze -v (and please, be sure the symbols are set up correctly).

Peter
OSR

OK. MmGetMdlVirtualAddress can only be used for one purpose - to pass to GetScatterGatherList. Mdl->VirtualAddress is only set in IoAllocateMdl. You CANNOT use it to access memory.

To access memory you use the address returned by MmMapLockedPagesWithREservedMapping.

Hmmm… Given that GetScatterGatherList actually succeeds… then IN THIS CASE, won’t the values be the same (and non-zero)?

He’s allocating FULL PAGES, and the pages are mapped into KV Address Space.

Granted, he’s not allowed to do what he’s doing architecturally, but shouldn’t the addresses he gets back from callling MmMapLockedPagesWithReservedMapping, MmAllocateMappingAddress, and MmGetMdlVirtualAddress all be the same?

OP… ARE those three addresses all the same? eh??

Peter
OSR

>OP… ARE those three addresses all the same? eh??

The allocated mapping address and the returned mapping address will be the same, per the documentation, unless the mapping fails because of incorrect parameters.

Mdl->VirtualAddress will not be set. Mdl->SystemVA may be set, or not.

GetSGL only cares about the offset of VirtualAddress it receives, reative to Mdl->VirtualAddress. Mdl->VirtualAddress is often NULL, in case of I/O outside of a process context (such as flushing dirty pages).

Ah! Yes, quite correct. Thank you for the reminder!!

I was so engaged in the problem about the MAPPING, the fact that s/g only cares about the VirtualAddress to determine the offset into the page entirely escaped my mind. Absolutely correct. Bravo.

Nice catch.

There you go, OP… Mr. Grig once again, solves a problem for an NTDEV member. Specifically, in this case, your problem OP.

Peter
OSR

> GetSGL only cares about the offset of VirtualAddress it receives, reative to Mdl->VirtualAddress.

Mdl->VirtualAddress is often NULL

Same is correct on IoBuildPartialMdl

The “offset” parameter of it must be (PUCHAR)MmGetMdlVirtualAddress(MasterMdl) + Offset

Too bad MS have not explicitly provide this function with Offset/Length, not nonsensial VirtualAddress


Maxim S. Shatskih
Microsoft MVP on File System And Storage
xxxxx@storagecraft.com
http://www.storagecraft.com