Suspected Spam:Re: IPortWaveRTStream::AllocatePagesForMdl & MmAllocatePagesForMdlEx

xxxxx@gmail.com wrote:

> That doesn’t mean a different kind of memory. If you’re worried about this, you just need to use GetScatterGatherList in the DMA_ADAPTER to get the page physical addresses.
GetScatterGatherList is for packet-based transfers

No, GetScatterGatherList is for discontiguous buffers.

the “problem” is when you need a static common buffer that can be discontiguous (so as to prevent allocation failures as much as possible, for example). I know that I can hoard system resources with GetScatterGatherList (well, theoretically - 64-bit hardware so bounce buffers won’t be utilized) but this is abusing the API.

No, it’s not. Look, the problem is that you have invented a category
that does not exist. There are two types of DMA transfers: common
buffer and packet. Common buffers are contiguous. If you have a
discontiguous common buffer, then it’s not a common buffer. It’s just
memory, and you will do packet DMA. SOMEHOW you have to get a
scatter/gather list for that buffer so you can do DMA, and for that
you’ll use GetScatterGatherList.

Now, if you really feel guilty about it, then I suppose you’ll free the
scatter/gather list after every use, but that seems like a waste of
resources.

I’m aware that it’s the same kind of memory and that on current hardware and versions of Windows (with disabled IOMMUs) there’s no difference between device-logical and physical addresses - I was just wondering why the DMA API hasn’t included a function that’s analogue to MmAllocatePagesForMdlEx or something similar (exallocatepool/etc),

Because it’s totally unnecessary. What would such a function do? How
would it differ from ExAllocatePool? There is no alternate pool of
memory you could use. You’ll still need a scatter/gather list at some
point. The only detail is when you create that list.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

> No, GetScatterGatherList is for discontiguous buffers.

I thought it was part of the packet-based API, since it’s supposed to be an alternative to the MapTransfer sequence.

Common buffers are contiguous. If you have a discontiguous common buffer, then it’s not a common buffer. It’s just memory, and you will do packet DMA.

The audio driver model and a lot of data (video/audio/whatever) streaming applications with timing constraints beg to differ. This has been discussed in this forum before. Quoting Peter Wieland from this thread:
https://www.osronline.com/showthread.cfm?link=251064

Also quoting Peter Viscarola :

…which to me implies that in the future, AllocateCommonBuffer (might) not be a problem for IOMMU-equipped machines (with the IOMMU being used).

Because it’s totally unnecessary. What would such a function do? How would it differ from ExAllocatePool? There is no alternate pool of memory you could use. You’ll still need a scatter/gather list at some point. The only detail is when you create that list.

If I use ExAllocatePool, and then use GetScatterGatherList, and GetScatterGatherList needs to map my buffer to map registers (the bounce-buffer type), I’m wasting memory that was mapped to the bounce buffers since I’m still holding on to it. In addition, what if it does use bounce buffers, and I want to access the memory both from the CPU (driver, user-mode process, etc) and from the device? As far as I’m aware, manually crafting MDLs is not officially supported in WDM, as noted in this thread:
https://www.osronline.com/showthread.cfm?link=251064
…nevermind the additional overhead.

-Alex

xxxxx@gmail.com wrote:

> No, GetScatterGatherList is for discontiguous buffers.
I thought it was part of the packet-based API, since it’s supposed to be an alternative to the MapTransfer sequence.

As opposed to what? That’s my point. You seem to implying that there
is some THIRD mode. I disagree.

> Because it’s totally unnecessary. What would such a function do? How would it differ from ExAllocatePool? There is no alternate pool of memory you could use. You’ll still need a scatter/gather list at some point. The only detail is when you create that list.
If I use ExAllocatePool, and then use GetScatterGatherList, and GetScatterGatherList needs to map my buffer to map registers (the bounce-buffer type), I’m wasting memory that was mapped to the bounce buffers since I’m still holding on to it.

Yes, but what’s the alternative? If you need bounce buffers for your
transfer, then you need bounce buffers, and GetScatterGatherList will
allocate them. If you feel guilty about that, then you create and free
the s/g list every time you need to transfer from your buffer. If you
need to transfer from it continuously, then you NEED to hold on to all
of those bounce buffers. Your device will not work without them (on
systems where they are required).

In addition, what if it does use bounce buffers, and I want to access the memory both from the CPU (driver, user-mode process, etc) and from the device?

If you’re using an IOMMU, then the logical, physical, and virtual
addresses all map to the same memory. As long as you use
FlushAdapterBuffers to ensure coherency, everything works.

The only time there is a disconnect is when your DMA is limited to
32-bit addressing. In that case, you simply cannot have long-lived DMA
transactions. The DMA abstraction will copy the data back and forth,
but you are limited to 64k at a time.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

>If you’re using an IOMMU, then the logical, physical, and virtual addresses all map to the same memory. As long as you use FlushAdapterBuffers to ensure coherency, everything works. The only time there is a disconnect is when your DMA is limited to 32-bit addressing. In that case, you simply cannot have long-lived DMA transactions. The DMA abstraction will copy the data back and forth, but you are limited to 64k at a time.

The whole idea is to have something which will either allocate a static, not necessarily contiguous buffer (no copying back and forth, can be accessed by the system and the device at the same time), or fail. GetScatterGather can potentially use map registers, thereby making the bus mapping and the (system) virtual address mapping incoherent - this is what the device requires.

AllocateCommonBuffer is actually better in the sense that it’ll get these bounce buffers beforehand and then map them as part of the virtual contiguous space (removing the need to flush adapter buffers, which is a memcpy operation - not acceptable for some tasks). Of course it currently allocates contiguous physical memory only, so… :slight_smile:

Again, I’m not using an IOMMU, both my system and my device are 64-bit capable, so this is not really a problem for me now. However, as people have noted before (at least one MS employee others), the WDM/KMDF API is lacking in this respect. If this can’t be done because, let’s say, some 128-bit arch is supported by Windows and somebody tries to use the device on it (and the physical RAM is mapped at such an offset in physical memory address space so as to make it beyond the reach of the device, and etc, etc), then I’d very much want it to fail. It’s just a matter of doing it properly, and not (quoting Peter Wieland yet again) “being naughty” or using backdoors/etc (“Jeremiah” in the comments: http://blogs.msdn.com/b/peterwie/archive/2006/05/11/595460.aspx).

Again, the original question - is there anything planned to add the functionality, and what does PortClass’ WaveRTStream port do in the AllocatePagesForMdl method (probably exactly what it sounds like, but maybe there’s something more going on).

Look, the problem is that you have invented a category that does not exist.

Is it a category that doesn’t exist at all, or is it something that doesn’t exist in Windows terminology? (given the fact that WaveRT is supposed to work exactly with a discontiguous common buffer, I’m thinking it’s neither… or I’m still failing to see something).

I guess this can only be answered by a current Microsoft employee on the drivers team, and if It’s impossible (non-disclosure or otherwise) to give me the answer, please let me know :slight_smile:

P.S. pingback to the original thread: http://www.osronline.com/showthread.cfm?link=269409

Hmmmm… Lots of assumptions in this thread, Mr. Spasov.

Let’s see if I understand correctly:

a) You want to use a continuous (everyone read carefully, please, I did not write conTIGuous but rather continuous) DMA model… what we typically refer to in Windows as “Common Buffer” DMA.

b) You don’t want the memory allocated contiguously, because it could fail (which is true, but vastly unlikely for a 64-bit device on a system with any workable amount of memory, but let’s ignore that), and your device supports Scatter/Gather.

c) You don’t want to “be naughty” (copyright PeterWie) and allocate your memory with MmAllocatePagesForMdlEx and call MmGetPhysicalAddress (THIS, I certainly applaud… I *hate* it when people call MmGetPhysicalAddress).

d) You very much want to avoid the overhead of double-buffering.

Did I get that right?

So you’re asking for an API that’ll take a DMA Adapter and length as input, and return a Scatter/Gather list that describes a buffer of “length” that’s been allocated in physical memory reachable by the device, and also return a kernel virtual address for that buffer (mapped into the high half of kernel virtual address space)… Did I get that right?

Assuming so… I *think* you’re over complicating this by a lot.

You just call AllocatePagesForMdlEx specifying the highest PA to which your device can DMA and feed it into the existing packet-based APIs, as Mr. Roberts has been saying. You get your S/G list to use for the duration of your continuous DMA operations… which will almost certainly be the life of your Device Object.

This does EXACTLY what you’re asking.

It fails if the memory can’t be allocated.

There’s no chance of bounce buffers.

It returns a S/G list for your DMA operations.

Problem solved, right!?

Actually, this isn’t how AllocateCommonBuffer works on x86/x64 systems. AllocateCommonBuffer simply calls MmAllocateContiguousMemorySpecifyCache, specifying the highest physical address of the memory block as the maximum PA that your device can DMA to (based on your DMA Adapter).

The morals of this story are:

a) Don’t get all caught-up in architectural terminology… worry instead about architectural concepts.

b) Don’t make assumptions about how things work

c) Listen to Mr. Roberts and re-read point a.

Peter
OSR
@OSRDrivers

xxxxx@gmail.com wrote:

> Look, the problem is that you have invented a category that does not exist.
Is it a category that doesn’t exist at all, or is it something that doesn’t exist in Windows terminology? (given the fact that WaveRT is supposed to work exactly with a discontiguous common buffer, I’m thinking it’s neither… or I’m still failing to see something).

Where does WaveRT use a discontiguous common buffer? The circular
buffer in a WaveRT device is in hardware, and therefore is physically
contiguous. Transfers into that space happen in the user mode Audio Engine.


Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Except if it’s in the 20-30MB range (multiple of these). The target machines are quite beefy, but still.

very. At the estimated transfer rates (up to 5GB/s bidirectional), this can hog quite a bit of CPU time by itself.

Mr. Roberts was mentioning the ExAllocatePool functions. I hadn’t thought about the combination of allocatepagesformdl (and the ability to specify an address range there) + GetScatterGather essentially functioning as a discontiguous common buffer equivalent. Shame on me, and thank you for the advice!

…I’d still prefer an actual allocatecommonbuffer-style function just for the peace of mind that I’ll be fully compliant with the API (as I won’t be calling FlushAdapterBuffers).

You’re referring to this I imagine:

I guess I should’ve typed “concept” instead of “terminology”, my bad.

I assume you’re referring to the AllocateCommonBuffer part here. The documentation specifically said that it “allocates map registers to map the buffer, if required by the system” - I interpreted this as potentially meaning that it might use part of the allocated bounce buffer space (as the term “map registers” is used for both the actual and the software-emulated ones in the DDK), even though that sounds very wasteful (and/or use actual map registers, if available and depending on how many, etc, etc). Still, my main point was that it’d set it up as a common buffer - the rest was mostly irrelevant. Let me know if it was something else :wink:

[quote]
c) Listen to Mr. Roberts

[quote]
I am and I was :wink:

https://msdn.microsoft.com/en-us/library/windows/hardware/ff536922(v=vs.85).aspx

There is always a cyclic buffer in system memory, unless I am missing something.

You STILL need to call FlushAdapterBuffers. You ALWAYS call it. It’s part of the Windows DMA architectural abstraction.

Peter
OSR
@OSRDrivers

Fair enough - then I also can’t (100%) safely (according to the abstraction/contract) assume that GetScatterGather won’t do some kind of buffering somewhere (for whatever reason it might be). I know that it’s far beyond a practical concern (at this moment and using this method), but it’s still bugging me as a… “hack”, in the sense of, “this is documented as possibly doing this, but since we do it with that kind of params/in that kind of environment/etc, it shouldn’t do it”. It still does seem to be the cleanest way to set up this kind of buffer, but it’s not 100% guaranteed to be the case :slight_smile:

You’re hung-up on the buffering, but the buffering isn’t the abstraction. It’s an implementation of the abstraction.

The bottom line is the contract between your driver and the HAL is that the HAL guarantees your driver will work without modification, regardless of how the I/O bus is connected to the memory bus and regardless of DMA limitations in your hardware. In return for this grand favor, you agree to use the HAL and not bypass it.

What the HAL does behind the scenes to get its job done is none of your concern, by architectural definition. Sure, by implementation, you’d like to have the best chance of performance possible. We all would. But by the very definition of the architecture, this is of necessity opaque to your driver. The HAL does whatever it does. The fact that you happen to know that it might sometimes use bounce buffers, is a detail of the implementation about which you architecturally have no right to be concerned. You driver has to work in either case.

Do you follow what I’m saying?

Peter
OSR
@OSRDrivers

I’m aware of all that, and I guess I should restate the question from the original thread, to see why I’m digging in:

The bottom line is, why is this functionality (allocating a discontiguous common buffer for continuous DMA purposes) available for portcls-based audio drivers, but not via the general purpose APIs (WDM/WDF/etc)? Also, any planned additions to the DMA API to cover this (given the fact that the GetScatterGatherList method is still, from an abstraction point of view, a workaround)?

Well, I know less than nothing about portcls… so I really have no insight to provide there.

I’ve explained to you how to do this using supported mechanisms and APIs.

Well, frankly, the answer is almost certainly that when AllocateCommonBuffer (then, HalAllocateCommonBuffer) was designed and implemented 27 years ago, this seemed like “the thing to do” – Devices that supported Scatter/Gather were not nearly as common as they are today. Nobody really knew what hardware would need to be supported in order to connect the I/O Bus with the system Memory Bus. Folks at the time thought the x86 was (at best) an aberration, and that *real* workstations (based on the cutting-edge the MIPS R3000, for example) would be designed that would require some sort of hardware redirection (like the VAX and even the PDP-11 before it). So, knocking together AllocateCommonBuffer to fill the limited purposes of the occasional Bus Master DMA device that needed to support Continuous mode DMA by allocating a chunk of memory within the DMA range of the device and calling MmGetPhysicalAddress to get the “Device Bus Logical Address” for that memory, seemed like the best idea on the time for the x86.

And again, if I had to GUESS, we don’t have any newer or different interface today because, well… nobody cares enough to change it, really. The existing API is simple and elegant, and most continuous DMA areas are relatively small. Plus, the existing supported APIs allow you to DO whatever you want, there’s nothing that PREVENTS you from doing it.

In terms of “any planned additions” to support this… I’d be surprised if there were. Now, I don’t work in the Windows Operating System Group, but if I was the architect responsible for the I/O subsystem I’d tell whoever proposed this “if it ain’t broke, don’t fix it.” Lots of better stuff to do than to implement an API for something everyone can already do if they need to.

Peter
OSR
@OSRDrivers