How to get physical pages from user space for DMA transaction.

Hello,

I need help to get physical page for allocated buffer in user space for DMA transaction. I want to control DMA from user space .Currently I am using WDF_COMMON_BUFFER inside driver. I am copying user buffer data to common buffers virtual address and then I am passing this logical address to user space. After that this PHYSICAL_ADDRESS is used for preparing descriptor for DMA transaction. Now I fill that this is costliest operation.

Is there any way that I can Lock this memory in physical page and use this physical page for DMA transaction so I don’t need to copy every time while I transfer data to driver. I required similar functionality like WD_DMALock().

Thanks,
Kishan Patel

On Aug 28, 2018, at 8:14 PM, xxxxx@gmail.com wrote:Hello,
>
> I need help to get physical page for allocated buffer in user space for DMA transaction. I want to control DMA from user space .

Don’t do that. It’s just that simple. User-mode processes are not protected from one another. By doing this, you are opening up a HUGE security hole. Anyone who can open your process handle can force you to do arbitrary copies anywhere in memory.

Seriously. Don’t do it.

> Currently I am using WDF_COMMON_BUFFER inside driver. I am copying user buffer data to common buffers virtual address and then I am passing this logical address to user space. After that this PHYSICAL_ADDRESS is used for preparing descriptor for DMA transaction. Now I fill that this is costliest operation.

Of course. Every user/kernel transition is expensive. Just do the whole DMA operation in kernel mode.

Tim Roberts, xxxxx@probo.com
Providenza & Boekelheide, Inc.

Apart from the issues that Tim has already mentioned, consider the system with IOMMU where (physical_address ==bus_address)?TRUE:FALSE expression may not always evaluate to the former option…

Anton Bassov

Thank you tim and Anton Bassov for your quick response.
With this operation I am getting 10 times less throughput compare to that.

To improve throughput I will try to so all DMA operations inside kernel space that suggested by tim.

Let me know if you have better idea in terms of getting good throughput.

Thanks,
Kishan Patel

>Every user/kernel transition is expensive.

Well, apparently, not a transition per se (although, AFAIK, it used to be a really expensive “luxury” on the earlier systems) - these days the main overhead is, apparently, mainly related to parameter validation that has to be done upon every call that is made by the userland. In the OP’s case it results in a noticeable performance degradation

Anton Bassov

Aren’t you ding it backwards? Can’t the user-space app send up a buffer, and then have the driver pin that buffer in memory and DMA into it? No copy required THAT way… does that not work for some reason?

Peter
OSR
@OSRDrivers

>Can’t the user-space app send up a buffer, and then have the driver pin that buffer

in memory and DMA into it?.. does that not work for some reason?

The very first idea that gets into my head is that the OP’s hardware requires the buffer that is physically contiguous - if I got it right, he made it clear that he needs a common buffer. In such case the above mentioned approach is not going to work - you cannot get physically contiguous memory from the userland,can you.

What he can try here is to allocate the contiguous buffer in the kernel and to map it to the userland.
Apart from the obvious security-related issues that such an approach is fraught with, it should work just fine…

Anton Bassov

You CAN allocate large pages in user mode, and pass them down for DMA. Each large page (2M) will be physically contiguous. The question really is does the OP need to do individual DMA transfers larger than 2Mytes? Even for a request ring buffer, 2Mbytes is decently big. If each request slot was 256 bytes, a 2M large page holds 8192 requests. If you have request queues with thousands of slots, you may want multiple queues anyway.

On some hardware, it also may give higher performance to copy highly fragmented request buffers into large contiguous buffers. Like if you were doing many 256 byte write requests, it may give optimal DMA performance if you had rings with 256 byte slots (or variable slots per request) and you just copied the data into the ring. The hardware then could just inhale big chunks from the ring memory with no need to do secondary buffer read requests.

A downside of depending on 2M large pages is there is some risk that non-Intel processors support different sized large pages than Intel.

Jan

-----Original Message-----
From: xxxxx@lists.osr.com On Behalf Of xxxxx@hotmail.com
Sent: Wednesday, August 29, 2018 1:13 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to get physical pages from user space for DMA transaction.

>Can’t the user-space app send up a buffer, and then have the driver pin that buffer
> in memory and DMA into it?.. does that not work for some reason?

The very first idea that gets into my head is that the OP’s hardware requires the buffer that is physically contiguous - if I got it right, he made it clear that he needs a common buffer. In such case the above mentioned approach is not going to work - you cannot get physically contiguous memory from the userland,can you.

What he can try here is to allocate the contiguous buffer in the kernel and to map it to the userland.
Apart from the obvious security-related issues that such an approach is fraught with, it should work just fine…

Anton Bassov


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:>

> You CAN allocate large pages in user mode, and pass them down for DMA.

Each large page (2M) will be physically contiguous

Sure, but still there are few issues here

  1. Consider the scenario of the target buffer being larger than 2M

  2. You need SeLockMemoryPrivilege, which, IIRC, is not assigned by default even to Admins, and has to be added too the account. IIRC, this part cannot be done programatically and requires manual interactions with the management console

  3. Even if you work around (1) and (2), there is no guarantee that you request gets granted. The longer the system runs, the lower probability of finding a large page that is available because of memory fragmentation

Anton Bassov

>You need SeLockMemoryPrivilege

Not necessarily. You allocate the buffer in user-mode, and send it up to the driver, where he maps it using Direct I/O. The driver keeps the IRP/Request in progress until the app closes the handle (or exits). This ensures tidyness.

We call this the “big honkin’ hanging Direct I/O” technique… and it’s shockingly simple and practical.

Peter
OSR
@OSRDrivers

>>You need SeLockMemoryPrivilege

Not necessarily. You allocate the buffer in user-mode, and send it up
to the driver, where he maps it using Direct I/O.

Please don’t forget that we are speaking about the large pages/physically contiguous buffers here.

Certainly, if the OP’s hardware is OK with a buffer that is physically non-contiguous, then the approach that you propose is most definitely the best way to go - this is out of question. However, if I got it right, he made it clear that he needs a common buffer…

Anton Bassov

>we are speaking about the large pages/physically contiguous buffers here

Hmmm? What am I missing?

The user calls VirtualAllocEx specifying reserve, commit, and large pages. Then passes up the UVA to the driver in some sort of Direct I/O request.

Isn’t that all there is to it? No mystic privs required in that case, are there?

Peter
OSR
@OSRDrivers

> Isn’t that all there is to it? No mystic privs required in that case, are there?

https://docs.microsoft.com/en-us/windows/desktop/memory/large-page-support

As you can see, before you can call VirtualAlloc() for getting a large page you have to enable SeLockMemoryPrivilege with AdjustTokenPrivileges() , and before you are in a position to do it you have to ensure that the account has been actually assigned this privilege - you cannot add new privileges by means of AdjustTokenPrivileges(), can you. IIRC, this part can be done only via the management console and not programatically…

Anton Bassov

Well, I learned something today.

Despite the docs you pointed to being very clear, a blog post by Raymond Chen (I mean, what does HE know, right?), and numerous articles on SO… I couldn’t believe it. So, I worked super hard and wrote my own program (OK, not really, I copied 90% of it from somebody somewhere):

int rc, min;

min = GetLargePageMinimum();

void *p = VirtualAlloc(NULL,
min,
MEM_COMMIT | MEM_RESERVE | MEM_LARGE_PAGES,
PAGE_READWRITE);

rc = GetLastError();

printf(“Pointer = %p, Error = %d.\n”, p, rc);

You know what? Error = 1314 – “A required privilege is not held”… I’ll be darned.

Gotta keep those pesky user-mode programmers under control, I guess! Can’t have them allocating LARGE PAGES. OMG!

Thanks Anton. I hereby grant you a free pass to make one rude, crude, arrogant, and socially irresponsible post here on NTDEV without being banned for life.

:wink:

Peter
OSR
@OSRDrivers

Actually, I believe I should earn some "good karma"on NTDEV before even thinking about switching to the “ironical mode”, taking into consideration the troll-management capabilities of the upcoming platform. Therefore, for the time being I will try my best to make some posts that fall into the “useful” category in order to avoid “The Hanging Judge’s” potential wrath…

Anton Bassov

There is an important reason for this restriction: large pages are not pageable. Think about it. You have some page file(s) full of 4k chunks with lots of data in them. Assume for efficiency that you manage the data in these files with some kind of bitmap ? like say the Windows team has done ? and then say now a want a 2 MB chunk to page out this large page I have been working on. Clearly it doesn?t work.

The ?obvious? solution is to say that large pages are chiefly used in limited situations such as mapping kernel32.dll or by ?advanced? users who have lots of RAM and full control over there systems and therefore it is okay to just make then not pageable. Once you have arrived at this conclusion it is not hard to see how the need for the lock pages in memory privilege becomes the next requirement.

IIRC Anton is wrong however in that this privilege is granted (but not enabled) for administrators by default. This has changed from Windows version to version and it has been a long time since I worked outside of a GPO controlled environment so I might have this mistaken

Any properly written code that tries to use large pages as a performance optimization can fail back to an successfully use normal pages. That?s what they are, a performance optimization for reducing the TLB lookup cost. Any other use is nonsense.

Sent from Mailhttps: for Windows 10

________________________________
From: xxxxx@lists.osr.com on behalf of xxxxx@osr.com
Sent: Friday, August 31, 2018 3:37:32 PM
To: Windows System Software Devs Interest List
Subject: RE:[ntdev] How to get physical pages from user space for DMA transaction.

Well, I learned something today.

Despite the docs you pointed to being very clear, a blog post by Raymond Chen (I mean, what does HE know, right?), and numerous articles on SO… I couldn’t believe it. So, I worked super hard and wrote my own program (OK, not really, I copied 90% of it from somebody somewhere):

int rc, min;

min = GetLargePageMinimum();

void *p = VirtualAlloc(NULL,
min,
MEM_COMMIT | MEM_RESERVE | MEM_LARGE_PAGES,
PAGE_READWRITE);

rc = GetLastError();

printf(“Pointer = %p, Error = %d.\n”, p, rc);

You know what? Error = 1314 – “A required privilege is not held”… I’ll be darned.

Gotta keep those pesky user-mode programmers under control, I guess! Can’t have them allocating LARGE PAGES. OMG!

Thanks Anton. I hereby grant you a free pass to make one rude, crude, arrogant, and socially irresponsible post here on NTDEV without being banned for life.

:wink:

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></https:>

> IIRC Anton is wrong however in that this privilege is granted (but not enabled) for

administrators by default.

Please read my post more carefully. Look what I had said

If it was granted by default there would be no problem whatsoever - the only thing that you would be required to do in such case is to enable this privilege in a token, which, unlike adding privileges to the account, may be done programatically. In fact, there would be no need to even mention it, in the first place. However, the fact that it requires user interaction with a console adds sort of an extra “complication”. Let’s face it - telling end users that they have to configure the OS in a certain way before your piece of hardware can be utilised does not really seem to add any extra selling points to your product, don’t you think…

This has changed from Windows version to version and it has been a long time
>since I worked outside of a GPO controlled environment so I might have this mistaken

Fair enough -assuming that this feature may change from one OS version to another, I have to admit that my practical experience with Windows as a user is VERY outdated. Unless we count booting up a new machine and taking whatever steps are necessary under the given OS version
before “defenestration process” can be successfully launched, as a “user experience”, the last Windows version that I have practical experience with is XP.

The only thing that I don’t understand is WHY it should be changing under different OS versions, in the first place. After all, the principle of “least privilege by default” seems to be pretty universal, and the ability of user apps to lock physical pages in RAM does not seem to be of crucial importance in 95+% of cases, does it…

Anton Bassov

Went threw some code I wrote many years ago for testing LARGE_PAGES. Here
is what i had to do:

static LPVOID Alloc(DWORD dwSize)
{
#ifdef _WIN32
DWORD type = MEM_COMMIT | MEM_RESERVE;
#if defined (_USE_LARGE_PAGES)
DWORD isize = dwSize;
HANDLE hToken;
TOKEN_PRIVILEGES tp;

// open process token
if (OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES |
TOKEN_QUERY, &hToken))
{
// get the luid
if (LookupPrivilegeValueA(NULL, “SeLockMemoryPrivilege”,
&tp.Privileges[0].Luid))
{
BOOL status;
DWORD error;
tp.PrivilegeCount = 1;
tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
// enable privilege
status = AdjustTokenPrivileges(hToken, FALSE, &tp, 0,
(PTOKEN_PRIVILEGES)NULL, 0);
// It is possible for AdjustTokenPrivileges to return TRUE and still not
succeed.
// So always check for the last error value.
error = GetLastError();
if (status && (error == ERROR_SUCCESS))
{
HMODULE hModule;
PGLPM pGLPM;

pGLPM = (PGLPM)NULL;
hModule = GetModuleHandleA( “Kernel32.dll” );
if ( hModule )
pGLPM = (PGLPM)GetProcAddress( hModule, “GetLargePageMinimum” );

// Get environemnt specific large page size
//SIZE_T minsize = GetLargePageMinimum();
SIZE_T minsize = 0;
if ( (PGLPM)NULL != pGLPM )
minsize = pGLPM();

if ( 0 == minsize )
{
// If the processor does not support large pages, the return value is zero.
// Or GetLargePageMinimum is not exist in Kernel32.dll
minsize = 2 * 1024 * 1024; // The minimum large page size varies, but it is
typically 2 MB or greater.
}

if (isize >= minsize)
{ // If we get this far, we know that we can allocate large pages
SIZE_T blocks = isize / minsize;
// Allocation size must be multiple of large page size
if (isize % minsize)
blocks++;
isize = blocks * minsize;
type |= MEM_LARGE_PAGES;
} // Too small; no need for large pages
#if defined (_FORCE_LARGE_PAGES)
else
{
isize = minsize;
type |= MEM_LARGE_PAGES;
}
#endif
} // Error setting privileges
} // Error on privilege lookup
} // Error opening token

// If we failed to enable large page security above, the function will
allocate
// as normal; so no failure in this case
PVOID m = VirtualAlloc(NULL, isize, type, PAGE_READWRITE);

if (hToken != INVALID_HANDLE_VALUE)
{
tp.Privileges[0].Attributes = 0;
tp.PrivilegeCount = 1;
// disable privilege
AdjustTokenPrivileges(hToken, FALSE, &tp, 0, (PTOKEN_PRIVILEGES)NULL, 0);
// close the handle
CloseHandle(hToken);
}

if (!m && (type & MEM_LARGE_PAGES))
{ // Large page allocation failed, revert back to normal allocation
type &= ~MEM_LARGE_PAGES;
m = VirtualAlloc(NULL, dwSize, type, PAGE_READWRITE);
}
else
{
}

return m;
#else
return VirtualAlloc(NULL, dwSize, type, PAGE_READWRITE);
#endif
#else
void*p = NULL;
if(0 == posix_memalign(&p, 0x1000, dwSize) )
return p;
return NULL;
#endif

}

On Mon, Sep 3, 2018 at 7:18 AM xxxxx@hotmail.com
wrote:

> > IIRC Anton is wrong however in that this privilege is granted (but not
> enabled) for
> > administrators by default.
>
>
> Please read my post more carefully. Look what I had said
>
>


>
>
>
> If it was granted by default there would be no problem whatsoever - the
> only thing that you would be required to do in such case is to enable this
> privilege in a token, which, unlike adding privileges to the account, may
> be done programatically. In fact, there would be no need to even mention
> it, in the first place. However, the fact that it requires user interaction
> with a console adds sort of an extra “complication”. Let’s face it -
> telling end users that they have to configure the OS in a certain way
> before your piece of hardware can be utilised does not really seem to add
> any extra selling points to your product, don’t you think…
>
>
>
> > This has changed from Windows version to version and it has been a long
> time
> >since I worked outside of a GPO controlled environment so I might have
> this mistaken
>
>
> Fair enough -assuming that this feature may change from one OS version to
> another, I have to admit that my practical experience with Windows as a
> user is VERY outdated. Unless we count booting up a new machine and taking
> whatever steps are necessary under the given OS version
> before “defenestration process” can be successfully launched, as a “user
> experience”, the last Windows version that I have practical experience
> with is XP.
>
>
>
> The only thing that I don’t understand is WHY it should be changing under
> different OS versions, in the first place. After all, the principle of
> “least privilege by default” seems to be pretty universal, and the ability
> of user apps to lock physical pages in RAM does not seem to be of crucial
> importance in 95+% of cases, does it…
>
>
> Anton Bassov
>
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
>


Jamey Kirby
Disrupting the establishment since 1964

This is a personal email account and as such, emails are not subject to
archiving. Nothing else really matters.
</http:>

Hi All,

Thank you for your precious response. I have concern about security as well.

I need one more help on same that can I allocate more than 4M at the time of

WdfCommonBufferCreate(DmaEnabler,WriteCommonBufferSize = 20M, WDF_NO_OBJECT_ATTRIBUTES, &DevExt->WriteCommonBuffer);

if yes than Please tell me requirement.because while allocating this I am not able to install driver.

Thanks,
Kishan Patel

Hmmmmm…

How much memory do you have on the system? You’re (effectively) trying to allocate 20MB of contiguous memory… which can be difficult once the system is up and running.

Peter
OSR
@OSRDrivers