I need help to get physical page for allocated buffer in user space for DMA transaction. I want to control DMA from user space .Currently I am using WDF_COMMON_BUFFER inside driver. I am copying user buffer data to common buffers virtual address and then I am passing this logical address to user space. After that this PHYSICAL_ADDRESS is used for preparing descriptor for DMA transaction. Now I fill that this is costliest operation.
Is there any way that I can Lock this memory in physical page and use this physical page for DMA transaction so I don’t need to copy every time while I transfer data to driver. I required similar functionality like WD_DMALock().
On Aug 28, 2018, at 8:14 PM, xxxxx@gmail.com wrote:Hello, > > I need help to get physical page for allocated buffer in user space for DMA transaction. I want to control DMA from user space .
Don’t do that. It’s just that simple. User-mode processes are not protected from one another. By doing this, you are opening up a HUGE security hole. Anyone who can open your process handle can force you to do arbitrary copies anywhere in memory.
Seriously. Don’t do it.
> Currently I am using WDF_COMMON_BUFFER inside driver. I am copying user buffer data to common buffers virtual address and then I am passing this logical address to user space. After that this PHYSICAL_ADDRESS is used for preparing descriptor for DMA transaction. Now I fill that this is costliest operation.
Of course. Every user/kernel transition is expensive. Just do the whole DMA operation in kernel mode. — Tim Roberts, xxxxx@probo.com Providenza & Boekelheide, Inc.
Apart from the issues that Tim has already mentioned, consider the system with IOMMU where (physical_address ==bus_address)?TRUE:FALSE expression may not always evaluate to the former option…
Well, apparently, not a transition per se (although, AFAIK, it used to be a really expensive “luxury” on the earlier systems) - these days the main overhead is, apparently, mainly related to parameter validation that has to be done upon every call that is made by the userland. In the OP’s case it results in a noticeable performance degradation
Aren’t you ding it backwards? Can’t the user-space app send up a buffer, and then have the driver pin that buffer in memory and DMA into it? No copy required THAT way… does that not work for some reason?
>Can’t the user-space app send up a buffer, and then have the driver pin that buffer
in memory and DMA into it?.. does that not work for some reason?
The very first idea that gets into my head is that the OP’s hardware requires the buffer that is physically contiguous - if I got it right, he made it clear that he needs a common buffer. In such case the above mentioned approach is not going to work - you cannot get physically contiguous memory from the userland,can you.
What he can try here is to allocate the contiguous buffer in the kernel and to map it to the userland.
Apart from the obvious security-related issues that such an approach is fraught with, it should work just fine…
You CAN allocate large pages in user mode, and pass them down for DMA. Each large page (2M) will be physically contiguous. The question really is does the OP need to do individual DMA transfers larger than 2Mytes? Even for a request ring buffer, 2Mbytes is decently big. If each request slot was 256 bytes, a 2M large page holds 8192 requests. If you have request queues with thousands of slots, you may want multiple queues anyway.
On some hardware, it also may give higher performance to copy highly fragmented request buffers into large contiguous buffers. Like if you were doing many 256 byte write requests, it may give optimal DMA performance if you had rings with 256 byte slots (or variable slots per request) and you just copied the data into the ring. The hardware then could just inhale big chunks from the ring memory with no need to do secondary buffer read requests.
A downside of depending on 2M large pages is there is some risk that non-Intel processors support different sized large pages than Intel.
Jan
-----Original Message-----
From: xxxxx@lists.osr.com On Behalf Of xxxxx@hotmail.com Sent: Wednesday, August 29, 2018 1:13 PM To: Windows System Software Devs Interest List Subject: RE:[ntdev] How to get physical pages from user space for DMA transaction.
>Can’t the user-space app send up a buffer, and then have the driver pin that buffer > in memory and DMA into it?.. does that not work for some reason?
The very first idea that gets into my head is that the OP’s hardware requires the buffer that is physically contiguous - if I got it right, he made it clear that he needs a common buffer. In such case the above mentioned approach is not going to work - you cannot get physically contiguous memory from the userland,can you.
What he can try here is to allocate the contiguous buffer in the kernel and to map it to the userland. Apart from the obvious security-related issues that such an approach is fraught with, it should work just fine…
> You CAN allocate large pages in user mode, and pass them down for DMA.
Each large page (2M) will be physically contiguous
Sure, but still there are few issues here
Consider the scenario of the target buffer being larger than 2M
You need SeLockMemoryPrivilege, which, IIRC, is not assigned by default even to Admins, and has to be added too the account. IIRC, this part cannot be done programatically and requires manual interactions with the management console
Even if you work around (1) and (2), there is no guarantee that you request gets granted. The longer the system runs, the lower probability of finding a large page that is available because of memory fragmentation
Not necessarily. You allocate the buffer in user-mode, and send it up to the driver, where he maps it using Direct I/O. The driver keeps the IRP/Request in progress until the app closes the handle (or exits). This ensures tidyness.
We call this the “big honkin’ hanging Direct I/O” technique… and it’s shockingly simple and practical.
Not necessarily. You allocate the buffer in user-mode, and send it up
to the driver, where he maps it using Direct I/O.
Please don’t forget that we are speaking about the large pages/physically contiguous buffers here.
Certainly, if the OP’s hardware is OK with a buffer that is physically non-contiguous, then the approach that you propose is most definitely the best way to go - this is out of question. However, if I got it right, he made it clear that he needs a common buffer…
As you can see, before you can call VirtualAlloc() for getting a large page you have to enable SeLockMemoryPrivilege with AdjustTokenPrivileges() , and before you are in a position to do it you have to ensure that the account has been actually assigned this privilege - you cannot add new privileges by means of AdjustTokenPrivileges(), can you. IIRC, this part can be done only via the management console and not programatically…
Despite the docs you pointed to being very clear, a blog post by Raymond Chen (I mean, what does HE know, right?), and numerous articles on SO… I couldn’t believe it. So, I worked super hard and wrote my own program (OK, not really, I copied 90% of it from somebody somewhere):
You know what? Error = 1314 – “A required privilege is not held”… I’ll be darned.
Gotta keep those pesky user-mode programmers under control, I guess! Can’t have them allocating LARGE PAGES. OMG!
Thanks Anton. I hereby grant you a free pass to make one rude, crude, arrogant, and socially irresponsible post here on NTDEV without being banned for life.
Actually, I believe I should earn some "good karma"on NTDEV before even thinking about switching to the “ironical mode”, taking into consideration the troll-management capabilities of the upcoming platform. Therefore, for the time being I will try my best to make some posts that fall into the “useful” category in order to avoid “The Hanging Judge’s” potential wrath…
There is an important reason for this restriction: large pages are not pageable. Think about it. You have some page file(s) full of 4k chunks with lots of data in them. Assume for efficiency that you manage the data in these files with some kind of bitmap ? like say the Windows team has done ? and then say now a want a 2 MB chunk to page out this large page I have been working on. Clearly it doesn?t work.
The ?obvious? solution is to say that large pages are chiefly used in limited situations such as mapping kernel32.dll or by ?advanced? users who have lots of RAM and full control over there systems and therefore it is okay to just make then not pageable. Once you have arrived at this conclusion it is not hard to see how the need for the lock pages in memory privilege becomes the next requirement.
IIRC Anton is wrong however in that this privilege is granted (but not enabled) for administrators by default. This has changed from Windows version to version and it has been a long time since I worked outside of a GPO controlled environment so I might have this mistaken
Any properly written code that tries to use large pages as a performance optimization can fail back to an successfully use normal pages. That?s what they are, a performance optimization for reducing the TLB lookup cost. Any other use is nonsense.
________________________________ From: xxxxx@lists.osr.com on behalf of xxxxx@osr.com Sent: Friday, August 31, 2018 3:37:32 PM To: Windows System Software Devs Interest List Subject: RE:[ntdev] How to get physical pages from user space for DMA transaction.
Well, I learned something today.
Despite the docs you pointed to being very clear, a blog post by Raymond Chen (I mean, what does HE know, right?), and numerous articles on SO… I couldn’t believe it. So, I worked super hard and wrote my own program (OK, not really, I copied 90% of it from somebody somewhere):
You know what? Error = 1314 – “A required privilege is not held”… I’ll be darned.
Gotta keep those pesky user-mode programmers under control, I guess! Can’t have them allocating LARGE PAGES. OMG!
Thanks Anton. I hereby grant you a free pass to make one rude, crude, arrogant, and socially irresponsible post here on NTDEV without being banned for life.
> IIRC Anton is wrong however in that this privilege is granted (but not enabled) for
administrators by default.
Please read my post more carefully. Look what I had said
If it was granted by default there would be no problem whatsoever - the only thing that you would be required to do in such case is to enable this privilege in a token, which, unlike adding privileges to the account, may be done programatically. In fact, there would be no need to even mention it, in the first place. However, the fact that it requires user interaction with a console adds sort of an extra “complication”. Let’s face it - telling end users that they have to configure the OS in a certain way before your piece of hardware can be utilised does not really seem to add any extra selling points to your product, don’t you think…
This has changed from Windows version to version and it has been a long time
>since I worked outside of a GPO controlled environment so I might have this mistaken
Fair enough -assuming that this feature may change from one OS version to another, I have to admit that my practical experience with Windows as a user is VERY outdated. Unless we count booting up a new machine and taking whatever steps are necessary under the given OS version
before “defenestration process” can be successfully launched, as a “user experience”, the last Windows version that I have practical experience with is XP.
The only thing that I don’t understand is WHY it should be changing under different OS versions, in the first place. After all, the principle of “least privilege by default” seems to be pretty universal, and the ability of user apps to lock physical pages in RAM does not seem to be of crucial importance in 95+% of cases, does it…
// open process token
if (OpenProcessToken(GetCurrentProcess(), TOKEN_ADJUST_PRIVILEGES |
TOKEN_QUERY, &hToken))
{
// get the luid
if (LookupPrivilegeValueA(NULL, “SeLockMemoryPrivilege”,
&tp.Privileges[0].Luid))
{
BOOL status;
DWORD error;
tp.PrivilegeCount = 1;
tp.Privileges[0].Attributes = SE_PRIVILEGE_ENABLED;
// enable privilege
status = AdjustTokenPrivileges(hToken, FALSE, &tp, 0,
(PTOKEN_PRIVILEGES)NULL, 0);
// It is possible for AdjustTokenPrivileges to return TRUE and still not
succeed.
// So always check for the last error value.
error = GetLastError();
if (status && (error == ERROR_SUCCESS))
{
HMODULE hModule;
PGLPM pGLPM;
// Get environemnt specific large page size
//SIZE_T minsize = GetLargePageMinimum();
SIZE_T minsize = 0;
if ( (PGLPM)NULL != pGLPM )
minsize = pGLPM();
if ( 0 == minsize )
{
// If the processor does not support large pages, the return value is zero.
// Or GetLargePageMinimum is not exist in Kernel32.dll
minsize = 2 * 1024 * 1024; // The minimum large page size varies, but it is
typically 2 MB or greater.
}
if (isize >= minsize)
{ // If we get this far, we know that we can allocate large pages
SIZE_T blocks = isize / minsize;
// Allocation size must be multiple of large page size
if (isize % minsize)
blocks++;
isize = blocks * minsize;
type |= MEM_LARGE_PAGES;
} // Too small; no need for large pages #if defined (_FORCE_LARGE_PAGES)
else
{
isize = minsize;
type |= MEM_LARGE_PAGES;
} #endif
} // Error setting privileges
} // Error on privilege lookup
} // Error opening token
// If we failed to enable large page security above, the function will
allocate
// as normal; so no failure in this case
PVOID m = VirtualAlloc(NULL, isize, type, PAGE_READWRITE);
if (hToken != INVALID_HANDLE_VALUE)
{
tp.Privileges[0].Attributes = 0;
tp.PrivilegeCount = 1;
// disable privilege
AdjustTokenPrivileges(hToken, FALSE, &tp, 0, (PTOKEN_PRIVILEGES)NULL, 0);
// close the handle
CloseHandle(hToken);
}
if (!m && (type & MEM_LARGE_PAGES))
{ // Large page allocation failed, revert back to normal allocation
type &= ~MEM_LARGE_PAGES;
m = VirtualAlloc(NULL, dwSize, type, PAGE_READWRITE);
}
else
{
}
> > IIRC Anton is wrong however in that this privilege is granted (but not > enabled) for > > administrators by default. > > > Please read my post more carefully. Look what I had said > >
> > > > If it was granted by default there would be no problem whatsoever - the > only thing that you would be required to do in such case is to enable this > privilege in a token, which, unlike adding privileges to the account, may > be done programatically. In fact, there would be no need to even mention > it, in the first place. However, the fact that it requires user interaction > with a console adds sort of an extra “complication”. Let’s face it - > telling end users that they have to configure the OS in a certain way > before your piece of hardware can be utilised does not really seem to add > any extra selling points to your product, don’t you think… > > > > > This has changed from Windows version to version and it has been a long > time > >since I worked outside of a GPO controlled environment so I might have > this mistaken > > > Fair enough -assuming that this feature may change from one OS version to > another, I have to admit that my practical experience with Windows as a > user is VERY outdated. Unless we count booting up a new machine and taking > whatever steps are necessary under the given OS version > before “defenestration process” can be successfully launched, as a “user > experience”, the last Windows version that I have practical experience > with is XP. > > > > The only thing that I don’t understand is WHY it should be changing under > different OS versions, in the first place. After all, the principle of > “least privilege by default” seems to be pretty universal, and the ability > of user apps to lock physical pages in RAM does not seem to be of crucial > importance in 95+% of cases, does it… > > > Anton Bassov > > > > — > NTDEV is sponsored by OSR > > Visit the list online at: < > http://www.osronline.com/showlists.cfm?list=ntdev> > > MONTHLY seminars on crash dump analysis, WDF, Windows internals and > software drivers! > Details at http: > > To unsubscribe, visit the List Server section of OSR Online at < > http://www.osronline.com/page.cfm?name=ListServer> >
– Jamey Kirby Disrupting the establishment since 1964
This is a personal email account and as such, emails are not subject to archiving. Nothing else really matters.</http:>
How much memory do you have on the system? You’re (effectively) trying to allocate 20MB of contiguous memory… which can be difficult once the system is up and running.