Transfer very large amounts of data from kernel to user mode

I have a requirement to build a data link from kernel to user mode that needs to have very high throughput, potentially up to 4GB/sec. This is for transferring video, i.e. frame sequence, where each frame can be up to 128MB at 30 frames per second. I’m considering the following options:

  1. IOCTL with METHOD_DIRECT
  2. IOCTL with a buffer allocated by VirtualAlloc in user mode with MmGetSystemAddressForMdlSafe on kernel side
  3. Shared memory (ZwCreateSection/ZwMapViewOfSection)

Any input on proc and cons of the above methods, or other transfer options are hugely welcome.

Maybe the following threads can be useful for what you need:

https://www.osronline.com/showthread.cfm?link=262104
http://osronline.com/showThread.CFM?link=215686
https://www.osronline.com/showthread.cfm?link=267149

ZwMapViewOfSection doesn’t map in the kernel address space, i.e. the upper half of the address space. You can map in the lower half of the System process but this will be available only in the System process context.

Rumit, Slava, thank you for your help!

Slava Imameev wrote:

ZwMapViewOfSection doesn’t map in the kernel address space, i.e. the upper half
of the address space. You can map in the lower half of the System process but
this will be available only in the System process context.

Can’t the driver create kernel mapping of this section, as usual (with a MDL)?
For a “serious” project like this, the hardware design is important (well done DMA and so on)
and capabilities of the host machine are important (how many CPUs? Can processing be parallelized?)
From the OP we don’t have enough info.

Regards,
– pa

I’ve mapped a section between km and um. Performance is better than IOCTL
method, but many on this list poo poo this method. I’m OK with it.

On Fri, Apr 28, 2017, 9:35 AM wrote:

> Slava Imameev wrote:
>
> > ZwMapViewOfSection doesn’t map in the kernel address space, i.e. the
> upper half
> > of the address space. You can map in the lower half of the System
> process but
> > this will be available only in the System process context.
>
> Can’t the driver create kernel mapping of this section, as usual (with a
> MDL)?
> For a “serious” project like this, the hardware design is important (well
> done DMA and so on)
> and capabilities of the host machine are important (how many CPUs? Can
> processing be parallelized?)
> From the OP we don’t have enough info.
>
> Regards,
> – pa
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

>I use a thread that waits on the process object in order to handle cleanup

This does not resolve the context issue.

Yes, but this does not resolve the context issue.

Either you have a keyboard maco that keeps unintentionally firing, or you really like that sentence.

By and large, there IS no “context issue”… if I’m writing a device driver, which is what we’re talking about, it is vanishingly unlikely that somebody is going to instantiate a filter device above my FDO. If they DO that, and my driver stops working, the filter is broken.

You are not allowed to write filter drivers that arbitrarily change the context in which a driver expects to be called. If you break my Fast I/O entry point, I will be unhappy. I will explain to my customer: “You install that filter, the driver stops working. You remove that filter, the driver works again. Conclusion: the filter is broken.”

If I’m at the top of the “driver stack”, supporting a unique device, and expect to be the first driver entered from the calling user-mode app, a filter is not allowed to change that. Full stop. A filter can filter… it it cannot fuck up my driver. If it does, the filter is architecturally broken.

Peter
OSR
@OSRDrivers

(Following-up my own post… bad form, I know)

Worse, it doesn’t even matter.

Let’s say I AM called in an arbitrary process context. Nothing stops me from changing to a specific context to do the mapping.

I’ll say it again: There IS no “context issue”…

Peter
OSR
@OSRDrivers

In the cases where I have done it, I use a worker to do the mapping in
system process context.

And yes, there is no performance difference if you don’t teardown on each
IOCTL.

When the um app disappears, it is easy to use another worker to undo the
section mapping.

Reverse IOCTL is the defacto way to do this, I just don’t like it. It feels
hacky to me. But what do I know.

On Sat, Apr 29, 2017, 11:39 AM wrote:

> (Following-up my own post… bad form, I know)
>
> Worse, it doesn’t even matter.
>
> Let’s say I AM called in an arbitrary process context. Nothing stops me
> from changing to a specific context to do the mapping.
>
> I’ll say it again: There IS no “context issue”…
>
> Peter
> OSR
> @OSRDrivers
>
>
> —
> NTDEV is sponsored by OSR
>
> Visit the list online at: <
> http://www.osronline.com/showlists.cfm?list=ntdev&gt;
>
> MONTHLY seminars on crash dump analysis, WDF, Windows internals and
> software drivers!
> Details at http:
>
> To unsubscribe, visit the List Server section of OSR Online at <
> http://www.osronline.com/page.cfm?name=ListServer&gt;
></http:>

The context issue is exactly this

If driver A incorrectly assumes that it will not be called in arbitrary context and takes no appropriate steps to remedy this, then it will fail under some circumstances which may be hard to describe.

If driver B correctly assumes that it cannot be called in arbitrary context, and takes no steps to remedy this issue, then it will never fail (at least for this cause)

If driver C correctly assumes that it can be called in arbitrary context, and takes appropriate steps, then it will never fail (at least for this cause)

If driver A, B or C is subjected to operating conditions that do not conform with the documented requirements of the OS, then many kinds of malfunctions can be expected to occur.

Sent from Mailhttps: for Windows 10

From: xxxxx@osr.commailto:xxxxx
Sent: April 29, 2017 11:39 AM
To: Windows System Software Devs Interest Listmailto:xxxxx
Subject: RE:[ntdev] Transfer very large amounts of data from kernel to user mode

(Following-up my own post… bad form, I know)

Worse, it doesn’t even matter.

Let’s say I AM called in an arbitrary process context. Nothing stops me from changing to a specific context to do the mapping.

I’ll say it again: There IS no “context issue”…

Peter
OSR
@OSRDrivers


NTDEV is sponsored by OSR

Visit the list online at: http:

MONTHLY seminars on crash dump analysis, WDF, Windows internals and software drivers!
Details at http:

To unsubscribe, visit the List Server section of OSR Online at http:</http:></http:></http:></mailto:xxxxx></mailto:xxxxx></https:>

This post is a nonsense.

People seem to be creating a false architectural constraint for Windows drivers, stating that they are always called in an arbitrary process context. This is NOT an architectural precept of the Windows OS.

The first driver entered from user mode will always be called in the context of the requesting process. This is a fixed architectural preceptin Windows. If it was not, there would be no Fast I/O. No file systems would work.

If a filter driver violates this, the filter driver is at fault. Full stop. End of story.

(To the WDF devs reading this: I am absolutely NOT saying you can make assumptions in your EvtIo event processing callbacks about the context in which you,will be called. The fact that EvtIoRead (for example) is called an arbitrary process context is a WDF architectural concept. That is NOT what I’m talking about in this thread, which is why I *specifically* mentioned Fast I/O.)

Peter
OSR
@OSRDrivers