I fail to understand the overall intent of this exerices.
If you are in the business of writing a driver that talks with a GPU, don’t you have already the ability to see all of the calls to the display-driver and the miniport by the virtue of writing the driver for them ?
If you are creating a virtual-device, then, you are on your own to provide the services requested by your device/device-driver pair.
I can possibly speculate that you are really writing something
that makes few more monitors
to appear to the user, and, diverting those monitors to a USB dongle, a network-based display solution and the such.
In this scenario, I can see how you would tempted to have an existing device to just show a couple of extra monitor
, and then intercept traffic from/to those devices.
In general, this is both hard and unsupported both in the actual hardware and in the OS and IHV software.
For the hardware part: in general video-cards are configured to have a limited number of regions of video-memory that can be fed to the ramdac. For example, the majority of the commodity cards have 2 scanout sources, that can be hooked-up to 5 targets, with not all paths allowed at the same time.
Assuming that you will fake
a couple of scanout sources, how would the real hardware react to that ?
For the IHV part: in the XPDM days, the DDraw/D3D part of the XPDM stack was fragile at best. I still think that certain IHV never understood the difference between session space and system space, and, their kernel worker-thread spawn-off by D3D interactions made the system really unstable. Intercepting all of that will make the system even more fragile.
For the OS part: VideoPrt.sys has a very private harware enumeration contract with the miniport, and, that has little to do with PnP. The reason is that VideoPrt is at the bottom of the device-stack for \.\DisplayX, that is what is being used by Win32 and the DDraw stack. Then, the binding between dxg.sys and win32k.sys is again private and involved, at best.
The semi-sustainable solution is to have a fully virtual miniport/display-device pair teamed-up by a user-mode rendered process.
The miniport would communicate with an OOB channel to the worker-process.
The worker-process would then use traditional mechanism to talk with the already-installed GPU to perform rasterization and rendering. Upon completion of rendering, the result would be shared back with the virtual display-deivce, and, the miniport could complete the relevant operation that triggered the rasterization.
The major problems I can see with all of this is latency of operations, synchronization and lock-inversions.
Most of the rendering stack in Win32k is expected to be bottom-down. The rendering happens while system-wide locks are held, and, those locks will not be released until your operation completes. Since you cannot do rendering on the real GPU while your virtual-GPU is beign used, it means that you’ll have to fake successfull completion.
That implies that you cannot ever get a correct screen-capture, because you have to fake success while you are deferring the work to the renderer process.
If you ever were tempted to issue DDraw/D3D rendering from kernel-mode, forwarding from the virtual-GPU to the real-GPU, again, this has serious locking and re-entrancy implications.
I’m afraid that your CPU based rendering is about as good as you can get your exercise.