OSRLogoOSRLogoOSRLogo x OSR Custom Development Services

Everything Windows Driver Development

GoToHomePage xLoginx

    Thu, 14 Mar 2019     118020 members


  Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
ListServer / Forum
  Express Links
  · The NT Insider Digital Edition - May-June 2016 Now Available!
  · Windows 8.1 Update: VS Express Now Supported
  · HCK Client install on Windows N versions
  · There's a WDFSTRING?
  · When CAN You Call WdfIoQueueP...ously

From Andy's Bookshelf: So you Wanna Write a Video Driver


The world of NT device drivers seems to be a nice, orderly world, even though pinball is considered an Accessory.  Certainly, there are rules.  As espoused many times in this forum, if you follow the rules, your driver will work quite nicely.   Don’t follow the rules, and one will see why it’s not nice to fool with mother NT.  While for the most part, device drivers have a common form, fit and finish (e.g., DriverEntry, IRPs, Resources, Events, etc.), there are some notable exceptions.  Some drivers  (like file systems) involve extra semantics and rules that you have to abide by in order to present to the user a “unified” front.   Others have even been compartmentalized with the Port/Miniport architecture.  Plug-n-play will add more “features” to the world of the NT device drivers.


And then…there are the graphics device drivers.  These are, um, different.


Graphics device driver writers tend to take a large amount of flack, for many reasons other than their personality.  They are responsible (or are perceived as responsible) for system performance through the impatient eyes of the user.  Take a DEC Alpha 800MHz chip or a quad processor Merced, both of which are fast enough to invoke an instruction before you think about it.  If the graphics adapter on that machine is any less than cutting-edge and it takes some extra time to move the window around the screen, the entire machine is deemed to be “too slow”.  Finding that panning in Quake is jumpy? Or does Laura lack a little of her usual luster?  How about some extra black lines across the screen?  I can hear the stampede of CPU/platform/system designers /graphics hardware designers/top scorers/marketing pointyhead-types heading toward your cubicle to tell you why this is your problem, not theirs.


Therefore, some education of the general NT device driver populace on the trials and tribulations of graphics device drivers is in order.  Hey, if you don’t feel sorry for us, you can always set your graphics adapter to VGA mode which works just fine and dandy with 16 colors and 640x480.


First, some graphics basics.


Let’s start with pixels, the itty-bitty dots on the screen.  The big trick here is to determine correctly, repeatedly and reliably, which ones to light with which colors.  A linear collection of these constitutes a scanline, while a  rectangular region of pixels constitutes a bitmap (which, yes, can be a collection of scanlines for you canonical-heads).  If we try to move a bitmap from one place to another, this is known as a bitblt (bitmap block transfer), spelled any number of ways, but all pronounced the same.  If you take a bitmap and “spread” it across another rectangular region, stretching the bitmap or stamping multiple copies of the bitmap, you have just performed texture mapping.  Congratulations!  You are now graphical buzzword competent.


Now, there are really 2 distinct graphics subsystems that reside on your workstation:  2D and 3D graphics.  The 2D stuff concerns itself mostly with actions like painting a region of pixels or getting a set of pixels from one place to another (as in moving a window or dialog box around the screen).  2D graphics tend to limit themselves to a simple, pixel (so that’s integer) coordinate system, so we only worry about moving/drawing things on pixel boundaries.  (Yes, there are irritating sub-rules for exactly which pixels are lit for any type of operation and this can be a black hole for the graphics hardware designers or nightmare for the graphics device drivers if the hardware guys don’t get it quite right).


Then there is the 3D beast.  Think games.  Think cool pictures.  Think complexity.  The deal here is that the graphics primitives are specified not in screen-type, pixel coordinates, but in 3D floating point x,y,z (z?).  That is, not an integer coordinate system, but one that really has very little to do with the screen.   So, that means that the 3D folks have to do some fairly complex floating point computations to map these 3D coordinates in the 2D pixel world in which the screen resides.  This mashing of coordinates is known as vertex transformation, the mashee being the said 3D coordinate vertices (usually describing a triangle or quadrilateral), and the masher being a thing called the 3D pipeline.  **This 3D pipeline takes in a massive about of state (as well as memory and compute cycles) like the color of the primitive, the types, color and number of lights in the scene, the way that the primitive reacts to the light (how it reflects the lights),  if there’s any sort of texturing applied, how much of the primitive is clipped.**  The 3D pipeline takes all this information, mixes it up, and determines which pixels in the 2D screen array map to the 3D vertex and how to fill in the pixels in-between.  Of course, there are a number of ways to approach this, all of which are based on sound computer graphics principles, and are discussed by religious zealots of all denominations at great length.  This 3D pipeline architectural approach argument, as well as the discussion of which part of the workstation  (the CPU or the graphics adapter) does which parts of the 3D pipeline, are a constant source of debate in the architecture and design of the graphics adapter.  Add to these, the time constraint put upon by the graphics industry to get a new graphics adapter out every 6 months, with an approximate doubling of the speed every 12 months (usually in terms of texture mapped triangles per second).  All of these items are areas for other discussions, but it’s probably enough to show folks that it’s not a pretty sight (internally, that is).


So, What’s the Big Picture


So, how this stuff all work?  Let’s start at the top when a user-mode program issues a Win32 graphics request, like Polyline, or “display the dialog box” (see Figure 1).  This request is a call into the Win32 GDI library (GDI32.DLL).  GDI32.DLL does a bunch of request and state checking, possibly returning back information.  If GDI32.DLL decides that it cannot do the job, it then transitions into kernel mode (via the interrupt gate on the i386) and calls into the system module win32k.sys.  This is what’s known as kernel-mode Graphics Display driver, also known as GDI  (yep, the same name, so pays to know who you’re talking to – a user or a kernel person).

Figure 1 - Overall Graphics Architecture

The basic design for the graphics subsystem is that it would segregate the graphics adapter specific functions from the device independent functions – the classic Device Independent/Device Dependent (DI/DD) functional split.   The GDI takes the graphics request, checks the Win32 graphics state, and then breaks the request up into smaller, simpler graphics requests.  It is these graphics requests that are then sent to the device specific display driver.  It is this driver which interacts (through the HAL) with the graphics adapter to render the correct image.  Note that this breakup of functionality greatly facilitates the quick implementation of a graphics device driver.  The DI/DD interface is well-defined and contains a set of graphics primitives and operations (line, paint, bitblt, etc.) and much of the Win32 state and graphics object management is in the GDI.


Now, good students of NT device drivers may ask, “Where’s the IRP from the I/O Manager?”  Um, there isn’t one.  The graphics device driver is a very special kind of device driver, having its own interfaces and rules.  In fact, this graphics device driver world is a much more restrictive world, than that of the usual NT device driver.  The relationship between the GDI and the graphics device driver is that of a synchronous call-return interface, and not an asynchronous, packet interface.  The motivation here is that the graphics requests must be done atomically, in order and serially, so that the user has the correct graphical feedback that will determine the next user operation: continue to move a window, click on a box, wash the car, generate a blue screen, etc.


But it’s not quite all that simple.  The device specific display driver actually consists of two separate drivers: a Device Display Interface (or DDI) driver and a Video Miniport driver (see Figure 2 - DDI and Miniport Drivers).  The DDI is a kernel mode DLL which is responsible for all rendering activities, and really only deals with rendering functionality.  That is, the DDI receives graphics requests from the Win32 subsystem and then interfaces with the video display hardware to produce the correct graphical representation.  The Video Miniport driver is a kernel mode driver (well, almost) which is responsible for the non-rendering tasks required by a DDI for a particular graphics adapter.   For instance, tasks such as graphics adapter initialization, or mapping of adapter registers, or allocation of resources.   Both of these drivers are a matched pair and together are considered a device specific display driver.


Figure 2 - DDI and Miniport Drivers


Video Port/Miniport Drivers


Video Miniport drivers have some of the same characteristics of other miniport drivers in Windows NT. In fact, folks who have worked with the SCSI port/miniport architecture will seem some quite striking similarities, not only in the form, but in the actual interface between the Video Port driver and the Video miniport driver.  The Video Miniport drivers are “wrapped” by a higher level driver (the Video Port driver), and the miniport driver (if written correctly) is compatible across Windows NT platform architectures.  Please note however, the Video Miniport driver for Windows NT is not compatible with the Video Miniport drivers for Windows 9x platforms.


The Video Port driver (videoprt.sys) is implemented by Microsoft as a standard NT kernel mode driver. This port/miniport design is used to insulate the graphics device driver from the usual NT executive interfaces, such as IRPs, DriverEntry(…), etc.  Because many of these interfaces and mechanisms are the same for all graphics devices, the common processing has been pooled in this port driver so that the graphics device miniport driver writer can concentrate more on implementation of the graphics device.  There exists a Video Port driver export library (videoprt.lib) to which the Video Miniport driver links.


When the Video Port driver is loaded, Windows NT will invoke its DriverEntry(…) routine which will perform the normal device driver initialization (routine registration, object creation, etc.).  Once loaded, the Video Port driver then queries the registry for the available video services, and, when found, the Video Miniport driver for that service will then be loaded. 


Because the Video Miniport driver is wrapped by the Video Port driver, its format and interface is much simpler than that of standard kernel mode device drivers.  As a result, Video Miniport drivers are typically shorter and simpler than an equivalent standard kernel mode driver.   Additionally, the standard structure of a Windows NT kernel mode driver does not apply to Video Miniport drivers.   Specifically, the Video Port/Video Miniport interface is a call-return interface, and therefore all access to a graphics adapter is serialized.  There is also a per-adapter resource lock which the Video Port driver must obtained before any request is submitted to the Video Miniport driver.  This imposes a single-threaded access paradigm for all Video Miniport requests and thus, single-threaded access to the graphics adapter itself.


Like typical drivers in Windows NT, the Video Miniport drivers start with a DriverEntry (…) routine that is called when the it is loaded.  However, this is where the similarities between the Video Miniport drivers and normal kernel mode drivers end.  The initialization consists of a number of sequential handshake calls between the Video Miniport and the Video Port driver.  The Video Miniport driver allocates device specific storage areas, sets function entry points (to be called by the Video Port driver), scans the buses looking for known graphics devices, chats to the registry, and connects to interrupts, among other typical initialization activities.  This activity culminates in an ultimate return from the Video Miniport’s DriverEntry(…) routine, and if successful, with the state of the graphics device fully initialized and ready to receive non-rendering graphics adapter requests (typically IOCTLs).


Because the Video Miniport driver is a wrapped driver, its interface to the I/O Manager is controlled by the Video Port driver.  Specifically, the Video Miniport driver does not see IRPs.  Instead, the Video Port driver will repackage IRPs into Video Requests Packet (VRPs).   These VRPs are received by the Video Miniport driver in the Video Miniport’s StartIO(…) routine (specified during initialization). 


There are a large number of HAL-ish routines that are exported by the Video Port driver as the interface which the Video Miniport driver is to use when talking directly to the graphics hardware.  This insures the HAL portability paradigm across NT architectures (e.g., Don’t bypass the HAL).  All of these routines are prefixed by the VideoPort moniker.  However, because it  is similar to a standard NT device driver (even though the Video Miniport rules stipulate that the developer need restrict themselves to only those routines exported by the Video Port driver), all of the NT kernel executive routines are callable from the Video Miniport.  Thus, normal NT driver-type activities (like DMA) can occur within the Video Miniport, as long as the requisite NT-aware care is taken when using those interfaces and mechanisms.


Video Display Driver


The architecture for the kernel mode graphics rendering subsystem for Window NT is notably different from the other kernel mode subsystems.  As mentioned previously, it consists of a device independent driver, known as the kernel-mode GDI, which receives user-mode graphics requests and determines how these are to be rendered upon the display device.  GDI will then make graphics requests to the resident device specific graphics rendering driver (DDI).   


For the display driver, the folks at Microsoft chose a similar “wrapped” architecture, a bit like the Video Miniport driver where the “port” driver piece is the GDI and the “miniport” piece is the DDI.  Beyond this notion of the architectural division of labor, any similarities with the typical NT port/miniport disappear.  In fact, the initial entry point for the DDI is not called DriverEntry(…), there is not any type of StartIO(…) routine, and the DDI does not handle anything like IRPs/VRPs.  The GDI/DDI interface is predicated on the call-return model, with the added notion that upon return from any DDI function, the display has been correctly updated and ready for the next graphics request.  


There may be a number of reasons for this type of architecture in the Win32 graphics subsystem.  Among them is that there really is only one graphics subsystem in the NT environment and this coordinates and serializes all graphics activity in a single subsystem.  Additionally, the Win32 graphics interface is quite large and complex, containing a magnificent number of graphics and window primitives, ranging from simple lines to complex dialog boxes to mip-mapped, 5 light sourced triangles.  While most of the windowing menu primitives are comprised of any number of simpler 2D graphics primitives, the device independent GDI layer can present a consistent interface to the underlying graphics options.  This architecture allows the reduction in the number of graphics primitives that a DDI (and also the underlying graphics adapter) must implement and therefore reduces the complexity of the supporting graphics software and adapter.   Additionally, GDI handles all Win32 graphics state management which is not dependent upon graphics display characteristics (e.g., Win32 primitive composition, display resolution, color bits per pixel,  graphics memory,  etc.), and determines, in a consistent manner, the graphical primitives that are to be rendered.   This frees the DDI to concentrate on only the rendering aspects of the specific graphics adapter for a known set of graphic rendering primitives.


Another major architectural reason for this wrapper architecture was to keep the DDI from having to deal directly with the NT executive.  This may or may not have come about because of the transition of GDI from user-mode (in NT 3.51 and before) to kernel mode (NT 4.0 and after).    This move had the effect of freeing the graphics device implementers from having to figure out how to interface with all of the NT executives (like the I/O manager) and enabled them to concentrate on the details of rendering the correct pixels to the display device.   However, since the interface with NT is completely specified by GDI,  the DDI has no access to any NT kernel executive functions, except for those exported by the GDI.  In fact, to build a DDI requires a special library and a special TARGETTYPE.  Additionally, when a DDI is loaded, a check is made to see if the DDI was linked against anything other than GDI.  If the check is found to be TRUE, the GDI will refuse to load the DDI.  Note that if the load fails, there is no failure or warning message (even on the checked build).  You’ll know that your DDI didn’t load because the system will load the fallback VGA driver instead.


GDI is implemented as a kernel mode driver (win32k.sys) and there exists an export library (win32k.lib) to which the DDI links against.  During boot, GDI is loaded after the Video Port and Video Miniport drivers have been loaded, at which point it will run through its initialization and then load the DDI associated with the Video Miniport.   During initialization, DDI will generate a table of index/function entry point pairs.  This table informs the GDI of the GDI primitives that the DDI can render.   Additionally, the DDI will allocate and initialize a per-graphics adapter state block.


How Do We Turn This Thing On?


So we want to draw something.  All Win32 drawing/rendering requests are received by the GDI and are broken down into one or more DDI rendering requests.  Most GDI request parameters consist of a target bitmap (or “surface”), the primitive (e.g., line, rectangle, etc.), a list of rectangles into which the drawing is to occur  (known as the “clip list”) and a raster operation (the bit-wise operation to be performed on the destination pixel).  The actual parameter list is specific to the graphics function and may contain more GDI graphics objects.  The DDI graphics function must then traverse the clip list for each clip rectangle and perform the rendering operation, returning either TRUE (if successful) or FALSE.  If FALSE is returned, an error is logged, but more importantly, this means that the graphics request did not complete and therefore the display will be missing the graphical update (i.e., the desktop display may be “corrupted”).  Importantly, there is no “automatic” fallback if the display operation fails and FALSE is returned.  Bummer.


So, it sounds pretty straightforward.  The DDI registers with the GDI telling the GDI what it can render; the GDI then takes Win32 requests, breaks them apart and calls the DDI with the pieces that DDI knows how to render.  There’s only a small fly in this oatmeal, and that’s the number of combinations of pixel formats that the requests can take.   The pixel format is the number of bits of a pixel that is used to determine the “color” of the pixel.  Unfortunately, this can take many forms:  A one-bit/pixel which is monochromatic; A four-bits/pixel which is a hold over from the olden days; An eight-bits/pixel which is also known as pseudocolor;  A 16 bits/pixel; Or a 24 bit/pixel which is known as True color or “millions of colors”, among others.   But, more maddening, is that even though you have the list of bits per pixel, the bits may be formatted in a number of ways.  For example, the 16 bits/pixel may be 16 bits of greyscale, a 4.4,4,3 color cube with 3 “extra” bits (see 3D), or 5,5,5,1 color cube (those 3D guys…).   Even without the data explosion of the actual bit format, the killer is that if your DDI says that it will render a primitive, it must render that primitive for all pixel formats in any combination.  That means that the DDI code must know how to figure out how to interpret and use all pixel formats, both for input pixels and output pixels.  Oh, and let’s not forget that the resultant pixel may have to be combined with the destination pixel in some sort of arithmetic (raster) operation and the hardware guys forgot to put it into the adapter.  And these are NOT limited to the usual 16 suspects.  In fact, GDI has a whole truckload.  Yuck.


So, what’s a DDI writer to do?   Return FALSE and don’t update the screen?  That’s a little harsh.  Well, fortunately for the sanity of the DDI implementers, there is an “out” for the cases where the requested operation may be too difficult for the DDI to perform.  The very astute reader might realize that during initialization, the DDI had informed the GDI what graphics primitives it could render.  What if the DDI couldn’t render anything?  That implies that the GDI must have capabilities to perform any graphic operation with any set of GDI operators internally, with a software renderer.  In fact, the GDI can render all graphics operations, regardless of the diversity and complexity of the graphics request parameters.  Thus, on such a difficult graphics request, the DDI can call back into the GDI and have the GDI’s Graphics Rendering Engine (GRE) perform the graphics request.  So, the DDI generates a temporary bitmap, copies the contents of the screen onto the bitmap and calls into the GDI to render onto this temporary bitmap.  After the GDI has done the rendering, the DDI will then copy the contents of the temporary bitmap to the screen and complete the operation by returning TRUE.  In DDI/GDI parlance, this technique is known as punting.  With this technique, all graphics operations can be realized by the DDI and there will not be any lost graphics, and thus, no corrupted displays.  This also decreases the burden upon the DDI to support all possible variations of the GDI’s graphics requests either in the graphics adapter itself or in DDI software emulation.  Of course, the downside is that, since this is done in software, performance is an issue.  In fact,  most implementations will make determinations of the most frequent requests and their parameters and either generate very fast local code or implement the operation on the graphics adapter itself  (gee, I wonder if some NT consulting and seminar teaching firm has such a tool?).   On a side note, the GDI does have a notion of “hooking” in DDI, but this I leave for another time.


**An interesting aspect of DDI drivers is that the DDI can optionally specify whether or not GDI’s access to the DDI is single-threaded.  By default, GDI allows for multiple threads to be accessing different bitmaps simultaneously.  However, this multithreaded access is constrained by GDI, allowing only single-threaded access to each bitmap.  But, this means that multiple threads can be accessing different bitmaps simultaneously.  DDI can further augment this constraint by specifying that all accesses to all bitmaps are single-threaded, effectively making GDI’s overall access to DDI single-threaded.   Hey, who needs all of those of Resources and ExAcquireResource and spinlocks and stuff?  Hey, we’re the only graphics subsystem in NT anyway.   We can be single-threaded!


This single-threaded mode may be desired for some graphics adapters that process graphic requests at a rate different from that of the host CPU.  Specifically, it is assumed that upon return from the graphics request, the DDI has completed the entire rendering activity and the contents of the display (or bitmap) is complete and available for the next request.  This has obvious ramifications if the graphics adapter is still updating an area of the display while another GDI thread is accessing that same display area.  So, there are still situations where the DDI folks need to worry about synchronization with the screen and GDI, and, sure enough, there are some GDI/DDI interfaces to handle that situation.


So, we’re chugging along, painting pixels on our screen with the GDI/DDI interface.  But one should take pause to wonder.  This is a synchronous, call-return interface.  Aren’t there other techniques to get data to the graphics adapter that might be a little bit faster?  Sure, don’t all other types of devices have DMA?  Sure, why not buffer up graphics data into a known buffer (known between the user and kernel) and then fill it with graphics commands and then blast that data directly to the graphics adapter via DMA?


Well, if you recall, the video display driver architecture is a “wrapped” architecture.  In fact, it’s quite a restrictive architecture in that DDI can only call exported routines from GDI (win32k.sys) and is prevented from calling anything outside of these exports (like IoAllocateAdapterChannel(…)).  But wait, can’t the Video Miniport make NT executive calls?   It’s bending the rules a bit.  Wrappers, by rule, should not call routines outside of their wrapped interfaces.   But, since DDI has this hard restriction (enforced by the loader, remember), and since the Video Miniport driver does not having this same restriction, it seems like we have a way to get out of this.  So, while the Video Port driver does not currently (as of Windows NT 4.0 SP3) contain direct support for DMA functionality (e.g., VideoPort version of memory functions, event functions), the Video Miniport driver may use the various NT executive routines to perform the necessary ancillary activities to enable DMA.  In addition, the DDI can communicate with the Video Miniport via IOCTLs (EngDeviceIoControl(…) as exported from the GDI).  Given these basic tools, DMA functionality can be supported in a graphics device driver subsystem.


A Word About the Active Stuff


Under “Active” umbrella, there is this user mode faculty called ActiveMovie.  This, of course, facilitates folks to get CNN video clips of some pie throwing incidents to be displayed on their workstation.  This is actually the implementation of bitblts (for each frame of video) onto the screen.  However, using the bitblt functionality of GDI is not sufficient for such streaming video.  That’s because the GDI bitblt  is a synchronous call-return interface.  So, for each frame you must copy the bits to memory, make the request (which copies the data to the screen), and then wait for the bitblt to complete before control returns back to the requester to send down the next frame.  This, of course, can result in choppy animation, even on machines that might have a very fast funnel from host memory to the graphics adapter.  So, a faster bitblt engine (actually a faster bitblt interface) had to be created so that the frame production would not be bottlenecked by synchronous transactions.  Thus, (well partly) the “Active” stuff was born.  Internally, this takes on the form of “Direct” stuff.  For facilitation of fast bitblt display, this is known as DirectDraw.


The motivation behind DirectDraw was that it was faster to move things from the host memory (the bitmaps) than it was to get the bitmaps onto the screen.  Thus, one could tell the adapter to bitblt a bitmap onto the screen and, while the bitblt was taking place, subsequent “to-be-displayed” bitmaps were loaded into the graphics adapter.  This removes the synchronous bottleneck of waiting for the bitblt to occur before the next bitblt, and allows the graphics adapter to bitblt at its own pace and not have it interfere with the loading of the bitmap onto the graphics adapter memory (assuming that the host CPU can push down the bitmaps very quickly).


As for the implementation, since this is a Microsoft gizmo, there is a complete emulation layer of DirectDraw contained in the GDI so that graphics adapters that do not support this functionality have something to  fall back upon.  This turns out to be a boon for graphics device drivers writers, since they can initially fall back to this software emulation and then later in the development cycle, go back and insert the DirectDraw calls directly to their graphics adapter at a later time without loosing user-mode functionality.


There’s only a small beetle in this porridge.  The (once again) astute reader may perceive that we’re now accessing the graphics adapter hardware asynchronously.  In fact, while the graphics adapter engine is updating the screen (and very possibly some memory on the device), we may be loading the next bitmap frame into that same memory at the same time.  Two things: 

·         The graphics adapter memory architecture better be able to handle multiple threads accessing its graphics memory simultaneously.  All of a sudden, while our DDI may still have the notion of being single-threaded, DirectDraw doesn’t have this notion and we have multithreaded access to hardware resources.

·         An underlying principle behind DirectDraw access is that the user-mode program can get a memory address that points to the first byte of the bitmap.  Typically, this is for the next frame, so that the memory on the graphics device may need to be accessible from user-mode.  Sure you can perform the requisite memory management tricks to get this to work correctly, but there is the additional assumption that the memory is linearly addressable (a starting address and the number of bytes per scanline, or “stride”).  For simple (read: older) graphics devices, this may not have been a problem.  But with the newer graphics modules, the memory architecture may not be very “simple” (since we may have a lot of other memory access requirements on the graphics board to support 3D functionality) and therefore the memory architecture may not lend itself easily to a linear framebuffer.

So, You Want Pretty Pictures?


So, that’s the basic 2D interfaces.  What about the cool, snazzy looking 3D stuff?   Sure, but first a word from the OSR Graphics History department about graphics APIs.


First, there was Win32 and it did all of the 2D stuff just fine and dandy.  Word and PowerPoint and Excel worked just great and there wasn’t anything else in the world that had to run, right?.  But there were folks in the other graphics world (really, the UN*X workstation world) that wanted to do 3D graphics.  In this UN*X world, much activity was given to the problem of 3D APIs since the dawn of  Sketchpad.  3D APIs came and went in 5-7 year cycles (and it seems that this is still the case – you have been warned).  For the dawning of 3D on Windows NT, the question was whether to expand the Win32 graphics API, create a new API, or use the current flavor of the day.    The current “decision” for 3D APIs for NT is in the form of Direct3D (from the “Direct” umbrella of Active stuff) and OpenGL, the former being something grown at Microsoft, the latter being the current “open” industry standard, based on the GL API of Silicon Graphics.  Both of these APIs are not part of the base Win32 graphics API architecture, but are separate entities, requiring separate major functional modules to be implemented.  Both have pros, cons and associated religious baggage, none of which we’re willing to speak about unless threatened by the Independent Counsel.  However, it suffices to say that the Microsoft party-line is that Direct3D (speed over precision) is geared towards consumer applications (e.g., games) and OpenGL (precision and repeatability over speed) is geared towards the technical graphics (e.g., CAD-CAM, computer animation/simulation, etc.).  And, we’ll add, that the line separating these two camps is as clear as the morning fog on Golden Gate Bridge.


In either case, both 3D subsystems have some of the same general architectural similarities and problems.  Both have a device independent/device dependent structure in the attempt to remove some of the burden of software emulation and facilitate the device specific display driver implementation .   In the case of Direct3D, with NT 4.0, while not officially supported, the device dependent interface is still undergoing evolutionary changes.  In the case of OpenGL, an older interface, there is a fairly straightforward device dependent interface, which was only recently made available through Microsoft.  However, one of the major generic problems with 3D architectures (Direct3D and OpenGL included) is that this device independent/dependent division line is difficult to identify and difficult to maintain as graphics technology evolves at its currently rapid pace. 


One problem is that the 3D pipeline consists of a number of almost discrete stages : transforming vertices (a number of times), computing the lighting and shading characteristics, clipping the primitives, applying textures, determining the color values for all pixels contained in a primitive.  All of these stages can be done in hardware, in certain combinations, or all together.  The complete hardware solutions tend to be very expensive, some on the order of 6 digits just for the adapter, but the graphics are very fast and pretty cool.   There are solutions where only certain stages are implemented in silicon, and every hardware implementation has its own idea of which stages to implement and what the interfaces are to be.  Obviously, this doesn’t lend itself to a real device independent type of solution.  In fact, it seems that the graphics adapter architecture (and the interface to the graphics adapter) tends to migrate itself up to the top of the 3D pipeline.  Add to this the memory requirements for the 3D pipeline, which must store/access large number of vertices (depending upon the primitive), large amounts of state, an incredible explosion of state combinations, and large/fast access to memory.  And then, because performance is the name of the game (have to make those brick hallways look nice while we’re running down them), there are known certain combinations of 3D graphics state and primitives that are used frequently.  These combinations may be very well suited for “compression” of the 3D pipeline from a general 3D pipeline solution to a specific solution, implemented directly in hardware (so, yet, another interface).  These are some of the reasons why a number of graphics adapter manufacturers discard the DI/DD structure supplied by Microsoft (for OpenGL) and implement their own entire OpenGL subsystem.  Such a task is a very large engineering effort and takes a tremendous amount of time and expertise, but it allows the greatest flexibility of implementation and, ultimately, the largest control over performance.   In the case of Direct3D, the jury is still out and deliberating on exactly what the interface should/could be, but it can be seen that Direct3D will have the same problems/issues to deal with. 


Add to this, the fact that the general 3D device interface has been changed by market forces.  The case in point is texture mapping.  In the days before Doom, the emphasis was on shaded primitives (triangles, quadrilaterals, polygons, take your pick).  But the folks at Id (and others) showed how texture mapping can be done quickly and easily and cheaply (you’ll notice that in Doom you can’t look up or down – that’s because of the restrictions on the texture mapping algorithm).  So, this meant that the texture mapping stages/interfaces became much more important.  Now, it’s a basic part of all interfaces to graphics engines, software and hardware (Wonder why AGP was made? – texture mapping requires a lot of memory and requires very fast access, and this was the fast, non-system bus access to host memory).  Figure, there’s going to be more to come and more “interfaces” to deal with.  Sigh.




So, that’s the deal with the current state of 2D and 3D graphics on your box.  What’s coming up?


In the 2D space, for NT 5.0, it looks like there are a number of goodies that we think will show up (note that we’ve been burned before, so reader beware).  Among the things we’ve seen/heard about: 

·         Multihead monitor support.  This used to be in 3.51 (sort of) and some manufacturers even delivered multihead drivers for 4.0, but this type of configuration was not officially supported by Microsoft.  The buzz is that this will be supported in 5.0 with all of the virtual desktop stuff as well.  Now, if you wrote your display driver correctly and according to all the rules, then support for multihead is supposed to be very simple and straightforward.

·         True floating point support looks like it might exist.  The problem here was that the floating point registers (for non-3D requests) were not saved, so things got a little dicey.

·         DMA support seems to be more fully supported in the miniport driver as well as the display driver.  There seems to be DMA interfaces for the miniport and event primitives exposed for the DDI.

·         File IO (well, mapped file IO) seems to have made a reappearance after it disappeared from 3.51.  This seems to be support for texture mapping and DirectDraw.

·        Hydra.  This allows for multiple users to hook into a single workstation/server and get graphics, mouse events and keyboard events remotely.  While this doesn’t have anything to do directly with graphics device drivers (as far as I know), this is a neat (long awaited) piece which was previously shipped as a product from Citrix Corporation.

What else?  Well, recall that back in January, Microsoft and SGI had an agreement that kind of put a stasis on the Direct3D vs. OpenGL controversy.  But more interestingly, they announced that they will jointly design yet-another-3D-API called Fahrenheit.  Looking at my 3D API watch, it seems that it’s just about on time.


So, that’s the deal on video device drivers.  They live in a controlled suburb of the NT device driver community and have a whole bunch of extra rules and regulations applied to them.  But, they also own the only horse in town, so everyone has to go through them to get anywhere.  Have pity on your graphics device driver guys.  We do the stuff you use all the time.


And remember – you can always switch back to VGA mode.  So, there.


Related Articles
From Andy's Bookshelf: Loading Video Drivers, a Mystery Solved
From Andy's Bookshelf: Floating Point Triage
Loading DLLs for Graphics Drivers
From Andy's Bookshelf: WinDbg Extension for GDI
From Andy's Bookshelf: Video Drivers and the Registry

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

"VideoMiniport and SCSI MIniport: Not Same Story!"
I want further clarification on this comment :- "There is also a per-adapter resource lock which the Video Port driver must obtained before any request is submitted to the Video Miniport driver. This imposes a single-threaded access paradigm for all Video Miniport requests and thus, single-threaded access to the graphics adapter itself." The author compares this structure with the SCSI driver machanism; but first of all the wrapper module in SCSi interface raises the IRQL during access to the miniport module(...during StartIO). This is not true for VideoPort. StartIo requests in the Video Miniport are serviced at PASSIVE_LEVEL. Secondly, in video miniport it is true that the VideoPort module will impose single threaded access to the Video Miniport from OS but Hardware interrupts can come in during servicing of any threads from upper levels. I also find that there is not neccessarily single threaded access into our graphics hardware. Video Port doesnot guarantee single threaded access to graphics adapter....observed on machines with hyperthreading/ multi CPU systems. However, I am a very early video driver developer and if I am sadly mistaken...I would appreciate if my presumptions are corrected. Thanks!

08-Oct-03, Pallavi Sharma Prasad

Post Your Comments.
Print this article.
Email this article.
bottom nav links