I was remembering a time, long ago, when I told a friend about some fascinating computations. He was an integer kind of guy. Knew and loved integers. And he was incredulous to find that I often dealt with floating point numbers.
"Why would anyone want to do that?" he would constantly ask me. "For graphics, all you need is a good line, fill and bitblt engine, and in return you get all the integer coordinates you would ever want to blast away at the pixels".
Needless to say, this conversation took place in the dark ages of computer graphics, a distant time that’s much to embarrassing for me to divulge. My friend still (only sometimes now) wonders aloud why someone would want to use floating points. I gently remind him that all his neat games with really cool graphics might just require these operations, either to render the wall he’s shooting at or use some special Intel-ish, MMX-ish instructions to display the movie that he’s shooting at.
So, how does one use floating point computations in the kernel? Well, we know it’s certainly possible. Figure that things like Direct3D and OpenGL and DirectDraw all need to do floating point computations at some time. But if you’re a pure 2D graphics device driver or just any old NT device driver and you’re not in the middle of some sort of 3D or DirectDraw rendering request, then you’ll find that blindly performing floating point operations may not come out just right.
Why? You see, in The Beginning, someone in kernel-land rightfully noted that the operation of saving and restoring floating point registers takes some non-zero amount of time. Since all kernel activity (at the time) was done with integers, they questioned why the extra work was necessary. Let’s save some cycles! After all, the graphics is done in user mode anyway, and they can use some floating point stuff up there. Then, along came NT 4.0 (GDI moves into the kernel), and we have these annoying 3D requests. Well, what happens is that through the grace of GDI, it notices that it has a 3D request and "turns on" floating point for the request, saving and restoring the floating point registers like one would expect. In fact, the 3D requests are so special, that they even get more stack space, and not the measly 12Kbytes that DDI folks get. So, if the request is anything but these more equal 3D citizens, that request is out of luck.
So, what’s a graphics device driver writer to do? Well, the DDK offers the FLOATOBJ construct to allow DDI writers to perform floating point operations. In fact, an entire set of operators exist to do most, if not all, floating point operations. Fully supported and fully functional. What a deal. But, a routine call for each floating point operation? Sure hope we’re not in the middle of any 4x4 transform/clip/lighting stages or some sort of extra cool FFT.
Walk on the Wild Side: Figure, who’s gonna really alter the values of the floating point registers while my driver is doing it computation? As for those pesky applications, I guess if they're doing floating point operations, I'll might have to save the floating point registers (Um, just how's that done, Clem?).
Anyway, this computations is important to my graphics driver and it’s only a couple of multiplies and adds. And, the chip is running at a zillion Hz anyway, so it’ll be done in plenty of time before the next context swi…
Live with it: Use the FLOATOBJ constructs. They’re fully documented. They work. So there.
Escape to the Promised Land: Have the client issue a special GL escape and do your operation in the context of the OpenGL driver. This requires a lot of coordination between your driver and the OpenGL driver. Maybe too much coordination.
Back to Basics : Use fixed point arithmetic. This is actually quite realistic, although it might require you to dust off your old EE books. Take a look at exactly what algorithm or function you’re trying to provide. If you do not require infinite (or almost infinite) precision and/or the values are bounded to some reasonable 32 (or even 64 bit) value, you can use integer arithmetic, with the lower order bits being your precision. Since all operations are now done with integers, this may prove to actually be faster than floating point.
Use a Very Big Hammer: The thought behind this approach is to work around the problem that the floating point registers are not saved during context switches. Ok, so you become the big man on the block. Raise your IRQL to HIGH_LEVEL, perform your floating point operations, and then return back to your original IRQL (PASSIVE_LEVEL for the DDI). Of course, there are some minor things to watch out for. Like, you better hope there’s no floating point exceptions in your code. And, like, certainly no other thread in the entire system would be worried that the stuff in the floating point registers might get some newer and better values when they get back in (hey, Clem, just how do you save and restore those floating pointer registers?).
And, like, just how do you raise your IRQL from DDI since you can’t call KeRaiseIRQL(…)? And, sure hope no one (or anything) on the system wants to do anything during this time.
Use an Expensive Hammer: There are existing retail packages that will perform the floating point operations in kernel mode. Exactly how this is done is probably a secret that only they know for sure. However, from some of the versions that I’ve seen, the exposed interfaces are procedural. Normally (in my thinking anyway), the deal with floating point operations is that they are time-critical and not something that one wants to call a routine for every operation. However, since I don’t know exactly how they’re implemented (maybe they’re MACRO wrappers? Maybe not.), it’s tough for me to really comment about them.
Use a Different Hammer: If one had come from a company that had just been recently bought out by Compaq and had used that company’s 64-bit architectures, one might say, "What’s the big deal with using floating point?" On DEC Alpha platforms, one would notice that, hey, floating point just plain works. And works just fine and dandy. That’s because the floating point registers are saved and restored as part of all context switches on Alpha platforms. All floating point, all the time. However, cross-platform code integration might prove be a little tricky.
Wait for the Promised Land: Peeking ahead into the NT 5.0 Beta 1 DDK (distributed with the MSDN January, 1998 release), one spies a couple of routines which look sort of interesting: EngSaveFloatingPointState(…) and EngRestoreFloatingPointState(…). Aha! Be still my floating point heart. In fact, the documentation chats that this is precisely the thing that folks want to use for the MMX instructions on Intel platforms. What a deal! Only a couple of problems with this:
- It’s in NT 5.0
- Even though it’s documented in the Beta 1 DDK release, we all know that nothing is final until the fat lady delivers the NT Final DDK MSDN set to our cubicle
- It’s in NT 5.0.
Oh, one other (floating) point (sorry). If you come from the UN*X world, you may have relied on stack-based exception handling. That is, you register an exception handler and, if, horror among horrors, your code gets forced into some floating point no-no, your exception handler gets called and you can do some controlled recovery. However, on NT, there is no such mechanism. In fact, the only exception handling that one could properly employ are try/except handlers around sections of code. This is the price to pay for doing NT, so it may be best to do as much floating point triage as possible before the execution of some section of code.