Unless you've been living under a rock, you've heard that AMD has introduced a new 64-bit processor. Code named the "Hammer" family of processors, the AMD Opteron and Athlon-64 are poised to kick the normally evolution-prone world of x86 computing into revolution. This article provides a brief introduction to some of the architectural highlights of the Hammer family, and the forthcoming support for the AMD64 on 64-bit Windows systems.
It’s Compatible and Fast
The AMD-64 can operate in one of two modes: Legacy Mode or Long Mode.
In Legacy Mode, the AMD-64 appears for all intents and purposes like a standard 32-bit x86 system. It runs ordinary versions of Windows. In this mode, it uses a 32-bit operating system and standard 32-bit drivers. It runs 32-bit applications unchanged. From our brief tests, the AMD-64 appeared to run in Legacy Mode as fast as any other modern system running the same software.
Of course, a fast new 32-bit processor really isn’t big news these days. What makes the AMD-64 interesting, to driver writers and users alike, is its 64-bit support.
Still Compatible and Still Faster: Long Mode
It’s in Long Mode that the AMD-64 really shines. In Long Mode, the AMD-64 utilizes a 64-bit operating system and 64-bit drivers. Standard 32-bit programs can be run without any recompilation or change using the AMD-64’s Compatibility sub-mode. In this mode, 32-bit programs execute without translation or any sort of weird emulation. And, guess what? 32-bit programs run in Compatibility sub-mode with performance that’s as good as a high-end 32-bit system.
Of course, it’s in Long Mode’s 64-bit sub-mode that we’re most interested in. The AMD-64 supports the standard x86 instructions (with the exception of a handful of op-codes that were almost never used anyhow), and extends them to properly support 64-bit native operation.
Rarely do we find a system where so many things seem to have been done right. What’s irritated you over the years about the X86 architecture? Chances are it’s been fixed in the AMD-64, running in Long Mode’s 64-bit sub mode.
One thing that’s long irritated me about the x86 is the paucity of general purpose registers. The AMD-64 fixes that by adding eight new general purpose registers, numbered R8 through R15. Of course, the entire register set has been lengthened to 64 bits. The full 64-bit width of each register is accessed by using the “R” prefix. So, for example, as is traditional, AX is a 16-bit register and EAX is the 32-bit designation for this same register. The 64-bit version of this register is accessed by specifying RAX.
The new registers, numbered R8 through R15, can also use explicit width designators. For example, R8W designates the low 16 bits of R8, and the 32 bit designation for that same register is R8D. Sweet, huh?
How about those annoying segment registers? The AMD-64 does away with them when running native 64-bit code. In their place, flat addressing is used.
Perhaps the cumbersome X87 floating point instructions and register set is your hot-button. Well, to maintain compatibility for existing 32-bit programs, the AMD-64 includes those old X87 registers and indeed supports the X87 FP instruction set. But 64-bit versions of Windows only support these registers for 32-bit programs running in compatibility sub-mode. For 64-bit mode programs, floating point and media operations are supported exclusively by the 128-bit XMM registers (used by SSE/SSE2 instructions). Like the GP register set, the XMM register set has also been augmented with the edition of 8 new XMM registers (numbered XMM8 through XMM15). The floating point parts of the C run-time library have been re-written to use SSE/SSE2 instructions for floating point operations. And, are you ready for the best news yet for driver writers? The SSE/SSE2 instruction set is fully-supported in kernel-mode, and is automatically context switched. No more goofy calls to KeSaveFloatingPointState() required if you need FP or media-specific instructions.
What About Windows?
When running under 64-bit Windows on the AMD-64, 32-bit apps have access to either a 2GB or 4GB virtual address space depending on whether or not they are large address aware. Apps that are re-compiled to native 64-bit mode will get an address space of about 8TB.
What does the kernel-mode address space look like? Well, there’s 248TB (that’s not a typo – two hundred and forty eight terabytes is correct) of kernel virtual address space. The paged and non-paged pools each are allocated 128GB of address space; System cache gets 1TB of address space. I’m thiknin’ that’s enough address space for awhile. I’ll call you when I run out, assuming I’m still around.
As mentioned previously, 32-bit apps run under the 64-bit version of Windows without translation or emulation. The only “help” such applications receive is that 32-bit system service calls are “thunked” (extended) to their 64-bit equivalents. This is entirely transparent to the application.
And The Drivers?
For most drivers, moving to the AMD-64 is trivial. Here at OSR we’ve already ported several drivers to support AMD-64. All that’s required will be a careful read-through of the code to ensure 64-bit compliance (discussed below) and re-compiling using the 64-bit compiler provided as part of the DDK.
An interesting fact to note is that the AMD-64 uses the “longlong pointer” architecture model. This means that while pointers become 64-bit values (with a data type of ULONGLONG), the ULONG data type stays 32-bits. This significantly eases the process of moving your driver to the AMD-64. Note that this is different from some Unix systems, where when moving from a 32-bit to a 64-bit system the LONG data type becomes 64-bits long.
What does making your driver 64-bit compliant entail? We’ll publish a complete article on the details of how to upgrade your driver to AMD-64 – Look for it soon in The NT Insider. In the mean time, recall that Windows will “thunk” data buffer pointers from 32-bit apps to 64-bits. Therefore, the process of moving a driver to the AMD-64 is typically no more involved than carefully checking to ensure that there are no assumptions about pointer lengths. For example, check carefully for places where you might cast a pointer to an ULONG.
In moving your driver to a 64-bit Windows system, one pointer issue you will need to deal with is pointers that are embedded in your driver’s data buffers. For example, do you use an IOCTL to pass a data structure into your driver that includes a pointer?
Embedding pointers in your IOCTL data buffers isn’t considered a wonderful practice in any case. But if you haven’t been able to avoid it, you’ll have to account for the difference in length of these pointers when they come from 64-bit and from 32-bit applications. It’s not like Windows can look into your IOCTL data buffer and know that there’s a pointer in there, right? So, this is the one type of pointer Windows can’t automagically “thunk” for you.
Another driver conversion issue: If your driver includes any code in assembly language, including any X87 or MMX instructions, you’re going to have to make some changes. The Windows AMD-64 compiler does not support in-line assembly language, so any such code will need to be converted to use compiler intrinsics or (as a final alternative) call an assembly language sub-routine that resides in a separate file. And, as previously mentioned, Windows on the AMD-64 does not support the use of MMX, X87, 3DNow! instructions in kernel mode at all. Any use of MMX, X87, or 3DNow! must be removed. If you still need floating point or media instructions, use the SSE/SSE2 instruction set.
This is a good opportunity to take a look at the assembly language and ask yourself: Do I really need this assembler-language code? Be honest. C’mon, now. In most cases, you should be able to just get rid of it and write what you want in C. If that’s not possible, chances are that you can do your fancy floating point or media work using SSE/SSE2 compiler intrinsics. A bonus is that using either ordinary C or the compiler instrinsics makes your code non-machine dependent. The same code will work on ANY system that supports SSE/SSE2.
I still haven’t convinced you to lose the assembler language? Gad! What are you, a video driver writer? Well, maybe this will convince you: Because Windows for the AMD-64 uses a table-based exception handling scheme, you won’t be able to just re-compile your assembler functions or cut and paste your in-line assembler, for use on the AMD-64. You’ll need re-code your assembler functions to comply with the new AMD-64 calling conventions. According to this convention, all calls are fastcalls, with initial arguments passed in registers. Plus, each function needs a specific prolog and epilog. Refer to the 64-Bit Windows on the AMD-64 Calling Conventions document supplied with the DDK for all the details. Those intrisics are looking better and better I bet, huh?
There’s no doubt about it: The AMD-64 is one incredible processor. It’s pure dynamite. Do your part by getting your drivers moved over to the AMD-64 now. Perhaps then you’ll be able to convince your boss that you need one of these babies for your desktop. I know an AMD-64 desktop is definitely in my future.