OSRLogoOSRLogoOSRLogo x OSR Custom Development Services

Everything Windows Driver Development

GoToHomePage xLoginx

    Thu, 14 Mar 2019     118020 members


  Online Dump Analyzer
OSR Dev Blog
The NT Insider
The Basics
File Systems
ListServer / Forum
  Express Links
  · The NT Insider Digital Edition - May-June 2016 Now Available!
  · Windows 8.1 Update: VS Express Now Supported
  · HCK Client install on Windows N versions
  · There's a WDFSTRING?
  · When CAN You Call WdfIoQueueP...ously

The Wide World Of The AMD64


Unless you've been living under a rock, you've heard that AMD has introduced a new 64-bit processor.  Code named the "Hammer" family of processors, the AMD Opteron and Athlon-64 are poised to kick the normally evolution-prone  world of x86 computing into revolution. This article provides a brief introduction to some of the architectural highlights of the Hammer family, and the forthcoming support for the AMD64 on 64-bit Windows systems.


It’s Compatible and Fast

The AMD-64 can operate in one of two modes: Legacy Mode or Long Mode.


In Legacy Mode, the AMD-64 appears for all intents and purposes like a standard 32-bit x86 system.  It runs ordinary versions of Windows.  In this mode, it uses a 32-bit operating system and standard 32-bit drivers. It runs 32-bit applications unchanged.  From our brief tests, the AMD-64 appeared to run in Legacy Mode as fast as any other modern system running the same software.


Of course, a fast new 32-bit processor really isn’t big news these days.  What makes the AMD-64 interesting, to driver writers and users alike, is its 64-bit support.


Still Compatible and Still Faster: Long Mode

It’s in Long Mode that the AMD-64 really shines.  In Long Mode, the AMD-64 utilizes a 64-bit operating system and 64-bit drivers.  Standard 32-bit programs can be run without any recompilation or change using the AMD-64’s Compatibility sub-mode.  In this mode, 32-bit programs execute without translation or any sort of weird emulation.  And, guess what?  32-bit programs run in Compatibility sub-mode with performance that’s as good as a high-end 32-bit system.


Of course, it’s in Long Mode’s 64-bit sub-mode that we’re most interested in.  The AMD-64 supports the standard x86 instructions (with the exception of a handful of op-codes that were almost never used anyhow), and extends them to properly support 64-bit native operation.


Rarely do we find a system where so many things seem to have been done right.  What’s irritated you over the years about the X86 architecture?  Chances are it’s been fixed in the AMD-64, running in Long Mode’s 64-bit sub mode.


One thing that’s long irritated me about the x86 is the paucity of general purpose registers.  The AMD-64 fixes that by adding eight new general purpose registers, numbered R8 through R15.  Of course, the entire register set has been lengthened to 64 bits. The full 64-bit width of each register is accessed by using the “R” prefix.  So, for example, as is traditional, AX is a 16-bit register and EAX is the 32-bit designation for this same register.  The 64-bit version of this register is accessed by specifying RAX.


The new registers, numbered R8 through R15, can also use explicit width designators.  For example, R8W designates the low 16 bits of R8, and the 32 bit designation for that same register is R8D.  Sweet, huh?


How about those annoying segment registers?  The AMD-64 does away with them when running native 64-bit code.  In their place, flat addressing is used.


Perhaps the cumbersome X87 floating point instructions and register set is your hot-button.  Well, to maintain compatibility for existing 32-bit programs, the AMD-64 includes those old X87 registers and indeed supports the X87 FP instruction set.  But 64-bit versions of Windows only support these registers for 32-bit programs running in compatibility sub-mode.  For 64-bit mode programs, floating point and media operations are supported exclusively by the 128-bit XMM registers (used by SSE/SSE2 instructions).  Like the GP register set, the XMM register set has also been augmented with the edition of 8 new XMM registers (numbered XMM8 through XMM15).  The floating point parts of the C run-time library have been re-written to use SSE/SSE2 instructions for floating point operations.  And, are you ready for the best news yet for driver writers?  The SSE/SSE2 instruction set is fully-supported in kernel-mode, and is automatically context switched.  No more goofy calls to KeSaveFloatingPointState() required if you need FP or media-specific instructions.


What About Windows?

When running under 64-bit Windows on the AMD-64, 32-bit apps have access to either a 2GB or 4GB virtual address space depending on whether or not they are large address aware.  Apps that are re-compiled to native 64-bit mode will get an address space of about 8TB.


What does the kernel-mode address space look like?  Well, there’s 248TB (that’s not a typo – two hundred and forty eight terabytes is correct) of kernel virtual address space.  The paged and non-paged pools each are allocated 128GB of address space; System cache gets 1TB of address space.  I’m thiknin’ that’s enough address space for awhile.  I’ll call you when I run out, assuming I’m still around.


As mentioned previously, 32-bit apps run under the 64-bit version of Windows without translation or emulation.  The only “help” such applications receive is that 32-bit system service calls are “thunked” (extended) to their 64-bit equivalents.  This is entirely transparent to the application.


And The Drivers?

For most drivers, moving to the AMD-64 is trivial.  Here at OSR we’ve already ported several drivers to support AMD-64. All that’s required will be a careful read-through of the code to ensure 64-bit compliance (discussed below) and re-compiling using the 64-bit compiler provided as part of the DDK.


An interesting fact to note is that the AMD-64 uses the “longlong pointer” architecture model.  This means that while pointers become 64-bit values (with a data type of ULONGLONG), the ULONG data type stays 32-bits.  This significantly eases the process of moving your driver to the AMD-64.  Note that this is different from some Unix systems, where when moving from a 32-bit to a 64-bit system the LONG data type becomes 64-bits long.


What does making your driver 64-bit compliant entail?  We’ll publish a complete article on the details of how to upgrade your driver to AMD-64 – Look for it soon in The NT Insider. In the mean time, recall that Windows will “thunk” data buffer pointers from 32-bit apps to 64-bits.  Therefore, the process of moving a driver to the AMD-64 is typically no more involved than carefully checking to ensure that there are no assumptions about pointer lengths.  For example, check carefully for places where you might cast a pointer to an ULONG.


In moving your driver to a 64-bit Windows system, one pointer issue you will need to deal with is pointers that are embedded in your driver’s data buffers. For example, do you use an IOCTL to pass a data structure into your driver that includes a pointer?

Embedding pointers in your IOCTL data buffers isn’t considered a wonderful practice in any case.  But if you haven’t been able to avoid it, you’ll have to account for the difference in length of these pointers when they come from 64-bit and from 32-bit applications.  It’s not like Windows can look into your IOCTL data buffer and know that there’s a pointer in there, right?  So, this is the one type of pointer Windows can’t automagically “thunk” for you.


Another driver conversion issue: If your driver includes any code in assembly language, including any X87 or MMX instructions, you’re going to have to make some changes.  The Windows AMD-64 compiler does not support in-line assembly language, so any such code will need to be converted to use compiler intrinsics or (as a final alternative) call an assembly language sub-routine that resides in a separate file.  And, as previously mentioned, Windows on the AMD-64 does not support the use of MMX, X87, 3DNow! instructions in kernel mode at all.  Any use of MMX, X87, or 3DNow! must be removed.  If you still need floating point or media instructions, use the SSE/SSE2 instruction set.


This is a good opportunity to take a look at the assembly language and ask yourself: Do I really need this assembler-language code?  Be honest.  C’mon, now.  In most cases, you should be able to just get rid of it and write what you want in C.  If that’s not possible, chances are that you can do your fancy floating point or media work using SSE/SSE2 compiler intrinsics.  A bonus is that using either ordinary C or the compiler instrinsics makes your code non-machine dependent.  The same code will work on ANY system that supports SSE/SSE2.


I still haven’t convinced you to lose the assembler language?  Gad!  What are you, a video driver writer?  Well, maybe this will convince you:  Because Windows for the AMD-64 uses a table-based exception handling scheme, you won’t be able to just re-compile your assembler functions or cut and paste your in-line assembler, for use on the AMD-64.  You’ll need re-code your assembler functions to comply with the new AMD-64 calling conventions.  According to this convention, all calls are fastcalls, with initial arguments passed in registers.  Plus, each function needs a specific prolog and epilog.  Refer to the 64-Bit Windows on the AMD-64 Calling Conventions document supplied with the DDK for all the details.  Those intrisics are looking better and better I bet, huh?


Pure Dynamite

There’s no doubt about it: The AMD-64 is one incredible processor.  It’s pure dynamite.  Do your part by getting your drivers moved over to the AMD-64 now.  Perhaps then you’ll be able to convince your boss that you need one of these babies for your desktop.  I know an AMD-64 desktop is definitely in my future.


Related Articles
Upsizing - Managing Address Space Increases for IA-64
Stop Interrupting Me -- Of PICs and APICs
What Are Rings
What is Real Mode?
No More Embedded Assembler or x87 FP
No Deadlock Verification on x64 UP Systems
Living With 64-Bit Windows

User Comments
Rate this article and give us feedback. Do you find anything missing? Share your opinion with the community!
Post Your Comment

"mathematical functions"
Who can tell me how to implement sin, cos, exp, log and other mathematical functions by sse/sse2?

12-Sep-04, Chun-yi Lee

"No x87 = Good Riddance"
Since virtually everything we do in the kernel is high-bandwidth floating point, I won't miss the x87 kludge a bit. I do, however, want to add my voice to the "equal time for the IA64" call. The AMD product may be new and exciting, but I doubt Intel's 64-bit efforts are languishing with Longhorn in sight.

11-Nov-03, Jim Barber

"RE: Flat memory security is a disaster"
If the AMD-64 were being designed for a new operating system, perhaps it would support whatever new protection constructs were used and invented for that operating system. Maybe multi-rings, maybe segments, maybe something else.

But there ain't no new O/S, and the current O/S'es that we have don't use the intervening rings -- and segment registers as they are used today are nothing if not an annoyance.

Was Multics a revolutionary operating system? Sure. In 1965. It was written in a high-level language, which was by itself revoluationary at the time. It got some high security rating by the US Govt IIRC, too. Of course, it solved a very different class of problems than an O/S like NT solves today.

The AMD64 is about supporting TODAY's operating systems and applications, efficiently and easily, TODAY. In this, it does an outstanding job.

16-Jul-03, Peter Viscarola

What a bad statement! "How about those annoying segment registers? The AMD-64 does away with them when running native 64-bit code. In their place, flat addressing is used."

Why a bad mistake? The very heart of the security architecture of the x86 structure, with and after the 80286, was expressed in segment structures (long agreed even by the former head of Microsoft's own UK research centre) with the ring structure as VITAL parts of overall system security! For example, device drivers MUST not be in the same address space or privilege level as the system kernel including those critical crypto functions, etc. Inamgibne if proper stack segment limit registers were used - no buffer overflows! Imagine if code/data segment enforcement were implemented - no execution of data!

Hold on! Flat memory is a security DISASTER! That has been proven over and over and over again! and two state computer systems were demonstrated to be insufficient in the 1970s! ( See MULTICS!)

We need secure Windows dirver systems now - not just some add-in Microsoft NGSCB "mess" - yes, it has a new "ring". Heaven knows what that does for device driver writing let alone to the AMD64 structure!

Regards, Bill Caelli w.caelli@qut.edu.au

16-Jul-03, William Caelli

"I like AMD64"
I like AMD64 more than Itanium. It's easy to understand with x86 knowledge. I don't need to program with EFI. At least it's not necessary to learn a totally new assembly language. I wish AMD64 could be popular in the near future.

01-Jul-03, Xinhai Kang

"RE: AMD64"
> How do you get 256TB VA Space?

The virtual addressing capability for the current AMD64 processors is 256TB, based on the max supported VA length, which is 48 bits.

(36 bit physical addressing using PAE, which is the 64TB limit that you mention, applies to the Pentium Pro and later not the 386).

26-Jun-03, Peter Viscarola

(1) How do you get the 248 TB as the max virual memory address space. I would show the calucation using the CPU's regs if it is easy. Being as I am I will find the answer out. Just like when I read 64 TB was the large memory model for the 386 and above.

26-Jun-03, William Jones

"AMD-64 intro"
(1) Good article. (2) Note that the learning curve for the AMD-64 is easier than the IA64. Would love to see more on the IA64 - architecture - equal time.

26-Jun-03, William Jones

from description of this article,i am exiting for the pure dynamite;but the thing is whether AMD64 will be accepted by the market in future.

23-Jun-03, jianwu zheng

Post Your Comments.
Print this article.
Email this article.
bottom nav links