Print an article from OSR Online

The NT Insider

Rock On With 64-bit Windows -- Porting Windows Drivers to AMD64
(By: The NT Insider, Vol 10, Issue 5, September - October 2003 | Published: 10-Nov-03| Modified: 10-Nov-03)

In our last issue, we promised you (in our article entitled The Wide World of the AMD64, see http://www.osronline.com/article.cfm?id=243) the details on what steps to take to port your existing Windows driver to Windows-64 for the AMD64. When we promise, we deliver. In this article, we'll cover all the necessary porting steps, and more.

Background

Recall that the AMD64 processors, the Opteron and Athlon 64, can boot and run the regular 32-bit versions of Windows (Windows XP and Server 2003) available today. That's because the AMD64 processors implement the x86 instruction set, with extensions to allow for optional 64-bit operation. In fact, the Athlon 64 running the standard 32-bit version of Windows XP has already become a favorite among gamers and the LAN party crowd. Have you seen some of these cool, new, systems by the way? If not, check out the systems that Shuttle has come out with. I have one of these systems in my office, and let me tell you, it rocks!

While you get cutting-edge performance running AMD64 systems in 32-bit mode, the chip really comes alive running the 64-bit version of Windows. Microsoft has publicly stated that a 64-bit version of Windows for the AMD64 will ship concurrently with Windows Server 2003 SP1. That version of the O/S is, in fact, already in beta and should be released fairly soon.

Why should anybody care about the 64-bit version of Windows? Well, for one thing, the 64-bit version of the Windows operating system itself has less overhead than the 32-bit version. A 64-bit version of Windows means that virtual address space for the O/S is just never an issue. Check out the numbers:

Total of 248TB kernel virtual address space
1TB system cache
128GB non-paged pool
128GB paged pool

Anybody think the system will run out of non-paged pool when there is 128GB (yes, that's GIGAbytes) of space available for it? Nah, me neither.

Of course, the nice thing about 64-bit Windows on the AMD64 is that it runs 32-bit Windows apps without any translation or emulation. That means that regular, out of the box, standard, vanilla, Windows apps run with performance equal to what you'd get on a plain 32-bit system. Without even being recompiled these apps benefit from larger potential address space. Of course, 64-bit Windows apps have even more address space to play with: They can utilize up to 8TB of user virtual address space. Pretty decent, I'd say.

So, the AMD64 rocks, and the 64-bit version of Windows looks to be a winner. But, nothing in the computer world can ever be perfect. So, what's the sticking point? Well, that's where we all come in. The only issue with running 64-bit Windows on the AMD64 is that all drivers for 64-bit Windows have to be 64 bit. That means we need to port our drivers from 32-bit Windows to 64-bit Windows and re-compile and link them. However, lucky for us, porting drivers to the AMD64 version of 64-bit Windows is usually a pretty simple job - and problems are even pretty easy to debug given that the AMD64 uses the regular x86 instruction set.

Porting Brain Dump

Here's the quick rundown on everything you need to know about porting your driver to run on the AMD64 under 64-bit Windows. We'll talk more about each of these topics later in the article, but this list will give you all the basics:

Old Data Types: Mostly, data types stay the same. CHAR/UCHAR is 8 bits, SHORT/USHORT is 16 bits and LONG/ULONG is still 32 bits. Got that? The only old data types that change are pointers: They become 64 bits long. So, PUCHAR and PDEVICE_OBJECT are 64 bits long.

New Data Types: To account for the change in the size of pointers, there are "pointer precision" data types. The most common one you?ll use is ULONG_PTR. This is an unsigned value that has the same size as a pointer on whatever machine your code is compiled for. So, if you compile your driver for 32-bit Windows, ULONG_PTR is 32 bits long. If you compile it for 64-bit Windows an ULONG_PTR is 64 bits long.

Pointers Passed As Parameters Are Auto-Thunked: Windows will automagically convert any pointer parameters passed to it by a 32-bit app to 64 bit. Microsoft famously calls this conversion "thunking." So, for example, on 64-bit Windows you driver will get a 64-bit pointer in Irp->UserData regardless of whether the caller is a 32 bit or 64 bit application. That makes things pretty simple, right?

Pointers Embedded In IOCTL Buffers: There's one small case of which a few drivers will need to be aware: If you have an IOCTL where the contents of one of the IOCTL buffers contains a pointer, your driver will have to be aware of whether the caller is a 32-bit app or a 64-bit app. Note, the pointers to the IOCTL IN_BUFFER and OUT_BUFFER are automagically thunked by Windows. But if either of these buffers contains a pointer, you'll have to figure out if the caller is a 32-bit app, and if it is, extend the pointer to 64-bits yourself.

In-Line Assembler, MMX, 3DNow, and X87 FP Instructions: If you use any of the aforementioned instructions in your driver, first (unless your driver is for audio or video media), shame on you. In any case, you'll need to eliminate all use of embedded assembler, MMX, 3DNow and X87 floating point from your code. It's simply not supported. Luckily, SSE and SSE/2 instructions are supported, and are available as intrinsics that you can use directly in your C code.

DMA: This just shouldn't be an issue. If your driver runs properly on 32-bit Windows systems using PAE (these systems have more than 4GB of physical memory) - which it should - then you're all set. Porting to AMD64 will just give you an opportunity to do a quick check to ensure that you're not assuming the high 32 bits of those PHYSICAL_ADDRESS structures is NULL.

Installation: You'll need to hack your INF, by adding sections decorated with .AMD64, to install the right version of your driver on 64-bit systems.

So, you see, if your driver is like most - if you don't use embedded assembler, you don't pass pointers inside your IOCTL buffers (yuck!), and your DMA implementation isn?t broken - all you'll need to do to get your driver running on the AMD64 under 64-bit Windows is find the places that you store or cast pointers, and change those data types from ULONG to ULONG_PTR. Then you re-compile, add the necessary "installMyDriver.NTAMD64" section to your INF, and you're done.

Data Types Old and New

So far, we've explained to a few hundred driver writers how to convert their 32-bit drivers to run under 64-bit Windows on the AMD64. Of the things they need to learn, it seems to me that the hardest thing for them to grasp is what they don't need to do. So, let me be very clear:

The size of LONG and ULONG does not change. It's always 32 bits.

Got that? Likewise INT and UNIT and DWORD (you're not using these data types anyhow, right) don?t change. They're always 32 bits. Similarly, the size of LONGLONG and ULONGLONG is always 64-bits. It doesn?t matter whether you're running on 64-bit or 32-bit Windows.

Another thing you don't need to worry about: Unlike other 64-bit Windows platforms, on the AMD64 failing to align data structures on their "natural" alignment (that is, failing to align ULONGS on integral 32 bit boundaries and ULONGLONGS on integral 64 bit boundaries) does not cause a fatal error. Of course, it's always good practice to align your data items naturally for best performance. But, if you forget to do so, 64-bit Windows on the AMD64 will not blue screen.

Also, you don't have to worry about "normal" use of pointers. Let's say you have a pointer in your device extension, that's declared as follows:

PMDL CurrentMdl;

In this case, there's no work required on your part. The PMDL will be 32 bits on 32-bit Windows and 64 bits on 64-bit Windows.

What do you have to worry about in terms of data structures? Well, basically, you have to worry about places where you explicitly or implicitly cast pointers to ULONGS. For such cases, where you want a data type that's "pointer precision" (i.e., as long as a pointer is on whatever system your driver?s compiled for), you use ULONG_PTR. So, let's say you have the following field declarations in your device extension:

ULONG currentMdl;
ULONG MdlPointers[10];

And you use these as follows in your driver code:

devExt->MdlPointers[currentMdl++] = (ULONG)Irp->MdlAddress;

Well, that's not going to work on a 64-bit system, is it! Because ULONG is always 32 bits, on a 64-bit system you?re trying to hammer a 64-bit pointer into a 32-bit storage location. Fortunately, the compiler will see this and slap you with an error.

How's this fixed? All you need to do is change the data type and cast as follows:

Declaration:

ULONG currentMdl;
ULONG_PTR MdlPointers[10];

Code:

devExt->MdlPointers[currentMdl++] = (ULONG_PTR)Irp->MdlAddress;

There are a couple of things to take note of in the changes above. First, there's no reason to change the declaration of currentMdl. I'm thinkin' that 32 bits is probably big enough to hold a number that'll never get higher than 9. By changing the declaration of MdlPointers from ULONG to ULONG_PTR, we've said that the field will be 32 bits on 32-bit Windows and 64 bits on 64-bit Windows. As an aside, you might note that if the data structure had been more strongly typed initially, the cast would have been unnecessary and there wouldn't have been any changes required to the code. Thus, if the initial author had written:

PMDL MdlPointers[10];

the compiler would have no cause to complain and the storage would be automatically 64 bits.

Another interesting point to mention here is that the compiler will align the pointer precision data structures on their natural boundaries by default. But if you want to be 100% sure, or you want to make such an alignment change obvious to future maintainers, or your code includes changes to the default packing, you can use the following syntax:

Example of forced alignment change:

(Do not do this unless absolutely required by your code):

#pragma pack(1) // change packing to byte alignment... YUCK!
ULONG currentMdl;
ULONG_PTR POINTER_ALIGNMENT MdlPointers[10];

The POINTER_ALIGNMENT declaration forces the data type being declared to be aligned on an integral (i.e., "natural") boundary based on its length.

Pointers Passed Inside Data Buffers

As described previously, Windows handles the conversion of pointers that are passed as parameters from 32 bits to 64 bits. But, how about pointers that might exist within a driver-private data structure that's passed within a data buffer? For example, consider the following data structure:

typdef struct _myBuffer {
     ULONG Count;
     PUCHAR SecondaryBuffer;
     ULONG SecondaryBufferSize;
     UCHAR Buffer[BUF_SIZE];
} MY_BUFFER, * PMY_BUFFER;

Let's suppose a 32-bit application sends this data structure to your driver, using the following code:

MY_BUFFER buf;

buf.Count = CharsPassed;
StringCchCopy(&buf.Buffer, BUF_SIZE, DataSouorce, buf.Count);
buf.SecondaryBufferSize = OtherBufferSize;
buf.SecondaryBuffer = PointerToOtherBuffer;

worked = DeviceIoControl( hDev,
                        IOCTL_MYDRV_SEND_BUFFER,
                        buf,
                        sizeof(MY_BUFFER),
                        NULL,
                        0,
                        &bytesReturned,
                        NULL);

See the problem? Windows will thunk buf (the IOCTL IN_BUFFER pointer) to 64 bits automatically. But there's no way Windows can know that the 32-bit caller's IN_BUFFER contains a 32-bit pointer (SecondaryBuffer in our example). The problem gets a bit trickier when this same IOCTL can be sent from either a 32-bit application or a 64-bit application. The layout of the IOCTL buffer, the proper length of that buffer, and the size of the pointer all vary depending on whether the caller is 32-bit or 64-bit.

How do we solve this? Well, the cleanest way to solve this particular problem would be to redefine the IOCTL so that it doesn?t pass the pointer within the data structure. How about this:

typdef struct _myNEWBuffer {
ULONG Count;
UCHAR Buffer[BUF_SIZE];
} MY_NEW_BUFFER, * PMY_NEW_BUFFER;

We eliminate the embedded buffer pointer entirely. The size of the buffer is now identical for 32-bit and 64-bit applications (because ULONG and UCHAR are always 32 bits, regardless of the platform), so the driver won't have any difficulty validating the buffer size.

We change the IOCTL call itself to pass the secondary buffer pointer and length as the OUT_BUFFER parameter to the IOCTL call:

MY_NEW_BUFFER buf;

buf.Count = CharsPassed;
StringCchCopy(&buf.Buffer, BUF_SIZE, DataSouorce, buf.Count);

worked = DeviceIoControl( hDev,
                        IOCTL_MYDRV_SEND_BUFFER,
                        buf,
                        sizeof(MY_NEW_BUFFER),
                        PointerToOtherBuffer,
                        OtherBufferSize,
                        &bytesReturned,
                        NULL);

Nice, uh? This way, you get Windows to do the thunk for you. And, you get better, cleaner, more reliable code, at the same time.

There are a few down-sides to this approach, of course. One is that you?ll need to change and re-compile all the apps that ever call this driver to use the new IOCTL format. That might be acceptable for some uses; for others, it might not be.

I can hear you coming up with other objections too: "Yeah, as if I would have passed the secondary buffer pointer in the data structure in the first place if the IOCTL OUT_BUFFER was unused. Duh! That's no solution at all. What do I do if I have to pass the secondary buffer pointer embedded in the IN_BUFFER?"

Fine. Be that way. It's still no big problem. In this case, we simply defined two structure types:

//
// Structure version used by all applications
// (and for 64-bit apps in Driver)
//
typdef struct _myBuffer {
        ULONG Count;
        PUCHAR SecondaryBuffer;
        ULONG SecondaryBufferSize;
        UCHAR Buffer[BUF_SIZE];
} MY_BUFFER, * PMY_BUFFER;

//
// Structure version used exclusively by driver when
// getting data from 32-bit apps
//
typdef struct _myBuffer_32 {
        ULONG Count;
        PUCHAR POINTER_32 SecondaryBuffer;
        ULONG SecondaryBufferSize;
        UCHAR Buffer[BUF_SIZE];
} MY_BUFFER_32, * PMY_BUFFER_32;

The first structure is unchanged from the original definition. It's the structure type that's used by all applications, either 32-bit or 64-bit, when talking to the driver. Of course, for 32-bit apps, the SecondaryBuffer pointer will be 32 bits long and for 64-bit apps the SecondaryBuffer pointer will be 64 bits long.

The second structure, MY_BUFFER_32, is the version that the driver will use when it's built for 64-bit Windows, but it's talking to a 32-bit application. The driver determines whether it has received a request from a 32-bit or a 64-bit caller using the function IoIs32bitProcess(...) (and yes: the "b" in "bit" is supposed to be lower case, it's not a typo). This is done as follows:

case IOCTL_MYDRV_SEND_BUFFER:

#ifdef _WIN64
// If it?s a 32-bit caller, we validate the size of the 32-bit structure.
if(IoIs32bitProcess(Irp) {
if (ios->Parameters.DeviceIoControl.InputBufferLength >= sizeof(MY_BUFFER_32)) { ...

}
} else
#endif
{
if (ios->Parameters.DeviceIoControl.InputBufferLength >= sizeof(MY_BUFFER) ) { ...

}
}

In-Line Assembler, MMX, and Other Monstrosities

We talked about this in the previous article, so I'll make it brief. Got code in your driver that does something like the following (whatever this does - I copied if off some web site):

_asm {
     movsd  xmm1, QWORD_PTR foo;
     mulsd  xmm1, xmm0;
     ...
}

If you do this, or even if you do very simple embedded assembly language type stuff, by now you should have read the memo that Hector posted to OSR Online? back in June. The AMD-64 compiler does not support in-line assembler code. In addition, from kernel mode, the 64-bit version of Windows does not support MMX, 3DNow, or X87 floating point.

So, get with it and change your code. Give up any outdated infatuation that you might have with assembler language. Write what you need to do in C. If you're writing a driver for some cool multimedia stuff, or you lust after fast floating point operations in kernel mode, you can use the SSE/SSE2 intrinsics directly from C, for example:

_mm_div_sd(foo, bar);

does a double-precision floating divide. Or something. Anyhow, describing this is beyond both the scope of this article (thankfully) and the scope of my knowledge. Type "SSE instrinsics" into Google and gorge yourself on the results.

Note that I do want to mention that one thing you cannot do is simply copy your assembler language as written to a .asm file, shove an entry point at the top and a ret at the bottom, and be done. The 64-bit version of Windows for the AMD64 uses table-based exception handling, so each function needs to have a very specific prologue and epilog. This isn't hard, it's just something you need to do. If you find that you need to do this, you'll need to be sure to study the "Calling Conventions for x64 64-bit Environments" document that's supplied with the DDK.

DMA

There's not much to say about DMA. Of course, you're already using the Windows DDIs (GetScatterGatherList, for example) to support your DMA operations, and you've already tested your code with DMA Verifier to ensure that it works on systems with more than 4GB of physical memory. If this is the case, then doing DMA on 64-bit Windows on the AMD64 doesn't impose any additional requirements beyond what you're already doing. If you try to bypass the DMA DDIs, or you haven't tested on PAE type systems yet, you better fix that now. Your code's not right even for 32-bit Windows.

Installation

Since you're compiling and linking a separate version of your driver for the AMD64, you'll need to modify your INF file to install this version on 64-bit Windows running on the AMD64. But fear not, if your customer accidentally tries to install the 32-bit version of your driver, the installer will display the message "Driver is not intended for this platform."

You'll probably want to create an added install section with the .NTAMD64 decoration that automatically selects the appropriate version of your executables, as follows:

[myInstall.NTx86]
CopyFiles=copyX86Stuff
[myInstall.NTAMD64]
CopyFiles=CopyAMD64Stuff

The appropriate section is selected, based on the platform on which your INF is being executed.

Summing Up

Are you getting the idea that converting to 64-bit Windows on the AMD64 is no big deal? If so, you're hearing the right message. Most stuff stays the same. Pointers become 64 bits. If you do unusual things in your driver, like use assembler or embedded pointers in your IOCTL buffers, this is when you pay for having done so. But even then, the price you'll have to pay isn't so severe.

I like to think of the process of porting to 64-bit Windows on the AMD64 as an opportunity to go through your driver and do a bit of long-deferred private code review and maintenance. Perhaps that assembler language made sense back when your driver was initially written to run on the 386-33/DX2, but does it still make sense now? Given what we know about security issues today, does it still make sense to pass pointers into your driver stuffed into an IOCTL data buffer? Maybe so, maybe not.

In any case, the AMD64 is way cool - and 64-bit Windows is certainly the wave of the future for power users. As you can see, the risk of doing a port is small for all but the most complex drivers. With Athlon64 systems available in volume now, and 64-bit Windows already in beta, the time is definitely right to jump on the 64-bit bandwagon.

This article was printed from OSR Online http://www.osronline.com