By Bruno van Dooren
Suppose that you have a custom program that works with a specific device - for example, an application that measures helium pressure in a glove box for handling nuclear fuel. If you would take a knife and carve the application open from the GUI right to the hardware (or use WinDBG to trace the stack) you would be able to identify four distinct interface layers:
The application interface is what users have at their disposal to do what they need to do. This interface can be a scripting interface, GUI, windows service, or anything else that fits user requirements.
The user-mode interface is the bridge between user-mode and kernel-mode code. In this layer, operating system internals and communication details are abstracted to device capabilities.
The device driver interface is the set of IOCTLs and other functions exported by the device driver to allow communication with the device itself.
The hardware interface is what hardware manufacturers provide - i.e., the PCI registers, USB end points, or anything that allows the outside world to interact with the hardware.
The Implications of This Model
The fundamental reality behind software is that it is a means to an end. There would be no need for professional software development if no one needed to use a computer to automate a process that would take more time or effort to do manually. That is, if it would be possible to do manually at all.
To enable someone to write a decent application, each interface between the user and the hardware has to be designed and developed in a way that allows functionality to emerge "naturally" to the level on top of it. The application should allow users to translate functional requirements into use-appropriate application commands.
The user-mode API should allow the application to translate application commands and processing into use-appropriate API function calls. The device driver should allow the user-mode API to translate function calls securely to use-appropriate device control operations.
Finally, the device itself should allow the device driver to translate control commands to use-appropriate physical communications.
Use-appropriate in this context means "suitable for its purpose in a natural way." That is, the functionality should logically follow from the requirements and the capabilities of the layer below.
Of the four layers mentioned above, the user application layer and the actual hardware layer are of lesser interest to readers of The NT Insider audience. On the other hand, the user-mode API, kernel-mode API, and the interactions between them are The NT Insider's raison d'etre.
Six Key Properties
With API we mean either a user-mode API (implemented as a DLL) or a kernel-mode API (implemented as the interface to a device driver). To be used naturally and efficiently, a good API has the following six key properties:
- The API exposes the device functionality in a logical and use-appropriate manner.
- The API is completely thread-safe.
- The API is flexible and, where appropriate, usable via a variety of languages.
- The API ensures that any locking and synchronization invoked is as fine-grained as is practical.
- The API provides performance that is appropriate for both applications and devices.
- The API does not, under any circumstances lead to a system crash, application crash, or a deadlock, nor may it lead to violations of system security (such as disclosure of inappropriate information). This is true even when the API is invoked improperly.
This list is short, but its implications are enormous. Not only should an API do what it's supposed to do, but it also should avoid doing things it's not supposed to do.
The latter is much harder to achieve than the former. To do what it's supposed to do, an API just has to follow its requirements specification. To not do what it's not supposed to do, an API has to protect itself from an almost infinite amount of abuse that can be thrown at it.
Before we can discuss functional design and implementation issues, let's try to describe what an actual API looks like.
Providing a Good Kernel-Mode API
Good API design starts at the device driver interface. The first tendency of many driver writers is to throw together whatever driver interface functions are most conveniently implemented, and "fix it up in user mode." This can lead to torturous implementation constraints on the user-mode DLL. Fortunately, with a bit of forethought, such problems can usually be avoided.
In designing the driver interface, you must consider how the device will most likely be used. You also need to carefully consider issues such as how the device's namespace is structured and which types of requests (IOCTL, Read, or Write) are used.
Also of primary importance in designing the device interface are reliability and security. When crafting your interface, never make the mistake of assuming that "the only thing that will ever talk to my driver will be the user-mode DLL" that you're providing. The interfaces you provide must be both reliable and secure, irrespective of how they're used (remember the Six Key Properties, described previously).
Finally, a well-crafted driver interface will ensure that you meet application-to-device performance goals. The design that you choose for the user-mode to kernel-mode boundary will define how the device performs in ordinary use. Often, time spent in the design phase to create a well-crafted buffering scheme will pay off in much enhanced device performance (or lower CPU utilization) during use.
Providing a Good User-Mode API
Writing a device driver takes a lot of skill and effort, but the work doesn't stop there. Unless the device driver supports a predefined device interface that the system uses directly (e.g. file system drivers), or the driver is to be accessed only from kernel mode, someone has to provide an API that makes the driver's functionality easily accessible to applications.
The reason for providing a user-mode API is the same as for supplying a device driver in the first place: You want your product to be easy to use and as reliable as possible.
Application programmers don't want to deal directly with IOCTLs, file handles, overlapped results, and lots of other muck. They just want to be able to use the device functionality without too much hassle.
As a result, you'll want to provide a user-mode API that hides the gory device-driver details from their view. Instead, you will provide an interface with function names and parameters that reflect actual device capabilities. This takes a lot more than just wrapping a thin function call around each IOCTL.
If your API is part of a commercial package, people will be less inclined to use your product if your software has a bad reputation. This is especially true if an application programmer can choose between your product and a competing product.
If you are creating software that will be used inside your company, you don't want to be known as being someone who causes delays and cost overruns by delivering crap software. Your coworkers will hate working with you, and you'll be an ideal layoff candidate.
What Does a User-Mode API Interface Look Like?
Unless your device, driver, and application are part of a turn-key solution, you will want to enable as many developers as possible to use your product. This implies that they should be able to use it in their programming language of choice.
A .NET interface is not yet a viable solution, because a large proportion of languages do not support the .NET framework.
A COM/ActiveX interface is better in that regard, as long as you supply a type library with your component. Still, not all languages support it, and a number of issues are involved, such as data marshalling and threading apartments.
That leaves us with one other logical option: a dynamic linked library (DLL) that exports a list of functions for interfacing with a device. Virtually all programming languages support using DLLs. A DLL also has the best performance characteristics for transferring large amounts of data because no marshalling is involved.
One other advantage of using a DLL is that it becomes almost trivial to implement .NET and ActiveX classes on top of it, thus providing an easy interface for modern programming languages.
Unfortunately, even now there are still a number of things to watch out for:
- You cannot export classes to represent functionality. There is no binary definition of what a compiled C++ class should look like; different compilers will have different ideas about it. Don't use classes unless you want only to support C++ and recompile your API with every compiler.
- Do not transfer memory ownership across API boundaries. Memory allocated by a certain heap manager must be deleted by the same heap manager. Otherwise, the application will crash.
- Do not expect applications to provide function pointers. Several popular programming languages cannot provide C-style function pointers to subroutines. Using them would mean that programmers have to wrap your DLL.
Do not export global variables. Several popular programming languages cannot access exported variables.
- Do not use structures that have arrays or strings inside them. Some programming languages have problems defining such structures because they use other representations for those data types.
- If you are using C++ to implement your interface, make sure that names of exported functions are not mangled.
- If you are using C++ to implement your interface, make sure that you do not let exceptions escape beyond the API boundaries. Client applications may not be able to handle them.
- Do not let any structured exceptions escape beyond the API boundaries. A subset of the platform functions can throw them, so you don't always have a choice to not use structured exceptions.
Do not use __declspec(thread) for using thread local storage. It is extremely convenient but makes it impossible to load the DLL with LoadLibrary or LoadLibraryEx. It can also cause non-standard behavior when an application uses delay loading on the DLL.
At this moment you are probably thinking that there isn't much infrastructure left to use, and you are right. As long as you use only C-style functions that pass data and pointers back and forth, you are good to go.
By now you might think, "What programming languages do not support function pointers, complex structures, or a .NET interface?" If you go beyond the IT and consumer software, many non-Microsoft tools and programming languages have their own way of doing things, as well as their own internal data-types. The LabVIEW programming language, for example, is widely used by NASA, ESA, and many other big companies to automate measurement systems. It has native drivers for all sorts of measurement hardware, image acquisition, motion control etc. It can use third-party DLLs to add functionality, but the DLL interface does not support function pointers or complex C-type structures, nor did it have a functional .NET interface until last year.
This is where your design decisions can make the difference between "Works flawlessly, immediately" and 'Works most of the time after jumping through hoops."
Nuance of the "Do Not Crash" Commandment
The requirement to prevent structured exceptions to leave the API boundary is a bit flexible. You should definitely catch all exceptions of which you can judge the consequences.
A good example of this is the exception that could be raised if you try to allocate a critical section. In that case, you know the cause and the implication, so you can handle this situation gracefully.
There are also situations in which you do not have this luxury.
Consider the situation where the API user passes a buffer pointer to the API for storage of incoming data. This pointer could point to memory located somewhere on the stack. Somehow the buffer size is calculated incorrectly, and the code in the API overflows the buffer with several thousands bytes of data. At that point, the stack is ruined, and all data structures in the overwritten memory area are corrupted. Sooner or later there will be an exception.
It is possible to surround all API function bodies with termination handlers, so that all exceptions are caught. But this will only hide the real problem.
The internal state of your API is completely unknown, so from that moment onward, the only safe thing to do would be to fail all API function calls.
The state of the application is also unknown, so anything it will do afterward can cause further data corruption or a secondary exception.
As a result, the best thing to do is to let the exception escape out of the API and crash the calling application. An immediate crash prevents further data corruption and also makes reproducing the problem easier for developers, testers, or users.
This concludes the first article in my API development series. Each follow-up article will focus on a specific topic that is relevant for API development; explaining techniques, best practices, and common pitfalls.
Properly designing and implementing a good API is a lot of work. You have to know that there is a good reason to make this effort. Otherwise, you might be tempted to cut corners that could cause problems later on. This is true for both user-mode and kernel-mode sides.
If you create an API that many different people will use, walking the straight and narrow is the only option for long-term success. Stray but a little, and you will have a doomed project on your hands sooner or later.
So before I could discuss the technical side of things, I needed to list the rules and constraints that you have to live by. If you can follow them, you are ready to think about implementation.
OSR Staff members contributed to this article.
Bruno van Dooren is a consultant for CIT Engineering nv and a Microsoft MVP, specializing in Visual C++. He can be reached at firstname.lastname@example.org. He also maintains a blog at www.msmvps.com/blogs/vanDooren Where he blogs about topics related to C++ and the Windows platform.