Recently, someone asked this innocuous question on the NTFSD forum, "What is the simplest hash technique you would use for storing file handles in a hash table?" This seemingly simple question provided some interesting "food for thought" about a topic that most of us, at one time or another, have to address. In this article we will first look at the general issues of hashing and then turn our attention to the specifics of hashing, offering up a simple demonstration package we recently used.

** **

**Hashing Fundamentals**

A hash table is a mechanism for organizing information so it is easy to find. For example, the original motivating question mentioned above related to file handles. In Windows, file handles are **ULONG_PTR** values, thus 32 or 64 bits on the various supported platforms. Even this smaller value gives us 2^{32 }values to consider. Since most of these values won't be in use, we need to have some easy way to determine whether or not a handle is in our table. The purpose of the hashing operation is to provide a quick way to look up something based upon its key.

If you know your data set extremely well, you can actually tailor the hash function to be *perfect*. In other words, you will never compute the hash value for two different input keys and come up with the same output. Frequently, however, we must deal with hash functions that are not perfect, providing us with two different input keys that have the same hash value. When this happens we are said to have a *collision*. There are various techniques for avoiding collisions such as having a series of possible hash values or changing to a different mechanism for resolving collisions.

Collisions are more frequent as the size of the data set grows relative to the size of the hash value. If you use a one byte hash value (256 possible values) and 1024 entries, you would expect a high degree of collisions. If your hash function is good, it will yield four entries per hash value. If your hash function is bad, you might end up using just one or two hash values.

Thus, when developing a practical hash table package, it is important to consider the following items:

Each of these issues should be considered prior to describing our sample hash table solution.

**Hash Table Size**

Fortunately, analyzing the size of the hash table is a fairly simple issue on the surface. The dominant factor is the size of the hash table relative to the number of entries expected in the hash table. The challenge is that sometimes it is difficult to predict the exact size of the data set. The original question posted on the NTFSD forum asking about hashing file handles becomes, "How many file handles will there be in the target system?" If this is a single process, then it is likely there will be a few dozen handles for the process and thus a modestly sized hash table should yield good results. On the other hand, if this is a hash of *file objects *on the entire system of a large server, such as a high end 64-processor IA64 box with 1TB of physical memory, then we'll need to use a much larger table that has perhaps 50,000 or more entries.

If your hashing may span both extremes, you may need to construct an extensible scheme. In the past, when constructing general purpose hash tables, one approach we used was to maintain a pre-computed list chosen from the first few hundred prime numbers. We would start with a number of hash buckets from the lowest entry in the table. When the size of the hash table exceeded the number of hash buckets by a significant percentage (e.g., 2x), then we'd rebuild the hash table using a bigger prime number. Thus, if the number of entries was very small, the table would be small. As the number of entries grew, the size of the table grew ‑ the size of the table grew logarithmically in size versus the number of entries. Think of this as mimicking the scatter of prime numbers as one wanders off to infinity. The hash value was actually computed in two parts. The first part computed a fixed size value (e.g., 32 bit) based upon the data itself and was normally a function pointer provided when the hash table was first initialized. The second part of the computation took this fixed size value and performed modulo arithmetic on it to find the actual hash bucket. Once the correct bucket was identified we would walk a doubly-linked list of entries to do full matching.

Prime numbers are popular for hash table sizes because modulo arithmetic over prime numbers has good characteristics since they form a *field* and thus have the same behavior as the integers (see http://en.wikibooks.org/wiki/Abstract_algebra:Fields for a description of this property of prime integer fields). If your collision management system is tied to doing a search of a sequence of locations, using prime numbers works extremely well.

However, developers don't traditionally like to use modulo arithmetic because it is so computationally expensive, even though this should be only one instruction in potentially many instructions. Thus, in many environments, this issue is not nearly as important now as it once was. However, the level of importance will depend on the usage pattern for your particular hash package.

Often it may be "good enough" to choose a fixed size table, which usually effectively shifts the complexity into handling collisions. For example, in one project we recently chose to go with a fixed size (256 entries) table because our goal was to minimize the lock contention on the table, not to minimize collisions. We'll discuss collisions later in the article.

** **

**Hashing Functions**

Having chosen your table size, it is equally important to choose a good hash function. Ideally, you want the chosen hash function to "scatter" the values so similar input values yield different hash values. Choosing a good hash function is not immediately obvious because it requires some insight into the actual data set. For example, when hashing file object addresses, we noted that the low order two bits and the upper bit have essentially no information value due to the way memory is allocated on the system. Thus, we decided that a "good enough" algorithm was to take the middle two bytes from a 4 byte address and XOR them together. While not perfect, we concluded it was not only sufficient for our purposes, but also because it was very cheap to compute.

A search of the Internet seems to suggest there are numerous algorithms for hash functions. Much of the analysis relates to the cost of computing the hash and how good of a job it does at "scattering" similar input values. For example, cryptographic hash functions do an excellent job of scattering the value, but are computationally expensive. The article at http://burtleburtle.net/bob/hash/doobs.html does a good job of describing a broad variety of hash functions and also provides a model for analyzing them.

**Collisions**

Having chosen a hash function and deciding how to handle table sizing, the only remaining issue is how to handle collisions. There are several techniques that can be used, but we'll talk about the technique OSR normally uses - linked list or table.

The simplest technique is to use a linked list of entries. Thus, once we have identified the right hash "bucket," we use a linked list of some sort to look through each of the values in the bucket, performing a comparison until we find the right entry or run out of entries.

Linked lists work well for small lists of a dozen or so entries. If you have a 256 entry hash table, with each hash table containing a list of 10 entries, you have 2560 entries. This technique also works well with an extensible hash table scheme, which shows that generally a linked list is sufficient.

However, beyond that, it often makes sense to use a table lookup scheme. For example, Windows includes both an AVLhash bucket.

Note: There are other collision techniques available that are not mentioned in this article.

** **

**Locking**

One of the interesting motivators for some of our hash table work has been the need to parallelize locking. For example, we had a handle lookup table that was originally protected with a single lock around the entire table. During some performance analysis, we found that this lock was the source of considerable contention. To parallelize this we tied the locks to the individual buckets of the table itself. While most discussions of hashing ignore the need for parallel access, this may be an important consideration in the kernel environment, particularly if you expect to handle parallel activity such as would be seen in a storage level driver.

Thus, to protect our hash tables we used two ERESOURCE locks. One was a lock on the entire table, which allows one thread to traverse the entire table without worrying it will change. The second was a lock on the individual hash bucket. Therefore, we had the following three table lock options:

- Acquiring the "whole hash table" lock in exclusive mode, which allows access to anything in the table. This lock, which is used sparingly (e.g., volume dismount or shutdown cases), is for walking the whole table lock.
- Locking an individual hash table "bucket" for lookup. This requires holding the "whole hash table" lock in shared mode and the "bucket lock" in shared mode. In this mode, entries can be searched, but cannot be not inserted or removed. This lock allows concurrent lookups, which were our most common case.
- Locking an individual hash table "bucket" for insert/remove. This requires holding the "whole hash table" lock in shared mode and the "bucket lock" in exclusive mode. Entries can be searched, inserted, or removed, but only within a single thread at a time.

By tying our locking into the structure of our hash table package, we were able to optimize our performance within our specific environment.

** **

**Implementation**

The code for one realization of a hash table with integrated locking is posted with this article, on OSR Online (www.osronline.com). In providing this code, keep in mind that our goal was to provide you with a demonstration, not provide a definitive solution. We're sure there are better hash algorithms, better implementations, and quite likely a few bugs in *this* implementation. However, we've successfully used this code in a test file system we constructed. Note that we customized this solution to our specific situation. In fact, we used *three* hash tables in our test file system: one hashed based on a handle (file or directory), one hashed based on the 64-bit file ID, and one hashed based on the 128-bit object ID. Therefore, this is not a generic hash table package, but rather a package customized to the specifics of the problem we were solving.

** **

**Conclusions**

Conceptually hashing seems simple, but it is often difficult to balance between common cases (small hash tables) and uncommon cases (large hash tables). Our experience is that uncommon cases are often the more performance sensitive. The more you know about the size and layout of your data set and your performance needs, the better your implementation will perform.

All but the most complex hashing implementation should take no more than a few hours to put together since there are dozens of samples to use when constructing your own.