element searched for is equally likely to be any one of the elements in
the table, so the longer the list, the more likely that the search is for one
of its elements. Even so, the expected search time still turns out to be
Θ(1 + α).
Theorem 11.2
In a hash table in which collisions are resolved by chaining, a successful
search takes Θ(1 + α) time on average, under the assumption of
independent uniform hashing.
Proof We assume that the element being searched for is equally likely
to be any of the n elements stored in the table. The number of elements
examined during a successful search for an element x is 1 more than the
number of elements that appear before x in x’s list. Because new elements are placed at the front of the list, elements before x in the list
were all inserted after x was inserted. Let xi denote the i th element inserted into the table, for i = 1, 2, …, n, and let ki = xi.key.
Our analysis uses indicator random variables extensively. For each
slot q in the table and for each pair of distinct keys ki and kj, we define the indicator random variable
Xijq = I {the search is for xi, h( ki) = q, and h( kj) = q}.


That is, Xijq = 1 when keys ki and kj collide at slot q and the search is for element xi. Because Pr{the search is for xi} = 1/ n, Pr{ h( ki) = q} =
1/ m, Pr{ h( kj) = q} = 1/ m, and these events are all independent, we have that Pr{ Xijq = 1} = 1/ nm 2. Lemma 5.1 on page 130 gives E[ Xijq] =
1/ nm 2.
Next, we define, for each element xj, the indicator random variable
Yj = I { xj appears in a list prior to the element being searched for}
=
,
since at most one of the Xijq equals 1, namely when the element xi being searched for belongs to the same list as xj (pointed to by slot q), and i < j (so that xi appears after xj in the list).
Our final random variable is Z, which counts how many elements
appear in the list prior to the element being searched for:
Because we must count the element being searched for as well as all
those preceding it in its list, we wish to compute E[ Z + 1]. Using linearity of expectation (equation (C.24) on page 1192), we have
Thus, the total time required for a successful search (including the time for computing the hash function) is Θ(2 + α/ 2 − α/ 2 n) = Θ(1 + α).
▪
What does this analysis mean? If the number of elements in the table
is at most proportional to the number of hash-table slots, we have n =
O( m) and, consequently, α = n/ m = O( m)/ m = O(1). Thus, searching takes constant time on average. Since insertion takes O(1) worst-case
time and deletion takes O(1) worst-case time when the lists are doubly
linked (assuming that the list element to be deleted is known, and not
just its key), we can support all dictionary operations in O(1) time on
average.
The analysis in the preceding two theorems depends only on two
essential properties of independent uniform hashing: uniformity (each
key is equally likely to hash to any one of the m slots), and
independence (so any two distinct keys collide with probability 1/ m).
Exercises
11.2-1
You use a hash function h to hash n distinct keys into an array T of length m. Assuming independent uniform hashing, what is the expected
number of collisions? More precisely, what is the expected cardinality of
{{ k 1, k 2} : k 1 ≠ k 2 and h( k 1) = h( k 2)}?
11.2-2
Consider a hash table with 9 slots and the hash function h( k) = k mod 9.
Demonstrate what happens upon inserting the keys 5, 28, 19, 15, 20, 33,
12, 17, 10 with collisions resolved by chaining.
11.2-3
Professor Marley hypothesizes that he can obtain substantial
performance gains by modifying the chaining scheme to keep each list
in sorted order. How does the professor’s modification affect the
running time for successful searches, unsuccessful searches, insertions,
and deletions?
Suggest how to allocate and deallocate storage for elements within the
hash table itself by creating a “free list”: a linked list of all the unused
slots. Assume that one slot can store a flag and either one element plus a
pointer or two pointers. All dictionary and free-list operations should
run in O(1) expected time. Does the free list need to be doubly linked, or
does a singly linked free list suffice?
11.2-5
You need to store a set of n keys in a hash table of size m. Show that if the keys are drawn from a universe U with | U| > ( n − 1) m, then U has a subset of size n consisting of keys that all hash to the same slot, so that
the worst-case searching time for hashing with chaining is Θ( n).
11.2-6
You have stored n keys in a hash table of size m, with collisions resolved by chaining, and you know the length of each chain, including the
length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns
it in expected time O( L · (1 + 1/ α)).
For hashing to work well, it needs a good hash function. Along with
being efficiently computable, what properties does a good hash function
have? How do you design good hash functions?
This section first attempts to answer these questions based on two ad
hoc approaches for creating hash functions: hashing by division and
hashing by multiplication. Although these methods work well for some
sets of input keys, they are limited because they try to provide a single
fixed hash function that works well on any data—an approach called
static hashing.
We then see that provably good average-case performance for any
data can be obtained by designing a suitable family of hash functions and choosing a hash function at random from this family at runtime,
independent of the data to be hashed. The approach we examine is
called random hashing. A particular kind of random hashing, universal
hashing, works well. As we saw with quicksort in Chapter 7,
randomization is a powerful algorithmic design tool.
What makes a good hash function?
A good hash function satisfies (approximately) the assumption of
independent uniform hashing: each key is equally likely to hash to any
of the m slots, independently of where any other keys have hashed to.
What does “equally likely” mean here? If the hash function is fixed, any
probabilities would have to be based on the probability distribution of
the input keys.
Unfortunately, you typically have no way to check this condition,
unless you happen to know the probability distribution from which the
keys are drawn. Moreover, the keys might not be drawn independently.
Occasionally you might know the distribution. For example, if you
know that the keys are random real numbers k independently and
uniformly distributed in the range 0 ≤ k < 1, then the hash function
h( k) = ⌊ km⌊
satisfies the condition of independent uniform hashing.
A good static hashing approach derives the hash value in a way that
you expect to be independent of any patterns that might exist in the
data. For example, the “division method” (discussed in Section 11.3.1) computes the hash value as the remainder when the key is divided by a
specified prime number. This method may give good results, if you
(somehow) choose a prime number that is unrelated to any patterns in
the distribution of keys.
Random hashing, described in Section 11.3.2, picks the hash function to be used at random from a suitable family of hashing
functions. This approach removes any need to know anything about the
probability distribution of the input keys, as the randomization
necessary for good average-case behavior then comes from the (known)
random process used to pick the hash function from the family of hash
functions, rather than from the (unknown) process used to create the
input keys. We recommend that you use random hashing.
Keys are integers, vectors, or strings
In practice, a hash function is designed to handle keys that are one of
the following two types:
A short nonnegative integer that fits in a w-bit machine word.
Typical values for w would be 32 or 64.
A short vector of nonnegative integers, each of bounded size. For
example, each element might be an 8-bit byte, in which case the
vector is often called a (byte) string. The vector might be of
variable length.
To begin, we assume that keys are short nonnegative integers. Handling
vector keys is more complicated and discussed in Sections 11.3.5 and
11.3.1 Static hashing
Static hashing uses a single, fixed hash function. The only
randomization available is through the (usually unknown) distribution
of input keys. This section discusses two standard approaches for static
hashing: the division method and the multiplication method. Although
static hashing is no longer recommended, the multiplication method
also provides a good foundation for “nonstatic” hashing—better known
as random hashing—where the hash function is chosen at random from
a suitable family of hash functions.
The division method
The division method for creating hash functions maps a key k into one of m slots by taking the remainder of k divided by m. That is, the hash function is
h( k) = k mod m.
For example, if the hash table has size m = 12 and the key is k = 100,
then h( k) = 4. Since it requires only a single division operation, hashing by division is quite fast.
The division method may work well when m is a prime not too close
to an exact power of 2. There is no guarantee that this method provides
good average-case performance, however, and it may complicate
applications since it constrains the size of the hash tables to be prime.
The multiplication method
The general multiplication method for creating hash functions operates
in two steps. First, multiply the key k by a constant A in the range 0 < A
< 1 and extract the fractional part of kA. Then, multiply this value by m and take the floor of the result. That is, the hash function is
h( k) = ⌊ m ( kA mod 1)⌊,
where “kA mod 1” means the fractional part of kA, that is, kA − ⌊ kA⌊.
The general multiplication method has the advantage that the value of
m is not critical and you can choose it independently of how you choose
the multiplicative constant A.
Figure 11.4 The multiply-shift method to compute a hash function. The w-bit representation of the key k is multiplied by the w-bit value a = A · 2 w. The ℓ highest-order bits of the lower w-bit half of the product form the desired hash value ha( k).
The multiply-shift method
In practice, the multiplication method is best in the special case where
the number m of hash-table slots is an exact power of 2, so that m = 2 ℓ
for some integer ℓ, where ℓ ≤ w and w is the number of bits in a machine
word. If you choose a fixed w-bit positive integer a = A 2 w, where 0 < A
< 1 as in the multiplication method so that a is in the range 0 < a < 2 w, you can implement the function on most computers as follows. We
assume that a key k fits into a single w-bit word.
Referring to Figure 11.4, first multiply k by the w-bit integer a. The result is a 2 w-bit value r 12 w + r 0, where r 1 is the high-order w-bit word of the product and r 0 is the low-order w-bit word of the product. The
desired ℓ -bit hash value consists of the ℓ most significant bits of r 0.
(Since r 1 is ignored, the hash function can be implemented on a
computer that produces only a w-bit product given two w-bit inputs, that is, where the multiplication operation computes modulo 2 w.)
In other words, you define the hash function h = ha, where
for a fixed nonzero w-bit value a. Since the product ka of two w-bit words occupies 2 w bits, taking this product modulo 2 w zeroes out the
high-order w bits ( r 1), leaving only the low-order w bits ( r 0). The ⋙
operator performs a logical right shift by w − ℓ bits, shifting zeros into the vacated positions on the left, so that the ℓ most significant bits of r 0
move into the ℓ rightmost positions. (It’s the same as dividing by 2 w− ℓ
and taking the floor of the result.) The resulting value equals the ℓ most
significant bits of r 0. The hash function ha can be implemented with three machine instructions: multiplication, subtraction, and logical right
shift.
As an example, suppose that k = 123456, ℓ = 14, m = 214 = 16384, and w = 32. Suppose further that we choose a = 2654435769 (following
a suggestion of Knuth [261]). Then ka = 327706022297664 = (76300 ·
232) + 17612864, and so r 1 = 76300 and r 0 = 17612864. The 14 most
significant bits of r 0 yield the value ha( k) = 67.
Even though the multiply-shift method is fast, it doesn’t provide any
guarantee of good average-case performance. The universal hashing
approach presented in the next section provides such a guarantee. A simple randomized variant of the multiply-shift method works well on
the average, when the program begins by picking a as a randomly
chosen odd integer.
11.3.2 Random hashing
Suppose that a malicious adversary chooses the keys to be hashed by
some fixed hash function. Then the adversary can choose n keys that all
hash to the same slot, yielding an average retrieval time of Θ( n). Any static hash function is vulnerable to such terrible worst-case behavior.
The only effective way to improve the situation is to choose the hash
function randomly in a way that is independent of the keys that are actually going to be stored. This approach is called random hashing. A
special case of this approach, called universal hashing, can yield provably
good performance on average when collisions are handled by chaining,
no matter which keys the adversary chooses.
To use random hashing, at the beginning of program execution you
select the hash function at random from a suitable family of functions.
As in the case of quicksort, randomization guarantees that no single
input always evokes worst-case behavior. Because you randomly select
the hash function, the algorithm can behave differently on each
execution, even for the same set of keys to be hashed, guaranteeing
good average-case performance.
Let H be a finite family of hash functions that map a given universe
U of keys into the range {0, 1, …, m − 1}. Such a family is said to be
universal if for each pair of distinct keys k 1, k 2 ∈ U, the number of hash functions h ∈ H for which h( k 1) = h( k 2) is at most |H|/ m. In other words, with a hash function randomly chosen from H, the chance of a
collision between distinct keys k 1 and k 2 is no more than the chance 1/ m of a collision if h( k 1) and h( k 2) were randomly and independently chosen from the set {0, 1, …, m − 1}.
Independent uniform hashing is the same as picking a hash function
uniformly at random from a family of mn hash functions, each member
of that family mapping the n keys to the m hash values in a different way.Every independent uniform random family of hash function is
universal, but the converse need not be true: consider the case where U
= {0, 1, …, m − 1} and the only hash function in the family is the identity function. The probability that two distinct keys collide is zero,
even though each key is hashes to a fixed value.
The following corollary to Theorem 11.2 on page 279 says that
universal hashing provides the desired payoff: it becomes impossible for
an adversary to pick a sequence of operations that forces the worst-case
running time.
Corollary 11.3
Using universal hashing and collision resolution by chaining in an
initially empty table with m slots, it takes Θ( s) expected time to handle any sequence of s INSERT, SEARCH, and DELETE operations
containing n = O( m) INSERT operations.
Proof The INSERT and DELETE operations take constant time.
Since the number n of insertions is O( m), we have that α = O(1).
Furthermore, the expected time for each SEARCH operation is O(1),
which can be seen by examining the proof of Theorem 11.2. That
analysis depends only on collision probabilities, which are 1/ m for any
pair k 1, k 2 of keys by the choice of an independent uniform hash function in that theorem. Using a universal family of hash functions
here instead of using independent uniform hashing changes the
probability of collision from 1/ m to at most 1/ m. By linearity of expectation, therefore, the expected time for the entire sequence of s
operations is O( s). Since each operation takes Ω(1) time, the Θ( s) bound follows.
▪
11.3.3 Achievable properties of random hashing
There is a rich literature on the properties a family H of hash functions
can have, and how they relate to the efficiency of hashing. We
summarize a few of the most interesting ones here.
Let H be a family of hash functions, each with domain U and range
{0, 1, …, m − 1}, and let h be any hash function that is picked uniformly at random from H. The probabilities mentioned are probabilities over
the picks of h.
The family H is uniform if for any key k in U and any slot q in the range {0, 1, …, m − 1}, the probability that h( k) = q is 1/ m.
The family H is universal if for any distinct keys k 1 and k 2 in U, the probability that h( k 1) = h( k 2) is at most 1/ m.
The family H of hash functions is ϵ-universal if for any distinct
keys k 1 and k 2 in U, the probability that h( k 1) = h( k 2) is at most ϵ. Therefore, a universal family of hash functions is also 1/ m-
universal.2
The family H is d-independent if for any distinct keys k 1, k 2, …, kd in U and any slots q 1, q 2, …, qd, not necessarily distinct, in {0, 1, …, m − 1} the probability that h( ki) = qi for i = 1, 2, …, d is 1/ md.
Universal hash-function families are of particular interest, as they are
the simplest type supporting provably efficient hash-table operations for
any input data set. Many other interesting and desirable properties, such
as those noted above, are also possible and allow for efficient specialized
hash-table operations.
11.3.4 Designing a universal family of hash functions
This section present two ways to design a universal (or ϵ-universal)
family of hash functions: one based on number theory and another
based on a randomized variant of the multiply-shift method presented
in Section 11.3.1. The first method is a bit easier to prove universal, but the second method is newer and faster in practice.
A universal family of hash functions based on number theory



We can design a universal family of hash functions using a little number
theory. You may wish to refer to Chapter 31 if you are unfamiliar with basic concepts in number theory.
Begin by choosing a prime number p large enough so that every
possible key k lies in the range 0 to p − 1, inclusive. We assume here that p has a “reasonable” length. (See Section 11.3.5 for a discussion of methods for handling long input keys, such as variable-length strings.)
Let ℤ p denote the set {0, 1, …, p − 1}, and let denote the set {1, 2,
…, p − 1}. Since p is prime, we can solve equations modulo p with the methods given in Chapter 31. Because the size of the universe of keys is greater than the number of slots in the hash table (otherwise, just use
direct addressing), we have p > m.
Given any
and any b ∈ ℤ p, define the hash function hab as a
linear transformation followed by reductions modulo p and then
modulo m:
For example, with p = 17 and m = 6, we have
h 3,4(8) = ((3 · 8 + 4) mod 17) mod 6
= (28 mod 17) mod 6
= 11 mod 6
= 5.
Given p and m, the family of all such hash functions is
Each hash function hab maps ℤ p to ℤ m. This family of hash functions has the nice property that the size m of the output range (which is the
size of the hash table) is arbitrary—it need not be prime. Since you can
choose from among p − 1 values for a and p values for b, the family H pm contains p( p − 1) hash functions.
Theorem 11.4
The family H pm of hash functions defined by equations (11.3) and
(11.4) is universal.
Proof Consider two distinct keys k 1 and k 2 from ℤ p, so that k 1 ≠ k 2.
For a given hash function hab, let
r 1 = ( ak 1 + b) mod p,
r 2 = ( ak 2 + b) mod p.
We first note that r 1 ≠ r 2. Why? Since we have r 1 − r 2 = a( k 1 − k 2) (mod p), it follows that r 1 ≠ r 2 because p is prime and both a and ( k 1 − k 2) are nonzero modulo p. By Theorem 31.6 on page 908, their product must
also be nonzero modulo p. Therefore, when computing any hab ∈
H pm, distinct inputs k 1 and k 2 map to distinct values r 1 and r 2
modulo p, and there are no collisions yet at the “mod p level.”
Moreover, each of the possible p( p − 1) choices for the pair ( a, b) with a
≠ 0 yields a different resulting pair ( r 1, r 2) with r 1 ≠ r 2, since we can solve for a and b given r 1 and r 2:
a = (( r − r 2)(( k 1 − k 2)−1 mod p)) mod p, b = ( r 1 − ak 1) mod p,
where (( k 1 − k 2)−1 mod p) denotes the unique multiplicative inverse, modulo p, of k 1 − k 2. For each of the p possible values of r 1, there are only p − 1 possible values of r 2 that do not equal r 1, making only p( p −
1) possible pairs ( r 1, r 2) with r 1 ≠ r 2. Therefore, there is a one-to-one correspondence between pairs ( a, b) with a ≠ 0 and pairs ( r 1, r 2) with r 1
≠ r 2. Thus, for any given pair of distinct inputs k 1 and k 2, if we pick ( a, b) uniformly at random from
, the resulting pair ( r 1, r 2) is
equally likely to be any pair of distinct values modulo p.
Therefore, the probability that distinct keys k 1 and k 2 collide is equal to the probability that r 1 = r 2 (mod m) when r 1 and r 2 are randomly

chosen as distinct values modulo p. For a given value of r 1, of the p − 1
possible remaining values for r 2, the number of values r 2 such that r 2 ≠
r 1 and r 2 = r 1 (mod m) is at most
The probability that r 2 collides with r 1 when reduced modulo m is at most (( p − 1)/ m)/( p − 1) = 1/ m, since r 2 is equally likely to be any of the p − 1 values in Zp that are different from r 1, but at most ( p − 1)/ m of those values are equivalent to r 1 modulo m.
Therefore, for any pair of distinct values k 1, k 2 ∈ ℤ p,
Pr{ hab( k 1) = hab( k 2)} ≤ 1/ m,
so that H pm is indeed universal.
▪
A 2/ m-universal family of hash functions based on the multiply-shift
method
We recommend that in practice you use the following hash-function
family based on the multiply-shift method. It is exceptionally efficient
and (although we omit the proof) provably 2/ m-universal. Define H to
be the family of multiply-shift hash functions with odd constants a:
Theorem 11.5
The family of hash functions H given by equation (11.5) is 2/ m-
universal.
▪
That is, the probability that any two distinct keys collide is at most
2/ m. In many practical situations, the speed of computing the hash
function more than compensates for the higher upper bound on the probability that two distinct keys collide when compared with a
universal hash function.
11.3.5 Hashing long inputs such as vectors or strings
Sometimes hash function inputs are so long that they cannot be easily
encoded modulo a reasonably sized prime number p or encoded within
a single word of, say, 64 bits. As an example, consider the class of
vectors, such as vectors of 8-bit bytes (which is how strings in many
programming languages are stored). A vector might have an arbitrary
nonnegative length, in which case the length of the input to the hash
function may vary from input to input.
Number-theoretic approaches
One way to design good hash functions for variable-length inputs is to
extend the ideas used in Section 11.3.4 to design universal hash functions. Exercise 11.3-6 explores one such approach.
Cryptographic hashing
Another way to design a good hash function for variable-length inputs
is to use a hash function designed for cryptographic applications.
Cryptographic hash functions are complex pseudorandom functions,
designed for applications requiring properties beyond those needed
here, but are robust, widely implemented, and usable as hash functions
for hash tables.
A cryptographic hash function takes as input an arbitrary byte string
and returns a fixed-length output. For example, the NIST standard
deterministic cryptographic hash function SHA-256 [346] produces a 256-bit (32-byte) output for any input.
Some chip manufacturers include instructions in their CPU
architectures to provide fast implementations of some cryptographic
functions. Of particular interest are instructions that efficiently
implement rounds of the Advanced Encryption Standard (AES), the
“AES-NI” instructions. These instructions execute in a few tens of
nanoseconds, which is generally fast enough for use with hash tables. A
message authentication code such as CBC-MAC based on AES and the
use of the AES-NI instructions could be a useful and efficient hash
function. We don’t pursue the potential use of specialized instruction
sets further here.
Cryptographic hash functions are useful because they provide a way
of implementing an approximate version of a random oracle. As noted
earlier, a random oracle is equivalent to an independent uniform hash
function family. From a theoretical point of view, a random oracle is an
unachievable ideal: a deterministic function that provides a randomly
selected output for each input. Because it is deterministic, it provides
the same output if queried again for the same input. From a practical
point of view, constructions of hash function families based on
cryptographic hash functions are sensible substitutes for random
oracles.
There are many ways to use a cryptographic hash function as a hash
function. For example, we could define
h( k) = SHA-256( k) mod m.
To define a family of such hash functions one may prepend a “salt”
string a to the input before hashing it, as in
ha( k) = SHA-256( a ‖ k) mod m,
where a ‖ k denotes the string formed by concatenating the strings a and k. The literature on message authentication codes (MACs) provides
additional approaches.
Cryptographic approaches to hash-function design are becoming
more practical as computers arrange their memories in hierarchies of
differing capacities and speeds. Section 11.5 discusses one hash-function design based on the RC6 encryption method.
Exercises
11.3-1
You wish to search a linked list of length n, where each element contains
a key k along with a hash value h( k). Each key is a long character string.
How might you take advantage of the hash values when searching the
list for an element with a given key?
11.3-2
You hash a string of r characters into m slots by treating it as a radix-128 number and then using the division method. You can represent the
number m as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can you apply
the division method to compute the hash value of the character string
without using more than a constant number of words of storage outside
the string itself?
11.3-3
Consider a version of the division method in which h( k) = k mod m, where m = 2 p − 1 and k is a character string interpreted in radix 2 p.
Show that if string x can be converted to string y by permuting its characters, then x and y hash to the same value. Give an example of an
application in which this property would be undesirable in a hash
function.
11.3-4
Consider a hash table of size m = 1000 and a corresponding hash
function h( k) = ⌊ m ( kA mod 1)⌊ for
. Compute the
locations to which the keys 61, 62, 63, 64, and 65 are mapped.
★ 11.3-5
Show that any ϵ-universal family H of hash functions from a finite set
U to a finite set Q has ϵ ≥ 1/| Q| − 1/| U|.
★ 11.3-6
Let U be the set of d-tuples of values drawn from ℤ p, and let Q = ℤ p, where p is prime. Define the hash function hb : U → Q for b ∈ ℤ p on an input d-tuple 〈 a 0, a 1, …, ad−1〉 from U as
and let H = { hb : b ∈ ℤ p}. Argue that H is ϵ-universal for ϵ = ( d −
1)/ p. ( Hint: See Exercise 31.4-4.)
This section describes open addressing, a method for collision
resolution that, unlike chaining, does not make use of storage outside of
the hash table itself. In open addressing, all elements occupy the hash table itself. That is, each table entry contains either an element of the
dynamic set or NIL. No lists or elements are stored outside the table,
unlike in chaining. Thus, in open addressing, the hash table can “fill up”
so that no further insertions can be made. One consequence is that the
load factor α can never exceed 1.
Collisions are handled as follows: when a new element is to be
inserted into the table, it is placed in its “first-choice” location if
possible. If that location is already occupied, the new element is placed
in its “second-choice” location. The process continues until an empty
slot is found in which to place the new element. Different elements have
different preference orders for the locations.
To search for an element, systematically examine the preferred table
slots for that element, in order of decreasing preference, until either you
find the desired element or you find an empty slot and thus verify that
the element is not in the table.
Of course, you could use chaining and store the linked lists inside the
hash table, in the otherwise unused hash-table slots (see Exercise 11.2-
4), but the advantage of open addressing is that it avoids pointers
altogether. Instead of following pointers, you compute the sequence of
slots to be examined. The memory freed by not storing pointers
provides the hash table with a larger number of slots in the same
amount of memory, potentially yielding fewer collisions and faster
retrieval.
To perform insertion using open addressing, successively examine, or
probe, the hash table until you find an empty slot in which to put the
key. Instead of being fixed in the order 0, 1, …, m − 1 (which implies a
Θ( n) search time), the sequence of positions probed depends upon the
key being inserted. To determine which slots to probe, the hash function
includes the probe number (starting from 0) as a second input. Thus, the
hash function becomes
h : U × {0, 1, …, m − 1} → {0, 1, …, m − 1}.
Open addressing requires that for every key k, the probe sequence 〈 h( k, 0), h( k, 1), …, h( k, m − 1)〉 be a permutation of 〈0, 1, …, m − 1〉, so that every hash-table position is eventually considered as a slot for a new key
as the table fills up. The HASH-INSERT procedure on the following
page assumes that the elements in the hash table T are keys with no satellite information: the key k is identical to the element containing key
k. Each slot contains either a key or NIL (if the slot is empty). The HASH-INSERT procedure takes as input a hash table T and a key
k that is assumed to be not already present in the hash table. It either
returns the slot number where it stores key k or flags an error because
the hash table is already full.
HASH-INSERT( T, k)
1 i = 0
2 repeat
3
q = h( k, i)
4
if T[ q] == NIL
5
T[ q] = k
6
return q
7
else i = i + 1
8 until i == m
9 error “hash table overflow”
HASH-SEARCH( T, k)
1 i = 0
2 repeat
q = h( k, i)
4
if T[ q] == k
5
return q
6
i = i + 1
7 until T[ q] == NIL or i == m
8 return NIL
The algorithm for searching for key k probes the same sequence of
slots that the insertion algorithm examined when key k was inserted.
Therefore, the search can terminate (unsuccessfully) when it finds an
empty slot, since k would have been inserted there and not later in its
probe sequence. The procedure HASH-SEARCH takes as input a hash
table T and a key k, returning q if it finds that slot q contains key k, or NIL if key k is not present in table T.
Deletion from an open-address hash table is tricky. When you delete
a key from slot q, it would be a mistake to mark that slot as empty by
simply storing NIL in it. If you did, you might be unable to retrieve any
key k for which slot q was probed and found occupied when k was inserted. One way to solve this problem is by marking the slot, storing
in it the special value DELETED instead of NIL. The HASH-INSERT
procedure then has to treat such a slot as empty so that it can insert a
new key there. The HASH-SEARCH procedure passes over DELETED
values while searching, since slots containing DELETED were filled
when the key being searched for was inserted. Using the special value
DELETED, however, means that search times no longer depend on the
load factor α, and for this reason chaining is frequently selected as a collision resolution technique when keys must be deleted. There is a
simple special case of open addressing, linear probing, that avoids the
need to mark slots with DELETED. Section 11.5.1 shows how to delete from a hash table when using linear probing.
In our analysis, we assume independent uniform permutation hashing
(also confusingly known as uniform hashing in the literature): the probe
sequence of each key is equally likely to be any of the m! permutations
of 〈0, 1, …, m − 1〉. Independent uniform permutation hashing
generalizes the notion of independent uniform hashing defined earlier to
a hash function that produces not just a single slot number, but a whole probe sequence. True independent uniform permutation hashing is
difficult to implement, however, and in practice suitable approximations
(such as double hashing, defined below) are used.
We’ll examine both double hashing and its special case, linear
probing. These techniques guarantee that 〈 h( k, 0), h( k, 1), …, h( k, m −
1)〉 is a permutation of 〈0, 1, …, m − 1〉 for each key k. (Recall that the second parameter to the hash function h is the probe number.) Neither
double hashing nor linear probing meets the assumption of independent
uniform permutation hashing, however. Double hashing cannot
generate more than m 2 different probe sequences (instead of the m! that independent uniform permutation hashing requires). Nonetheless,
double hashing has a large number of possible probe sequences and, as
you might expect, seems to give good results. Linear probing is even
more restricted, capable of generating only m different probe sequences.
Double hashing
Double hashing offers one of the best methods available for open
addressing because the permutations produced have many of the
characteristics of randomly chosen permutations. Double hashing uses a
hash function of the form
h( k, i) = ( h 1( k) + ih 2( k)) mod m, where both h 1 and h 2 are auxiliary hash functions. The initial probe goes to position T[ h 1( k)], and successive probe positions are offset from previous positions by the amount h 2( k), modulo m. Thus, the probe sequence here depends in two ways upon the key k, since the initial probe position h 1( k), the step size h 2( k), or both, may vary. Figure 11.5
gives an example of insertion by double hashing.
In order for the entire hash table to be searched, the value h 2( k) must be relatively prime to the hash-table size m. (See Exercise 11.4-5.) A convenient way to ensure this condition is to let m be an exact power of
2 and to design h 2 so that it always produces an odd number. Another
way is to let m be prime and to design h 2 so that it always returns a positive integer less than m. For example, you could choose m prime and let
Figure 11.5 Insertion by double hashing. The hash table has size 13 with h 1( k) = k mod 13 and h 2( k) = 1 + ( k mod 11). Since 14 = 1 (mod 13) and 14 = 3 (mod 11), the key 14 goes into empty slot 9, after slots 1 and 5 are examined and found to be occupied.
h 1( k) = k mod m,
h 2( k) = 1 + ( k mod m′),
where m′ is chosen to be slightly less than m (say, m − 1). For example, if k = 123456, m = 701, and m′ = 700, then h 1( k) = 80 and h 2( k) = 257, so that the first probe goes to position 80, and successive probes examine
every 257th slot (modulo m) until the key has been found or every slot
has been examined.
Although values of m other than primes or exact powers of 2 can in
principle be used with double hashing, in practice it becomes more
difficult to efficiently generate h 2( k) (other than choosing h 2( k) = 1, which gives linear probing) in a way that ensures that it is relatively
prime to m, in part because the relative density ϕ( m)/ m of such numbers for general m may be small (see equation (31.25) on page 921).
When m is prime or an exact power of 2, double hashing produces
Θ( m 2) probe sequences, since each possible ( h 1( k), h 2( k)) pair yields a distinct probe sequence. As a result, for such values of m, double
hashing appears to perform close to the “ideal” scheme of independent
uniform permutation hashing.
Linear probing
Linear probing, a special case of double hashing, is the simplest open-
addressing approach to resolving collisions. As with double hashing, an
auxiliary hash function h 1 determines the first probe position h 1( k) for inserting an element. If slot T[ h 1( k)] is already occupied, probe the next position T[ h 1( k) + 1]. Keep going as necessary, on up to slot T[ m − 1], and then wrap around to slots T[0], T[1], and so on, but never going past slot T[ h 1( k) − 1]. To view linear probing as a special case of double hashing, just set the double-hashing step function h 2 to be fixed at 1:
h 2( k) = 1 for all k. That is, the hash function is
for i = 0, 1, …, m − 1. The value of h 1( k) determines the entire probe sequence, and so assuming that h 1( k) can take on any value in {0, 1, …, m − 1}, linear probing allows only m distinct probe sequences.
We’ll revisit linear probing in Section 11.5.1.
Analysis of open-address hashing
As in our analysis of chaining in Section 11.2, we analyze open addressing in terms of the load factor α = n/ m of the hash table. With open addressing, at most one element occupies each slot, and thus n ≤
m, which implies α ≤ 1. The analysis below requires α to be strictly less than 1, and so we assume that at least one slot is empty. Because
deleting from an open-address hash table does not really free up a slot,
we assume as well that no deletions occur.
For the hash function, we assume independent uniform permutation
hashing. In this idealized scheme, the probe sequence 〈 h( k, 0), h( k, 1),
…, h( k, m − 1)〉 used to insert or search for each key k is equally likely to be any permutation of 〈0, 1, …, m − 1〉. Of course, any given key has
a unique fixed probe sequence associated with it. What we mean here is
that, considering the probability distribution on the space of keys and
the operation of the hash function on the keys, each possible probe
sequence is equally likely.
We now analyze the expected number of probes for hashing with
open addressing under the assumption of independent uniform
permutation hashing, beginning with the expected number of probes
made in an unsuccessful search (assuming, as stated above, that α < 1).
The bound proven, of 1/(1 − α) = 1 + α + α 2 + α 3 + ⋯, has an intuitive interpretation. The first probe always occurs. With probability
approximately α, the first probe finds an occupied slot, so that a second
probe happens. With probability approximately α 2, the first two slots are occupied so that a third probe ensues, and so on.
Theorem 11.6
Given an open-address hash table with load factor α = n/ m < 1, the expected number of probes in an unsuccessful search is at most 1/(1 −
α), assuming independent uniform permutation hashing and no
deletions.
Proof In an unsuccessful search, every probe but the last accesses an
occupied slot that does not contain the desired key, and the last slot
probed is empty. Let the random variable X denote the number of
probes made in an unsuccessful search, and define the event Ai, for i =
1, 2, …, as the event that an i th probe occurs and it is to an occupied
slot. Then the event { X ≥ i} is the intersection of events A 1 ⋂ A 2 ⋂ ⋯
⋂ Ai−1. We bound Pr{ X ≥ i} by bounding Pr{ A 1 ⋂ A 2 ⋂ ⋯ ⋂ Ai−1}.
By Exercise C.2-5 on page 1190,
Pr{ A 1 ⋂ A 2 ⋂ ⋯ ⋂ = Pr{ A 1} · Pr{ A 2 | A 1} · Pr { A 3 | A 1 ⋂ A 2}
Ai−1}
⋯
Pr{ Ai−1 | A 1 ⋂ A 2 ⋂ ⋯ ⋂ Ai−2}.

Since there are n elements and m slots, Pr{ A 1} = n/ m. For j > 1, the probability that there is a j th probe and it is to an occupied slot, given
that the first j − 1 probes were to occupied slots, is ( n − j + 1)/( m − j +
1). This probability follows because the j th probe would be finding one
of the remaining ( n − ( j − 1)) elements in one of the ( m − ( j − 1)) unexamined slots, and by the assumption of independent uniform
permutation hashing, the probability is the ratio of these quantities.
Since n < m implies that ( n − j)/( m − j) ≤ n/ m for all j in the range 0 ≤ j < m, it follows that for all i in the range 1 ≤ i ≤ m, we have The product in the first line has i − 1 factors. When i = 1, the product is 1, the identity for multiplication, and we get Pr{ X ≥ 1} = 1, which makes sense, since there must always be at least 1 probe. If each of the
first n probes is to an occupied slot, then all occupied slots have been
probed. Then, the ( n + 1)st probe must be to an empty slot, which gives
Pr{ X ≥ i} = 0 for i > n + 1. Now, we use equation (C.28) on page 1193 to bound the expected number of probes:
▪
If α is a constant, Theorem 11.6 predicts that an unsuccessful search
runs in O(1) time. For example, if the hash table is half full, the average
number of probes in an unsuccessful search is at most 1/(1 − .5) = 2. If it
is 90% full, the average number of probes is at most 1/(1 − .9) = 10.
Theorem 11.6 yields almost immediately how well the HASH-
INSERT procedure performs.
Corollary 11.7
Inserting an element into an open-address hash table with load factor α,
where α < 1, requires at most 1/(1 − α) probes on average, assuming independent uniform permutation hashing and no deletions.
Proof An element is inserted only if there is room in the table, and thus
α < 1. Inserting a key requires an unsuccessful search followed by placing the key into the first empty slot found. Thus, the expected
number of probes is at most 1/(1 − α).
▪
It takes a little more work to compute the expected number of
probes for a successful search.
Theorem 11.8
Given an open-address hash table with load factor α < 1, the expected
number of probes in a successful search is at most
assuming independent uniform permutation hashing with no deletions
and assuming that each key in the table is equally likely to be searched
for.
Proof A search for a key k reproduces the same probe sequence as when the element with key k was inserted. If k was the ( i + 1)st key inserted into the hash table, then the load factor at the time it was
inserted was i/ m, and so by Corollary 11.7, the expected number of probes made in a search for k is at most 1/(1 − i/ m) = m/( m − i).
Averaging over all n keys in the hash table gives us the expected number
of probes in a successful search:
▪
If the hash table is half full, the expected number of probes in a
successful search is less than 1.387. If the hash table is 90% full, the
expected number of probes is less than 2.559. If α = 1, then in an unsuccessful search, all m slots must be probed. Exercise 11.4-4 asks you to analyze a successful search when α = 1.
Exercises
11.4-1
Consider inserting the keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash
table of length m = 11 using open addressing. Illustrate the result of inserting these keys using linear probing with h( k, i) = ( k + i) mod m and using double hashing with h 1( k) = k and h 2( k) = 1 + ( k mod ( m −
1)).
11.4-2
Write pseudocode for HASH-DELETE that fills the deleted key’s slot
with the special value DELETED, and modify HASH-SEARCH and
HASH-INSERT as needed to handle DELETED.
11.4-3
Consider an open-address hash table with independent uniform
permutation hashing and no deletions. Give upper bounds on the
expected number of probes in an unsuccessful search and on the
expected number of probes in a successful search when the load factor is
3/4 and when it is 7/8.
11.4-4
Show that the expected number of probes required for a successful
search when α = 1 (that is, when n = m), is Hm, the m th harmonic number.
★ 11.4-5
Show that, with double hashing, if m and h 2( k) have greatest common divisor d ≥ 1 for some key k, then an unsuccessful search for key k examines (1/ d)th of the hash table before returning to slot h 1( k). Thus, when d = 1, so that m and h 2( k) are relatively prime, the search may examine the entire hash table. ( Hint: See Chapter 31. )
★ 11.4-6
Consider an open-address hash table with a load factor α. Approximate
the nonzero value α for which the expected number of probes in an
unsuccessful search equals twice the expected number of probes in a
successful search. Use the upper bounds given by Theorems 11.6 and
11.8 for these expected numbers of probes.
Efficient hash table algorithms are not only of theoretical interest, but
also of immense practical importance. Constant factors can matter. For
this reason, this section discusses two aspects of modern CPUs that are
not included in the standard RAM model presented in Section 2.2:
Memory hierarchies: The memory of modern CPUs has a number of
levels, from the fast registers, through one or more levels of cache
memory, to the main-memory level. Each successive level stores more
data than the previous level, but access is slower. As a consequence, a complex computation (such as a complicated hash function) that
works entirely within the fast registers can take less time than a single
read operation from main memory. Furthermore, cache memory is
organized in cache blocks of (say) 64 bytes each, which are always
fetched together from main memory. There is a substantial benefit for
ensuring that memory usage is local: reusing the same cache block is
much more efficient than fetching a different cache block from main
memory.
The standard RAM model measures efficiency of a hash-table
operation by counting the number of hash-table slots probed. In
practice, this metric is only a crude approximation to the truth, since
once a cache block is in the cache, successive probes to that cache
block are much faster than probes that must access main memory.
Advanced instruction sets: Modern CPUs may have sophisticated
instruction sets that implement advanced primitives useful for
encryption or other forms of cryptography. These instructions may be
useful in the design of exceptionally efficient hash functions.
Section 11.5.1 discusses linear probing, which becomes the collision-resolution method of choice in the presence of a memory hierarchy.
Section 11.5.2 suggests how to construct “advanced” hash functions based on cryptographic primitives, suitable for use on computers with
hierarchical memory models.
11.5.1 Linear probing
Linear probing is often disparaged because of its poor performance in
the standard RAM model. But linear probing excels for hierarchical
memory models, because successive probes are usually to the same
cache block of memory.
Deletion with linear probing
Another reason why linear probing is often not used in practice is that
deletion seems complicated or impossible without using the special
DELETED value. Yet we’ll now see that deletion from a hash table
based on linear probing is not all that difficult, even without the DELETED marker. The deletion procedure works for linear probing,
but not for open-address probing in general, because with linear
probing keys all follow the same simple cyclic probing sequence (albeit
with different starting points).
The deletion procedure relies on an “inverse” function to the linear-
probing hash function h( k, i) = ( h 1( k) + i) mod m, which maps a key k and a probe number i to a slot number in the hash table. The inverse
function g maps a key k and a slot number q, where 0 ≤ q < m, to the probe number that reaches slot q:
g( k, q) = ( q − h 1( k)) mod m.
If h( k, i) = q, then g( k, q) = i, and so h( k, g( k, q)) = q.
The procedure LINEAR-PROBING-HASH-DELETE on the facing
page deletes the key stored in position q from hash table T. Figure 11.6
shows how it works. The procedure first deletes the key in position q by
setting T[ q] to NIL in line 2. It then searches for a slot q′ (if any) that contains a key that should be moved to the slot q just vacated by key k.
Line 9 asks the critical question: does the key k′ in slot q′ need to be moved to the vacated slot q in order to preserve the accessibility of k′? If g( k′, q) < g( k′, q′), then during the insertion of k′ into the table, slot q was examined but found to be already occupied. But now slot q, where a
search will look for k′, is empty. In this case, key k′ moves to slot q in line 10, and the search continues, to see whether any later key also needs
to be moved to the slot q′ that was just freed up when k′ moved.
Figure 11.6 Deletion in a hash table that uses linear probing. The hash table has size 10 with h 1( k) = k mod 10. (a) The hash table after inserting keys in the order 74, 43, 93, 18, 82, 38, 92.
(b) The hash table after deleting the key 43 from slot 3. Key 93 moves up to slot 3 to keep it accessible, and then key 92 moves up to slot 5 just vacated by key 93. No other keys need to be moved.
LINEAR-PROBING-HASH-DELETE( T, q)
1 while TRUE
2
T[ q] = NIL
// make slot q empty
3
q′ = q
// starting point for search
4
repeat
5
q′ = ( q′ + 1) mod m // next slot number with linear probing
6
k′ = T[ q′]
// next key to try to move
7
if k′ == NIL
8
return
// return when an empty slot is found
9
until g( k′, q) < g( k′, q′) // was empty slot q probed before q′?
10
T[ q] = k′
// move k′ into slot q
11
q = q′
// free up slot q′
Analysis of linear probing
Linear probing is popular to implement, but it exhibits a phenomenon
known as primary clustering. Long runs of occupied slots build up,
increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability ( i + 1)/ m. Long runs of occupied slots tend to get longer, and the average search time
increases.
In the standard RAM model, primary clustering is a problem, and
general double hashing usually performs better than linear probing. By
contrast, in a hierarchical memory model, primary clustering is a
beneficial property, as elements are often stored together in the same
cache block. Searching proceeds through one cache block before
advancing to search the next cache block. With linear probing, the
running time for a key k of HASH-INSERT, HASH-SEARCH, or
LINEAR-PROBING-HASH-DELETE is at most proportional to the
distance from h 1( k) to the next empty slot.
The following theorem is due to Pagh et al. [351]. A more recent proof is given by Thorup [438]. We omit the proof here. The need for 5-independence is by no means obvious; see the cited proofs.
Theorem 11.9
If h 1 is 5-independent and α ≤ 2/3, then it takes expected constant time to search for, insert, or delete a key in a hash table using linear probing.
▪
(Indeed, the expected operation time is O(1/ ϵ 2) for α = 1 − ϵ.)
★ 11.5.2 Hash functions for hierarchical memory models
This section illustrates an approach for designing efficient hash tables in
a modern computer system having a memory hierarchy.
Because of the memory hierarchy, linear probing is a good choice for
resolving collisions, as probe sequences are sequential and tend to stay
within cache blocks. But linear probing is most efficient when the hash
function is complex (for example, 5-independent as in Theorem 11.9).
Fortunately, having a memory hierarchy means that complex hash
functions can be implemented efficiently.
As noted in Section 11.3.5, one approach is to use a cryptographic hash function such as SHA-256. Such functions are complex and


sufficiently random for hash table applications. On machines with
specialized instructions, cryptographic functions can be quite efficient.
Instead, we present here a simple hash function based only on
addition, multiplication, and swapping the halves of a word. This
function can be implemented entirely within the fast registers, and on a
machine with a memory hierarchy, its latency is small compared with
the time taken to access a random slot of the hash table. It is related to
the RC6 encryption algorithm and can for practical purposes be
considered a “random oracle.”
The wee hash function
Let w denote the word size of the machine (e.g., w = 64), assumed to be even, and let a and b be w-bit unsigned (nonnegative) integers such that a is odd. Let swap( x) denote the w-bit result of swapping the two w/ 2-bit halves of w-bit input x. That is,
swap( x) = ( x ⋙ ( w/2)) + ( x ⋘ ( w/2))
where “⋙” is “logical right shift” (as in equation (11.2)) and “⋘ is
“left shift.” Define
fa( k) = swap((2 k 2 + ak) mod 2 w).
Thus, to compute fa( k), evaluate the quadratic function 2 k 2 + ak modulo 2 w and then swap the left and right halves of the result.
Let r denote a desired number of “rounds” for the computation of
the hash function. We’ll use r = 4, but the hash function is well defined
for any nonnegative r. Denote by
the result of iterating fa a total
of r times (that is, r rounds) starting with input value k. For any odd a and any r ≥ 0, the function
, although complicated, is one-to-one (see
Exercise 11.5-1). A cryptographer would view
as a simple block
cipher operating on w-bit input blocks, with r rounds and key a.
We first define the wee hash function h for short inputs, where by
“short” we means “whose length t is at most w-bits,” so that the input
fits within one computer word. We would like inputs of different lengths

to be hashed differently. The wee hash function ha, b, t, r( k) for parameters a, b, and r on t-bit input k is defined as
That is, the hash value for t-bit input k is obtained by applying
to k
+ b, then taking the final result modulo m. Adding the value b provides hash-dependent randomization of the input, in a way that ensures that
for variable-length inputs the 0-length input does not have a fixed hash
value. Adding the value 2 t to a ensures that the hash function acts differently for inputs of different lengths. (We use 2 t rather than t to ensure that the key a + 2 t is odd if a is odd.) We call this hash function
“wee” because it uses a tiny amount of memory—more precisely, it can
be implemented efficiently using only the computer’s fast registers. (This
hash function does not have a name in the literature; it is a variant we
developed for this textbook.)
Speed of the wee hash function
It is surprising how much efficiency can be bought with locality.
Experiments (unpublished, by the authors) suggest that evaluating the
wee hash function takes less time than probing a single randomly chosen
slot in a hash table. These experiments were run on a laptop (2019
MacBook Pro) with w = 64 and a = 123. For large hash tables, evaluating the wee hash function was 2 to 10 times faster than
performing a single probe of the hash table.
The wee hash function for variable-length inputs
Sometimes inputs are long—more than one w-bit word in length—or
have variable length, as discussed in Section 11.3.5. We can extend the wee hash function, defined above for inputs that are at most single w-bit
word in length, to handle long or variable-length inputs. Here is one
method for doing so.
Suppose that an input k has length t (measured in bits). Break k into a sequence 〈 k 1, k 2, …, ku〉 of w-bit words, where u = ⌈ t/ w⌉, k 1 contains the least-significant w bits of k, and ku contains the most significant
bits. If t is not a multiple of w, then ku contains fewer than w bits, in which case, pad out the unused high-order bits of ku with 0-bits. Define
the function chop to return a sequence of the w-bit words in k:
chop( k) = 〈 k 1, k 2, …, ku〉.
The most important property of the chop operation is that it is one-to-
one, given t: for any two t-bit keys k and k′, if k ≠ k′ then chop( k) ≠
chop( k′), and k can be derived from chop( k) and t. The chop operation also has the useful property that a single-word input key yields a single-word output sequence: chop( k) = 〈 k〉.
With the chop function in hand, we specify the wee hash function
ha, b, t, r( k) for an input k of length t bits as follows: ha, b, t, r( k) = WEE( k, a, b, t, r, m), where the procedure WEE defined on the facing page iterates through
the elements of the w-bit words returned by chop( k), applying to the
sum of the current word ki and the previously computed hash value so
far, finally returning the result obtained modulo m. This definition for
variable-length and long (multiple-word) inputs is a consistent extension
of the definition in equation (11.7) for short (single-word) inputs. For
practical use, we recommend that a be a randomly chosen odd w-bit word, b be a randomly chosen w-bit word, and that r = 4.
Note that the wee hash function is really a hash function family, with
individual hash functions determined by parameters a, b, t, r, and m.
The (approximate) 5-independence of the wee hash function family for
variable-length inputs can be argued based on the assumption that the
1-word wee hash function is a random oracle and on the security of the
cipher-block-chaining message authentication code (CBC-MAC), as
studied by Bellare et al. [42]. The case here is actually simpler than that studied in the literature, since if two messages have different lengths t and t′, then their “keys” are different: a + 2 t ≠ a + 2 t′. We omit the details.
WEE( k, a, b, t, r, m)




1 u = ⌈ t/ w⌉
2 〈 k 1, k 2, …, ku〉 = chop( k)
3 q = b
4 for i = 1 to u
5
6 return q mod m
This definition of a cryptographically inspired hash-function family
is meant to be realistic, yet only illustrative, and many variations and
improvements are possible. See the chapter notes for suggestions.
In summary, we see that when the memory system is hierarchical, it
becomes advantageous to use linear probing (a special case of double
hashing), since successive probes tend to stay in the same cache block.
Furthermore, hash functions that can be implemented using only the
computer’s fast registers are exceptionally efficient, so they can be quite
complex and even cryptographically inspired, providing the high degree
of independence needed for linear probing to work most efficiently.
Exercises
★ 11.5-1
Complete the argument that for any odd positive integer a and any
integer r ≥ 0, the function
is one-to-one. Use a proof by
contradiction and make use of the fact that the function fa works
modulo 2 w.
★ 11.5-2
Argue that a random oracle is 5-independent.
★ 11.5-3
Consider what happens to the value
as we flip a single bit ki of
the input value k, for various values of r. Let
and
define the bit values ki in the input (with k 0 the least-
significant bit) and the bit values bj in ga( k) = (2 k 2 + ak) mod 2 w (where ga( k) is the value that, when its halves are swapped, becomes fa( k)). Suppose that flipping a single bit ki of the input k may cause any bit bj of ga( k) to flip, for j ≥ i. What is the least value of r for which flipping the value of any single bit ki may cause any bit of the output to flip? Explain.
Problems
11-1 Longest-probe bound for hashing
Suppose you are using an open-addressed hash table of size m to store n
≤ m/2 items.
a. Assuming independent uniform permutation hashing, show that for i
= 1, 2, …, n, the probability is at most 2− p that the i th insertion requires strictly more than p probes.
b. Show that for i = 1, 2, …, n, the probability is O(1/ n 2) that the i th insertion requires more than 2 lg n probes.
Let the random variable Xi denote the number of probes required by the
i th insertion. You have shown in part (b) that Pr{ Xi > 2 lg n} = O(1/ n 2).
Let the random variable X = max { Xi : 1 ≤ i ≤ n} denote the maximum number of probes required by any of the n insertions.
c. Show that Pr{ X > 2 lg n} = O(1/ n).
d. Show that the expected length E[ X] of the longest probe sequence is O(lg n).
11-2 Searching a static set
You are asked to implement a searchable set of n elements in which the
keys are numbers. The set is static (no INSERT or DELETE
operations), and the only operation required is SEARCH. You are given

an arbitrary amount of time to preprocess the n elements so that
SEARCH operations run quickly.
a. Show how to implement SEARCH in O(lg n) worst-case time using no extra storage beyond what is needed to store the elements of the set
themselves.
b. Consider implementing the set by open-address hashing on m slots,
and assume independent uniform permutation hashing. What is the
minimum amount of extra storage m − n required to make the average
performance of an unsuccessful SEARCH operation be at least as
good as the bound in part (a)? Your answer should be an asymptotic
bound on m − n in terms of n.
11-3 Slot-size bound for chaining
Given a hash table with n slots, with collisions resolved by chaining, suppose that n keys are inserted into the table. Each key is equally likely
to be hashed to each slot. Let M be the maximum number of keys in
any slot after all the keys have been inserted. Your mission is to prove
an O(lg n / lg lg n) upper bound on E[ M], the expected value of M.
a. Argue that the probability Qk that exactly k keys hash to a particular slot is given by
b. Let Pk be the probability that M = k, that is, the probability that the slot containing the most keys contains k keys. Show that Pk ≤ nQk.
c. Show that Qk < ek/ kk. Hint: Use Stirling’s approximation, equation (3.25) on page 67.
d. Show that there exists a constant c > 1 such that
for k 0 =
c lg n / lg lg n. Conclude that Pk < 1/ n 2 for k ≥ k 0 = c lg n / lg lg n.
e. Argue that





Conclude that E[ M] = O(lg n / lg lg n).
11-4 Hashing and authentication
Let H be a family of hash functions in which each hash function h ∈
H maps the universe U of keys to {0, 1, …, m − 1}.
a. Show that if the family H of hash functions is 2-independent, then it
is universal.
b. Suppose that the universe U is the set of n-tuples of values drawn from ℤ p = {0, 1, …, p − 1}, where p is prime. Consider an element x =
〈 x 0, x 1, …, xn−1〉 ∈ U. For any n-tuple a = 〈 a 0, a 1, …, an−1〉 ∈ U, define the hash function ha by
Let H = { ha : a ∈ U}. Show that H is universal, but not 2-independent. ( Hint: Find a key for which all hash functions in H
produce the same value.)
c. Suppose that we modify H slightly from part (b): for any a ∈ U and for any b ∈ ℤ p, define
and
. Argue that H′ is 2-independent.
( Hint: Consider fixed n-tuples x ∈ U and y ∈ U, with xi ≠ yi for some i. What happens to
and
as ai and b range over ℤ p?)
d. Alice and Bob secretly agree on a hash function h from a 2-
independent family H of hash functions. Each h ∈ H maps from a
universe of keys U to ℤ p, where p is prime. Later, Alice sends a message m to Bob over the internet, where m ∈ U. She authenticates this message to Bob by also sending an authentication tag t = h( m), and Bob checks that the pair ( m, t) he receives indeed satisfies t =
h( m). Suppose that an adversary intercepts ( m, t) en route and tries to fool Bob by replacing the pair ( m, t) with a different pair ( m′, t′).
Argue that the probability that the adversary succeeds in fooling Bob
into accepting ( m′, t′) is at most 1/ p, no matter how much computing power the adversary has, even if the adversary knows the family H of
hash functions used.
Chapter notes
The books by Knuth [261] and Gonnet and Baeza-Yates [193] are excellent references for the analysis of hashing algorithms. Knuth
credits H. P. Luhn (1953) for inventing hash tables, along with the
chaining method for resolving collisions. At about the same time, G. M.
Amdahl originated the idea of open addressing. The notion of a
random oracle was introduced by Bellare et al. [43]. Carter and Wegman [80] introduced the notion of universal families of hash functions in 1979.
Dietzfelbinger et al. [113] invented the multiply-shift hash function and gave a proof of Theorem 11.5. Thorup [437] provides extensions and additional analysis. Thorup [438] gives a simple proof that linear probing with 5-independent hashing takes constant expected time per
operation. Thorup also describes the method for deletion in a hash table
using linear probing.
Fredman, Komlós, and Szemerédi [154] developed a perfect hashing
scheme for static sets—“perfect” because all collisions are avoided. An
extension of their method to dynamic sets, handling insertions and
deletions in amortized expected time O(1), has been given by
Dietzfelbinger et al. [114].
The wee hash function is based on the RC6 encryption algorithm
[379]. Leiserson et al. [292] propose an “RC6MIX” function that is essentially the same as the wee hash function. They give experimental
evidence that it has good randomness, and they also give a “DOTMIX”
function for dealing with variable-length inputs. Bellare et al. [42]
provide an analysis of the security of the cipher-block-chaining message
authentication code. This analysis implies that the wee hash function
has the desired pseudorandomness properties.
1 The definition of “average-case” requires care—are we assuming an input distribution over the keys, or are we randomizing the choice of hash function itself? We’ll consider both approaches, but with an emphasis on the use of a randomly chosen hash function.
2 In the literature, a ( c/ m)-universal hash function is sometimes called c-universal or c-
approximately universal. We’ll stick with the notation ( c/ m)-universal.
The search tree data structure supports each of the dynamic-set
operations listed on page 250: SEARCH, MINIMUM, MAXIMUM,
PREDECESSOR, SUCCESSOR, INSERT, and DELETE. Thus, you
can use a search tree both as a dictionary and as a priority queue.
Basic operations on a binary search tree take time proportional to
the height of the tree. For a complete binary tree with n nodes, such operations run in Θ(lg n) worst-case time. If the tree is a linear chain of
n nodes, however, the same operations take Θ( n) worst-case time. In
Chapter 13, we’ll see a variation of binary search trees, red-black trees, whose operations guarantee a height of O(lg n). We won’t prove it here, but if you build a binary search tree on a random set of n keys, its expected height is O(lg n) even if you don’t try to limit its height.
After presenting the basic properties of binary search trees, the
following sections show how to walk a binary search tree to print its
values in sorted order, how to search for a value in a binary search tree,
how to find the minimum or maximum element, how to find the
predecessor or successor of an element, and how to insert into or delete
from a binary search tree. The basic mathematical properties of trees
appear in Appendix B.
12.1 What is a binary search tree?
A binary search tree is organized, as the name suggests, in a binary tree,
as shown in Figure 12.1. You can represent such a tree with a linked
data structure, as in Section 10.3. In addition to a key and satellite data, each node object contains attributes left, right, and p that point to the nodes corresponding to its left child, its right child, and its parent,
respectively. If a child or the parent is missing, the appropriate attribute
contains the value NIL. The tree itself has an attribute root that points
to the root node, or NIL if the tree is empty. The root node T.root is the
only node in a tree T whose parent is NIL.
Figure 12.1 Binary search trees. For any node x, the keys in the left subtree of x are at most x.key, and the keys in the right subtree of x are at least x.key. Different binary search trees can represent the same set of values. The worst-case running time for most search-tree operations is proportional to the height of the tree. (a) A binary search tree on 6 nodes with height 2. The top figure shows how to view the tree conceptually, and the bottom figure shows the left, right, and p attributes in each node, in the style of Figure 10.6 on page 266. (b) A less efficient binary search tree, with height 4, that contains the same keys.
The keys in a binary search tree are always stored in such a way as to
satisfy the binary-search-tree property:
Let x be a node in a binary search tree. If y is a node in the left
subtree of x, then y.key ≤ x.key. If y is a node in the right subtree of x, then y.key ≥ x.key.
Thus, in Figure 12.1(a), the key of the root is 6, the keys 2, 5, and 5
in its left subtree are no larger than 6, and the keys 7 and 8 in its right
subtree are no smaller than 6. The same property holds for every node
in the tree. For example, looking at the root’s left child as the root of a
subtree, this subtree root has the key 5, the key 2 in its left subtree is no
larger than 5, and the key 5 in its right subtree is no smaller than 5.
Because of the binary-search-tree property, you can print out all the
keys in a binary search tree in sorted order by a simple recursive
algorithm, called an inorder tree walk, given by the procedure
INORDER-TREE-WALK. This algorithm is so named because it
prints the key of the root of a subtree between printing the values in its
left subtree and printing those in its right subtree. (Similarly, a preorder
tree walk prints the root before the values in either subtree, and a postorder tree walk prints the root after the values in its subtrees.) To print all the elements in a binary search tree T, call INORDER-TREE-WALK( T.root). For example, the inorder tree walk prints the keys in
each of the two binary search trees from Figure 12.1 in the order 2, 5, 5, 6, 7, 8. The correctness of the algorithm follows by induction directly
from the binary-search-tree property.
INORDER-TREE-WALK( x)
1 if x ≠ NIL
2
INORDER-TREE-WALK( x.left)
3
print x.key
4
INORDER-TREE-WALK( x.right)
It takes Θ( n) time to walk an n-node binary search tree, since after
the initial call, the procedure calls itself recursively exactly twice for
each node in the tree—once for its left child and once for its right child.
The following theorem gives a formal proof that it takes linear time to
perform an inorder tree walk.
Theorem 12.1
If x is the root of an n-node subtree, then the call INORDER-TREE-
WALK( x) takes Θ( n) time.
Proof Let T( n) denote the time taken by INORDER-TREE-WALK
when it is called on the root of an n-node subtree. Since INORDER-
TREE-WALK visits all n nodes of the subtree, we have T( n) = Ω( n). It remains to show that T( n) = O( n).
Since INORDER-TREE-WALK takes a small, constant amount of
time on an empty subtree (for the test x ≠ NIL), we have T(0) = c for some constant c > 0.
For n > 0, suppose that INORDER-TREE-WALK is called on a
node x whose left subtree has k nodes and whose right subtree has n − k
− 1 nodes. The time to perform INORDER-TREE-WALK( x) is
bounded by T( n) ≤ T( k) + T( n − k − 1) + d for some constant d > 0 that reflects an upper bound on the time to execute the body of INORDER-TREE-WALK( x), exclusive of the time spent in recursive calls.
We use the substitution method to show that T( n) = O( n) by proving that T( n) ≤ ( c + d) n + c. For n = 0, we have ( c + d) · 0 + c = c = T(0). For n > 0, we have
T( n) ≤ T( k) + T( n − k − 1) + d
≤ (( c + d) k + c) + (( c + d)( n − k − 1) + c) + d
= ( c + d) n + c − ( c + d) + c + d
= ( c + d) n + c,
which completes the proof.
▪
Exercises
12.1-1
For the set {1, 4, 5, 10, 16, 17, 21} of keys, draw binary search trees of
heights 2, 3, 4, 5, and 6.
12.1-2
What is the difference between the binary-search-tree property and the
min-heap property on page 163? Can the min-heap property be used to
print out the keys of an n-node tree in sorted order in O( n) time? Show how, or explain why not.
Give a nonrecursive algorithm that performs an inorder tree walk.
( Hint: An easy solution uses a stack as an auxiliary data structure. A
more complicated, but elegant, solution uses no stack but assumes that
you can test two pointers for equality.)
12.1-4
Give recursive algorithms that perform preorder and postorder tree
walks in Θ( n) time on a tree of n nodes.
12.1-5
Argue that since sorting n elements takes Ω( n lg n) time in the worst case in the comparison model, any comparison-based algorithm for
constructing a binary search tree from an arbitrary list of n elements takes Ω( n lg n) time in the worst case.
12.2 Querying a binary search tree
Binary search trees can support the queries MINIMUM, MAXIMUM,
SUCCESSOR, and PREDECESSOR, as well as SEARCH. This
section examines these operations and shows how to support each one
in O( h) time on any binary search tree of height h.
Searching
To search for a node with a given key in a binary search tree, call the
TREE-SEARCH procedure. Given a pointer x to the root of a subtree
and a key k, TREE-SEARCH( x, k) returns a pointer to a node with key k if one exists in the subtree; otherwise, it returns NIL. To search for key
k in the entire binary search tree T, call TREE-SEARCH( T.root, k).
TREE-SEARCH( x, k)
1 if x == NIL or k == x.key
2
return x
3 if k < x.key
4
return TREE-SEARCH( x.left, k)
5 else return TREE-SEARCH( x.right, k)
ITERATIVE-TREE-SEARCH( x, k)
1 while x ≠ NIL and k ≠ x.key
2
if k < x.key
3
x = x.left
4
else x = x.right
5 return x
The TREE-SEARCH procedure begins its search at the root and
traces a simple path downward in the tree, as shown in Figure 12.2(a).
For each node x it encounters, it compares the key k with x.key. If the two keys are equal, the search terminates. If k is smaller than x.key, the search continues in the left subtree of x, since the binary-search-tree property implies that k cannot reside in the right subtree. Symmetrically,
if k is larger than x.key, the search continues in the right subtree. The nodes encountered during the recursion form a simple path downward
from the root of the tree, and thus the running time of TREE-SEARCH
is O( h), where h is the height of the tree.
Figure 12.2 Queries on a binary search tree. Nodes and paths followed in each query are colored blue. (a) A search for the key 13 in the tree follows the path 15 → 6 → 7 → 13 from the root. (b) The minimum key in the tree is 2, which is found by following left pointers from the root. The maximum key 20 is found by following right pointers from the root. (c) The successor of the node with key 15 is the node with key 17, since it is the minimum key in the right subtree of 15.
(d) The node with key 13 has no right subtree, and thus its successor is its lowest ancestor whose left child is also an ancestor. In this case, the node with key 15 is its successor.
Since the TREE-SEARCH procedure recurses on either the left
subtree or the right subtree, but not both, we can rewrite the algorithm
to “unroll” the recursion into a while loop. On most computers, the
ITERATIVE-TREE-SEARCH procedure on the facing page is more
efficient.
Minimum and maximum
To find an element in a binary search tree whose key is a minimum, just follow left child pointers from the root until you encounter a NIL, as
shown in Figure 12.2(b). The TREE-MINIMUM procedure returns a
pointer to the minimum element in the subtree rooted at a given node x,
which we assume to be non-NIL.
TREE-MINIMUM( x)
1while x.left ≠ NIL
2
x = x.left
3return x
TREE-MAXIMUM( x)
1while x.right ≠ NIL
2
x = x.right
3return x
The binary-search-tree property guarantees that TREE-MINIMUM
is correct. If node x has no left subtree, then since every key in the right
subtree of x is at least as large as x.key, the minimum key in the subtree rooted at x is x.key. If node x has a left subtree, then since no key in the right subtree is smaller than x.key and every key in the left subtree is not
larger than x.key, the minimum key in the subtree rooted at x resides in the subtree rooted at x.left.
The pseudocode for TREE-MAXIMUM is symmetric. Both TREE-
MINIMUM and TREE-MAXIMUM run in O( h) time on a tree of
height h since, as in TREE-SEARCH, the sequence of nodes
encountered forms a simple path downward from the root.
Successor and predecessor
Given a node in a binary search tree, how can you find its successor in
the sorted order determined by an inorder tree walk? If all keys are
distinct, the successor of a node x is the node with the smallest key greater than x.key. Regardless of whether the keys are distinct, we define
the successor of a node as the next node visited in an inorder tree walk.
The structure of a binary search tree allows you to determine the
successor of a node without comparing keys. The TREE-SUCCESSOR
procedure on the facing page returns the successor of a node x in a binary search tree if it exists, or NIL if x is the last node that would be
visited during an inorder walk.
The code for TREE-SUCCESSOR has two cases. If the right subtree
of node x is nonempty, then the successor of x is just the leftmost node in x’s right subtree, which line 2 finds by calling TREE-MINIMUM( x.right). For example, the successor of the node with key
15 in Figure 12.2(c) is the node with key 17.
On the other hand, as Exercise 12.2-6 asks you to show, if the right
subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. In Figure 12.2(d), the successor of the node with key 13 is the node with key 15. To find y,
go up the tree from x until you encounter either the root or a node that
is the left child of its parent. Lines 4–8 of TREE-SUCCESSOR handle
this case.
TREE-SUCCESSOR( x)
1 if x.right ≠ NIL
2
return TREE-MINIMUM( x.right) // leftmost node in right
subtree
3 else // find the lowest ancestor of x whose left child is an ancestor of
x
4
y = x.p
5
while y ≠ NIL and x == y.right
6
x = y
7
y = y.p
8
return y
The running time of TREE-SUCCESSOR on a tree of height h is
O( h), since it either follows a simple path up the tree or follows a simple path down the tree. The procedure TREE-PREDECESSOR, which is
symmetric to TREE-SUCCESSOR, also runs in O( h) time.
In summary, we have proved the following theorem.
The dynamic-set operations SEARCH, MINIMUM, MAXIMUM,
SUCCESSOR, and PREDECESSOR can be implemented so that each
one runs in O( h) time on a binary search tree of height h.
▪
Exercises
12.2-1
You are searching for the number 363 in a binary search tree containing
numbers between 1 and 1000. Which of the following sequences cannot
be the sequence of nodes examined?
a. 2, 252, 401, 398, 330, 344, 397, 363.
b. 924, 220, 911, 244, 898, 258, 362, 363.
c. 925, 202, 911, 240, 912, 245, 363.
d. 2, 399, 387, 219, 266, 382, 381, 278, 363.
e. 935, 278, 347, 621, 299, 392, 358, 363.
12.2-2
Write recursive versions of TREE-MINIMUM and TREE-
MAXIMUM.
12.2-3
Write the TREE-PREDECESSOR procedure.
12.2-4
Professor Kilmer claims to have discovered a remarkable property of
binary search trees. Suppose that the search for key k in a binary search
tree ends up at a leaf. Consider three sets: A, the keys to the left of the
search path; B, the keys on the search path; and C, the keys to the right of the search path. Professor Kilmer claims that any three keys a ∈ A, b
∈ B, and c ∈ C must satisfy a ≤ b ≤ c. Give a smallest possible counterexample to the professor’s claim.
12.2-5
Show that if a node in a binary search tree has two children, then its successor has no left child and its predecessor has no right child.
12.2-6
Consider a binary search tree T whose keys are distinct. Show that if the
right subtree of a node x in T is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. (Recall that every node is its own ancestor.)
12.2-7
An alternative method of performing an inorder tree walk of an n-node
binary search tree finds the minimum element in the tree by calling
TREE-MINIMUM and then making n − 1 calls to TREE-
SUCCESSOR. Prove that this algorithm runs in Θ( n) time.
12.2-8
Prove that no matter what node you start at in a height- h binary search
tree, k successive calls to TREE-SUCCESSOR take O( k + h) time.
12.2-9
Let T be a binary search tree whose keys are distinct, let x be a leaf node, and let y be its parent. Show that y.key is either the smallest key in T larger than x.key or the largest key in T smaller than x.key.
The operations of insertion and deletion cause the dynamic set
represented by a binary search tree to change. The data structure must
be modified to reflect this change, but in such a way that the binary-
search-tree property continues to hold. We’ll see that modifying the tree
to insert a new element is relatively straightforward, but deleting a node
from a binary search tree is more complicated.
Insertion
The TREE-INSERT procedure inserts a new node into a binary search
tree. The procedure takes a binary search tree T and a node z for which
z.key has already been filled in, z.left = NIL, and z.right = NIL. It modifies T and some of the attributes of z so as to insert z into an appropriate position in the tree.
TREE-INSERT( T, z)
1 x = T.root
// node being compared with z
2 y = NIL
// y will be parent of z
3 while x ≠ NIL
// descend until reaching a leaf
4
y = x
5
if z.key < x.key
6
x = x.left
7
else x = x.right
8 z.p = y
// found the location—insert z with parent y
9 if y == NIL
10
T.root = z
// tree T was empty
11 elseif z.key < y.key
12
y.left = z
13 else y.right = z
Figure 12.3 shows how TREE-INSERT works. Just like the
procedures TREE-SEARCH and ITERATIVE-TREE-SEARCH,
TREE-INSERT begins at the root of the tree and the pointer x traces a
simple path downward looking for a NIL to replace with the input node
z. The procedure maintains the trailing pointer y as the parent of x.
After initialization, the while loop in lines 3–7 causes these two pointers
to move down the tree, going left or right depending on the comparison
of z.key with x.key, until x becomes NIL. This NIL occupies the position where node z will go. More precisely, this NIL is a left or right attribute of the node that will become z’s parent, or it is T.root if tree T
is currently empty. The procedure needs the trailing pointer y, because
by the time it finds the NIL where z belongs, the search has proceeded
one step beyond the node that needs to be changed. Lines 8–13 set the
pointers that cause z to be inserted.
Figure 12.3 Inserting a node with key 13 into a binary search tree. The simple path from the root down to the position where the node is inserted is shown in blue. The new node and the link to its parent are highlighted in orange.