element searched for is equally likely to be any one of the elements in

the table, so the longer the list, the more likely that the search is for one

of its elements. Even so, the expected search time still turns out to be

Θ(1 + α).

Theorem 11.2

In a hash table in which collisions are resolved by chaining, a successful

search takes Θ(1 + α) time on average, under the assumption of

independent uniform hashing.

Proof We assume that the element being searched for is equally likely

to be any of the n elements stored in the table. The number of elements

examined during a successful search for an element x is 1 more than the

number of elements that appear before x in x’s list. Because new elements are placed at the front of the list, elements before x in the list

were all inserted after x was inserted. Let xi denote the i th element inserted into the table, for i = 1, 2, …, n, and let ki = xi.key.

Our analysis uses indicator random variables extensively. For each

slot q in the table and for each pair of distinct keys ki and kj, we define the indicator random variable

Xijq = I {the search is for xi, h( ki) = q, and h( kj) = q}.

Image 398

Image 399

Image 400

That is, Xijq = 1 when keys ki and kj collide at slot q and the search is for element xi. Because Pr{the search is for xi} = 1/ n, Pr{ h( ki) = q} =

1/ m, Pr{ h( kj) = q} = 1/ m, and these events are all independent, we have that Pr{ Xijq = 1} = 1/ nm 2. Lemma 5.1 on page 130 gives E[ Xijq] =

1/ nm 2.

Next, we define, for each element xj, the indicator random variable

Yj = I { xj appears in a list prior to the element being searched for}

=

,

since at most one of the Xijq equals 1, namely when the element xi being searched for belongs to the same list as xj (pointed to by slot q), and i < j (so that xi appears after xj in the list).

Our final random variable is Z, which counts how many elements

appear in the list prior to the element being searched for:

Because we must count the element being searched for as well as all

those preceding it in its list, we wish to compute E[ Z + 1]. Using linearity of expectation (equation (C.24) on page 1192), we have

Thus, the total time required for a successful search (including the time for computing the hash function) is Θ(2 + α/ 2 − α/ 2 n) = Θ(1 + α).

What does this analysis mean? If the number of elements in the table

is at most proportional to the number of hash-table slots, we have n =

O( m) and, consequently, α = n/ m = O( m)/ m = O(1). Thus, searching takes constant time on average. Since insertion takes O(1) worst-case

time and deletion takes O(1) worst-case time when the lists are doubly

linked (assuming that the list element to be deleted is known, and not

just its key), we can support all dictionary operations in O(1) time on

average.

The analysis in the preceding two theorems depends only on two

essential properties of independent uniform hashing: uniformity (each

key is equally likely to hash to any one of the m slots), and

independence (so any two distinct keys collide with probability 1/ m).

Exercises

11.2-1

You use a hash function h to hash n distinct keys into an array T of length m. Assuming independent uniform hashing, what is the expected

number of collisions? More precisely, what is the expected cardinality of

{{ k 1, k 2} : k 1 ≠ k 2 and h( k 1) = h( k 2)}?

11.2-2

Consider a hash table with 9 slots and the hash function h( k) = k mod 9.

Demonstrate what happens upon inserting the keys 5, 28, 19, 15, 20, 33,

12, 17, 10 with collisions resolved by chaining.

11.2-3

Professor Marley hypothesizes that he can obtain substantial

performance gains by modifying the chaining scheme to keep each list

in sorted order. How does the professor’s modification affect the

running time for successful searches, unsuccessful searches, insertions,

and deletions?

11.2-4

Suggest how to allocate and deallocate storage for elements within the

hash table itself by creating a “free list”: a linked list of all the unused

slots. Assume that one slot can store a flag and either one element plus a

pointer or two pointers. All dictionary and free-list operations should

run in O(1) expected time. Does the free list need to be doubly linked, or

does a singly linked free list suffice?

11.2-5

You need to store a set of n keys in a hash table of size m. Show that if the keys are drawn from a universe U with | U| > ( n − 1) m, then U has a subset of size n consisting of keys that all hash to the same slot, so that

the worst-case searching time for hashing with chaining is Θ( n).

11.2-6

You have stored n keys in a hash table of size m, with collisions resolved by chaining, and you know the length of each chain, including the

length L of the longest chain. Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns

it in expected time O( L · (1 + 1/ α)).

11.3 Hash functions

For hashing to work well, it needs a good hash function. Along with

being efficiently computable, what properties does a good hash function

have? How do you design good hash functions?

This section first attempts to answer these questions based on two ad

hoc approaches for creating hash functions: hashing by division and

hashing by multiplication. Although these methods work well for some

sets of input keys, they are limited because they try to provide a single

fixed hash function that works well on any data—an approach called

static hashing.

We then see that provably good average-case performance for any

data can be obtained by designing a suitable family of hash functions and choosing a hash function at random from this family at runtime,

independent of the data to be hashed. The approach we examine is

called random hashing. A particular kind of random hashing, universal

hashing, works well. As we saw with quicksort in Chapter 7,

randomization is a powerful algorithmic design tool.

What makes a good hash function?

A good hash function satisfies (approximately) the assumption of

independent uniform hashing: each key is equally likely to hash to any

of the m slots, independently of where any other keys have hashed to.

What does “equally likely” mean here? If the hash function is fixed, any

probabilities would have to be based on the probability distribution of

the input keys.

Unfortunately, you typically have no way to check this condition,

unless you happen to know the probability distribution from which the

keys are drawn. Moreover, the keys might not be drawn independently.

Occasionally you might know the distribution. For example, if you

know that the keys are random real numbers k independently and

uniformly distributed in the range 0 ≤ k < 1, then the hash function

h( k) = ⌊ km

satisfies the condition of independent uniform hashing.

A good static hashing approach derives the hash value in a way that

you expect to be independent of any patterns that might exist in the

data. For example, the “division method” (discussed in Section 11.3.1) computes the hash value as the remainder when the key is divided by a

specified prime number. This method may give good results, if you

(somehow) choose a prime number that is unrelated to any patterns in

the distribution of keys.

Random hashing, described in Section 11.3.2, picks the hash function to be used at random from a suitable family of hashing

functions. This approach removes any need to know anything about the

probability distribution of the input keys, as the randomization

necessary for good average-case behavior then comes from the (known)

random process used to pick the hash function from the family of hash

functions, rather than from the (unknown) process used to create the

input keys. We recommend that you use random hashing.

Keys are integers, vectors, or strings

In practice, a hash function is designed to handle keys that are one of

the following two types:

A short nonnegative integer that fits in a w-bit machine word.

Typical values for w would be 32 or 64.

A short vector of nonnegative integers, each of bounded size. For

example, each element might be an 8-bit byte, in which case the

vector is often called a (byte) string. The vector might be of

variable length.

To begin, we assume that keys are short nonnegative integers. Handling

vector keys is more complicated and discussed in Sections 11.3.5 and

11.5.2.

11.3.1 Static hashing

Static hashing uses a single, fixed hash function. The only

randomization available is through the (usually unknown) distribution

of input keys. This section discusses two standard approaches for static

hashing: the division method and the multiplication method. Although

static hashing is no longer recommended, the multiplication method

also provides a good foundation for “nonstatic” hashing—better known

as random hashing—where the hash function is chosen at random from

a suitable family of hash functions.

The division method

The division method for creating hash functions maps a key k into one of m slots by taking the remainder of k divided by m. That is, the hash function is

h( k) = k mod m.

For example, if the hash table has size m = 12 and the key is k = 100,

then h( k) = 4. Since it requires only a single division operation, hashing by division is quite fast.

Image 401

The division method may work well when m is a prime not too close

to an exact power of 2. There is no guarantee that this method provides

good average-case performance, however, and it may complicate

applications since it constrains the size of the hash tables to be prime.

The multiplication method

The general multiplication method for creating hash functions operates

in two steps. First, multiply the key k by a constant A in the range 0 < A

< 1 and extract the fractional part of kA. Then, multiply this value by m and take the floor of the result. That is, the hash function is

h( k) = ⌊ m ( kA mod 1)⌊,

where “kA mod 1” means the fractional part of kA, that is, kA − ⌊ kA⌊.

The general multiplication method has the advantage that the value of

m is not critical and you can choose it independently of how you choose

the multiplicative constant A.

Figure 11.4 The multiply-shift method to compute a hash function. The w-bit representation of the key k is multiplied by the w-bit value a = A · 2 w. The highest-order bits of the lower w-bit half of the product form the desired hash value ha( k).

The multiply-shift method

In practice, the multiplication method is best in the special case where

the number m of hash-table slots is an exact power of 2, so that m = 2

for some integer , where w and w is the number of bits in a machine

Image 402

word. If you choose a fixed w-bit positive integer a = A 2 w, where 0 < A

< 1 as in the multiplication method so that a is in the range 0 < a < 2 w, you can implement the function on most computers as follows. We

assume that a key k fits into a single w-bit word.

Referring to Figure 11.4, first multiply k by the w-bit integer a. The result is a 2 w-bit value r 12 w + r 0, where r 1 is the high-order w-bit word of the product and r 0 is the low-order w-bit word of the product. The

desired -bit hash value consists of the most significant bits of r 0.

(Since r 1 is ignored, the hash function can be implemented on a

computer that produces only a w-bit product given two w-bit inputs, that is, where the multiplication operation computes modulo 2 w.)

In other words, you define the hash function h = ha, where

for a fixed nonzero w-bit value a. Since the product ka of two w-bit words occupies 2 w bits, taking this product modulo 2 w zeroes out the

high-order w bits ( r 1), leaving only the low-order w bits ( r 0). The ⋙

operator performs a logical right shift by w bits, shifting zeros into the vacated positions on the left, so that the most significant bits of r 0

move into the rightmost positions. (It’s the same as dividing by 2 w

and taking the floor of the result.) The resulting value equals the most

significant bits of r 0. The hash function ha can be implemented with three machine instructions: multiplication, subtraction, and logical right

shift.

As an example, suppose that k = 123456, = 14, m = 214 = 16384, and w = 32. Suppose further that we choose a = 2654435769 (following

a suggestion of Knuth [261]). Then ka = 327706022297664 = (76300 ·

232) + 17612864, and so r 1 = 76300 and r 0 = 17612864. The 14 most

significant bits of r 0 yield the value ha( k) = 67.

Even though the multiply-shift method is fast, it doesn’t provide any

guarantee of good average-case performance. The universal hashing

approach presented in the next section provides such a guarantee. A simple randomized variant of the multiply-shift method works well on

the average, when the program begins by picking a as a randomly

chosen odd integer.

11.3.2 Random hashing

Suppose that a malicious adversary chooses the keys to be hashed by

some fixed hash function. Then the adversary can choose n keys that all

hash to the same slot, yielding an average retrieval time of Θ( n). Any static hash function is vulnerable to such terrible worst-case behavior.

The only effective way to improve the situation is to choose the hash

function randomly in a way that is independent of the keys that are actually going to be stored. This approach is called random hashing. A

special case of this approach, called universal hashing, can yield provably

good performance on average when collisions are handled by chaining,

no matter which keys the adversary chooses.

To use random hashing, at the beginning of program execution you

select the hash function at random from a suitable family of functions.

As in the case of quicksort, randomization guarantees that no single

input always evokes worst-case behavior. Because you randomly select

the hash function, the algorithm can behave differently on each

execution, even for the same set of keys to be hashed, guaranteeing

good average-case performance.

Let H be a finite family of hash functions that map a given universe

U of keys into the range {0, 1, …, m − 1}. Such a family is said to be

universal if for each pair of distinct keys k 1, k 2 ∈ U, the number of hash functions h ∈ H for which h( k 1) = h( k 2) is at most |H|/ m. In other words, with a hash function randomly chosen from H, the chance of a

collision between distinct keys k 1 and k 2 is no more than the chance 1/ m of a collision if h( k 1) and h( k 2) were randomly and independently chosen from the set {0, 1, …, m − 1}.

Independent uniform hashing is the same as picking a hash function

uniformly at random from a family of mn hash functions, each member

of that family mapping the n keys to the m hash values in a different way.Every independent uniform random family of hash function is

universal, but the converse need not be true: consider the case where U

= {0, 1, …, m − 1} and the only hash function in the family is the identity function. The probability that two distinct keys collide is zero,

even though each key is hashes to a fixed value.

The following corollary to Theorem 11.2 on page 279 says that

universal hashing provides the desired payoff: it becomes impossible for

an adversary to pick a sequence of operations that forces the worst-case

running time.

Corollary 11.3

Using universal hashing and collision resolution by chaining in an

initially empty table with m slots, it takes Θ( s) expected time to handle any sequence of s INSERT, SEARCH, and DELETE operations

containing n = O( m) INSERT operations.

Proof The INSERT and DELETE operations take constant time.

Since the number n of insertions is O( m), we have that α = O(1).

Furthermore, the expected time for each SEARCH operation is O(1),

which can be seen by examining the proof of Theorem 11.2. That

analysis depends only on collision probabilities, which are 1/ m for any

pair k 1, k 2 of keys by the choice of an independent uniform hash function in that theorem. Using a universal family of hash functions

here instead of using independent uniform hashing changes the

probability of collision from 1/ m to at most 1/ m. By linearity of expectation, therefore, the expected time for the entire sequence of s

operations is O( s). Since each operation takes Ω(1) time, the Θ( s) bound follows.

11.3.3 Achievable properties of random hashing

There is a rich literature on the properties a family H of hash functions

can have, and how they relate to the efficiency of hashing. We

summarize a few of the most interesting ones here.

Let H be a family of hash functions, each with domain U and range

{0, 1, …, m − 1}, and let h be any hash function that is picked uniformly at random from H. The probabilities mentioned are probabilities over

the picks of h.

The family H is uniform if for any key k in U and any slot q in the range {0, 1, …, m − 1}, the probability that h( k) = q is 1/ m.

The family H is universal if for any distinct keys k 1 and k 2 in U, the probability that h( k 1) = h( k 2) is at most 1/ m.

The family H of hash functions is ϵ-universal if for any distinct

keys k 1 and k 2 in U, the probability that h( k 1) = h( k 2) is at most ϵ. Therefore, a universal family of hash functions is also 1/ m-

universal.2

The family H is d-independent if for any distinct keys k 1, k 2, …, kd in U and any slots q 1, q 2, …, qd, not necessarily distinct, in {0, 1, …, m − 1} the probability that h( ki) = qi for i = 1, 2, …, d is 1/ md.

Universal hash-function families are of particular interest, as they are

the simplest type supporting provably efficient hash-table operations for

any input data set. Many other interesting and desirable properties, such

as those noted above, are also possible and allow for efficient specialized

hash-table operations.

11.3.4 Designing a universal family of hash functions

This section present two ways to design a universal (or ϵ-universal)

family of hash functions: one based on number theory and another

based on a randomized variant of the multiply-shift method presented

in Section 11.3.1. The first method is a bit easier to prove universal, but the second method is newer and faster in practice.

A universal family of hash functions based on number theory

Image 403

Image 404

Image 405

Image 406

We can design a universal family of hash functions using a little number

theory. You may wish to refer to Chapter 31 if you are unfamiliar with basic concepts in number theory.

Begin by choosing a prime number p large enough so that every

possible key k lies in the range 0 to p − 1, inclusive. We assume here that p has a “reasonable” length. (See Section 11.3.5 for a discussion of methods for handling long input keys, such as variable-length strings.)

Let ℤ p denote the set {0, 1, …, p − 1}, and let denote the set {1, 2,

…, p − 1}. Since p is prime, we can solve equations modulo p with the methods given in Chapter 31. Because the size of the universe of keys is greater than the number of slots in the hash table (otherwise, just use

direct addressing), we have p > m.

Given any

and any b ∈ ℤ p, define the hash function hab as a

linear transformation followed by reductions modulo p and then

modulo m:

For example, with p = 17 and m = 6, we have

h 3,4(8) = ((3 · 8 + 4) mod 17) mod 6

= (28 mod 17) mod 6

= 11 mod 6

= 5.

Given p and m, the family of all such hash functions is

Each hash function hab maps ℤ p to ℤ m. This family of hash functions has the nice property that the size m of the output range (which is the

size of the hash table) is arbitrary—it need not be prime. Since you can

choose from among p − 1 values for a and p values for b, the family H pm contains p( p − 1) hash functions.

Theorem 11.4

Image 407

The family H pm of hash functions defined by equations (11.3) and

(11.4) is universal.

Proof Consider two distinct keys k 1 and k 2 from ℤ p, so that k 1 ≠ k 2.

For a given hash function hab, let

r 1 = ( ak 1 + b) mod p,

r 2 = ( ak 2 + b) mod p.

We first note that r 1 ≠ r 2. Why? Since we have r 1 − r 2 = a( k 1 − k 2) (mod p), it follows that r 1 ≠ r 2 because p is prime and both a and ( k 1 − k 2) are nonzero modulo p. By Theorem 31.6 on page 908, their product must

also be nonzero modulo p. Therefore, when computing any hab

H pm, distinct inputs k 1 and k 2 map to distinct values r 1 and r 2

modulo p, and there are no collisions yet at the “mod p level.”

Moreover, each of the possible p( p − 1) choices for the pair ( a, b) with a

≠ 0 yields a different resulting pair ( r 1, r 2) with r 1 ≠ r 2, since we can solve for a and b given r 1 and r 2:

a = (( rr 2)(( k 1 − k 2)−1 mod p)) mod p, b = ( r 1 − ak 1) mod p,

where (( k 1 − k 2)−1 mod p) denotes the unique multiplicative inverse, modulo p, of k 1 − k 2. For each of the p possible values of r 1, there are only p − 1 possible values of r 2 that do not equal r 1, making only p( p

1) possible pairs ( r 1, r 2) with r 1 ≠ r 2. Therefore, there is a one-to-one correspondence between pairs ( a, b) with a ≠ 0 and pairs ( r 1, r 2) with r 1

r 2. Thus, for any given pair of distinct inputs k 1 and k 2, if we pick ( a, b) uniformly at random from

, the resulting pair ( r 1, r 2) is

equally likely to be any pair of distinct values modulo p.

Therefore, the probability that distinct keys k 1 and k 2 collide is equal to the probability that r 1 = r 2 (mod m) when r 1 and r 2 are randomly

Image 408

Image 409

chosen as distinct values modulo p. For a given value of r 1, of the p − 1

possible remaining values for r 2, the number of values r 2 such that r 2 ≠

r 1 and r 2 = r 1 (mod m) is at most

The probability that r 2 collides with r 1 when reduced modulo m is at most (( p − 1)/ m)/( p − 1) = 1/ m, since r 2 is equally likely to be any of the p − 1 values in Zp that are different from r 1, but at most ( p − 1)/ m of those values are equivalent to r 1 modulo m.

Therefore, for any pair of distinct values k 1, k 2 ∈ ℤ p,

Pr{ hab( k 1) = hab( k 2)} ≤ 1/ m,

so that H pm is indeed universal.

A 2/ m-universal family of hash functions based on the multiply-shift

method

We recommend that in practice you use the following hash-function

family based on the multiply-shift method. It is exceptionally efficient

and (although we omit the proof) provably 2/ m-universal. Define H to

be the family of multiply-shift hash functions with odd constants a:

Theorem 11.5

The family of hash functions H given by equation (11.5) is 2/ m-

universal.

That is, the probability that any two distinct keys collide is at most

2/ m. In many practical situations, the speed of computing the hash

function more than compensates for the higher upper bound on the probability that two distinct keys collide when compared with a

universal hash function.

11.3.5 Hashing long inputs such as vectors or strings

Sometimes hash function inputs are so long that they cannot be easily

encoded modulo a reasonably sized prime number p or encoded within

a single word of, say, 64 bits. As an example, consider the class of

vectors, such as vectors of 8-bit bytes (which is how strings in many

programming languages are stored). A vector might have an arbitrary

nonnegative length, in which case the length of the input to the hash

function may vary from input to input.

Number-theoretic approaches

One way to design good hash functions for variable-length inputs is to

extend the ideas used in Section 11.3.4 to design universal hash functions. Exercise 11.3-6 explores one such approach.

Cryptographic hashing

Another way to design a good hash function for variable-length inputs

is to use a hash function designed for cryptographic applications.

Cryptographic hash functions are complex pseudorandom functions,

designed for applications requiring properties beyond those needed

here, but are robust, widely implemented, and usable as hash functions

for hash tables.

A cryptographic hash function takes as input an arbitrary byte string

and returns a fixed-length output. For example, the NIST standard

deterministic cryptographic hash function SHA-256 [346] produces a 256-bit (32-byte) output for any input.

Some chip manufacturers include instructions in their CPU

architectures to provide fast implementations of some cryptographic

functions. Of particular interest are instructions that efficiently

implement rounds of the Advanced Encryption Standard (AES), the

“AES-NI” instructions. These instructions execute in a few tens of

nanoseconds, which is generally fast enough for use with hash tables. A

message authentication code such as CBC-MAC based on AES and the

use of the AES-NI instructions could be a useful and efficient hash

function. We don’t pursue the potential use of specialized instruction

sets further here.

Cryptographic hash functions are useful because they provide a way

of implementing an approximate version of a random oracle. As noted

earlier, a random oracle is equivalent to an independent uniform hash

function family. From a theoretical point of view, a random oracle is an

unachievable ideal: a deterministic function that provides a randomly

selected output for each input. Because it is deterministic, it provides

the same output if queried again for the same input. From a practical

point of view, constructions of hash function families based on

cryptographic hash functions are sensible substitutes for random

oracles.

There are many ways to use a cryptographic hash function as a hash

function. For example, we could define

h( k) = SHA-256( k) mod m.

To define a family of such hash functions one may prepend a “salt”

string a to the input before hashing it, as in

ha( k) = SHA-256( ak) mod m,

where ak denotes the string formed by concatenating the strings a and k. The literature on message authentication codes (MACs) provides

additional approaches.

Cryptographic approaches to hash-function design are becoming

more practical as computers arrange their memories in hierarchies of

differing capacities and speeds. Section 11.5 discusses one hash-function design based on the RC6 encryption method.

Exercises

11.3-1

You wish to search a linked list of length n, where each element contains

a key k along with a hash value h( k). Each key is a long character string.

Image 410

How might you take advantage of the hash values when searching the

list for an element with a given key?

11.3-2

You hash a string of r characters into m slots by treating it as a radix-128 number and then using the division method. You can represent the

number m as a 32-bit computer word, but the string of r characters, treated as a radix-128 number, takes many words. How can you apply

the division method to compute the hash value of the character string

without using more than a constant number of words of storage outside

the string itself?

11.3-3

Consider a version of the division method in which h( k) = k mod m, where m = 2 p − 1 and k is a character string interpreted in radix 2 p.

Show that if string x can be converted to string y by permuting its characters, then x and y hash to the same value. Give an example of an

application in which this property would be undesirable in a hash

function.

11.3-4

Consider a hash table of size m = 1000 and a corresponding hash

function h( k) = ⌊ m ( kA mod 1)⌊ for

. Compute the

locations to which the keys 61, 62, 63, 64, and 65 are mapped.

11.3-5

Show that any ϵ-universal family H of hash functions from a finite set

U to a finite set Q has ϵ ≥ 1/| Q| − 1/| U|.

11.3-6

Let U be the set of d-tuples of values drawn from ℤ p, and let Q = ℤ p, where p is prime. Define the hash function hb : UQ for b ∈ ℤ p on an input d-tuple 〈 a 0, a 1, …, ad−1〉 from U as

Image 411

and let H = { hb : b ∈ ℤ p}. Argue that H is ϵ-universal for ϵ = ( d

1)/ p. ( Hint: See Exercise 31.4-4.)

11.4 Open addressing

This section describes open addressing, a method for collision

resolution that, unlike chaining, does not make use of storage outside of

the hash table itself. In open addressing, all elements occupy the hash table itself. That is, each table entry contains either an element of the

dynamic set or NIL. No lists or elements are stored outside the table,

unlike in chaining. Thus, in open addressing, the hash table can “fill up”

so that no further insertions can be made. One consequence is that the

load factor α can never exceed 1.

Collisions are handled as follows: when a new element is to be

inserted into the table, it is placed in its “first-choice” location if

possible. If that location is already occupied, the new element is placed

in its “second-choice” location. The process continues until an empty

slot is found in which to place the new element. Different elements have

different preference orders for the locations.

To search for an element, systematically examine the preferred table

slots for that element, in order of decreasing preference, until either you

find the desired element or you find an empty slot and thus verify that

the element is not in the table.

Of course, you could use chaining and store the linked lists inside the

hash table, in the otherwise unused hash-table slots (see Exercise 11.2-

4), but the advantage of open addressing is that it avoids pointers

altogether. Instead of following pointers, you compute the sequence of

slots to be examined. The memory freed by not storing pointers

provides the hash table with a larger number of slots in the same

amount of memory, potentially yielding fewer collisions and faster

retrieval.

To perform insertion using open addressing, successively examine, or

probe, the hash table until you find an empty slot in which to put the

key. Instead of being fixed in the order 0, 1, …, m − 1 (which implies a

Θ( n) search time), the sequence of positions probed depends upon the

key being inserted. To determine which slots to probe, the hash function

includes the probe number (starting from 0) as a second input. Thus, the

hash function becomes

h : U × {0, 1, …, m − 1} → {0, 1, …, m − 1}.

Open addressing requires that for every key k, the probe sequenceh( k, 0), h( k, 1), …, h( k, m − 1)〉 be a permutation of 〈0, 1, …, m − 1〉, so that every hash-table position is eventually considered as a slot for a new key

as the table fills up. The HASH-INSERT procedure on the following

page assumes that the elements in the hash table T are keys with no satellite information: the key k is identical to the element containing key

k. Each slot contains either a key or NIL (if the slot is empty). The HASH-INSERT procedure takes as input a hash table T and a key

k that is assumed to be not already present in the hash table. It either

returns the slot number where it stores key k or flags an error because

the hash table is already full.

HASH-INSERT( T, k)

1 i = 0

2 repeat

3

q = h( k, i)

4

if T[ q] == NIL

5

T[ q] = k

6

return q

7

else i = i + 1

8 until i == m

9 error “hash table overflow”

HASH-SEARCH( T, k)

1 i = 0

2 repeat

3

q = h( k, i)

4

if T[ q] == k

5

return q

6

i = i + 1

7 until T[ q] == NIL or i == m

8 return NIL

The algorithm for searching for key k probes the same sequence of

slots that the insertion algorithm examined when key k was inserted.

Therefore, the search can terminate (unsuccessfully) when it finds an

empty slot, since k would have been inserted there and not later in its

probe sequence. The procedure HASH-SEARCH takes as input a hash

table T and a key k, returning q if it finds that slot q contains key k, or NIL if key k is not present in table T.

Deletion from an open-address hash table is tricky. When you delete

a key from slot q, it would be a mistake to mark that slot as empty by

simply storing NIL in it. If you did, you might be unable to retrieve any

key k for which slot q was probed and found occupied when k was inserted. One way to solve this problem is by marking the slot, storing

in it the special value DELETED instead of NIL. The HASH-INSERT

procedure then has to treat such a slot as empty so that it can insert a

new key there. The HASH-SEARCH procedure passes over DELETED

values while searching, since slots containing DELETED were filled

when the key being searched for was inserted. Using the special value

DELETED, however, means that search times no longer depend on the

load factor α, and for this reason chaining is frequently selected as a collision resolution technique when keys must be deleted. There is a

simple special case of open addressing, linear probing, that avoids the

need to mark slots with DELETED. Section 11.5.1 shows how to delete from a hash table when using linear probing.

In our analysis, we assume independent uniform permutation hashing

(also confusingly known as uniform hashing in the literature): the probe

sequence of each key is equally likely to be any of the m! permutations

of 〈0, 1, …, m − 1〉. Independent uniform permutation hashing

generalizes the notion of independent uniform hashing defined earlier to

a hash function that produces not just a single slot number, but a whole probe sequence. True independent uniform permutation hashing is

difficult to implement, however, and in practice suitable approximations

(such as double hashing, defined below) are used.

We’ll examine both double hashing and its special case, linear

probing. These techniques guarantee that 〈 h( k, 0), h( k, 1), …, h( k, m

1)〉 is a permutation of 〈0, 1, …, m − 1〉 for each key k. (Recall that the second parameter to the hash function h is the probe number.) Neither

double hashing nor linear probing meets the assumption of independent

uniform permutation hashing, however. Double hashing cannot

generate more than m 2 different probe sequences (instead of the m! that independent uniform permutation hashing requires). Nonetheless,

double hashing has a large number of possible probe sequences and, as

you might expect, seems to give good results. Linear probing is even

more restricted, capable of generating only m different probe sequences.

Double hashing

Double hashing offers one of the best methods available for open

addressing because the permutations produced have many of the

characteristics of randomly chosen permutations. Double hashing uses a

hash function of the form

h( k, i) = ( h 1( k) + ih 2( k)) mod m, where both h 1 and h 2 are auxiliary hash functions. The initial probe goes to position T[ h 1( k)], and successive probe positions are offset from previous positions by the amount h 2( k), modulo m. Thus, the probe sequence here depends in two ways upon the key k, since the initial probe position h 1( k), the step size h 2( k), or both, may vary. Figure 11.5

gives an example of insertion by double hashing.

In order for the entire hash table to be searched, the value h 2( k) must be relatively prime to the hash-table size m. (See Exercise 11.4-5.) A convenient way to ensure this condition is to let m be an exact power of

2 and to design h 2 so that it always produces an odd number. Another

Image 412

way is to let m be prime and to design h 2 so that it always returns a positive integer less than m. For example, you could choose m prime and let

Figure 11.5 Insertion by double hashing. The hash table has size 13 with h 1( k) = k mod 13 and h 2( k) = 1 + ( k mod 11). Since 14 = 1 (mod 13) and 14 = 3 (mod 11), the key 14 goes into empty slot 9, after slots 1 and 5 are examined and found to be occupied.

h 1( k) = k mod m,

h 2( k) = 1 + ( k mod m′),

where m′ is chosen to be slightly less than m (say, m − 1). For example, if k = 123456, m = 701, and m′ = 700, then h 1( k) = 80 and h 2( k) = 257, so that the first probe goes to position 80, and successive probes examine

every 257th slot (modulo m) until the key has been found or every slot

has been examined.

Although values of m other than primes or exact powers of 2 can in

principle be used with double hashing, in practice it becomes more

difficult to efficiently generate h 2( k) (other than choosing h 2( k) = 1, which gives linear probing) in a way that ensures that it is relatively

prime to m, in part because the relative density ϕ( m)/ m of such numbers for general m may be small (see equation (31.25) on page 921).

Image 413

When m is prime or an exact power of 2, double hashing produces

Θ( m 2) probe sequences, since each possible ( h 1( k), h 2( k)) pair yields a distinct probe sequence. As a result, for such values of m, double

hashing appears to perform close to the “ideal” scheme of independent

uniform permutation hashing.

Linear probing

Linear probing, a special case of double hashing, is the simplest open-

addressing approach to resolving collisions. As with double hashing, an

auxiliary hash function h 1 determines the first probe position h 1( k) for inserting an element. If slot T[ h 1( k)] is already occupied, probe the next position T[ h 1( k) + 1]. Keep going as necessary, on up to slot T[ m − 1], and then wrap around to slots T[0], T[1], and so on, but never going past slot T[ h 1( k) − 1]. To view linear probing as a special case of double hashing, just set the double-hashing step function h 2 to be fixed at 1:

h 2( k) = 1 for all k. That is, the hash function is

for i = 0, 1, …, m − 1. The value of h 1( k) determines the entire probe sequence, and so assuming that h 1( k) can take on any value in {0, 1, …, m − 1}, linear probing allows only m distinct probe sequences.

We’ll revisit linear probing in Section 11.5.1.

Analysis of open-address hashing

As in our analysis of chaining in Section 11.2, we analyze open addressing in terms of the load factor α = n/ m of the hash table. With open addressing, at most one element occupies each slot, and thus n

m, which implies α ≤ 1. The analysis below requires α to be strictly less than 1, and so we assume that at least one slot is empty. Because

deleting from an open-address hash table does not really free up a slot,

we assume as well that no deletions occur.

For the hash function, we assume independent uniform permutation

hashing. In this idealized scheme, the probe sequence 〈 h( k, 0), h( k, 1),

…, h( k, m − 1)〉 used to insert or search for each key k is equally likely to be any permutation of 〈0, 1, …, m − 1〉. Of course, any given key has

a unique fixed probe sequence associated with it. What we mean here is

that, considering the probability distribution on the space of keys and

the operation of the hash function on the keys, each possible probe

sequence is equally likely.

We now analyze the expected number of probes for hashing with

open addressing under the assumption of independent uniform

permutation hashing, beginning with the expected number of probes

made in an unsuccessful search (assuming, as stated above, that α < 1).

The bound proven, of 1/(1 − α) = 1 + α + α 2 + α 3 + ⋯, has an intuitive interpretation. The first probe always occurs. With probability

approximately α, the first probe finds an occupied slot, so that a second

probe happens. With probability approximately α 2, the first two slots are occupied so that a third probe ensues, and so on.

Theorem 11.6

Given an open-address hash table with load factor α = n/ m < 1, the expected number of probes in an unsuccessful search is at most 1/(1 −

α), assuming independent uniform permutation hashing and no

deletions.

Proof In an unsuccessful search, every probe but the last accesses an

occupied slot that does not contain the desired key, and the last slot

probed is empty. Let the random variable X denote the number of

probes made in an unsuccessful search, and define the event Ai, for i =

1, 2, …, as the event that an i th probe occurs and it is to an occupied

slot. Then the event { Xi} is the intersection of events A 1 ⋂ A 2 ⋂ ⋯

Ai−1. We bound Pr{ Xi} by bounding Pr{ A 1 ⋂ A 2 ⋂ ⋯ ⋂ Ai−1}.

By Exercise C.2-5 on page 1190,

Pr{ A 1 ⋂ A 2 ⋂ ⋯ ⋂ = Pr{ A 1} · Pr{ A 2 | A 1} · Pr { A 3 | A 1 ⋂ A 2}

Ai−1}

Pr{ Ai−1 | A 1 ⋂ A 2 ⋂ ⋯ ⋂ Ai−2}.

Image 414

Image 415

Since there are n elements and m slots, Pr{ A 1} = n/ m. For j > 1, the probability that there is a j th probe and it is to an occupied slot, given

that the first j − 1 probes were to occupied slots, is ( nj + 1)/( mj +

1). This probability follows because the j th probe would be finding one

of the remaining ( n − ( j − 1)) elements in one of the ( m − ( j − 1)) unexamined slots, and by the assumption of independent uniform

permutation hashing, the probability is the ratio of these quantities.

Since n < m implies that ( nj)/( mj) ≤ n/ m for all j in the range 0 ≤ j < m, it follows that for all i in the range 1 ≤ im, we have The product in the first line has i − 1 factors. When i = 1, the product is 1, the identity for multiplication, and we get Pr{ X ≥ 1} = 1, which makes sense, since there must always be at least 1 probe. If each of the

first n probes is to an occupied slot, then all occupied slots have been

probed. Then, the ( n + 1)st probe must be to an empty slot, which gives

Pr{ Xi} = 0 for i > n + 1. Now, we use equation (C.28) on page 1193 to bound the expected number of probes:

Image 416

If α is a constant, Theorem 11.6 predicts that an unsuccessful search

runs in O(1) time. For example, if the hash table is half full, the average

number of probes in an unsuccessful search is at most 1/(1 − .5) = 2. If it

is 90% full, the average number of probes is at most 1/(1 − .9) = 10.

Theorem 11.6 yields almost immediately how well the HASH-

INSERT procedure performs.

Corollary 11.7

Inserting an element into an open-address hash table with load factor α,

where α < 1, requires at most 1/(1 − α) probes on average, assuming independent uniform permutation hashing and no deletions.

Proof An element is inserted only if there is room in the table, and thus

α < 1. Inserting a key requires an unsuccessful search followed by placing the key into the first empty slot found. Thus, the expected

number of probes is at most 1/(1 − α).

It takes a little more work to compute the expected number of

probes for a successful search.

Theorem 11.8

Given an open-address hash table with load factor α < 1, the expected

number of probes in a successful search is at most

assuming independent uniform permutation hashing with no deletions

and assuming that each key in the table is equally likely to be searched

for.

Proof A search for a key k reproduces the same probe sequence as when the element with key k was inserted. If k was the ( i + 1)st key inserted into the hash table, then the load factor at the time it was

inserted was i/ m, and so by Corollary 11.7, the expected number of probes made in a search for k is at most 1/(1 − i/ m) = m/( mi).

Averaging over all n keys in the hash table gives us the expected number

of probes in a successful search:

Image 417

If the hash table is half full, the expected number of probes in a

successful search is less than 1.387. If the hash table is 90% full, the

expected number of probes is less than 2.559. If α = 1, then in an unsuccessful search, all m slots must be probed. Exercise 11.4-4 asks you to analyze a successful search when α = 1.

Exercises

11.4-1

Consider inserting the keys 10, 22, 31, 4, 15, 28, 17, 88, 59 into a hash

table of length m = 11 using open addressing. Illustrate the result of inserting these keys using linear probing with h( k, i) = ( k + i) mod m and using double hashing with h 1( k) = k and h 2( k) = 1 + ( k mod ( m

1)).

11.4-2

Write pseudocode for HASH-DELETE that fills the deleted key’s slot

with the special value DELETED, and modify HASH-SEARCH and

HASH-INSERT as needed to handle DELETED.

11.4-3

Consider an open-address hash table with independent uniform

permutation hashing and no deletions. Give upper bounds on the

expected number of probes in an unsuccessful search and on the

expected number of probes in a successful search when the load factor is

3/4 and when it is 7/8.

11.4-4

Show that the expected number of probes required for a successful

search when α = 1 (that is, when n = m), is Hm, the m th harmonic number.

11.4-5

Show that, with double hashing, if m and h 2( k) have greatest common divisor d ≥ 1 for some key k, then an unsuccessful search for key k examines (1/ d)th of the hash table before returning to slot h 1( k). Thus, when d = 1, so that m and h 2( k) are relatively prime, the search may examine the entire hash table. ( Hint: See Chapter 31. )

11.4-6

Consider an open-address hash table with a load factor α. Approximate

the nonzero value α for which the expected number of probes in an

unsuccessful search equals twice the expected number of probes in a

successful search. Use the upper bounds given by Theorems 11.6 and

11.8 for these expected numbers of probes.

11.5 Practical considerations

Efficient hash table algorithms are not only of theoretical interest, but

also of immense practical importance. Constant factors can matter. For

this reason, this section discusses two aspects of modern CPUs that are

not included in the standard RAM model presented in Section 2.2:

Memory hierarchies: The memory of modern CPUs has a number of

levels, from the fast registers, through one or more levels of cache

memory, to the main-memory level. Each successive level stores more

data than the previous level, but access is slower. As a consequence, a complex computation (such as a complicated hash function) that

works entirely within the fast registers can take less time than a single

read operation from main memory. Furthermore, cache memory is

organized in cache blocks of (say) 64 bytes each, which are always

fetched together from main memory. There is a substantial benefit for

ensuring that memory usage is local: reusing the same cache block is

much more efficient than fetching a different cache block from main

memory.

The standard RAM model measures efficiency of a hash-table

operation by counting the number of hash-table slots probed. In

practice, this metric is only a crude approximation to the truth, since

once a cache block is in the cache, successive probes to that cache

block are much faster than probes that must access main memory.

Advanced instruction sets: Modern CPUs may have sophisticated

instruction sets that implement advanced primitives useful for

encryption or other forms of cryptography. These instructions may be

useful in the design of exceptionally efficient hash functions.

Section 11.5.1 discusses linear probing, which becomes the collision-resolution method of choice in the presence of a memory hierarchy.

Section 11.5.2 suggests how to construct “advanced” hash functions based on cryptographic primitives, suitable for use on computers with

hierarchical memory models.

11.5.1 Linear probing

Linear probing is often disparaged because of its poor performance in

the standard RAM model. But linear probing excels for hierarchical

memory models, because successive probes are usually to the same

cache block of memory.

Deletion with linear probing

Another reason why linear probing is often not used in practice is that

deletion seems complicated or impossible without using the special

DELETED value. Yet we’ll now see that deletion from a hash table

based on linear probing is not all that difficult, even without the DELETED marker. The deletion procedure works for linear probing,

but not for open-address probing in general, because with linear

probing keys all follow the same simple cyclic probing sequence (albeit

with different starting points).

The deletion procedure relies on an “inverse” function to the linear-

probing hash function h( k, i) = ( h 1( k) + i) mod m, which maps a key k and a probe number i to a slot number in the hash table. The inverse

function g maps a key k and a slot number q, where 0 ≤ q < m, to the probe number that reaches slot q:

g( k, q) = ( qh 1( k)) mod m.

If h( k, i) = q, then g( k, q) = i, and so h( k, g( k, q)) = q.

The procedure LINEAR-PROBING-HASH-DELETE on the facing

page deletes the key stored in position q from hash table T. Figure 11.6

shows how it works. The procedure first deletes the key in position q by

setting T[ q] to NIL in line 2. It then searches for a slot q′ (if any) that contains a key that should be moved to the slot q just vacated by key k.

Line 9 asks the critical question: does the key k′ in slot q′ need to be moved to the vacated slot q in order to preserve the accessibility of k′? If g( k′, q) < g( k′, q′), then during the insertion of k′ into the table, slot q was examined but found to be already occupied. But now slot q, where a

search will look for k′, is empty. In this case, key k′ moves to slot q in line 10, and the search continues, to see whether any later key also needs

to be moved to the slot q′ that was just freed up when k′ moved.

Image 418

Figure 11.6 Deletion in a hash table that uses linear probing. The hash table has size 10 with h 1( k) = k mod 10. (a) The hash table after inserting keys in the order 74, 43, 93, 18, 82, 38, 92.

(b) The hash table after deleting the key 43 from slot 3. Key 93 moves up to slot 3 to keep it accessible, and then key 92 moves up to slot 5 just vacated by key 93. No other keys need to be moved.

LINEAR-PROBING-HASH-DELETE( T, q)

1 while TRUE

2

T[ q] = NIL

// make slot q empty

3

q′ = q

// starting point for search

4

repeat

5

q′ = ( q′ + 1) mod m // next slot number with linear probing

6

k′ = T[ q′]

// next key to try to move

7

if k′ == NIL

8

return

// return when an empty slot is found

9

until g( k′, q) < g( k′, q′) // was empty slot q probed before q′?

10

T[ q] = k

// move k′ into slot q

11

q = q

// free up slot q

Analysis of linear probing

Linear probing is popular to implement, but it exhibits a phenomenon

known as primary clustering. Long runs of occupied slots build up,

increasing the average search time. Clusters arise because an empty slot preceded by i full slots gets filled next with probability ( i + 1)/ m. Long runs of occupied slots tend to get longer, and the average search time

increases.

In the standard RAM model, primary clustering is a problem, and

general double hashing usually performs better than linear probing. By

contrast, in a hierarchical memory model, primary clustering is a

beneficial property, as elements are often stored together in the same

cache block. Searching proceeds through one cache block before

advancing to search the next cache block. With linear probing, the

running time for a key k of HASH-INSERT, HASH-SEARCH, or

LINEAR-PROBING-HASH-DELETE is at most proportional to the

distance from h 1( k) to the next empty slot.

The following theorem is due to Pagh et al. [351]. A more recent proof is given by Thorup [438]. We omit the proof here. The need for 5-independence is by no means obvious; see the cited proofs.

Theorem 11.9

If h 1 is 5-independent and α ≤ 2/3, then it takes expected constant time to search for, insert, or delete a key in a hash table using linear probing.

(Indeed, the expected operation time is O(1/ ϵ 2) for α = 1 − ϵ.)

★ 11.5.2 Hash functions for hierarchical memory models

This section illustrates an approach for designing efficient hash tables in

a modern computer system having a memory hierarchy.

Because of the memory hierarchy, linear probing is a good choice for

resolving collisions, as probe sequences are sequential and tend to stay

within cache blocks. But linear probing is most efficient when the hash

function is complex (for example, 5-independent as in Theorem 11.9).

Fortunately, having a memory hierarchy means that complex hash

functions can be implemented efficiently.

As noted in Section 11.3.5, one approach is to use a cryptographic hash function such as SHA-256. Such functions are complex and

Image 419

Image 420

Image 421

sufficiently random for hash table applications. On machines with

specialized instructions, cryptographic functions can be quite efficient.

Instead, we present here a simple hash function based only on

addition, multiplication, and swapping the halves of a word. This

function can be implemented entirely within the fast registers, and on a

machine with a memory hierarchy, its latency is small compared with

the time taken to access a random slot of the hash table. It is related to

the RC6 encryption algorithm and can for practical purposes be

considered a “random oracle.”

The wee hash function

Let w denote the word size of the machine (e.g., w = 64), assumed to be even, and let a and b be w-bit unsigned (nonnegative) integers such that a is odd. Let swap( x) denote the w-bit result of swapping the two w/ 2-bit halves of w-bit input x. That is,

swap( x) = ( x ⋙ ( w/2)) + ( x ⋘ ( w/2))

where “⋙” is “logical right shift” (as in equation (11.2)) and “⋘ is

“left shift.” Define

fa( k) = swap((2 k 2 + ak) mod 2 w).

Thus, to compute fa( k), evaluate the quadratic function 2 k 2 + ak modulo 2 w and then swap the left and right halves of the result.

Let r denote a desired number of “rounds” for the computation of

the hash function. We’ll use r = 4, but the hash function is well defined

for any nonnegative r. Denote by

the result of iterating fa a total

of r times (that is, r rounds) starting with input value k. For any odd a and any r ≥ 0, the function

, although complicated, is one-to-one (see

Exercise 11.5-1). A cryptographer would view

as a simple block

cipher operating on w-bit input blocks, with r rounds and key a.

We first define the wee hash function h for short inputs, where by

“short” we means “whose length t is at most w-bits,” so that the input

fits within one computer word. We would like inputs of different lengths

Image 422

Image 423

to be hashed differently. The wee hash function ha, b, t, r( k) for parameters a, b, and r on t-bit input k is defined as

That is, the hash value for t-bit input k is obtained by applying

to k

+ b, then taking the final result modulo m. Adding the value b provides hash-dependent randomization of the input, in a way that ensures that

for variable-length inputs the 0-length input does not have a fixed hash

value. Adding the value 2 t to a ensures that the hash function acts differently for inputs of different lengths. (We use 2 t rather than t to ensure that the key a + 2 t is odd if a is odd.) We call this hash function

“wee” because it uses a tiny amount of memory—more precisely, it can

be implemented efficiently using only the computer’s fast registers. (This

hash function does not have a name in the literature; it is a variant we

developed for this textbook.)

Speed of the wee hash function

It is surprising how much efficiency can be bought with locality.

Experiments (unpublished, by the authors) suggest that evaluating the

wee hash function takes less time than probing a single randomly chosen

slot in a hash table. These experiments were run on a laptop (2019

MacBook Pro) with w = 64 and a = 123. For large hash tables, evaluating the wee hash function was 2 to 10 times faster than

performing a single probe of the hash table.

The wee hash function for variable-length inputs

Sometimes inputs are long—more than one w-bit word in length—or

have variable length, as discussed in Section 11.3.5. We can extend the wee hash function, defined above for inputs that are at most single w-bit

word in length, to handle long or variable-length inputs. Here is one

method for doing so.

Suppose that an input k has length t (measured in bits). Break k into a sequence 〈 k 1, k 2, …, ku〉 of w-bit words, where u = ⌈ t/ w⌉, k 1 contains the least-significant w bits of k, and ku contains the most significant

Image 424

bits. If t is not a multiple of w, then ku contains fewer than w bits, in which case, pad out the unused high-order bits of ku with 0-bits. Define

the function chop to return a sequence of the w-bit words in k:

chop( k) = 〈 k 1, k 2, …, ku〉.

The most important property of the chop operation is that it is one-to-

one, given t: for any two t-bit keys k and k′, if kk′ then chop( k) ≠

chop( k′), and k can be derived from chop( k) and t. The chop operation also has the useful property that a single-word input key yields a single-word output sequence: chop( k) = 〈 k〉.

With the chop function in hand, we specify the wee hash function

ha, b, t, r( k) for an input k of length t bits as follows: ha, b, t, r( k) = WEE( k, a, b, t, r, m), where the procedure WEE defined on the facing page iterates through

the elements of the w-bit words returned by chop( k), applying to the

sum of the current word ki and the previously computed hash value so

far, finally returning the result obtained modulo m. This definition for

variable-length and long (multiple-word) inputs is a consistent extension

of the definition in equation (11.7) for short (single-word) inputs. For

practical use, we recommend that a be a randomly chosen odd w-bit word, b be a randomly chosen w-bit word, and that r = 4.

Note that the wee hash function is really a hash function family, with

individual hash functions determined by parameters a, b, t, r, and m.

The (approximate) 5-independence of the wee hash function family for

variable-length inputs can be argued based on the assumption that the

1-word wee hash function is a random oracle and on the security of the

cipher-block-chaining message authentication code (CBC-MAC), as

studied by Bellare et al. [42]. The case here is actually simpler than that studied in the literature, since if two messages have different lengths t and t′, then their “keys” are different: a + 2 ta + 2 t′. We omit the details.

WEE( k, a, b, t, r, m)

Image 425

Image 426

Image 427

Image 428

Image 429

1 u = ⌈ t/ w

2 〈 k 1, k 2, …, ku〉 = chop( k)

3 q = b

4 for i = 1 to u

5

6 return q mod m

This definition of a cryptographically inspired hash-function family

is meant to be realistic, yet only illustrative, and many variations and

improvements are possible. See the chapter notes for suggestions.

In summary, we see that when the memory system is hierarchical, it

becomes advantageous to use linear probing (a special case of double

hashing), since successive probes tend to stay in the same cache block.

Furthermore, hash functions that can be implemented using only the

computer’s fast registers are exceptionally efficient, so they can be quite

complex and even cryptographically inspired, providing the high degree

of independence needed for linear probing to work most efficiently.

Exercises

11.5-1

Complete the argument that for any odd positive integer a and any

integer r ≥ 0, the function

is one-to-one. Use a proof by

contradiction and make use of the fact that the function fa works

modulo 2 w.

11.5-2

Argue that a random oracle is 5-independent.

11.5-3

Consider what happens to the value

as we flip a single bit ki of

the input value k, for various values of r. Let

and

define the bit values ki in the input (with k 0 the least-

Image 430

significant bit) and the bit values bj in ga( k) = (2 k 2 + ak) mod 2 w (where ga( k) is the value that, when its halves are swapped, becomes fa( k)). Suppose that flipping a single bit ki of the input k may cause any bit bj of ga( k) to flip, for ji. What is the least value of r for which flipping the value of any single bit ki may cause any bit of the output to flip? Explain.

Problems

11-1 Longest-probe bound for hashing

Suppose you are using an open-addressed hash table of size m to store n

m/2 items.

a. Assuming independent uniform permutation hashing, show that for i

= 1, 2, …, n, the probability is at most 2− p that the i th insertion requires strictly more than p probes.

b. Show that for i = 1, 2, …, n, the probability is O(1/ n 2) that the i th insertion requires more than 2 lg n probes.

Let the random variable Xi denote the number of probes required by the

i th insertion. You have shown in part (b) that Pr{ Xi > 2 lg n} = O(1/ n 2).

Let the random variable X = max { Xi : 1 ≤ in} denote the maximum number of probes required by any of the n insertions.

c. Show that Pr{ X > 2 lg n} = O(1/ n).

d. Show that the expected length E[ X] of the longest probe sequence is O(lg n).

11-2 Searching a static set

You are asked to implement a searchable set of n elements in which the

keys are numbers. The set is static (no INSERT or DELETE

operations), and the only operation required is SEARCH. You are given

Image 431

Image 432

an arbitrary amount of time to preprocess the n elements so that

SEARCH operations run quickly.

a. Show how to implement SEARCH in O(lg n) worst-case time using no extra storage beyond what is needed to store the elements of the set

themselves.

b. Consider implementing the set by open-address hashing on m slots,

and assume independent uniform permutation hashing. What is the

minimum amount of extra storage mn required to make the average

performance of an unsuccessful SEARCH operation be at least as

good as the bound in part (a)? Your answer should be an asymptotic

bound on mn in terms of n.

11-3 Slot-size bound for chaining

Given a hash table with n slots, with collisions resolved by chaining, suppose that n keys are inserted into the table. Each key is equally likely

to be hashed to each slot. Let M be the maximum number of keys in

any slot after all the keys have been inserted. Your mission is to prove

an O(lg n / lg lg n) upper bound on E[ M], the expected value of M.

a. Argue that the probability Qk that exactly k keys hash to a particular slot is given by

b. Let Pk be the probability that M = k, that is, the probability that the slot containing the most keys contains k keys. Show that PknQk.

c. Show that Qk < ek/ kk. Hint: Use Stirling’s approximation, equation (3.25) on page 67.

d. Show that there exists a constant c > 1 such that

for k 0 =

c lg n / lg lg n. Conclude that Pk < 1/ n 2 for kk 0 = c lg n / lg lg n.

e. Argue that

Image 433

Image 434

Image 435

Image 436

Image 437

Image 438

Conclude that E[ M] = O(lg n / lg lg n).

11-4 Hashing and authentication

Let H be a family of hash functions in which each hash function h

H maps the universe U of keys to {0, 1, …, m − 1}.

a. Show that if the family H of hash functions is 2-independent, then it

is universal.

b. Suppose that the universe U is the set of n-tuples of values drawn from ℤ p = {0, 1, …, p − 1}, where p is prime. Consider an element x =

x 0, x 1, …, xn−1〉 ∈ U. For any n-tuple a = 〈 a 0, a 1, …, an−1〉 ∈ U, define the hash function ha by

Let H = { ha : aU}. Show that H is universal, but not 2-independent. ( Hint: Find a key for which all hash functions in H

produce the same value.)

c. Suppose that we modify H slightly from part (b): for any aU and for any b ∈ ℤ p, define

and

. Argue that H′ is 2-independent.

( Hint: Consider fixed n-tuples xU and yU, with xiyi for some i. What happens to

and

as ai and b range over ℤ p?)

d. Alice and Bob secretly agree on a hash function h from a 2-

independent family H of hash functions. Each h ∈ H maps from a

universe of keys U to ℤ p, where p is prime. Later, Alice sends a message m to Bob over the internet, where mU. She authenticates this message to Bob by also sending an authentication tag t = h( m), and Bob checks that the pair ( m, t) he receives indeed satisfies t =

h( m). Suppose that an adversary intercepts ( m, t) en route and tries to fool Bob by replacing the pair ( m, t) with a different pair ( m′, t′).

Argue that the probability that the adversary succeeds in fooling Bob

into accepting ( m′, t′) is at most 1/ p, no matter how much computing power the adversary has, even if the adversary knows the family H of

hash functions used.

Chapter notes

The books by Knuth [261] and Gonnet and Baeza-Yates [193] are excellent references for the analysis of hashing algorithms. Knuth

credits H. P. Luhn (1953) for inventing hash tables, along with the

chaining method for resolving collisions. At about the same time, G. M.

Amdahl originated the idea of open addressing. The notion of a

random oracle was introduced by Bellare et al. [43]. Carter and Wegman [80] introduced the notion of universal families of hash functions in 1979.

Dietzfelbinger et al. [113] invented the multiply-shift hash function and gave a proof of Theorem 11.5. Thorup [437] provides extensions and additional analysis. Thorup [438] gives a simple proof that linear probing with 5-independent hashing takes constant expected time per

operation. Thorup also describes the method for deletion in a hash table

using linear probing.

Fredman, Komlós, and Szemerédi [154] developed a perfect hashing

scheme for static sets—“perfect” because all collisions are avoided. An

extension of their method to dynamic sets, handling insertions and

deletions in amortized expected time O(1), has been given by

Dietzfelbinger et al. [114].

The wee hash function is based on the RC6 encryption algorithm

[379]. Leiserson et al. [292] propose an “RC6MIX” function that is essentially the same as the wee hash function. They give experimental

evidence that it has good randomness, and they also give a “DOTMIX”

function for dealing with variable-length inputs. Bellare et al. [42]

provide an analysis of the security of the cipher-block-chaining message

authentication code. This analysis implies that the wee hash function

has the desired pseudorandomness properties.

1 The definition of “average-case” requires care—are we assuming an input distribution over the keys, or are we randomizing the choice of hash function itself? We’ll consider both approaches, but with an emphasis on the use of a randomly chosen hash function.

2 In the literature, a ( c/ m)-universal hash function is sometimes called c-universal or c-

approximately universal. We’ll stick with the notation ( c/ m)-universal.

12 Binary Search Trees

The search tree data structure supports each of the dynamic-set

operations listed on page 250: SEARCH, MINIMUM, MAXIMUM,

PREDECESSOR, SUCCESSOR, INSERT, and DELETE. Thus, you

can use a search tree both as a dictionary and as a priority queue.

Basic operations on a binary search tree take time proportional to

the height of the tree. For a complete binary tree with n nodes, such operations run in Θ(lg n) worst-case time. If the tree is a linear chain of

n nodes, however, the same operations take Θ( n) worst-case time. In

Chapter 13, we’ll see a variation of binary search trees, red-black trees, whose operations guarantee a height of O(lg n). We won’t prove it here, but if you build a binary search tree on a random set of n keys, its expected height is O(lg n) even if you don’t try to limit its height.

After presenting the basic properties of binary search trees, the

following sections show how to walk a binary search tree to print its

values in sorted order, how to search for a value in a binary search tree,

how to find the minimum or maximum element, how to find the

predecessor or successor of an element, and how to insert into or delete

from a binary search tree. The basic mathematical properties of trees

appear in Appendix B.

12.1 What is a binary search tree?

A binary search tree is organized, as the name suggests, in a binary tree,

as shown in Figure 12.1. You can represent such a tree with a linked

data structure, as in Section 10.3. In addition to a key and satellite data, each node object contains attributes left, right, and p that point to the nodes corresponding to its left child, its right child, and its parent,

respectively. If a child or the parent is missing, the appropriate attribute

contains the value NIL. The tree itself has an attribute root that points

to the root node, or NIL if the tree is empty. The root node T.root is the

only node in a tree T whose parent is NIL.

Image 439

Figure 12.1 Binary search trees. For any node x, the keys in the left subtree of x are at most x.key, and the keys in the right subtree of x are at least x.key. Different binary search trees can represent the same set of values. The worst-case running time for most search-tree operations is proportional to the height of the tree. (a) A binary search tree on 6 nodes with height 2. The top figure shows how to view the tree conceptually, and the bottom figure shows the left, right, and p attributes in each node, in the style of Figure 10.6 on page 266. (b) A less efficient binary search tree, with height 4, that contains the same keys.

The keys in a binary search tree are always stored in such a way as to

satisfy the binary-search-tree property:

Let x be a node in a binary search tree. If y is a node in the left

subtree of x, then y.keyx.key. If y is a node in the right subtree of x, then y.keyx.key.

Thus, in Figure 12.1(a), the key of the root is 6, the keys 2, 5, and 5

in its left subtree are no larger than 6, and the keys 7 and 8 in its right

subtree are no smaller than 6. The same property holds for every node

in the tree. For example, looking at the root’s left child as the root of a

subtree, this subtree root has the key 5, the key 2 in its left subtree is no

larger than 5, and the key 5 in its right subtree is no smaller than 5.

Because of the binary-search-tree property, you can print out all the

keys in a binary search tree in sorted order by a simple recursive

algorithm, called an inorder tree walk, given by the procedure

INORDER-TREE-WALK. This algorithm is so named because it

prints the key of the root of a subtree between printing the values in its

left subtree and printing those in its right subtree. (Similarly, a preorder

tree walk prints the root before the values in either subtree, and a postorder tree walk prints the root after the values in its subtrees.) To print all the elements in a binary search tree T, call INORDER-TREE-WALK( T.root). For example, the inorder tree walk prints the keys in

each of the two binary search trees from Figure 12.1 in the order 2, 5, 5, 6, 7, 8. The correctness of the algorithm follows by induction directly

from the binary-search-tree property.

INORDER-TREE-WALK( x)

1 if x ≠ NIL

2

INORDER-TREE-WALK( x.left)

3

print x.key

4

INORDER-TREE-WALK( x.right)

It takes Θ( n) time to walk an n-node binary search tree, since after

the initial call, the procedure calls itself recursively exactly twice for

each node in the tree—once for its left child and once for its right child.

The following theorem gives a formal proof that it takes linear time to

perform an inorder tree walk.

Theorem 12.1

If x is the root of an n-node subtree, then the call INORDER-TREE-

WALK( x) takes Θ( n) time.

Proof Let T( n) denote the time taken by INORDER-TREE-WALK

when it is called on the root of an n-node subtree. Since INORDER-

TREE-WALK visits all n nodes of the subtree, we have T( n) = Ω( n). It remains to show that T( n) = O( n).

Since INORDER-TREE-WALK takes a small, constant amount of

time on an empty subtree (for the test x ≠ NIL), we have T(0) = c for some constant c > 0.

For n > 0, suppose that INORDER-TREE-WALK is called on a

node x whose left subtree has k nodes and whose right subtree has nk

− 1 nodes. The time to perform INORDER-TREE-WALK( x) is

bounded by T( n) ≤ T( k) + T( nk − 1) + d for some constant d > 0 that reflects an upper bound on the time to execute the body of INORDER-TREE-WALK( x), exclusive of the time spent in recursive calls.

We use the substitution method to show that T( n) = O( n) by proving that T( n) ≤ ( c + d) n + c. For n = 0, we have ( c + d) · 0 + c = c = T(0). For n > 0, we have

T( n) ≤ T( k) + T( nk − 1) + d

≤ (( c + d) k + c) + (( c + d)( nk − 1) + c) + d

= ( c + d) n + c − ( c + d) + c + d

= ( c + d) n + c,

which completes the proof.

Exercises

12.1-1

For the set {1, 4, 5, 10, 16, 17, 21} of keys, draw binary search trees of

heights 2, 3, 4, 5, and 6.

12.1-2

What is the difference between the binary-search-tree property and the

min-heap property on page 163? Can the min-heap property be used to

print out the keys of an n-node tree in sorted order in O( n) time? Show how, or explain why not.

12.1-3

Give a nonrecursive algorithm that performs an inorder tree walk.

( Hint: An easy solution uses a stack as an auxiliary data structure. A

more complicated, but elegant, solution uses no stack but assumes that

you can test two pointers for equality.)

12.1-4

Give recursive algorithms that perform preorder and postorder tree

walks in Θ( n) time on a tree of n nodes.

12.1-5

Argue that since sorting n elements takes Ω( n lg n) time in the worst case in the comparison model, any comparison-based algorithm for

constructing a binary search tree from an arbitrary list of n elements takes Ω( n lg n) time in the worst case.

12.2 Querying a binary search tree

Binary search trees can support the queries MINIMUM, MAXIMUM,

SUCCESSOR, and PREDECESSOR, as well as SEARCH. This

section examines these operations and shows how to support each one

in O( h) time on any binary search tree of height h.

Searching

To search for a node with a given key in a binary search tree, call the

TREE-SEARCH procedure. Given a pointer x to the root of a subtree

and a key k, TREE-SEARCH( x, k) returns a pointer to a node with key k if one exists in the subtree; otherwise, it returns NIL. To search for key

k in the entire binary search tree T, call TREE-SEARCH( T.root, k).

TREE-SEARCH( x, k)

1 if x == NIL or k == x.key

2

return x

3 if k < x.key

4

return TREE-SEARCH( x.left, k)

5 else return TREE-SEARCH( x.right, k)

ITERATIVE-TREE-SEARCH( x, k)

1 while x ≠ NIL and kx.key

2

if k < x.key

3

x = x.left

4

else x = x.right

5 return x

The TREE-SEARCH procedure begins its search at the root and

traces a simple path downward in the tree, as shown in Figure 12.2(a).

For each node x it encounters, it compares the key k with x.key. If the two keys are equal, the search terminates. If k is smaller than x.key, the search continues in the left subtree of x, since the binary-search-tree property implies that k cannot reside in the right subtree. Symmetrically,

if k is larger than x.key, the search continues in the right subtree. The nodes encountered during the recursion form a simple path downward

from the root of the tree, and thus the running time of TREE-SEARCH

is O( h), where h is the height of the tree.

Image 440

Figure 12.2 Queries on a binary search tree. Nodes and paths followed in each query are colored blue. (a) A search for the key 13 in the tree follows the path 15 → 6 → 7 → 13 from the root. (b) The minimum key in the tree is 2, which is found by following left pointers from the root. The maximum key 20 is found by following right pointers from the root. (c) The successor of the node with key 15 is the node with key 17, since it is the minimum key in the right subtree of 15.

(d) The node with key 13 has no right subtree, and thus its successor is its lowest ancestor whose left child is also an ancestor. In this case, the node with key 15 is its successor.

Since the TREE-SEARCH procedure recurses on either the left

subtree or the right subtree, but not both, we can rewrite the algorithm

to “unroll” the recursion into a while loop. On most computers, the

ITERATIVE-TREE-SEARCH procedure on the facing page is more

efficient.

Minimum and maximum

To find an element in a binary search tree whose key is a minimum, just follow left child pointers from the root until you encounter a NIL, as

shown in Figure 12.2(b). The TREE-MINIMUM procedure returns a

pointer to the minimum element in the subtree rooted at a given node x,

which we assume to be non-NIL.

TREE-MINIMUM( x)

1while x.left ≠ NIL

2

x = x.left

3return x

TREE-MAXIMUM( x)

1while x.right ≠ NIL

2

x = x.right

3return x

The binary-search-tree property guarantees that TREE-MINIMUM

is correct. If node x has no left subtree, then since every key in the right

subtree of x is at least as large as x.key, the minimum key in the subtree rooted at x is x.key. If node x has a left subtree, then since no key in the right subtree is smaller than x.key and every key in the left subtree is not

larger than x.key, the minimum key in the subtree rooted at x resides in the subtree rooted at x.left.

The pseudocode for TREE-MAXIMUM is symmetric. Both TREE-

MINIMUM and TREE-MAXIMUM run in O( h) time on a tree of

height h since, as in TREE-SEARCH, the sequence of nodes

encountered forms a simple path downward from the root.

Successor and predecessor

Given a node in a binary search tree, how can you find its successor in

the sorted order determined by an inorder tree walk? If all keys are

distinct, the successor of a node x is the node with the smallest key greater than x.key. Regardless of whether the keys are distinct, we define

the successor of a node as the next node visited in an inorder tree walk.

The structure of a binary search tree allows you to determine the

successor of a node without comparing keys. The TREE-SUCCESSOR

procedure on the facing page returns the successor of a node x in a binary search tree if it exists, or NIL if x is the last node that would be

visited during an inorder walk.

The code for TREE-SUCCESSOR has two cases. If the right subtree

of node x is nonempty, then the successor of x is just the leftmost node in x’s right subtree, which line 2 finds by calling TREE-MINIMUM( x.right). For example, the successor of the node with key

15 in Figure 12.2(c) is the node with key 17.

On the other hand, as Exercise 12.2-6 asks you to show, if the right

subtree of node x is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. In Figure 12.2(d), the successor of the node with key 13 is the node with key 15. To find y,

go up the tree from x until you encounter either the root or a node that

is the left child of its parent. Lines 4–8 of TREE-SUCCESSOR handle

this case.

TREE-SUCCESSOR( x)

1 if x.right ≠ NIL

2

return TREE-MINIMUM( x.right) // leftmost node in right

subtree

3 else // find the lowest ancestor of x whose left child is an ancestor of

x

4

y = x.p

5

while y ≠ NIL and x == y.right

6

x = y

7

y = y.p

8

return y

The running time of TREE-SUCCESSOR on a tree of height h is

O( h), since it either follows a simple path up the tree or follows a simple path down the tree. The procedure TREE-PREDECESSOR, which is

symmetric to TREE-SUCCESSOR, also runs in O( h) time.

In summary, we have proved the following theorem.

Theorem 12.2

The dynamic-set operations SEARCH, MINIMUM, MAXIMUM,

SUCCESSOR, and PREDECESSOR can be implemented so that each

one runs in O( h) time on a binary search tree of height h.

Exercises

12.2-1

You are searching for the number 363 in a binary search tree containing

numbers between 1 and 1000. Which of the following sequences cannot

be the sequence of nodes examined?

a. 2, 252, 401, 398, 330, 344, 397, 363.

b. 924, 220, 911, 244, 898, 258, 362, 363.

c. 925, 202, 911, 240, 912, 245, 363.

d. 2, 399, 387, 219, 266, 382, 381, 278, 363.

e. 935, 278, 347, 621, 299, 392, 358, 363.

12.2-2

Write recursive versions of TREE-MINIMUM and TREE-

MAXIMUM.

12.2-3

Write the TREE-PREDECESSOR procedure.

12.2-4

Professor Kilmer claims to have discovered a remarkable property of

binary search trees. Suppose that the search for key k in a binary search

tree ends up at a leaf. Consider three sets: A, the keys to the left of the

search path; B, the keys on the search path; and C, the keys to the right of the search path. Professor Kilmer claims that any three keys aA, b

B, and cC must satisfy abc. Give a smallest possible counterexample to the professor’s claim.

12.2-5

Show that if a node in a binary search tree has two children, then its successor has no left child and its predecessor has no right child.

12.2-6

Consider a binary search tree T whose keys are distinct. Show that if the

right subtree of a node x in T is empty and x has a successor y, then y is the lowest ancestor of x whose left child is also an ancestor of x. (Recall that every node is its own ancestor.)

12.2-7

An alternative method of performing an inorder tree walk of an n-node

binary search tree finds the minimum element in the tree by calling

TREE-MINIMUM and then making n − 1 calls to TREE-

SUCCESSOR. Prove that this algorithm runs in Θ( n) time.

12.2-8

Prove that no matter what node you start at in a height- h binary search

tree, k successive calls to TREE-SUCCESSOR take O( k + h) time.

12.2-9

Let T be a binary search tree whose keys are distinct, let x be a leaf node, and let y be its parent. Show that y.key is either the smallest key in T larger than x.key or the largest key in T smaller than x.key.

12.3 Insertion and deletion

The operations of insertion and deletion cause the dynamic set

represented by a binary search tree to change. The data structure must

be modified to reflect this change, but in such a way that the binary-

search-tree property continues to hold. We’ll see that modifying the tree

to insert a new element is relatively straightforward, but deleting a node

from a binary search tree is more complicated.

Insertion

The TREE-INSERT procedure inserts a new node into a binary search

tree. The procedure takes a binary search tree T and a node z for which

z.key has already been filled in, z.left = NIL, and z.right = NIL. It modifies T and some of the attributes of z so as to insert z into an appropriate position in the tree.

TREE-INSERT( T, z)

1 x = T.root

// node being compared with z

2 y = NIL

// y will be parent of z

3 while x ≠ NIL

// descend until reaching a leaf

4

y = x

5

if z.key < x.key

6

x = x.left

7

else x = x.right

8 z.p = y

// found the location—insert z with parent y

9 if y == NIL

10

T.root = z

// tree T was empty

11 elseif z.key < y.key

12

y.left = z

13 else y.right = z

Figure 12.3 shows how TREE-INSERT works. Just like the

procedures TREE-SEARCH and ITERATIVE-TREE-SEARCH,

TREE-INSERT begins at the root of the tree and the pointer x traces a

simple path downward looking for a NIL to replace with the input node

z. The procedure maintains the trailing pointer y as the parent of x.

After initialization, the while loop in lines 3–7 causes these two pointers

to move down the tree, going left or right depending on the comparison

of z.key with x.key, until x becomes NIL. This NIL occupies the position where node z will go. More precisely, this NIL is a left or right attribute of the node that will become z’s parent, or it is T.root if tree T

is currently empty. The procedure needs the trailing pointer y, because

by the time it finds the NIL where z belongs, the search has proceeded

one step beyond the node that needs to be changed. Lines 8–13 set the

pointers that cause z to be inserted.

Image 441

Figure 12.3 Inserting a node with key 13 into a binary search tree. The simple path from the root down to the position where the node is inserted is shown in blue. The new node and the link to its parent are highlighted in orange.