Exercises
9.2-1
Show that RANDOMIZED-SELECT never makes a recursive call to a
0-length array.
9.2-2
Write an iterative version of RANDOMIZED-SELECT.
9.2-3
Suppose that RANDOMIZED-SELECT is used to select the minimum
element of the array A = 〈2, 3, 0, 5, 7, 9, 1, 8, 6, 4〉. Describe a sequence
of partitions that results in a worst-case performance of
RANDOMIZED-SELECT.
9.2-4
Argue that the expected running time of RANDOMIZED-SELECT
does not depend on the order of the elements in its input array A[ p : r].
That is, the expected running time is the same for any permutation of
the input array A[ p : r]. ( Hint: Argue by induction on the length n of the input array.)
9.3 Selection in worst-case linear time
We’ll now examine a remarkable and theoretically interesting selection
algorithm whose running time is Θ( n) in the worst case. Although the
RANDOMIZED-SELECT algorithm from Section 9.2 achieves linear
expected time, we saw that its running time in the worst case was
quadratic. The selection algorithm presented in this section achieves
linear time in the worst case, but it is not nearly as practical as
RANDOMIZED-SELECT. It is mostly of theoretical interest.
Like the expected linear-time RANDOMIZED-SELECT, the worst-
case linear-time algorithm SELECT finds the desired element by
recursively partitioning the input array. Unlike RANDOMIZED-
SELECT, however, SELECT guarantees a good split by choosing a
provably good pivot when partitioning the array. The cleverness in the
algorithm is that it finds the pivot recursively. Thus, there are two
invocations of SELECT: one to find a good pivot, and a second to
recursively find the desired order statistic.
The partitioning algorithm used by SELECT is like the deterministic
partitioning algorithm PARTITION from quicksort (see Section 7.1), but modified to take the element to partition around as an additional
input parameter. Like PARTITION, the PARTITION-AROUND
algorithm returns the index of the pivot. Since it’s so similar to
PARTITION, the pseudocode for PARTITION-AROUND is omitted.
The SELECT procedure takes as input a subarray A[ p : r] of n = r –
p + 1 elements and an integer i in the range 1 ≤ i ≤ n. It returns the i th smallest element of A. The pseudocode is actually more understandable
than it might appear at first.
SELECT( A, p, r, i)
1 while ( r – p + 1) mod 5 ≠ 0
2
for j = p + 1 to r
// put the minimum into A[ p]
3
if A[ p] > A[ j]
4
exchange A[ p] with A[ j]
5
// If we want the minimum of A[ p : r], we’re done.
6
if i == 1
7
return A[ p]
8
// Otherwise, we want the ( i – 1)st element of A[ p + 1 : r].
9
p = p + 1
10
i = i – 1
11 g = ( r – p + 1)/5
// number of 5-element
groups
12 for j = p to p + g – 1
// sort each group
13
sort 〈 A[ j], A[ j + g], A[ j + 2 g], A[ j + 3 g], A[ j + 4 g]〉 in place 14 // All group medians now lie in the middle fifth of A[ p : r].
15 // Find the pivot x recursively as the median of the group medians.
16 x = SELECT( A, p + 2 g, p + 3 g – 1, ⌈ g/2⌉) 17 q = PARTITION-AROUND( A, p, r,// partition around the pivot x)
18 // The rest is just like lines 3–9 of RANDOMIZED-SELECT.
19 k = q – p + 1
20 if i == k
21
return A[ q]
// the pivot value is the
answer
22 elseif i < k
23
return SELECT( A, p, q – 1, i)
24 else return SELECT( A, q + 1, r, i – k)
The pseudocode starts by executing the while loop in lines 1–10 to
reduce the number r – p + 1 of elements in the subarray until it is divisible by 5. The while loop executes 0 to 4 times, each time
rearranging the elements of A[ p : r] so that A[ p] contains the minimum element. If i = 1, which means that we actually want the minimum
element, then the procedure simply returns it in line 7. Otherwise,
SELECT eliminates the minimum from the subarray A[ p : r] and iterates
to find the ( i – 1)st element in A[ p + 1 : r]. Lines 9–10 do so by incrementing p and decrementing i. If the while loop completes all of its iterations without returning a result, the procedure executes the core of
the algorithm in lines 11–24, assured that the number r – p + 1 of elements in A[ p : r] is evenly divisible by 5.
Figure 9.3 The relationships between elements (shown as circles) immediately after line 17 of the selection algorithm SELECT. There are g = ( r – p + 1)/5 groups of 5 elements, each of which occupies a column. For example, the leftmost column contains elements A[ p], A[ p + g], A[ p +
2 g], A[ p + 3 g], A[ p + 4 g], and the next column contains A[ p + 1], A[ p + g + 1], A[ p + 2 g + 1], A[ p
+ 3 g + 1], A[ p + 4 g + 1]. The medians of the groups are red, and the pivot x is labeled. Arrows go from smaller elements to larger. The elements on the blue background are all known to be less than or equal to x and cannot fall into the high side of the partition around x. The elements on the yellow background are known to be greater than or equal to x and cannot fall into the low side of the partition around x. The pivot x belongs to both the blue and yellow regions and is shown on a green background. The elements on the white background could lie on either side of the partition.
The next part of the algorithm implements the following idea,
illustrated in Figure 9.3. Divide the elements in A[ p : r] into g = ( r – p +
1)/5 groups of 5 elements each. The first 5-element group is
〈 A[ p], A[ p + g], A[ p + 2 g], A[ p + 3 g], A[ p + 4 g]〉, the second is
〈 A[ p + 1], A[ p + g + 1], A[ p + 2 g + 1], A[ p + 3 g + 1], A[ p + 4 g + 1]〉, and so forth until the last, which is
〈 A[ p + g – 1], A[ p + 2 g – 1], A[ p + 3 g – 1], A[ p + 4 g – 1], A[ r]〉.
(Note that r = p + 5 g – 1.) Line 13 puts each group in order using, for example, insertion sort (Section 2.1), so that for j = p, p + 1, … , p + g –
1, we have
A[ j] ≤ A[ j + g] ≤ A[ j + 2 g] ≤ A[ j + 3 g] ≤ A[ j + 4 g].
Each vertical column in Figure 9.3 depicts a sorted group of 5 elements.
The median of each 5-element group is A[ j + 2 g], and thus all the 5-element medians, shown in red, lie in the range A[ p + 2 g : p + 3 g – 1].
Next, line 16 determines the pivot x by recursively calling SELECT
to find the median (specifically, the ⌈ g/2⌉th smallest) of the g group medians. Line 17 uses the modified PARTITION-AROUND algorithm
to partition the elements of A[ p : r] around x, returning the index q of x, so that A[ q] = x, elements in A[ p : q] are all at most x, and elements in A[ q : r] are greater than or equal to x.
The remainder of the code mirrors that of RANDOMIZED-
SELECT. If the pivot x is the i th largest, the procedure returns it.
Otherwise, the procedure recursively calls itself on either A[ p : q – 1] or A[ q + 1 : r], depending on the value of i.
Let’s analyze the running time of SELECT and see how the judicious
choice of the pivot x plays into a guarantee on its worst-case running
time.
Theorem 9.3
The running time of SELECT on an input of n elements is Θ( n).
Proof Define T ( n) as the worst-case time to run SELECT on any input subarray A[ p : r] of size at most n, that is, for which r – p + 1 ≤ n. By this definition, T ( n) is monotonically increasing.
We first determine an upper bound on the time spent outside the
recursive calls in lines 16, 23, and 24. The while loop in lines 1–10
executes 0 to 4 times, which is O(1) times. Since the dominant time
within the loop is the computation of the minimum in lines 2–4, which
takes Θ( n) time, lines 1–10 execute in O(1) · Θ( n) = O( n) time. The sorting of the 5-element groups in lines 12–13 takes Θ( n) time because
each 5-element group takes Θ(1) time to sort (even using an
asymptotically inefficient sorting algorithm such as insertion sort), and
there are g elements to sort, where n/5 – 1 < g ≤ n/5. Finally, the time to partition in line 17 is Θ( n), as Exercise 7.1-3 on page 187 asks you to
show. Because the remaining bookkeeping only costs Θ(1) time, the
total amount of time spent outside of the recursive calls is O( n) + Θ( n) +
Θ( n) + Θ(1) = Θ( n).
Now let’s determine the running time for the recursive calls. The
recursive call to find the pivot in line 16 takes T ( g) ≤ T ( n/5) time, since g ≤ n/5 and T ( n) monotonically increases. Of the two recursive calls in lines 23 and 24, at most one is executed. But we’ll see that no matter
which of these two recursive calls to SELECT actually executes, the
number of elements in the recursive call turns out to be at most 7 n/10,
and hence the worst-case cost for lines 23 and 24 is at most T (7 n/10).
Let’s now show that the machinations with group medians and the
choice of the pivot x as the median of the group medians guarantees this property.
Figure 9.3 helps to visualize what’s going on. There are g ≤ n/5
groups of 5 elements, with each group shown as a column sorted from
bottom to top. The arrows show the ordering of elements within the
columns. The columns are ordered from left to right with groups to the
left of x’s group having a group median less than x and those to the right of x’s group having a group median greater than x. Although the
relative order within each group matters, the relative order among
groups to the left of x’s column doesn’t really matter, and neither does
the relative order among groups to the right of x’s column. The
important thing is that the groups to the left have group medians less
than x (shown by the horizontal arrows entering x), and that the groups to the right have group medians greater than x (shown by the horizontal
arrows leaving x). Thus, the yellow region contains elements that we
know are greater than or equal to x, and the blue region contains
elements that we know are less than or equal to x.
These two regions each contain at least 3 g/2 elements. The number of
group medians in the yellow region is ⌊ g/2⌊ + 1, and for each group median, two additional elements are greater than it, making a total of
3(⌊ g/2⌊ + 1) ≥ 3 g/2 elements. Similarly, the number of group medians in the blue region is ⌈ g/2⌉, and for each group median, two additional
elements are less than it, making a total of 3 ⌈ g/2 ⌉ ≥ 3 g/2.
The elements in the yellow region cannot fall into the low side of the
partition around x, and those in the blue region cannot fall into the high side. The elements in neither region—those lying on a white
background—could fall into either side of the partition. But since the
low side of the partition excludes the elements in the yellow region, and
there are a total of 5 g elements, we know that the low side of the partition can contain at most 5 g – 3 g/2 = 7 g/2 ≤ 7 n/10 elements.
Likewise, the high side of the partition excludes the elements in the blue
region, and a similar calculation shows that it also contains at most
7 n/10 elements.
All of which leads to the following recurrence for the worst-case
running time of SELECT:
We can show that T ( n) = O( n) by substitution. 2 More specifically, we’ll prove that T ( n) ≤ cn for some suitably large constant c > 0 and all n > 0.
Substituting this inductive hypothesis into the right-hand side of
recurrence (9.1) and assuming that n ≥ 5 yields
T ( n) ≤ c( n/5) + c(7 n/10) + Θ( n)
≤ 9 cn/10 + Θ( n)
= cn – cn/10 + Θ( n)
≤ cn
if c is chosen large enough that c/10 dominates the upper-bound constant hidden by the Θ( n). In addition to this constraint, we can pick
c large enough that T ( n) ≤ cn for all n ≤ 4, which is the base case of the recursion within SELECT. The running time of SELECT is therefore
O( n) in the worst case, and because line 13 alone takes Θ( n) time, the total time is Θ( n).
▪
As in a comparison sort (see Section 8.1), SELECT and
RANDOMIZED-SELECT determine information about the relative
order of elements only by comparing elements. Recall from Chapter 8
that sorting requires Ω( n lg n) time in the comparison model, even on
average (see Problem 8-1). The linear-time sorting algorithms in
Chapter 8 make assumptions about the type of the input. In contrast, the linear-time selection algorithms in this chapter do not require any
assumptions about the input’s type, only that the elements are distinct
and can be pairwise compared according to a linear order. The
algorithms in this chapter are not subject to the Ω( n lg n) lower bound, because they manage to solve the selection problem without sorting all
the elements. Thus, solving the selection problem by sorting and
indexing, as presented in the introduction to this chapter, is
asymptotically inefficient in the comparison model.
Exercises
9.3-1
In the algorithm SELECT, the input elements are divided into groups
of 5. Show that the algorithm works in linear time if the input elements
are divided into groups of 7 instead of 5.
9.3-2
Suppose that the preprocessing in lines 1–10 of SELECT is replaced by
a base case for n ≥ n 0, where n 0 is a suitable constant; that g is chosen as
⌊ r – p + 1)/5⌊; and that the elements in A[5 g : n] belong to no group.
Show that although the recurrence for the running time becomes
messier, it still solves to Θ( n).
9.3-3
Show how to use SELECT as a subroutine to make quicksort run in
O( n lg n) time in the worst case, assuming that all elements are distinct.
Figure 9.4 Professor Olay needs to determine the position of the east-west oil pipeline that minimizes the total length of the north-south spurs.
★ 9.3-4
Suppose that an algorithm uses only comparisons to find the i th
smallest element in a set of n elements. Show that it can also find the i –
1 smaller elements and the n – i larger elements without performing any additional comparisons.
9.3-5
Show how to determine the median of a 5-element set using only 6
comparisons.
9.3-6
You have a “black-box” worst-case linear-time median subroutine. Give
a simple, linear-time algorithm that solves the selection problem for an
arbitrary order statistic.
9.3-7
Professor Olay is consulting for an oil company, which is planning a
large pipeline running east to west through an oil field of n wells. The
company wants to connect a spur pipeline from each well directly to the
main pipeline along a shortest route (either north or south), as shown in
Figure 9.4. Given the x- and y-coordinates of the wells, how should the professor pick an optimal location of the main pipeline to minimize the
total length of the spurs? Show how to determine an optimal location in linear time.
9.3-8
The k th quantiles of an n-element set are the k – 1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O( n lg k)-time algorithm to list the k th quantiles of a set.
9.3-9
Describe an O( n)-time algorithm that, given a set S of n distinct numbers and a positive integer k ≤ n, determines the k numbers in S that are closest to the median of S.
9.3-10
Let X[1 : n] and Y [1 : n] be two arrays, each containing n numbers already in sorted order. Give an O(lg n)-time algorithm to find the median of all 2 n elements in arrays X and Y. Assume that all 2 n numbers are distinct.
Problems
9-1 Largest i numbers in sorted order
You are given a set of n numbers, and you wish to find the i largest in sorted order using a comparison-based algorithm. Describe the
algorithm that implements each of the following methods with the best
asymptotic worst-case running time, and analyze the running times of
the algorithms in terms of n and i.
a. Sort the numbers, and list the i largest.
b. Build a max-priority queue from the numbers, and call EXTRACT-
MAX i times.
c. Use an order-statistic algorithm to find the i th largest number,
partition around that number, and sort the i largest numbers.
9-2 Variant of randomized selection


Professor Mendel has proposed simplifying RANDOMIZED-SELECT
by eliminating the check for whether i and k are equal. The simplified
procedure is SIMPLER-RANDOMIZED-SELECT.
SIMPLER-RANDOMIZED-SELECT( A, p, r, i)
1 if p == r
2
return A[ p]// 1 ≤ i ≤ r – p + 1 means that i = 1
3 q = RANDOMIZED-PARTITION( A, p, r)
4 k = q – p + 1
5 if i ≤ k
6
return
SIMPLER-RANDOMIZED-
SELECT( A, p, q, i)
7 else
return
SIMPLER-RANDOMIZED-
SELECT( A, q + 1, r, i – k)
a. Argue that in the worst case, SIMPLER-RANDOMIZED-SELECT
never terminates.
b. Prove that the expected running time of SIMPLER-
RANDOMIZED-SELECT is still O( n).
9-3 Weighted median
Consider n elements x 1, x 2, … , xn with positive weights w 1, w 2, … , wn such that
. The weighted (lower) median is an element xk
satisfying
and
For example, consider the following elements xi and weights wi:
i
1
2
3
4
5
6
7
xi 3
8
2
5
4
1
6
wi 0.12 0.35 0.025 0.08 0.15 0.075 0.2
For these elements, the median is x 5 = 4, but the weighted median is x 7
= 6. To see why the weighted median is x 7, observe that the elements
less than x 7 are x 1, x 3, x 4, x 5, and x 6, and the sum w 1 + w 3 + w 4 + w 5
+ w 6 = 0.45, which is less than 1/2. Furthermore, only element x 2 is greater than x 7, and w 2 = 0.35, which is no greater than 1/2.
a. Argue that the median of x 1, x 2, … , xn is the weighted median of the xi with weights wi = 1/ n for i = 1, 2, … , n.
b. Show how to compute the weighted median of n elements in O( n lg n) worst-case time using sorting.
c. Show how to compute the weighted median in Θ( n) worst-case time
using a linear-time median algorithm such as SELECT from Section
The post-office location problem is defined as follows. The input is n points p 1, p 2, … , pn with associated weights w 1, w 2, … , wn. A solution is a point p (not necessarily one of the input points) that minimizes the
sum
, where d( a, b) is the distance between points a and b.
d. Argue that the weighted median is a best solution for the one-
dimensional post-office location problem, in which points are simply
real numbers and the distance between points a and b is d( a, b) = | a –
b|.
e. Find the best solution for the two-dimensional post-office location
problem, in which the points are ( x, y) coordinate pairs and the
distance between points a = ( x 1, y 1) and b = ( x 2, y 2) is the Manhattan distance given by d( a, b) = | x 1 – x 2| + | y 1 – y 2|.
9-4 Small order statistics
Let’s denote by S( n) the worst-case number of comparisons used by SELECT to select the i th order statistic from n numbers. Although S( n)
= Θ( n), the constant hidden by the Θ-notation is rather large. When i is small relative to n, there is an algorithm that uses SELECT as a
subroutine but makes fewer comparisons in the worst case.
a. Describe an algorithm that uses Ui( n) comparisons to find the i th smallest of n elements, where
( Hint: Begin with ⌊ n/2⌊ disjoint pairwise comparisons, and recurse on
the set containing the smaller element from each pair.)
b. Show that, if i < n/2, then Ui( n) = n + O( S(2 i) lg( n/ i)).
c. Show that if i is a constant less than n/2, then Ui( n) = n + O(lg n).
d. Show that if i = n/ k for k ≥ 2, then Ui( n) = n + O( S(2 n/ k) lg k).
9-5 Alternative analysis of randomized selection
In this problem, you will use indicator random variables to analyze the
procedure RANDOMIZED-SELECT in a manner akin to our analysis
of RANDOMIZED-QUICKSORT in Section 7.4.2.
As in the quicksort analysis, we assume that all elements are distinct,
and we rename the elements of the input array A as z 1, z 2, … , zn, where zi is the i th smallest element. Thus the call RANDOMIZED-SELECT( A, 1, n, i) returns zi.
For 1 ≤ j < k ≤ n, let
Xijk = I { zj is compared with zk sometime during the execution of the algorithm to find zi}.
a. Give an exact expression for E [ Xijk]. ( Hint: Your expression may have different values, depending on the values of i, j, and k.)
b. Let Xi denote the total number of comparisons between elements of
array A when finding zi. Show that
c. Show that E [ Xi] ≤ 4 n.
d. Conclude that, assuming all elements of array A are distinct,
RANDOMIZED-SELECT runs in O( n) expected time.
9-6 Select with groups of 3
Exercise 9.3-1 asks you to show that the SELECT algorithm still runs in
linear time if the elements are divided into groups of 7. This problem
asks about dividing into groups of 3.
a. Show that SELECT runs in linear time if you divide the elements into
groups whose size is any odd constant greater than 3.
b. Show that SELECT runs in O( n lg n) time if you divide the elements into groups of size 3.
Because the bound in part (b) is just an upper bound, we do not
know whether the groups-of-3 strategy actually runs in O( n) time. But
by repeating the groups-of-3 idea on the middle group of medians, we
can pick a pivot that guarantees O( n) time. The SELECT3 algorithm on
the next page determines the i th smallest of an input array of n > 1
distinct elements.
c. Describe in English how the SELECT3 algorithm works. Include in
your description one or more suitable diagrams.
d. Show that SELECT3 runs in O( n) time in the worst case.
Chapter notes
The worst-case linear-time median-finding algorithm was devised by
Blum, Floyd, Pratt, Rivest, and Tarjan [62]. The fast randomized
version is due to Hoare [218]. Floyd and Rivest [147] have developed an improved randomized version that partitions around an element
recursively selected from a small sample of the elements.
SELECT3( A, p, r, i)
1 while ( r – p + 1) mod 9 ≠ 0
2
for j = p + 1 to r
// put the minimum into A[ p]
3
if A[ p] > A[ j]
4
exchange A[ p] with A[ j]
5
// If we want the minimum of A[ p : r], we’re done.
6
if i == 1
7
return A[ p]
8
// Otherwise, we want the ( i – 1)st element of A[ p + 1 : r].
9
p = p + 1
10
i = i – 1
11 g = ( r – p + 1)/3
// number of 3-element groups
12 for j = p to p + g – 1
// run through the groups
13
sort 〈 A[ j], A[ j + g], A[ j + 2 g]〉 in place 14 // All group medians now lie in the middle third of A[ p : r].
15 g′ = g/3
// number of 3-element
subgroups
16 for j = p + g to p + g + g′ – 1
// sort the subgroups
17
sort 〈 A[ j], A[ j + g′], A[ j + 2 g′]〉 in place 18 // All subgroup medians now lie in the middle ninth of A[ p : r].
19 // Find the pivot x recursively as the median of the subgroup
medians.
20 x = SELECT3( A, p + 4 g′, p + 5 g′ – 1, ⌈ g′/2⌉) 21 q = PARTITION-AROUND( A, p,// partition around the pivot
r, x)
22 // The rest is just like lines 19–24 of SELECT.
23 k = q – p + 1
24 if i == k
25
return A[ q]
// the pivot value is the answer
26 elseif i < k
27
return SELECT3( A, p, q – 1, i)
28 else return SELECT3( A, q + 1, r, i – k) It is still unknown exactly how many comparisons are needed to
determine the median. Bent and John [48] gave a lower bound of 2 n comparisons for median finding, and Schönhage, Paterson, and
Pippenger [397] gave an upper bound of 3 n. Dor and Zwick have improved on both of these bounds. Their upper bound [123] is slightly less than 2.95 n, and their lower bound [124] is (2 + ϵ) n, for a small positive constant ϵ, thereby improving slightly on related work by Dor
et al. [122]. Paterson [354] describes some of these results along with other related work.
Problem 9-6 was inspired by a paper by Chen and Dumitrescu [84].
1 As in the footnote on page 182, you can enforce the assumption that the numbers are distinct by converting each input value A[ i] to an ordered pair ( A[ i], i) with ( A[ i], i) < ( A[ j], j) if either A[ i] < A[ j] or A[ i] = A[ j] and i < j.
2 We could also use the Akra-Bazzi method from Section 4.7, which involves calculus, to solve this recurrence. Indeed, a similar recurrence (4.24) on page 117 was used to illustrate that method.
Sets are as fundamental to computer science as they are to mathematics.
Whereas mathematical sets are unchanging, the sets manipulated by
algorithms can grow, shrink, or otherwise change over time. We call
such sets dynamic. The next four chapters present some basic techniques
for representing finite dynamic sets and manipulating them on a
computer.
Algorithms may require several types of operations to be performed
on sets. For example, many algorithms need only the ability to insert
elements into, delete elements from, and test membership in a set. We
call a dynamic set that supports these operations a dictionary. Other algorithms require more complicated operations. For example, min-priority queues, which Chapter 6 introduced in the context of the heap data structure, support the operations of inserting an element into and
extracting the smallest element from a set. The best way to implement a
dynamic set depends upon the operations that you need to support.
Elements of a dynamic set
In a typical implementation of a dynamic set, each element is
represented by an object whose attributes can be examined and
manipulated given a pointer to the object. Some kinds of dynamic sets
assume that one of the object’s attributes is an identifying key. If the keys are all different, we can think of the dynamic set as being a set of
key values. The object may contain satellite data, which are carried
around in other object attributes but are otherwise unused by the set
implementation. It may also have attributes that are manipulated by the set operations. These attributes may contain data or pointers to other
objects in the set.
Some dynamic sets presuppose that the keys are drawn from a totally
ordered set, such as the real numbers, or the set of all words under the
usual alphabetic ordering. A total ordering allows us to define the
minimum element of the set, for example, or to speak of the next
element larger than a given element in a set.
Operations on dynamic sets
Operations on a dynamic set can be grouped into two categories:
queries, which simply return information about the set, and modifying
operations, which change the set. Here is a list of typical operations. Any
specific application will usually require only a few of these to be
implemented.
SEARCH( S, k)
A query that, given a set S and a key value k, returns a pointer x to an element in S such that x.key = k, or NIL if no such element belongs to S.
INSERT( S, x)
A modifying operation that adds the element pointed to by x to the
set S. We usually assume that any attributes in element x needed by
the set implementation have already been initialized.
DELETE( S, x)
A modifying operation that, given a pointer x to an element in the
set S, removes x from S. (Note that this operation takes a pointer to an element x, not a key value.)
MINIMUM( S) and MAXIMUM( S)
Queries on a totally ordered set S that return a pointer to the
element of S with the smallest (for MINIMUM) or largest (for
MAXIMUM) key.
SUCCESSOR( S, x)
A query that, given an element x whose key is from a totally ordered set S, returns a pointer to the next larger element in S, or NIL if x is the maximum element.
PREDECESSOR( S, x)
A query that, given an element x whose key is from a totally ordered
set S, returns a pointer to the next smaller element in S, or NIL if x is the minimum element.
In some situations, we can extend the queries SUCCESSOR and
PREDECESSOR so that they apply to sets with nondistinct keys. For a
set on n keys, the normal presumption is that a call to MINIMUM
followed by n – 1 calls to SUCCESSOR enumerates the elements in the
set in sorted order.
We usually measure the time taken to execute a set operation in
terms of the size of the set. For example, Chapter 13 describes a data structure that can support any of the operations listed above on a set of
size n in O(lg n) time.
Of course, you can always choose to implement a dynamic set with
an array. The advantage of doing so is that the algorithms for the
dynamic-set operations are simple. The downside, however, is that many
of these operations have a worst-case running time of Θ( n). If the array
is not sorted, INSERT and DELETE can take Θ(1) time, but the
remaining operations take Θ( n) time. If instead the array is maintained
in sorted order, then MINIMUM, MAXIMUM, SUCCESSOR, and
PREDECESSOR take Θ(1) time; SEARCH takes O(lg n) time if
implemented with binary search; but INSERT and DELETE take Θ( n)
time in the worst case. The data structures studied in this part improve
on the array implementation for many of the dynamic-set operations.
Overview of Part III
Chapters 10–13 describe several data structures that we can use to implement dynamic sets. We’ll use many of these data structures later to
construct efficient algorithms for a variety of problems. We already saw
another important data structure—the heap—in Chapter 6.
Chapter 10 presents the essentials of working with simple data structures such as arrays, matrices, stacks, queues, linked lists, and
rooted trees. If you have taken an introductory programming course,
then much of this material should be familiar to you.
Chapter 11 introduces hash tables, a widely used data structure supporting the dictionary operations INSERT, DELETE, and
SEARCH. In the worst case, hash tables require Θ( n) time to perform a
SEARCH operation, but the expected time for hash-table operations is
O(1). We rely on probability to analyze hash-table operations, but you
can understand how the operations work even without probability.
Binary search trees, which are covered in Chapter 12, support all the dynamic-set operations listed above. In the worst case, each operation
takes Θ( n) time on a tree with n elements. Binary search trees serve as the basis for many other data structures.
Chapter 13 introduces red-black trees, which are a variant of binary search trees. Unlike ordinary binary search trees, red-black trees are
guaranteed to perform well: operations take O(lg n) time in the worst case. A red-black tree is a balanced search tree. Chapter 18 in Part V
presents another kind of balanced search tree, called a B-tree. Although
the mechanics of red-black trees are somewhat intricate, you can glean
most of their properties from the chapter without studying the
mechanics in detail. Nevertheless, you probably will find walking
through the code to be instructive.
In this chapter, we examine the representation of dynamic sets by simple
data structures that use pointers. Although you can construct many
complex data structures using pointers, we present only the rudimentary
ones: arrays, matrices, stacks, queues, linked lists, and rooted trees.
10.1 Simple array-based data structures: arrays, matrices,
10.1.1 Arrays
We assume that, as in most programming languages, an array is stored
as a contiguous sequence of bytes in memory. If the first element of an
array has index s (for example, in an array with 1-origin indexing, s = 1), the array starts at memory address a, and each array element occupies b
bytes, then the i th element occupies bytes a + b( i – s) through a + b( i – s
+ 1) – 1. Since most of the arrays in this book are indexed starting at 1,
and a few starting at 0, we can simplify these formulas a little. When s =
1, the i th element occupies bytes a + b( i – 1) through a + bi – 1, and when s = 0, the i th element occupies bytes a + bi through a + b( i + 1) –
1. Assuming that the computer can access all memory locations in the
same amount of time (as in the RAM model described in Section 2.2), it takes constant time to access any array element, regardless of the index.
Most programming languages require each element of a particular
array to be the same size. If the elements of a given array might occupy
different numbers of bytes, then the above formulas fail to apply, since

the element size b is not a constant. In such cases, the array elements are
usually objects of varying sizes, and what actually appears in each array
element is a pointer to the object. The number of bytes occupied by a
pointer is typically the same, no matter what the pointer references, so
that to access an object in an array, the above formulas give the address
of the pointer to the object and then the pointer must be followed to
access the object itself.
Figure 10.1 Four ways to store the 2 × 3 matrix M from equation (10.1). (a) In row-major order, in a single array. (b) In column-major order, in a single array. (c) In row-major order, with one array per row (tan) and a single array (blue) of pointers to the row arrays. (d) In column-major order, with one array per column (tan) and a single array (blue) of pointers to the column arrays.
10.1.2 Matrices
We typically represent a matrix or two-dimensional array by one or
more one-dimensional arrays. The two most common ways to store a
matrix are row-major and column-major order. Let’s consider an m × n
matrix—a matrix with m rows and n columns. In row-major order, the matrix is stored row by row, and in column-major order, the matrix is stored column by column. For example, consider the 2 × 3 matrix
Row-major order stores the two rows 1 2 3 and 4 5 6, whereas column-
major order stores the three columns 1 4; 2 5; and 3 6.
Parts (a) and (b) of Figure 10.1 show how to store this matrix using a single one-dimensional array. It’s stored in row-major order in part (a)
and in column-major order in part (b). If the rows, columns, and the
single array all are indexed starting at s, then M [ i, j]—the element in row i and column j—is at array index s + ( n( i – s)) + ( j – s) with row-
major order and s + ( m( j – s)) + ( i – s) with column-major order. When s = 1, the single-array indices are n( i – 1) + j with row-major order and i
+ m( j – 1) with column-major order. When s = 0, the single-array indices are simpler: ni + j with row-major order and i + mj with column-major order. For the example matrix M with 1-origin indexing, element
M [2, 1] is stored at index 3(2 – 1) + 1 = 4 in the single array using row-
major order and at index 2 + 2(1 – 1) = 2 using column-major order.
Parts (c) and (d) of Figure 10.1 show multiple-array strategies for storing the example matrix. In part (c), each row is stored in its own
array of length n, shown in tan. Another array, with m elements, shown
in blue, points to the m row arrays. If we call the blue array A, then A[ i]
points to the array storing the entries for row i of M, and array element A[ i] [ j] stores matrix element M [ i, j]. Part (d) shows the column-major version of the multiple-array representation, with n arrays, each of
length m, representing the n columns. Matrix element M [ i, j] is stored in array element A[ j] [ i].
Single-array representations are typically more efficient on modern
machines than multiple-array representations. But multiple-array
representations can sometimes be more flexible, for example, allowing
for “ragged arrays,” in which the rows in the row-major version may
have different lengths, or symmetrically for the column-major version,
where columns may have different lengths.
Occasionally, other schemes are used to store matrices. In the block
representation, the matrix is divided into blocks, and each block is
stored contiguously. For example, a 4 × 4 matrix that is divided into 2 ×
2 blocks, such as
might be stored in a single array in the order 〈1, 2, 5, 6, 3, 4, 7, 8, 9, 10,
13, 14, 11, 12, 15, 16〉.
10.1.3 Stacks and queues
Stacks and queues are dynamic sets in which the element removed from
the set by the DELETE operation is prespecified. In a stack, the
element deleted from the set is the one most recently inserted: the stack
implements a last-in, first-out, or LIFO, policy. Similarly, in a queue, the element deleted is always the one that has been in the set for the longest
time: the queue implements a first-in, first-out, or FIFO, policy. There are several efficient ways to implement stacks and queues on a
computer. Here, you will see how to use an array with attributes to store
them.
Stacks
The INSERT operation on a stack is often called PUSH, and the
DELETE operation, which does not take an element argument, is often
called POP. These names are allusions to physical stacks, such as the
spring-loaded stacks of plates used in cafeterias. The order in which
plates are popped from the stack is the reverse of the order in which
they were pushed onto the stack, since only the top plate is accessible.
Figure 10.2 shows how to implement a stack of at most n elements with an array S[1 : n]. The stack has attributes S.top, indexing the most recently inserted element, and S.size, equaling the size n of the array.
The stack consists of elements S[1 : S.top], where S[1] is the element at the bottom of the stack and S[ S.top] is the element at the top.
Figure 10.2 An array implementation of a stack S. Stack elements appear only in the tan positions. (a) Stack S has 4 elements. The top element is 9. (b) Stack S after the calls PUSH( S, 17) and PUSH( S, 3). (c) Stack S after the call POP( S) has returned the element 3, which is the one most recently pushed. Although element 3 still appears in the array, it is no longer in the stack. The top is element 17.
When S.top = 0, the stack contains no elements and is empty. We can
test whether the stack is empty with the query operation STACK-
EMPTY. Upon an attempt to pop an empty stack, the stack underflows, which is normally an error. If S.top exceeds S.size, the stack overflows.
The procedures STACK-EMPTY, PUSH, and POP implement each
of the stack operations with just a few lines of code. Figure 10.2 shows the effects of the modifying operations PUSH and POP. Each of the
three stack operations takes O(1) time.
STACK-EMPTY( S)
1 if S.top == 0
2
return TRUE
3 else return FALSE
PUSH( S, x)
1 if S.top == S.size
2
error “overflow”
3 else S.top = S.top + 1
4
S[ S.top] = x
POP( S)
1 if STACK-EMPTY( S)
2
error “underflow”
3 else S.top = S.top – 1
4
return S[ S.top + 1]
Figure 10.3 A queue implemented using an array Q[1 : 12]. Queue elements appear only in the tan positions. (a) The queue has 5 elements, in locations Q[7 : 11]. (b) The configuration of the queue after the calls ENQUEUE( Q, 17), ENQUEUE( Q, 3), and ENQUEUE( Q, 5). (c) The configuration of the queue after the call DEQUEUE( Q) returns the key value 15 formerly at the head of the queue. The new head has key 6.
Queues
We call the INSERT operation on a queue ENQUEUE, and we call the
DELETE operation DEQUEUE. Like the stack operation POP,
DEQUEUE takes no element argument. The FIFO property of a queue
causes it to operate like a line of customers waiting for service. The
queue has a head and a tail. When an element is enqueued, it takes its
place at the tail of the queue, just as a newly arriving customer takes a
place at the end of the line. The element dequeued is always the one at
the head of the queue, like the customer at the head of the line, who has
waited the longest.
Figure 10.3 shows one way to implement a queue of at most n – 1
elements using an array Q[1 : n], with the attribute Q.size equaling the size n of the array. The queue has an attribute Q.head that indexes, or points to, its head. The attribute Q.tail indexes the next location at which a newly arriving element will be inserted into the queue. The
elements in the queue reside in locations Q.head, Q.head + 1, … , Q.tail
– 1, where we “wrap around” in the sense that location 1 immediately
follows location n in a circular order. When Q.head = Q.tail, the queue is empty. Initially, we have Q.head = Q.tail = 1. An attempt to dequeue an element from an empty queue causes the queue to underflow. When
Q.head = Q.tail + 1 or both Q.head = 1 and Q.tail = Q.size, the queue is full, and an attempt to enqueue an element causes the queue to
overflow.
In the procedures ENQUEUE and DEQUEUE, we have omitted the
error checking for underflow and overflow. (Exercise 10.1-5 asks you to
supply these checks.) Figure 10.3 shows the effects of the ENQUEUE
and DEQUEUE operations. Each operation takes O(1) time.
ENQUEUE( Q, x)
1 Q[ Q.tail] = x
2 if Q.tail == Q.size
3
Q.tail = 1
4 else Q.tail = Q.tail + 1
DEQUEUE( Q)
1 x = Q[ Q.head]
2 if Q.head == Q.size
3
Q.head = 1
4 else Q.head = Q.head + 1
5 return x
Exercises
10.1-1
Consider an m × n matrix in row-major order, where both m and n are powers of 2 and rows and columns are indexed from 0. We can represent
a row index i in binary by the lg m bits 〈 i lg m – 1, i lg m – 2, … , i 0〉 and a column index j in binary by the lg n bits 〈 j lg n – 1, j lg n – 2, … , j 0〉.
Suppose that this matrix is a 2 × 2 block matrix, where each block has
m/2 rows and n/2 columns, and it is to be represented by a single array with 0-origin indexing. Show how to construct the binary representation
of the (lg m + lg n)-bit index into the single array from the binary representations of i and j.
10.1-2
Using Figure 10.2 as a model, illustrate the result of each operation in the sequence PUSH( S, 4), PUSH( S, 1), PUSH( S, 3), POP( S), PUSH( S, 8), and POP( S) on an initially empty stack S stored in array S[1 : 6]
10.1-3
Explain how to implement two stacks in one array A[1 : n] in such a way that neither stack overflows unless the total number of elements in both
stacks together is n. The PUSH and POP operations should run in O(1)
time.
10.1-4
Using Figure 10.3 as a model, illustrate the result of each operation in the sequence ENQUEUE( Q, 4), ENQUEUE( Q, 1), ENQUEUE( Q, 3),
DEQUEUE( Q), ENQUEUE( Q, 8), and DEQUEUE( Q) on an initially
empty queue Q stored in array Q[1 : 6].
10.1-5
Rewrite ENQUEUE and DEQUEUE to detect underflow and overflow
of a queue.
10.1-6
Whereas a stack allows insertion and deletion of elements at only one
end, and a queue allows insertion at one end and deletion at the other
end, a deque (double-ended queue, pronounced like “deck”) allows
insertion and deletion at both ends. Write four O(1)-time procedures to
insert elements into and delete elements from both ends of a deque
implemented by an array.
10.1-7
Show how to implement a queue using two stacks. Analyze the running
time of the queue operations.
Show how to implement a stack using two queues. Analyze the running
time of the stack operations.
A linked list is a data structure in which the objects are arranged in a
linear order. Unlike an array, however, in which the linear order is
determined by the array indices, the order in a linked list is determined
by a pointer in each object. Since the elements of linked lists often
contain keys that can be searched for, linked lists are sometimes called
search lists. Linked lists provide a simple, flexible representation for dynamic sets, supporting (though not necessarily efficiently) all the
operations listed on page 250.
As shown in Figure 10.4, each element of a doubly linked list L is an object with an attribute key and two pointer attributes: next and prev.
The object may also contain other satellite data. Given an element x in
the list, x.next points to its successor in the linked list, and x.prev points to its predecessor. If x.prev = NIL, the element x has no predecessor and is therefore the first element, or head, of the list. If x.next = NIL, the element x has no successor and is therefore the last element, or tail, of the list. An attribute L.head points to the first element of the list. If
L.head = NIL, the list is empty.
Figure 10.4 (a) A doubly linked list L representing the dynamic set {1, 4, 9, 16}. Each element in the list is an object with attributes for the key and pointers (shown by arrows) to the next and previous objects. The next attribute of the tail and the prev attribute of the head are NIL, indicated by a diagonal slash. The attribute L.head points to the head. (b) Following the execution of LIST-PREPEND( L, x), where x.key = 25, the linked list has an object with key 25
as the new head. This new object points to the old head with key 9. (c) The result of calling LIST-INSERT( x, y), where x.key = 36 and y points to the object with key 9. (d) The result of the subsequent call LIST-DELETE( L, x), where x points to the object with key 4.
A list may have one of several forms. It may be either singly linked or
doubly linked, it may be sorted or not, and it may be circular or not. If
a list is singly linked, each element has a next pointer but not a prev pointer. If a list is sorted, the linear order of the list corresponds to the
linear order of keys stored in elements of the list. The minimum element
is then the head of the list, and the maximum element is the tail. If the
list is unsorted, the elements can appear in any order. In a circular list, the prev pointer of the head of the list points to the tail, and the next
pointer of the tail of the list points to the head. You can think of a
circular list as a ring of elements. In the remainder of this section, we
assume that the lists we are working with are unsorted and doubly
linked.
Searching a linked list
The procedure LIST-SEARCH( L, k) finds the first element with key k
in list L by a simple linear search, returning a pointer to this element. If
no object with key k appears in the list, then the procedure returns NIL.
For the linked list in Figure 10.4(a), the call LIST-SEARCH( L, 4)
returns a pointer to the third element, and the call LIST-SEARCH( L, 7) returns NIL. To search a list of n objects, the LIST-SEARCH
procedure takes Θ( n) time in the worst case, since it may have to search
the entire list.
LIST-SEARCH( L, k)
1 x = L.head
2 while x ≠ NIL and x.key ≠ k
3
x = x.next
4 return x
Inserting into a linked list
Given an element x whose key attribute has already been set, the LIST-
PREPEND procedure adds x to the front of the linked list, as shown in
Figure 10.4(b). (Recall that our attribute notation can cascade, so that L.head.prev denotes the prev attribute of the object that L.head points to.) The running time for LIST-PREPEND on a list of n elements is
O(1).
LIST-PREPEND( L, x)
1 x.next = L.head
2 x.prev = NIL
3 if L.head ≠ NIL
4
L.head.prev = x
5 L.head = x
You can insert anywhere within a linked list. As Figure 10.4(c)
shows, if you have a pointer y to an object in the list, the LIST-INSERT
procedure on the facing page “splices” a new element x into the list, immediately following y, in O(1) time. Since LIST-INSERT never
references the list object L, it is not supplied as a parameter.
LIST-INSERT( x, y)
2 x.prev = y
3 if y.next ≠ NIL
4
y.next.prev = x
5 y.next = x
Deleting from a linked list
The procedure LIST-DELETE removes an element x from a linked list
L. It must be given a pointer to x, and it then “‘splices” x out of the list by updating pointers. To delete an element with a given key, first call
LIST-SEARCH to retrieve a pointer to the element. Figure 10.4(d)
shows how an element is deleted from a linked list. LIST-DELETE runs
in O(1) time, but to delete an element with a given key, the call to LIST-
SEARCH makes the worst-case running time be Θ( n).
LIST-DELETE( L, x)
1 if x.prev ≠ NIL
2
x.prev.next = x.next
3 else L.head = x.next
4 if x.next ≠ NIL
5
x.next.prev = x.prev
Insertion and deletion are faster operations on doubly linked lists
than on arrays. If you want to insert a new first element into an array or
delete the first element in an array, maintaining the relative order of all
the existing elements, then each of the existing elements needs to be
moved by one position. In the worst case, therefore, insertion and
deletion take Θ( n) time in an array, compared with O(1) time for a doubly linked list. (Exercise 10.2-1 asks you to show that deleting an
element from a singly linked list takes Θ( n) time in the worst case.) If,
however, you want to find the k th element in the linear order, it takes
just O(1) time in an array regardless of k, but in a linked list, you’d have to traverse k elements, taking Θ( k) time.
Sentinels
The code for LIST-DELETE is simpler if you ignore the boundary
conditions at the head and tail of the list:
Figure 10.5 A circular, doubly linked list with a sentinel. The sentinel L.nil, in blue, appears between the head and tail. The attribute L.head is no longer needed, since the head of the list is L.nil.next. (a) An empty list. (b) The linked list from Figure 10.4(a), with key 9 at the head and key 1 at the tail. (c) The list after executing LIST-INSERT′ ( x, L.nil), where x.key = 25. The new object becomes the head of the list. (d) The list after deleting the object with key 1. The new tail is the object with key 4. (e) The list after executing LIST-INSERT′ ( x, y), where x.key = 36 and y points to the object with key 9.
LIST-DELETE′ ( x)
1 x.prev.next = x.next
2 x.next.prev = x.prev
A sentinel is a dummy object that allows us to simplify boundary
conditions. In a linked list L, the sentinel is an object L.nil that represents NIL but has all the attributes of the other objects in the list.
References to NIL are replaced by references to the sentinel L.nil. As shown in Figure 10.5, this change turns a regular doubly linked list into a circular, doubly linked list with a sentinel, in which the sentinel L.nil lies between the head and tail. The attribute L.nil.next points to the head of the list, and L.nil.prev points to the tail. Similarly, both the next
attribute of the tail and the prev attribute of the head point to L.nil.
Since L.nil.next points to the head, the attribute L.head is eliminated altogether, with references to it replaced by references to L.nil.next.
Figure 10.5(a) shows that an empty list consists of just the sentinel, and both L.nil.next and L.nil.prev point to L.nil.
To delete an element from the list, just use the two-line procedure
LIST-DELETE′ from before. Just as LIST-INSERT never references
the list object L, neither does LIST-DELETE′. You should never delete
the sentinel L.nil unless you are deleting the entire list!
The LIST-INSERT′ procedure inserts an element x into the list
following object y. No separate procedure for prepending is necessary:
to insert at the head of the list, let y be L.nil; and to insert at the tail, let y be L.nil.prev. Figure 10.5 shows the effects of LIST-INSERT′ and LIST-DELETE′ on a sample list.
LIST-INSERT′ ( x, y)
1 x.next = y.next
2 x.prev = y
3 y.next.prev = x
4 y.next = x
Searching a circular, doubly linked list with a sentinel has the same
asymptotic running time as without a sentinel, but it is possible to
decrease the constant factor. The test in line 2 of LIST-SEARCH makes
two comparisons: one to check whether the search has run off the end
of the list and, if not, one to check whether the key resides in the current
element x. Suppose that you know that the key is somewhere in the list.
Then you do not need to check whether the search runs off the end of
the list, thereby eliminating one comparison in each iteration of the
while loop.
The sentinel provides a place to put the key before starting the
search. The search starts at the head L.nil.next of list L, and it stops if it finds the key somewhere in the list. Now the search is guaranteed to find
the key, either in the sentinel or before reaching the sentinel. If the key is
found before reaching the sentinel, then it really is in the element where
the search stops. If, however, the search goes through all the elements in the list and finds the key only in the sentinel, then the key is not really in
the list, and the search returns NIL. The procedure LIST-SEARCH′
embodies this idea. (If your sentinel requires its key attribute to be NIL,
then you might want to assign L.nil.key = NIL before line 5.)
LIST-SEARCH′ ( L, k)
1 L.nil.key = k
// store the key in the sentinel to guarantee it is in list
2 x = L.nil.next // start at the head of the list
3 while x.key ≠ k
4
x = x.next
5 if x == L.nil
// found k in the sentinel
6
return NIL // k was not really in the list
7 else return x
// found k in element x
Sentinels often simplify code and, as in searching a linked list, they
might speed up code by a small constant factor, but they don’t typically
improve the asymptotic running time. Use them judiciously. When there
are many small lists, the extra storage used by their sentinels can
represent significant wasted memory. In this book, we use sentinels only
when they significantly simplify the code.
Exercises
10.2-1
Explain why the dynamic-set operation INSERT on a singly linked list
can be implemented in O(1) time, but the worst-case time for DELETE
is Θ( n).
10.2-2
Implement a stack using a singly linked list. The operations PUSH and
POP should still take O(1) time. Do you need to add any attributes to
the list?
10.2-3
Implement a queue using a singly linked list. The operations
ENQUEUE and DEQUEUE should still take O(1) time. Do you need
to add any attributes to the list?
10.2-4
The dynamic-set operation UNION takes two disjoint sets S 1 and S 2 as
input, and it returns a set S = S 1 ⋃ S 2 consisting of all the elements of S 1 and S 2. The sets S 1 and S 2 are usually destroyed by the operation.
Show how to support UNION in O(1) time using a suitable list data
structure.
10.2-5
Give a Θ( n)-time nonrecursive procedure that reverses a singly linked
list of n elements. The procedure should use no more than constant
storage beyond that needed for the list itself.
★ 10.2-6
Explain how to implement doubly linked lists using only one pointer
value x.np per item instead of the usual two ( next and prev). Assume that all pointer values can be interpreted as k-bit integers, and define x.np = x.next XOR x.prev, the k-bit “exclusive-or” of x.next and x.prev.
The value NIL is represented by 0. Be sure to describe what information
you need to access the head of the list. Show how to implement the
SEARCH, INSERT, and DELETE operations on such a list. Also show
how to reverse such a list in O(1) time.
10.3 Representing rooted trees
Linked lists work well for representing linear relationships, but not all
relationships are linear. In this section, we look specifically at the
problem of representing rooted trees by linked data structures. We first
look at binary trees, and then we present a method for rooted trees in
which nodes can have an arbitrary number of children.
We represent each node of a tree by an object. As with linked lists,
we assume that each node contains a key attribute. The remaining
attributes of interest are pointers to other nodes, and they vary according to the type of tree.
Binary trees
Figure 10.6 shows how to use the attributes p, left, and right to store pointers to the parent, left child, and right child of each node in a
binary tree T. If x.p = NIL, then x is the root. If node x has no left child, then x.left = NIL, and similarly for the right child. The root of
the entire tree T is pointed to by the attribute T.root. If T.root = NIL, then the tree is empty.
Rooted trees with unbounded branching
It’s simple to extend the scheme for representing a binary tree to any
class of trees in which the number of children of each node is at most
some constant k: replace the left and right attributes by child 1, child 2, …
, childk. This scheme no longer works when the number of children of a
node is unbounded, however, since we do not know how many
attributes to allocate in advance. Moreover, if k, the number of children,
is bounded by a large constant but most nodes have a small number of
children, we may waste a lot of memory.
Fortunately, there is a clever scheme to represent trees with arbitrary
numbers of children. It has the advantage of using only O( n) space for
any n-node rooted tree. The left-child, right-sibling representation appears in Figure 10.7. As before, each node contains a parent pointer p, and T.root points to the root of tree T. Instead of having a pointer to each of its children, however, each node x has only two pointers:
1. x.left-child points to the leftmost child of node x, and
2. x.right-sibling points to the sibling of x immediately to its right.
If node x has no children, then x.left-child = NIL, and if node x is the rightmost child of its parent, then x.right-sibling = NIL.

Figure 10.6 The representation of a binary tree T. Each node x has the attributes x.p (top), x.left (lower left), and x.right (lower right). The key attributes are not shown.
Figure 10.7 The left-child, right-sibling representation of a tree T. Each node x has attributes x.p (top), x.left-child (lower left), and x.right-sibling (lower right). The key attributes are not shown.
Other tree representations
We sometimes represent rooted trees in other ways. In Chapter 6, for example, we represented a heap, which is based on a complete binary
tree, by a single array along with an attribute giving the index of the last
node in the heap. The trees that appear in Chapter 19 are traversed only toward the root, and so only the parent pointers are present: there are
no pointers to children. Many other schemes are possible. Which
scheme is best depends on the application.
Exercises
10.3-1
Draw the binary tree rooted at index 6 that is represented by the
following attributes:
index key left right
1
17 8
9
2
14 NIL NIL
3
12 NIL NIL
4
20 10 NIL
5
33 2 NIL
6
15 1
4
7
28 NIL NIL
8
22 NIL NIL
9
13 3
7
10 25 NIL 5
10.3-2
Write an O( n)-time recursive procedure that, given an n-node binary tree, prints out the key of each node in the tree.
10.3-3
Write an O( n)-time nonrecursive procedure that, given an n-node binary tree, prints out the key of each node in the tree. Use a stack as an
auxiliary data structure.
10.3-4
Write an O( n)-time procedure that prints out all the keys of an arbitrary rooted tree with n nodes, where the tree is stored using the left-child, right-sibling representation.
★ 10.3-5
Write an O( n)-time nonrecursive procedure that, given an n-node binary tree, prints out the key of each node. Use no more than constant extra
space outside of the tree itself and do not modify the tree, even
temporarily, during the procedure.
★ 10.3-6
The left-child, right-sibling representation of an arbitrary rooted tree
uses three pointers in each node: left-child, right-sibling, and parent.
From any node, its parent can be accessed in constant time and all its
children can be accessed in time linear in the number of children. Show
how to use only two pointers and one boolean value in each node x so
that x’s parent or all of x’s children can be accessed in time linear in the number of x’s children.
Problems
10-1 Comparisons among lists
For each of the four types of lists in the following table, what is the
asymptotic worst-case running time for each dynamic-set operation
listed?
sorted,
sorted,
unsorted,
unsorted,
singly linked
singly
doubly
linked
doubly linked
linked
SEARCH
INSERT
DELETE
SUCCESSOR
PREDECESSOR
MINIMUM
MAXIMUM
10-2 Mergeable heaps using linked lists
A mergeable heap supports the following operations: MAKE-HEAP
(which creates an empty mergeable heap), INSERT, MINIMUM,
EXTRACT-MIN, and UNION.1 Show how to implement mergeable
heaps using linked lists in each of the following cases. Try to make each
operation as efficient as possible. Analyze the running time of each
operation in terms of the size of the dynamic set(s) being operated on.
a. Lists are sorted.
b. Lists are unsorted.
c. Lists are unsorted, and dynamic sets to be merged are disjoint.
10-3 Searching a sorted compact list
We can represent a singly linked list with two arrays, key and next.
Given the index i of an element, its value is stored in key[ i], and the index of its successor is given by next[ i], where next[ i] = NIL for the last element. We also need the index head of the first element in the list. An
n-element list stored in this way is compact if it is stored only in positions 1 through n of the key and next arrays.
Let’s assume that all keys are distinct and that the compact list is
also sorted, that is, key[ i] < key[ next[ i]] for all i = 1, 2, … , n such that next[ i] ≠ NIL. Under these assumptions, you will show that the randomized algorithm COMPACT-LIST-SEARCH searches the list for
key k in
expected time.
COMPACT-LIST-SEARCH( key, next, head, n, k)
1 i = head
2 while i ≠ NIL and key[ i] < k
3
j = RANDOM(1, n)
4
if key[ i] < key[ j] and key[ j] ≤ k 5
i = j
if key[ i] == k
7
return i
8
i = next[ i]
9 if i == NIL or key[ i] > k
10
return NIL
11 else return i
If you ignore lines 3–7 of the procedure, you can see that it’s an
ordinary algorithm for searching a sorted linked list, in which index i
points to each position of the list in turn. The search terminates once
the index i “falls off” the end of the list or once key[ i] ≥ k. In the latter case, if key[ i] = k, the procedure has found a key with the value k. If, however, key[ i] > k, then the search will never find a key with the value k, so that terminating the search was the correct action.
Lines 3–7 attempt to skip ahead to a randomly chosen position j.
Such a skip helps if key[ j] is larger than key[ i] and no larger than k. In such a case, j marks a position in the list that i would reach during an ordinary list search. Because the list is compact, we know that any
choice of j between 1 and n indexes some element in the list.
Instead of analyzing the performance of COMPACT-LIST-
SEARCH directly, you will analyze a related algorithm, COMPACT-
LIST-SEARCH, which executes two separate loops. This algorithm
takes an additional parameter t, which specifies an upper bound on the
number of iterations of the first loop.
COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) 1 i = head
2 for q = 1 to t
3
j = RANDOM(1, n)
4
if key[ i] < key[ j] and key[ j] ≤ k 5
i = j
6
if key[ i] == k
7
return i
8 while i ≠ NIL and key[ i] < k
9
i = next[ i]


10 if i == NIL or key[ i] > k
11
return NIL
12 else return i
To compare the execution of the two algorithms, assume that the
sequence of calls of RANDOM(1, n) yields the same sequence of
integers for both algorithms.
a. Argue that for any value of t, COMPACT-LIST-SEARCH( key, next, head, n, k) and COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) return the same result and that the number of iterations of the while
loop of lines 2–8 in COMPACT-LIST-SEARCH is at most the total
number of iterations of both the for and while loops in COMPACT-
LIST-SEARCH′.
In the call COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t), let Xt be the random variable that describes the distance in the linked list (that
is, through the chain of next pointers) from position i to the desired key k after t iterations of the for loop of lines 2–7 have occurred.
b. Argue that COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) has an expected running time of O( t + E [ Xt]).
c. Show that
. ( Hint: Use equation (C.28) on page
1193.)
d. Show that
. ( Hint: Use inequality (A.18) on page
1150.)
e. Prove that E [ Xt] ≤ n/( t + 1).
f. Show that COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) has an expected running time of O( t + n/ t).
g. Conclude that COMPACT-LIST-SEARCH runs in
expected
time.
h. Why do we assume that all keys are distinct in COMPACT-LIST-
SEARCH? Argue that random skips do not necessarily help
asymptotically when the list contains repeated key values.
Chapter notes
Aho, Hopcroft, and Ullman [6] and Knuth [259] are excellent references for elementary data structures. Many other texts cover both basic data
structures and their implementation in a particular programming
language. Examples of these types of textbooks include Goodrich and
Tamassia [196], Main [311], Shaffer [406], and Weiss [452, 453, 454]. The book by Gonnet and Baeza-Yates [193] provides experimental data on the performance of many data-structure operations.
The origin of stacks and queues as data structures in computer
science is unclear, since corresponding notions already existed in
mathematics and paper-based business practices before the introduction
of digital computers. Knuth [259] cites A. M. Turing for the development of stacks for subroutine linkage in 1947.
Pointer-based data structures also seem to be a folk invention.
According to Knuth, pointers were apparently used in early computers
with drum memories. The A-1 language developed by G. M. Hopper in
1951 represented algebraic formulas as binary trees. Knuth credits the
IPL-II language, developed in 1956 by A. Newell, J. C. Shaw, and H. A.
Simon, for recognizing the importance and promoting the use of
pointers. Their IPL-III language, developed in 1957, included explicit
stack operations.
1 Because we have defined a mergeable heap to support MINIMUM and EXTRACT-MIN, we can also refer to it as a mergeable min-heap. Alternatively, if it supports MAXIMUM and EXTRACT-MAX, it is a mergeable max-heap.
Many applications require a dynamic set that supports only the
dictionary operations INSERT, SEARCH, and DELETE. For example,
a compiler that translates a programming language maintains a symbol
table, in which the keys of elements are arbitrary character strings
corresponding to identifiers in the language. A hash table is an effective
data structure for implementing dictionaries. Although searching for an
element in a hash table can take as long as searching for an element in a
linked list—Θ( n) time in the worst case—in practice, hashing performs
extremely well. Under reasonable assumptions, the average time to
search for an element in a hash table is O(1). Indeed, the built-in dictionaries of Python are implemented with hash tables.
A hash table generalizes the simpler notion of an ordinary array.
Directly addressing into an ordinary array takes advantage of the O(1)
access time for any array element. Section 11.1 discusses direct addressing in more detail. To use direct addressing, you must be able to
allocate an array that contains a position for every possible key.
When the number of keys actually stored is small relative to the total
number of possible keys, hash tables become an effective alternative to
directly addressing an array, since a hash table typically uses an array of
size proportional to the number of keys actually stored. Instead of using
the key as an array index directly, we compute the array index from the
key. Section 11.2 presents the main ideas, focusing on “chaining” as a way to handle “collisions,” in which more than one key maps to the
same array index. Section 11.3 describes how to compute array indices from keys using hash functions. We present and analyze several
variations on the basic theme. Section 11.4 looks at “open addressing,”
which is another way to deal with collisions. The bottom line is that
hashing is an extremely effective and practical technique: the basic
dictionary operations require only O(1) time on the average. Section
11.5 discusses the hierarchical memory systems of modern computer
systems have and illustrates how to design hash tables that work well in
such systems.
Direct addressing is a simple technique that works well when the
universe U of keys is reasonably small. Suppose that an application
needs a dynamic set in which each element has a distinct key drawn
from the universe U = {0, 1, …, m − 1}, where m is not too large.
To represent the dynamic set, you can use an array, or direct-address
table, denoted by T[0 : m − 1], in which each position, or slot, corresponds to a key in the universe U. Figure 11.1 illustrates this approach. Slot k points to an element in the set with key k. If the set contains no element with key k, then T[ k] = NIL.
The dictionary operations DIRECT-ADDRESS-SEARCH,
DIRECT-ADDRESS-INSERT, and DIRECT-ADDRESS-DELETE
on the following page are trivial to implement. Each takes only O(1) time.
For some applications, the direct-address table itself can hold the
elements in the dynamic set. That is, rather than storing an element’s
key and satellite data in an object external to the direct-address table,
with a pointer from a slot in the table to the object, save space by
storing the object directly in the slot. To indicate an empty slot, use a
special key. Then again, why store the key of the object at all? The index
of the object is its key! Of course, then you’d need some way to tell whether slots are empty.
Figure 11.1 How to implement a dynamic set by a direct-address table T. Each key in the universe U = {0, 1, …, 9} corresponds to an index into the table. The set K = {2, 3, 5, 8} of actual keys determines the slots in the table that contain pointers to elements. The other slots, in blue, contain NIL.
DIRECT-ADDRESS-SEARCH( T, k)
1return T[ k]
DIRECT-ADDRESS-INSERT( T, x)
1 T[ x.key] = x
DIRECT-ADDRESS-DELETE( T, x)
1 T[ x.key] = NIL
Exercises
11.1-1
A dynamic set S is represented by a direct-address table T of length m.
Describe a procedure that finds the maximum element of S. What is the
worst-case performance of your procedure?
11.1-2
A bit vector is simply an array of bits (each either 0 or 1). A bit vector of
length m takes much less space than an array of m pointers. Describe
how to use a bit vector to represent a dynamic set of distinct elements drawn from the set {0, 1, …, m − 1} and with no satellite data.
Dictionary operations should run in O(1) time.
11.1-3
Suggest how to implement a direct-address table in which the keys of
stored elements do not need to be distinct and the elements can have
satellite data. All three dictionary operations (INSERT, DELETE, and
SEARCH) should run in O(1) time. (Don’t forget that DELETE takes
as an argument a pointer to an object to be deleted, not a key.)
★ 11.1-4
Suppose that you want to implement a dictionary by using direct
addressing on a huge array. That is, if the array size is m and the dictionary contains at most n elements at any one time, then m ≫ n. At the start, the array entries may contain garbage, and initializing the
entire array is impractical because of its size. Describe a scheme for
implementing a direct-address dictionary on a huge array. Each stored
object should use O(1) space; the operations SEARCH, INSERT, and
DELETE should take O(1) time each; and initializing the data structure
should take O(1) time. ( Hint: Use an additional array, treated somewhat like a stack whose size is the number of keys actually stored in the
dictionary, to help determine whether a given entry in the huge array is
valid or not.)
The downside of direct addressing is apparent: if the universe U is large
or infinite, storing a table T of size | U| may be impractical, or even impossible, given the memory available on a typical computer.
Furthermore, the set K of keys actually stored may be so small relative to U that most of the space allocated for T would be wasted.
When the set K of keys stored in a dictionary is much smaller than
the universe U of all possible keys, a hash table requires much less storage than a direct-address table. Specifically, the storage requirement
reduces to Θ(| K|) while maintaining the benefit that searching for an element in the hash table still requires only O(1) time. The catch is that
this bound is for the average-case time, 1 whereas for direct addressing it holds for the worst-case time.
With direct addressing, an element with key k is stored in slot k, but
with hashing, we use a hash function h to compute the slot number from
the key k, so that the element goes into slot h( k). The hash function h maps the universe U of keys into the slots of a hash table T[0 : m − 1]: h : U → {0, 1, …, m − 1},
where the size m of the hash table is typically much less than | U|. We say that an element with key k hashes to slot h( k), and we also say that h( k) is the hash value of key k. Figure 11.2 illustrates the basic idea. The hash function reduces the range of array indices and hence the size of the
array. Instead of a size of | U|, the array can have size m. An example of a simple, but not particularly good, hash function is h( k) = k mod m.
There is one hitch, namely that two keys may hash to the same slot.
We call this situation a collision. Fortunately, there are effective
techniques for resolving the conflict created by collisions.
Of course, the ideal solution is to avoid collisions altogether. We
might try to achieve this goal by choosing a suitable hash function h.
One idea is to make h appear to be “random,” thus avoiding collisions
or at least minimizing their number. The very term “to hash,” evoking
images of random mixing and chopping, captures the spirit of this
approach. (Of course, a hash function h must be deterministic in that a
given input k must always produce the same output h( k).) Because | U| > m, however, there must be at least two keys that have the same hash value, and avoiding collisions altogether is impossible. Thus, although a
well-designed, “random”-looking hash function can reduce the number
of collisions, we still need a method for resolving the collisions that do
occur.
Figure 11.2 Using a hash function h to map keys to hash-table slots. Because keys k 2 and k 5
map to the same slot, they collide.
The remainder of this section first presents a definition of
“independent uniform hashing,” which captures the simplest notion of
what it means for a hash function to be “random.” It then presents and
analyzes the simplest collision resolution technique, called chaining.
Section 11.4 introduces an alternative method for resolving collisions, called open addressing.
Independent uniform hashing
An “ideal” hashing function h would have, for each possible input k in
the domain U, an output h( k) that is an element randomly and independently chosen uniformly from the range {0, 1, …, m − 1}. Once
a value h( k) is randomly chosen, each subsequent call to h with the same input k yields the same output h( k).
We call such an ideal hash function an independent uniform hash
function. Such a function is also often called a random oracle [43]. When hash tables are implemented with an independent uniform hash
function, we say we are using independent uniform hashing.
Independent uniform hashing is an ideal theoretical abstraction, but
it is not something that can reasonably be implemented in practice.
Nonetheless, we’ll analyze the efficiency of hashing under the
assumption of independent uniform hashing and then present ways of
achieving useful practical approximations to this ideal.
Figure 11.3 Collision resolution by chaining. Each nonempty hash-table slot T[ j] points to a linked list of all the keys whose hash value is j. For example, h( k 1) = h( k 4) and h( k 5) = h( k 2) =
h( k 7). The list can be either singly or doubly linked. We show it as doubly linked because deletion may be faster that way when the deletion procedure knows which list element (not just which key) is to be deleted.
Collision resolution by chaining
At a high level, you can think of hashing with chaining as a
nonrecursive form of divide-and-conquer: the input set of n elements is
divided randomly into m subsets, each of approximate size n/ m. A hash function determines which subset an element belongs to. Each subset is
managed independently as a list.
Figure 11.3 shows the idea behind chaining: each nonempty slot points to a linked list, and all the elements that hash to the same slot go
into that slot’s linked list. Slot j contains a pointer to the head of the list of all stored elements with hash value j. If there are no such elements,
then slot j contains NIL.
When collisions are resolved by chaining, the dictionary operations
are straightforward to implement. They appear on the next page and
use the linked-list procedures from Section 10.2. The worst-case running time for insertion is O(1). The insertion procedure is fast in part because
it assumes that the element x being inserted is not already present in the table. To enforce this assumption, you can search (at additional cost) for
an element whose key is x.key before inserting. For searching, the worst-
case running time is proportional to the length of the list. (We’ll analyze
this operation more closely below.) Deletion takes O(1) time if the lists
are doubly linked, as in Figure 11.3. (Since CHAINED-HASH-
DELETE takes as input an element x and not its key k, no search is needed. If the hash table supports deletion, then its linked lists should
be doubly linked in order to delete an item quickly. If the lists were only
singly linked, then by Exercise 10.2-1, deletion could take time
proportional to the length of the list. With singly linked lists, both
deletion and searching would have the same asymptotic running times.)
CHAINED-HASH-INSERT( T, x)
1 LIST-PREPEND( T[ h( x.key)], x)
CHAINED-HASH-SEARCH( T, k)
1 return LIST-SEARCH( T[ h( k)], k)
CHAINED-HASH-DELETE( T, x)
1 LIST-DELETE( T[ h( x.key)], x)
Analysis of hashing with chaining
How well does hashing with chaining perform? In particular, how long
does it take to search for an element with a given key?
Given a hash table T with m slots that stores n elements, we define the load factor α for T as n/ m, that is, the average number of elements stored in a chain. Our analysis will be in terms of α, which can be less
than, equal to, or greater than 1.
The worst-case behavior of hashing with chaining is terrible: all n
keys hash to the same slot, creating a list of length n. The worst-case time for searching is thus Θ( n) plus the time to compute the hash
function—no better than using one linked list for all the elements. We
clearly don’t use hash tables for their worst-case performance.
The average-case performance of hashing depends on how well the
hash function h distributes the set of keys to be stored among the m slots, on the average (meaning with respect to the distribution of keys to
be hashed and with respect to the choice of hash function, if this choice
is randomized). Section 11.3 discusses these issues, but for now we assume that any given element is equally likely to hash into any of the m
slots. That is, the hash function is uniform. We further assume that where a given element hashes to is independent of where any other
elements hash to. In other words, we assume that we are using
independent uniform hashing.
Because hashes of distinct keys are assumed to be independent,
independent uniform hashing is universal: the chance that any two
distinct keys k 1 and k 2 collide is at most 1/ m. Universality is important in our analysis and also in the specification of universal families of hash
functions, which we’ll see in Section 11.3.2.
For j = 0, 1, …, m − 1, denote the length of the list T[ j] by nj, so that and the expected value of nj is E[ nj] = α = n/ m.
We assume that O(1) time suffices to compute the hash value h( k), so that the time required to search for an element with key k depends
linearly on the length nh( k) of the list T[ h( k)]. Setting aside the O(1) time required to compute the hash function and to access slot h( k), we’ll consider the expected number of elements examined by the search
algorithm, that is, the number of elements in the list T[ h( k)] that the algorithm checks to see whether any have a key equal to k. We consider
two cases. In the first, the search is unsuccessful: no element in the table
has key k. In the second, the search successfully finds an element with
key k.
Theorem 11.1
In a hash table in which collisions are resolved by chaining, an
unsuccessful search takes Θ(1 + α) time on average, under the
assumption of independent uniform hashing.
Proof Under the assumption of independent uniform hashing, any key k not already stored in the table is equally likely to hash to any of the m slots. The expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[ h( k)], which has expected length E[ nh( k)] = α. Thus, the expected number of elements examined in an unsuccessful search is α, and the total time required (including the
time for computing h( k)) is Θ(1 + α).
▪
The situation for a successful search is slightly different. An
unsuccessful search is equally likely to go to any slot of the hash table. A
successful search, however, cannot go to an empty slot, since it is for an
element that is present in one of the linked lists. We assume that the