Exercises

9.2-1

Show that RANDOMIZED-SELECT never makes a recursive call to a

0-length array.

9.2-2

Write an iterative version of RANDOMIZED-SELECT.

9.2-3

Suppose that RANDOMIZED-SELECT is used to select the minimum

element of the array A = 〈2, 3, 0, 5, 7, 9, 1, 8, 6, 4〉. Describe a sequence

of partitions that results in a worst-case performance of

RANDOMIZED-SELECT.

9.2-4

Argue that the expected running time of RANDOMIZED-SELECT

does not depend on the order of the elements in its input array A[ p : r].

That is, the expected running time is the same for any permutation of

the input array A[ p : r]. ( Hint: Argue by induction on the length n of the input array.)

9.3 Selection in worst-case linear time

We’ll now examine a remarkable and theoretically interesting selection

algorithm whose running time is Θ( n) in the worst case. Although the

RANDOMIZED-SELECT algorithm from Section 9.2 achieves linear

expected time, we saw that its running time in the worst case was

quadratic. The selection algorithm presented in this section achieves

linear time in the worst case, but it is not nearly as practical as

RANDOMIZED-SELECT. It is mostly of theoretical interest.

Like the expected linear-time RANDOMIZED-SELECT, the worst-

case linear-time algorithm SELECT finds the desired element by

recursively partitioning the input array. Unlike RANDOMIZED-

SELECT, however, SELECT guarantees a good split by choosing a

provably good pivot when partitioning the array. The cleverness in the

algorithm is that it finds the pivot recursively. Thus, there are two

invocations of SELECT: one to find a good pivot, and a second to

recursively find the desired order statistic.

The partitioning algorithm used by SELECT is like the deterministic

partitioning algorithm PARTITION from quicksort (see Section 7.1), but modified to take the element to partition around as an additional

input parameter. Like PARTITION, the PARTITION-AROUND

algorithm returns the index of the pivot. Since it’s so similar to

PARTITION, the pseudocode for PARTITION-AROUND is omitted.

The SELECT procedure takes as input a subarray A[ p : r] of n = r

p + 1 elements and an integer i in the range 1 ≤ in. It returns the i th smallest element of A. The pseudocode is actually more understandable

than it might appear at first.

SELECT( A, p, r, i)

1 while ( rp + 1) mod 5 ≠ 0

2

for j = p + 1 to r

// put the minimum into A[ p]

3

if A[ p] > A[ j]

4

exchange A[ p] with A[ j]

5

// If we want the minimum of A[ p : r], we’re done.

6

if i == 1

7

return A[ p]

8

// Otherwise, we want the ( i – 1)st element of A[ p + 1 : r].

9

p = p + 1

10

i = i – 1

11 g = ( rp + 1)/5

// number of 5-element

groups

12 for j = p to p + g – 1

// sort each group

13

sort 〈 A[ j], A[ j + g], A[ j + 2 g], A[ j + 3 g], A[ j + 4 g]〉 in place 14 // All group medians now lie in the middle fifth of A[ p : r].

15 // Find the pivot x recursively as the median of the group medians.

16 x = SELECT( A, p + 2 g, p + 3 g – 1, ⌈ g/2⌉) 17 q = PARTITION-AROUND( A, p, r,// partition around the pivot x)

18 // The rest is just like lines 3–9 of RANDOMIZED-SELECT.

19 k = qp + 1

20 if i == k

21

return A[ q]

// the pivot value is the

answer

22 elseif i < k

23

return SELECT( A, p, q – 1, i)

24 else return SELECT( A, q + 1, r, ik)

The pseudocode starts by executing the while loop in lines 1–10 to

reduce the number rp + 1 of elements in the subarray until it is divisible by 5. The while loop executes 0 to 4 times, each time

rearranging the elements of A[ p : r] so that A[ p] contains the minimum element. If i = 1, which means that we actually want the minimum

element, then the procedure simply returns it in line 7. Otherwise,

SELECT eliminates the minimum from the subarray A[ p : r] and iterates

Image 372

to find the ( i – 1)st element in A[ p + 1 : r]. Lines 9–10 do so by incrementing p and decrementing i. If the while loop completes all of its iterations without returning a result, the procedure executes the core of

the algorithm in lines 11–24, assured that the number rp + 1 of elements in A[ p : r] is evenly divisible by 5.

Figure 9.3 The relationships between elements (shown as circles) immediately after line 17 of the selection algorithm SELECT. There are g = ( rp + 1)/5 groups of 5 elements, each of which occupies a column. For example, the leftmost column contains elements A[ p], A[ p + g], A[ p +

2 g], A[ p + 3 g], A[ p + 4 g], and the next column contains A[ p + 1], A[ p + g + 1], A[ p + 2 g + 1], A[ p

+ 3 g + 1], A[ p + 4 g + 1]. The medians of the groups are red, and the pivot x is labeled. Arrows go from smaller elements to larger. The elements on the blue background are all known to be less than or equal to x and cannot fall into the high side of the partition around x. The elements on the yellow background are known to be greater than or equal to x and cannot fall into the low side of the partition around x. The pivot x belongs to both the blue and yellow regions and is shown on a green background. The elements on the white background could lie on either side of the partition.

The next part of the algorithm implements the following idea,

illustrated in Figure 9.3. Divide the elements in A[ p : r] into g = ( rp +

1)/5 groups of 5 elements each. The first 5-element group is

A[ p], A[ p + g], A[ p + 2 g], A[ p + 3 g], A[ p + 4 g]〉, the second is

A[ p + 1], A[ p + g + 1], A[ p + 2 g + 1], A[ p + 3 g + 1], A[ p + 4 g + 1]〉, and so forth until the last, which is

A[ p + g – 1], A[ p + 2 g – 1], A[ p + 3 g – 1], A[ p + 4 g – 1], A[ r]〉.

(Note that r = p + 5 g – 1.) Line 13 puts each group in order using, for example, insertion sort (Section 2.1), so that for j = p, p + 1, … , p + g

1, we have

A[ j] ≤ A[ j + g] ≤ A[ j + 2 g] ≤ A[ j + 3 g] ≤ A[ j + 4 g].

Each vertical column in Figure 9.3 depicts a sorted group of 5 elements.

The median of each 5-element group is A[ j + 2 g], and thus all the 5-element medians, shown in red, lie in the range A[ p + 2 g : p + 3 g – 1].

Next, line 16 determines the pivot x by recursively calling SELECT

to find the median (specifically, the ⌈ g/2⌉th smallest) of the g group medians. Line 17 uses the modified PARTITION-AROUND algorithm

to partition the elements of A[ p : r] around x, returning the index q of x, so that A[ q] = x, elements in A[ p : q] are all at most x, and elements in A[ q : r] are greater than or equal to x.

The remainder of the code mirrors that of RANDOMIZED-

SELECT. If the pivot x is the i th largest, the procedure returns it.

Otherwise, the procedure recursively calls itself on either A[ p : q – 1] or A[ q + 1 : r], depending on the value of i.

Let’s analyze the running time of SELECT and see how the judicious

choice of the pivot x plays into a guarantee on its worst-case running

time.

Theorem 9.3

The running time of SELECT on an input of n elements is Θ( n).

Proof Define T ( n) as the worst-case time to run SELECT on any input subarray A[ p : r] of size at most n, that is, for which rp + 1 ≤ n. By this definition, T ( n) is monotonically increasing.

We first determine an upper bound on the time spent outside the

recursive calls in lines 16, 23, and 24. The while loop in lines 1–10

executes 0 to 4 times, which is O(1) times. Since the dominant time

within the loop is the computation of the minimum in lines 2–4, which

takes Θ( n) time, lines 1–10 execute in O(1) · Θ( n) = O( n) time. The sorting of the 5-element groups in lines 12–13 takes Θ( n) time because

each 5-element group takes Θ(1) time to sort (even using an

asymptotically inefficient sorting algorithm such as insertion sort), and

there are g elements to sort, where n/5 – 1 < gn/5. Finally, the time to partition in line 17 is Θ( n), as Exercise 7.1-3 on page 187 asks you to

show. Because the remaining bookkeeping only costs Θ(1) time, the

total amount of time spent outside of the recursive calls is O( n) + Θ( n) +

Θ( n) + Θ(1) = Θ( n).

Now let’s determine the running time for the recursive calls. The

recursive call to find the pivot in line 16 takes T ( g) ≤ T ( n/5) time, since gn/5 and T ( n) monotonically increases. Of the two recursive calls in lines 23 and 24, at most one is executed. But we’ll see that no matter

which of these two recursive calls to SELECT actually executes, the

number of elements in the recursive call turns out to be at most 7 n/10,

and hence the worst-case cost for lines 23 and 24 is at most T (7 n/10).

Let’s now show that the machinations with group medians and the

choice of the pivot x as the median of the group medians guarantees this property.

Figure 9.3 helps to visualize what’s going on. There are gn/5

groups of 5 elements, with each group shown as a column sorted from

bottom to top. The arrows show the ordering of elements within the

columns. The columns are ordered from left to right with groups to the

left of x’s group having a group median less than x and those to the right of x’s group having a group median greater than x. Although the

relative order within each group matters, the relative order among

groups to the left of x’s column doesn’t really matter, and neither does

the relative order among groups to the right of x’s column. The

important thing is that the groups to the left have group medians less

than x (shown by the horizontal arrows entering x), and that the groups to the right have group medians greater than x (shown by the horizontal

arrows leaving x). Thus, the yellow region contains elements that we

know are greater than or equal to x, and the blue region contains

elements that we know are less than or equal to x.

Image 373

These two regions each contain at least 3 g/2 elements. The number of

group medians in the yellow region is ⌊ g/2⌊ + 1, and for each group median, two additional elements are greater than it, making a total of

3(⌊ g/2⌊ + 1) ≥ 3 g/2 elements. Similarly, the number of group medians in the blue region is ⌈ g/2⌉, and for each group median, two additional

elements are less than it, making a total of 3 ⌈ g/2 ⌉ ≥ 3 g/2.

The elements in the yellow region cannot fall into the low side of the

partition around x, and those in the blue region cannot fall into the high side. The elements in neither region—those lying on a white

background—could fall into either side of the partition. But since the

low side of the partition excludes the elements in the yellow region, and

there are a total of 5 g elements, we know that the low side of the partition can contain at most 5 g – 3 g/2 = 7 g/2 ≤ 7 n/10 elements.

Likewise, the high side of the partition excludes the elements in the blue

region, and a similar calculation shows that it also contains at most

7 n/10 elements.

All of which leads to the following recurrence for the worst-case

running time of SELECT:

We can show that T ( n) = O( n) by substitution. 2 More specifically, we’ll prove that T ( n) ≤ cn for some suitably large constant c > 0 and all n > 0.

Substituting this inductive hypothesis into the right-hand side of

recurrence (9.1) and assuming that n ≥ 5 yields

T ( n) ≤ c( n/5) + c(7 n/10) + Θ( n)

≤ 9 cn/10 + Θ( n)

= cncn/10 + Θ( n)

cn

if c is chosen large enough that c/10 dominates the upper-bound constant hidden by the Θ( n). In addition to this constraint, we can pick

c large enough that T ( n) ≤ cn for all n ≤ 4, which is the base case of the recursion within SELECT. The running time of SELECT is therefore

O( n) in the worst case, and because line 13 alone takes Θ( n) time, the total time is Θ( n).

As in a comparison sort (see Section 8.1), SELECT and

RANDOMIZED-SELECT determine information about the relative

order of elements only by comparing elements. Recall from Chapter 8

that sorting requires Ω( n lg n) time in the comparison model, even on

average (see Problem 8-1). The linear-time sorting algorithms in

Chapter 8 make assumptions about the type of the input. In contrast, the linear-time selection algorithms in this chapter do not require any

assumptions about the input’s type, only that the elements are distinct

and can be pairwise compared according to a linear order. The

algorithms in this chapter are not subject to the Ω( n lg n) lower bound, because they manage to solve the selection problem without sorting all

the elements. Thus, solving the selection problem by sorting and

indexing, as presented in the introduction to this chapter, is

asymptotically inefficient in the comparison model.

Exercises

9.3-1

In the algorithm SELECT, the input elements are divided into groups

of 5. Show that the algorithm works in linear time if the input elements

are divided into groups of 7 instead of 5.

9.3-2

Suppose that the preprocessing in lines 1–10 of SELECT is replaced by

a base case for nn 0, where n 0 is a suitable constant; that g is chosen as

rp + 1)/5⌊; and that the elements in A[5 g : n] belong to no group.

Show that although the recurrence for the running time becomes

messier, it still solves to Θ( n).

9.3-3

Show how to use SELECT as a subroutine to make quicksort run in

O( n lg n) time in the worst case, assuming that all elements are distinct.

Image 374

Figure 9.4 Professor Olay needs to determine the position of the east-west oil pipeline that minimizes the total length of the north-south spurs.

9.3-4

Suppose that an algorithm uses only comparisons to find the i th

smallest element in a set of n elements. Show that it can also find the i

1 smaller elements and the ni larger elements without performing any additional comparisons.

9.3-5

Show how to determine the median of a 5-element set using only 6

comparisons.

9.3-6

You have a “black-box” worst-case linear-time median subroutine. Give

a simple, linear-time algorithm that solves the selection problem for an

arbitrary order statistic.

9.3-7

Professor Olay is consulting for an oil company, which is planning a

large pipeline running east to west through an oil field of n wells. The

company wants to connect a spur pipeline from each well directly to the

main pipeline along a shortest route (either north or south), as shown in

Figure 9.4. Given the x- and y-coordinates of the wells, how should the professor pick an optimal location of the main pipeline to minimize the

total length of the spurs? Show how to determine an optimal location in linear time.

9.3-8

The k th quantiles of an n-element set are the k – 1 order statistics that divide the sorted set into k equal-sized sets (to within 1). Give an O( n lg k)-time algorithm to list the k th quantiles of a set.

9.3-9

Describe an O( n)-time algorithm that, given a set S of n distinct numbers and a positive integer kn, determines the k numbers in S that are closest to the median of S.

9.3-10

Let X[1 : n] and Y [1 : n] be two arrays, each containing n numbers already in sorted order. Give an O(lg n)-time algorithm to find the median of all 2 n elements in arrays X and Y. Assume that all 2 n numbers are distinct.

Problems

9-1 Largest i numbers in sorted order

You are given a set of n numbers, and you wish to find the i largest in sorted order using a comparison-based algorithm. Describe the

algorithm that implements each of the following methods with the best

asymptotic worst-case running time, and analyze the running times of

the algorithms in terms of n and i.

a. Sort the numbers, and list the i largest.

b. Build a max-priority queue from the numbers, and call EXTRACT-

MAX i times.

c. Use an order-statistic algorithm to find the i th largest number,

partition around that number, and sort the i largest numbers.

9-2 Variant of randomized selection

Image 375

Image 376

Image 377

Professor Mendel has proposed simplifying RANDOMIZED-SELECT

by eliminating the check for whether i and k are equal. The simplified

procedure is SIMPLER-RANDOMIZED-SELECT.

SIMPLER-RANDOMIZED-SELECT( A, p, r, i)

1 if p == r

2

return A[ p]// 1 ≤ irp + 1 means that i = 1

3 q = RANDOMIZED-PARTITION( A, p, r)

4 k = qp + 1

5 if ik

6

return

SIMPLER-RANDOMIZED-

SELECT( A, p, q, i)

7 else

return

SIMPLER-RANDOMIZED-

SELECT( A, q + 1, r, ik)

a. Argue that in the worst case, SIMPLER-RANDOMIZED-SELECT

never terminates.

b. Prove that the expected running time of SIMPLER-

RANDOMIZED-SELECT is still O( n).

9-3 Weighted median

Consider n elements x 1, x 2, … , xn with positive weights w 1, w 2, … , wn such that

. The weighted (lower) median is an element xk

satisfying

and

For example, consider the following elements xi and weights wi:

i

Image 378

1

2

3

4

5

6

7

xi 3

8

2

5

4

1

6

wi 0.12 0.35 0.025 0.08 0.15 0.075 0.2

For these elements, the median is x 5 = 4, but the weighted median is x 7

= 6. To see why the weighted median is x 7, observe that the elements

less than x 7 are x 1, x 3, x 4, x 5, and x 6, and the sum w 1 + w 3 + w 4 + w 5

+ w 6 = 0.45, which is less than 1/2. Furthermore, only element x 2 is greater than x 7, and w 2 = 0.35, which is no greater than 1/2.

a. Argue that the median of x 1, x 2, … , xn is the weighted median of the xi with weights wi = 1/ n for i = 1, 2, … , n.

b. Show how to compute the weighted median of n elements in O( n lg n) worst-case time using sorting.

c. Show how to compute the weighted median in Θ( n) worst-case time

using a linear-time median algorithm such as SELECT from Section

9.3.

The post-office location problem is defined as follows. The input is n points p 1, p 2, … , pn with associated weights w 1, w 2, … , wn. A solution is a point p (not necessarily one of the input points) that minimizes the

sum

, where d( a, b) is the distance between points a and b.

d. Argue that the weighted median is a best solution for the one-

dimensional post-office location problem, in which points are simply

real numbers and the distance between points a and b is d( a, b) = | a

b|.

e. Find the best solution for the two-dimensional post-office location

problem, in which the points are ( x, y) coordinate pairs and the

distance between points a = ( x 1, y 1) and b = ( x 2, y 2) is the Manhattan distance given by d( a, b) = | x 1 – x 2| + | y 1 – y 2|.

9-4 Small order statistics

Image 379

Let’s denote by S( n) the worst-case number of comparisons used by SELECT to select the i th order statistic from n numbers. Although S( n)

= Θ( n), the constant hidden by the Θ-notation is rather large. When i is small relative to n, there is an algorithm that uses SELECT as a

subroutine but makes fewer comparisons in the worst case.

a. Describe an algorithm that uses Ui( n) comparisons to find the i th smallest of n elements, where

( Hint: Begin with ⌊ n/2⌊ disjoint pairwise comparisons, and recurse on

the set containing the smaller element from each pair.)

b. Show that, if i < n/2, then Ui( n) = n + O( S(2 i) lg( n/ i)).

c. Show that if i is a constant less than n/2, then Ui( n) = n + O(lg n).

d. Show that if i = n/ k for k ≥ 2, then Ui( n) = n + O( S(2 n/ k) lg k).

9-5 Alternative analysis of randomized selection

In this problem, you will use indicator random variables to analyze the

procedure RANDOMIZED-SELECT in a manner akin to our analysis

of RANDOMIZED-QUICKSORT in Section 7.4.2.

As in the quicksort analysis, we assume that all elements are distinct,

and we rename the elements of the input array A as z 1, z 2, … , zn, where zi is the i th smallest element. Thus the call RANDOMIZED-SELECT( A, 1, n, i) returns zi.

For 1 ≤ j < kn, let

Xijk = I { zj is compared with zk sometime during the execution of the algorithm to find zi}.

a. Give an exact expression for E [ Xijk]. ( Hint: Your expression may have different values, depending on the values of i, j, and k.)

Image 380

b. Let Xi denote the total number of comparisons between elements of

array A when finding zi. Show that

c. Show that E [ Xi] ≤ 4 n.

d. Conclude that, assuming all elements of array A are distinct,

RANDOMIZED-SELECT runs in O( n) expected time.

9-6 Select with groups of 3

Exercise 9.3-1 asks you to show that the SELECT algorithm still runs in

linear time if the elements are divided into groups of 7. This problem

asks about dividing into groups of 3.

a. Show that SELECT runs in linear time if you divide the elements into

groups whose size is any odd constant greater than 3.

b. Show that SELECT runs in O( n lg n) time if you divide the elements into groups of size 3.

Because the bound in part (b) is just an upper bound, we do not

know whether the groups-of-3 strategy actually runs in O( n) time. But

by repeating the groups-of-3 idea on the middle group of medians, we

can pick a pivot that guarantees O( n) time. The SELECT3 algorithm on

the next page determines the i th smallest of an input array of n > 1

distinct elements.

c. Describe in English how the SELECT3 algorithm works. Include in

your description one or more suitable diagrams.

d. Show that SELECT3 runs in O( n) time in the worst case.

Chapter notes

The worst-case linear-time median-finding algorithm was devised by

Blum, Floyd, Pratt, Rivest, and Tarjan [62]. The fast randomized

version is due to Hoare [218]. Floyd and Rivest [147] have developed an improved randomized version that partitions around an element

recursively selected from a small sample of the elements.

SELECT3( A, p, r, i)

1 while ( rp + 1) mod 9 ≠ 0

2

for j = p + 1 to r

// put the minimum into A[ p]

3

if A[ p] > A[ j]

4

exchange A[ p] with A[ j]

5

// If we want the minimum of A[ p : r], we’re done.

6

if i == 1

7

return A[ p]

8

// Otherwise, we want the ( i – 1)st element of A[ p + 1 : r].

9

p = p + 1

10

i = i – 1

11 g = ( rp + 1)/3

// number of 3-element groups

12 for j = p to p + g – 1

// run through the groups

13

sort 〈 A[ j], A[ j + g], A[ j + 2 g]〉 in place 14 // All group medians now lie in the middle third of A[ p : r].

15 g′ = g/3

// number of 3-element

subgroups

16 for j = p + g to p + g + g′ – 1

// sort the subgroups

17

sort 〈 A[ j], A[ j + g′], A[ j + 2 g′]〉 in place 18 // All subgroup medians now lie in the middle ninth of A[ p : r].

19 // Find the pivot x recursively as the median of the subgroup

medians.

20 x = SELECT3( A, p + 4 g′, p + 5 g′ – 1, ⌈ g′/2⌉) 21 q = PARTITION-AROUND( A, p,// partition around the pivot

r, x)

22 // The rest is just like lines 19–24 of SELECT.

23 k = qp + 1

24 if i == k

25

return A[ q]

// the pivot value is the answer

26 elseif i < k

27

return SELECT3( A, p, q – 1, i)

28 else return SELECT3( A, q + 1, r, ik) It is still unknown exactly how many comparisons are needed to

determine the median. Bent and John [48] gave a lower bound of 2 n comparisons for median finding, and Schönhage, Paterson, and

Pippenger [397] gave an upper bound of 3 n. Dor and Zwick have improved on both of these bounds. Their upper bound [123] is slightly less than 2.95 n, and their lower bound [124] is (2 + ϵ) n, for a small positive constant ϵ, thereby improving slightly on related work by Dor

et al. [122]. Paterson [354] describes some of these results along with other related work.

Problem 9-6 was inspired by a paper by Chen and Dumitrescu [84].

1 As in the footnote on page 182, you can enforce the assumption that the numbers are distinct by converting each input value A[ i] to an ordered pair ( A[ i], i) with ( A[ i], i) < ( A[ j], j) if either A[ i] < A[ j] or A[ i] = A[ j] and i < j.

2 We could also use the Akra-Bazzi method from Section 4.7, which involves calculus, to solve this recurrence. Indeed, a similar recurrence (4.24) on page 117 was used to illustrate that method.

Part III Data Structures

Introduction

Sets are as fundamental to computer science as they are to mathematics.

Whereas mathematical sets are unchanging, the sets manipulated by

algorithms can grow, shrink, or otherwise change over time. We call

such sets dynamic. The next four chapters present some basic techniques

for representing finite dynamic sets and manipulating them on a

computer.

Algorithms may require several types of operations to be performed

on sets. For example, many algorithms need only the ability to insert

elements into, delete elements from, and test membership in a set. We

call a dynamic set that supports these operations a dictionary. Other algorithms require more complicated operations. For example, min-priority queues, which Chapter 6 introduced in the context of the heap data structure, support the operations of inserting an element into and

extracting the smallest element from a set. The best way to implement a

dynamic set depends upon the operations that you need to support.

Elements of a dynamic set

In a typical implementation of a dynamic set, each element is

represented by an object whose attributes can be examined and

manipulated given a pointer to the object. Some kinds of dynamic sets

assume that one of the object’s attributes is an identifying key. If the keys are all different, we can think of the dynamic set as being a set of

key values. The object may contain satellite data, which are carried

around in other object attributes but are otherwise unused by the set

implementation. It may also have attributes that are manipulated by the set operations. These attributes may contain data or pointers to other

objects in the set.

Some dynamic sets presuppose that the keys are drawn from a totally

ordered set, such as the real numbers, or the set of all words under the

usual alphabetic ordering. A total ordering allows us to define the

minimum element of the set, for example, or to speak of the next

element larger than a given element in a set.

Operations on dynamic sets

Operations on a dynamic set can be grouped into two categories:

queries, which simply return information about the set, and modifying

operations, which change the set. Here is a list of typical operations. Any

specific application will usually require only a few of these to be

implemented.

SEARCH( S, k)

A query that, given a set S and a key value k, returns a pointer x to an element in S such that x.key = k, or NIL if no such element belongs to S.

INSERT( S, x)

A modifying operation that adds the element pointed to by x to the

set S. We usually assume that any attributes in element x needed by

the set implementation have already been initialized.

DELETE( S, x)

A modifying operation that, given a pointer x to an element in the

set S, removes x from S. (Note that this operation takes a pointer to an element x, not a key value.)

MINIMUM( S) and MAXIMUM( S)

Queries on a totally ordered set S that return a pointer to the

element of S with the smallest (for MINIMUM) or largest (for

MAXIMUM) key.

SUCCESSOR( S, x)

A query that, given an element x whose key is from a totally ordered set S, returns a pointer to the next larger element in S, or NIL if x is the maximum element.

PREDECESSOR( S, x)

A query that, given an element x whose key is from a totally ordered

set S, returns a pointer to the next smaller element in S, or NIL if x is the minimum element.

In some situations, we can extend the queries SUCCESSOR and

PREDECESSOR so that they apply to sets with nondistinct keys. For a

set on n keys, the normal presumption is that a call to MINIMUM

followed by n – 1 calls to SUCCESSOR enumerates the elements in the

set in sorted order.

We usually measure the time taken to execute a set operation in

terms of the size of the set. For example, Chapter 13 describes a data structure that can support any of the operations listed above on a set of

size n in O(lg n) time.

Of course, you can always choose to implement a dynamic set with

an array. The advantage of doing so is that the algorithms for the

dynamic-set operations are simple. The downside, however, is that many

of these operations have a worst-case running time of Θ( n). If the array

is not sorted, INSERT and DELETE can take Θ(1) time, but the

remaining operations take Θ( n) time. If instead the array is maintained

in sorted order, then MINIMUM, MAXIMUM, SUCCESSOR, and

PREDECESSOR take Θ(1) time; SEARCH takes O(lg n) time if

implemented with binary search; but INSERT and DELETE take Θ( n)

time in the worst case. The data structures studied in this part improve

on the array implementation for many of the dynamic-set operations.

Overview of Part III

Chapters 10–13 describe several data structures that we can use to implement dynamic sets. We’ll use many of these data structures later to

construct efficient algorithms for a variety of problems. We already saw

another important data structure—the heap—in Chapter 6.

Chapter 10 presents the essentials of working with simple data structures such as arrays, matrices, stacks, queues, linked lists, and

rooted trees. If you have taken an introductory programming course,

then much of this material should be familiar to you.

Chapter 11 introduces hash tables, a widely used data structure supporting the dictionary operations INSERT, DELETE, and

SEARCH. In the worst case, hash tables require Θ( n) time to perform a

SEARCH operation, but the expected time for hash-table operations is

O(1). We rely on probability to analyze hash-table operations, but you

can understand how the operations work even without probability.

Binary search trees, which are covered in Chapter 12, support all the dynamic-set operations listed above. In the worst case, each operation

takes Θ( n) time on a tree with n elements. Binary search trees serve as the basis for many other data structures.

Chapter 13 introduces red-black trees, which are a variant of binary search trees. Unlike ordinary binary search trees, red-black trees are

guaranteed to perform well: operations take O(lg n) time in the worst case. A red-black tree is a balanced search tree. Chapter 18 in Part V

presents another kind of balanced search tree, called a B-tree. Although

the mechanics of red-black trees are somewhat intricate, you can glean

most of their properties from the chapter without studying the

mechanics in detail. Nevertheless, you probably will find walking

through the code to be instructive.

10 Elementary Data Structures

In this chapter, we examine the representation of dynamic sets by simple

data structures that use pointers. Although you can construct many

complex data structures using pointers, we present only the rudimentary

ones: arrays, matrices, stacks, queues, linked lists, and rooted trees.

10.1 Simple array-based data structures: arrays, matrices,

stacks, queues

10.1.1 Arrays

We assume that, as in most programming languages, an array is stored

as a contiguous sequence of bytes in memory. If the first element of an

array has index s (for example, in an array with 1-origin indexing, s = 1), the array starts at memory address a, and each array element occupies b

bytes, then the i th element occupies bytes a + b( is) through a + b( is

+ 1) – 1. Since most of the arrays in this book are indexed starting at 1,

and a few starting at 0, we can simplify these formulas a little. When s =

1, the i th element occupies bytes a + b( i – 1) through a + bi – 1, and when s = 0, the i th element occupies bytes a + bi through a + b( i + 1) –

1. Assuming that the computer can access all memory locations in the

same amount of time (as in the RAM model described in Section 2.2), it takes constant time to access any array element, regardless of the index.

Most programming languages require each element of a particular

array to be the same size. If the elements of a given array might occupy

different numbers of bytes, then the above formulas fail to apply, since

Image 381

Image 382

the element size b is not a constant. In such cases, the array elements are

usually objects of varying sizes, and what actually appears in each array

element is a pointer to the object. The number of bytes occupied by a

pointer is typically the same, no matter what the pointer references, so

that to access an object in an array, the above formulas give the address

of the pointer to the object and then the pointer must be followed to

access the object itself.

Figure 10.1 Four ways to store the 2 × 3 matrix M from equation (10.1). (a) In row-major order, in a single array. (b) In column-major order, in a single array. (c) In row-major order, with one array per row (tan) and a single array (blue) of pointers to the row arrays. (d) In column-major order, with one array per column (tan) and a single array (blue) of pointers to the column arrays.

10.1.2 Matrices

We typically represent a matrix or two-dimensional array by one or

more one-dimensional arrays. The two most common ways to store a

matrix are row-major and column-major order. Let’s consider an m × n

matrix—a matrix with m rows and n columns. In row-major order, the matrix is stored row by row, and in column-major order, the matrix is stored column by column. For example, consider the 2 × 3 matrix

Row-major order stores the two rows 1 2 3 and 4 5 6, whereas column-

major order stores the three columns 1 4; 2 5; and 3 6.

Parts (a) and (b) of Figure 10.1 show how to store this matrix using a single one-dimensional array. It’s stored in row-major order in part (a)

and in column-major order in part (b). If the rows, columns, and the

single array all are indexed starting at s, then M [ i, j]—the element in row i and column j—is at array index s + ( n( is)) + ( js) with row-

Image 383

major order and s + ( m( js)) + ( is) with column-major order. When s = 1, the single-array indices are n( i – 1) + j with row-major order and i

+ m( j – 1) with column-major order. When s = 0, the single-array indices are simpler: ni + j with row-major order and i + mj with column-major order. For the example matrix M with 1-origin indexing, element

M [2, 1] is stored at index 3(2 – 1) + 1 = 4 in the single array using row-

major order and at index 2 + 2(1 – 1) = 2 using column-major order.

Parts (c) and (d) of Figure 10.1 show multiple-array strategies for storing the example matrix. In part (c), each row is stored in its own

array of length n, shown in tan. Another array, with m elements, shown

in blue, points to the m row arrays. If we call the blue array A, then A[ i]

points to the array storing the entries for row i of M, and array element A[ i] [ j] stores matrix element M [ i, j]. Part (d) shows the column-major version of the multiple-array representation, with n arrays, each of

length m, representing the n columns. Matrix element M [ i, j] is stored in array element A[ j] [ i].

Single-array representations are typically more efficient on modern

machines than multiple-array representations. But multiple-array

representations can sometimes be more flexible, for example, allowing

for “ragged arrays,” in which the rows in the row-major version may

have different lengths, or symmetrically for the column-major version,

where columns may have different lengths.

Occasionally, other schemes are used to store matrices. In the block

representation, the matrix is divided into blocks, and each block is

stored contiguously. For example, a 4 × 4 matrix that is divided into 2 ×

2 blocks, such as

might be stored in a single array in the order 〈1, 2, 5, 6, 3, 4, 7, 8, 9, 10,

13, 14, 11, 12, 15, 16〉.

10.1.3 Stacks and queues

Image 384

Stacks and queues are dynamic sets in which the element removed from

the set by the DELETE operation is prespecified. In a stack, the

element deleted from the set is the one most recently inserted: the stack

implements a last-in, first-out, or LIFO, policy. Similarly, in a queue, the element deleted is always the one that has been in the set for the longest

time: the queue implements a first-in, first-out, or FIFO, policy. There are several efficient ways to implement stacks and queues on a

computer. Here, you will see how to use an array with attributes to store

them.

Stacks

The INSERT operation on a stack is often called PUSH, and the

DELETE operation, which does not take an element argument, is often

called POP. These names are allusions to physical stacks, such as the

spring-loaded stacks of plates used in cafeterias. The order in which

plates are popped from the stack is the reverse of the order in which

they were pushed onto the stack, since only the top plate is accessible.

Figure 10.2 shows how to implement a stack of at most n elements with an array S[1 : n]. The stack has attributes S.top, indexing the most recently inserted element, and S.size, equaling the size n of the array.

The stack consists of elements S[1 : S.top], where S[1] is the element at the bottom of the stack and S[ S.top] is the element at the top.

Figure 10.2 An array implementation of a stack S. Stack elements appear only in the tan positions. (a) Stack S has 4 elements. The top element is 9. (b) Stack S after the calls PUSH( S, 17) and PUSH( S, 3). (c) Stack S after the call POP( S) has returned the element 3, which is the one most recently pushed. Although element 3 still appears in the array, it is no longer in the stack. The top is element 17.

When S.top = 0, the stack contains no elements and is empty. We can

test whether the stack is empty with the query operation STACK-

EMPTY. Upon an attempt to pop an empty stack, the stack underflows, which is normally an error. If S.top exceeds S.size, the stack overflows.

The procedures STACK-EMPTY, PUSH, and POP implement each

of the stack operations with just a few lines of code. Figure 10.2 shows the effects of the modifying operations PUSH and POP. Each of the

three stack operations takes O(1) time.

STACK-EMPTY( S)

1 if S.top == 0

2

return TRUE

3 else return FALSE

PUSH( S, x)

1 if S.top == S.size

2

error “overflow”

3 else S.top = S.top + 1

4

S[ S.top] = x

POP( S)

1 if STACK-EMPTY( S)

2

error “underflow”

3 else S.top = S.top – 1

4

return S[ S.top + 1]

Image 385

Figure 10.3 A queue implemented using an array Q[1 : 12]. Queue elements appear only in the tan positions. (a) The queue has 5 elements, in locations Q[7 : 11]. (b) The configuration of the queue after the calls ENQUEUE( Q, 17), ENQUEUE( Q, 3), and ENQUEUE( Q, 5). (c) The configuration of the queue after the call DEQUEUE( Q) returns the key value 15 formerly at the head of the queue. The new head has key 6.

Queues

We call the INSERT operation on a queue ENQUEUE, and we call the

DELETE operation DEQUEUE. Like the stack operation POP,

DEQUEUE takes no element argument. The FIFO property of a queue

causes it to operate like a line of customers waiting for service. The

queue has a head and a tail. When an element is enqueued, it takes its

place at the tail of the queue, just as a newly arriving customer takes a

place at the end of the line. The element dequeued is always the one at

the head of the queue, like the customer at the head of the line, who has

waited the longest.

Figure 10.3 shows one way to implement a queue of at most n – 1

elements using an array Q[1 : n], with the attribute Q.size equaling the size n of the array. The queue has an attribute Q.head that indexes, or points to, its head. The attribute Q.tail indexes the next location at which a newly arriving element will be inserted into the queue. The

elements in the queue reside in locations Q.head, Q.head + 1, … , Q.tail

– 1, where we “wrap around” in the sense that location 1 immediately

follows location n in a circular order. When Q.head = Q.tail, the queue is empty. Initially, we have Q.head = Q.tail = 1. An attempt to dequeue an element from an empty queue causes the queue to underflow. When

Q.head = Q.tail + 1 or both Q.head = 1 and Q.tail = Q.size, the queue is full, and an attempt to enqueue an element causes the queue to

overflow.

In the procedures ENQUEUE and DEQUEUE, we have omitted the

error checking for underflow and overflow. (Exercise 10.1-5 asks you to

supply these checks.) Figure 10.3 shows the effects of the ENQUEUE

and DEQUEUE operations. Each operation takes O(1) time.

ENQUEUE( Q, x)

1 Q[ Q.tail] = x

2 if Q.tail == Q.size

3

Q.tail = 1

4 else Q.tail = Q.tail + 1

DEQUEUE( Q)

1 x = Q[ Q.head]

2 if Q.head == Q.size

3

Q.head = 1

4 else Q.head = Q.head + 1

5 return x

Exercises

10.1-1

Consider an m × n matrix in row-major order, where both m and n are powers of 2 and rows and columns are indexed from 0. We can represent

a row index i in binary by the lg m bits 〈 i lg m – 1, i lg m – 2, … , i 0〉 and a column index j in binary by the lg n bits 〈 j lg n – 1, j lg n – 2, … , j 0〉.

Suppose that this matrix is a 2 × 2 block matrix, where each block has

m/2 rows and n/2 columns, and it is to be represented by a single array with 0-origin indexing. Show how to construct the binary representation

of the (lg m + lg n)-bit index into the single array from the binary representations of i and j.

10.1-2

Using Figure 10.2 as a model, illustrate the result of each operation in the sequence PUSH( S, 4), PUSH( S, 1), PUSH( S, 3), POP( S), PUSH( S, 8), and POP( S) on an initially empty stack S stored in array S[1 : 6]

10.1-3

Explain how to implement two stacks in one array A[1 : n] in such a way that neither stack overflows unless the total number of elements in both

stacks together is n. The PUSH and POP operations should run in O(1)

time.

10.1-4

Using Figure 10.3 as a model, illustrate the result of each operation in the sequence ENQUEUE( Q, 4), ENQUEUE( Q, 1), ENQUEUE( Q, 3),

DEQUEUE( Q), ENQUEUE( Q, 8), and DEQUEUE( Q) on an initially

empty queue Q stored in array Q[1 : 6].

10.1-5

Rewrite ENQUEUE and DEQUEUE to detect underflow and overflow

of a queue.

10.1-6

Whereas a stack allows insertion and deletion of elements at only one

end, and a queue allows insertion at one end and deletion at the other

end, a deque (double-ended queue, pronounced like “deck”) allows

insertion and deletion at both ends. Write four O(1)-time procedures to

insert elements into and delete elements from both ends of a deque

implemented by an array.

10.1-7

Show how to implement a queue using two stacks. Analyze the running

time of the queue operations.

10.1-8

Show how to implement a stack using two queues. Analyze the running

time of the stack operations.

10.2 Linked lists

A linked list is a data structure in which the objects are arranged in a

linear order. Unlike an array, however, in which the linear order is

determined by the array indices, the order in a linked list is determined

by a pointer in each object. Since the elements of linked lists often

contain keys that can be searched for, linked lists are sometimes called

search lists. Linked lists provide a simple, flexible representation for dynamic sets, supporting (though not necessarily efficiently) all the

operations listed on page 250.

As shown in Figure 10.4, each element of a doubly linked list L is an object with an attribute key and two pointer attributes: next and prev.

The object may also contain other satellite data. Given an element x in

the list, x.next points to its successor in the linked list, and x.prev points to its predecessor. If x.prev = NIL, the element x has no predecessor and is therefore the first element, or head, of the list. If x.next = NIL, the element x has no successor and is therefore the last element, or tail, of the list. An attribute L.head points to the first element of the list. If

L.head = NIL, the list is empty.

Image 386

Figure 10.4 (a) A doubly linked list L representing the dynamic set {1, 4, 9, 16}. Each element in the list is an object with attributes for the key and pointers (shown by arrows) to the next and previous objects. The next attribute of the tail and the prev attribute of the head are NIL, indicated by a diagonal slash. The attribute L.head points to the head. (b) Following the execution of LIST-PREPEND( L, x), where x.key = 25, the linked list has an object with key 25

as the new head. This new object points to the old head with key 9. (c) The result of calling LIST-INSERT( x, y), where x.key = 36 and y points to the object with key 9. (d) The result of the subsequent call LIST-DELETE( L, x), where x points to the object with key 4.

A list may have one of several forms. It may be either singly linked or

doubly linked, it may be sorted or not, and it may be circular or not. If

a list is singly linked, each element has a next pointer but not a prev pointer. If a list is sorted, the linear order of the list corresponds to the

linear order of keys stored in elements of the list. The minimum element

is then the head of the list, and the maximum element is the tail. If the

list is unsorted, the elements can appear in any order. In a circular list, the prev pointer of the head of the list points to the tail, and the next

pointer of the tail of the list points to the head. You can think of a

circular list as a ring of elements. In the remainder of this section, we

assume that the lists we are working with are unsorted and doubly

linked.

Searching a linked list

The procedure LIST-SEARCH( L, k) finds the first element with key k

in list L by a simple linear search, returning a pointer to this element. If

no object with key k appears in the list, then the procedure returns NIL.

For the linked list in Figure 10.4(a), the call LIST-SEARCH( L, 4)

returns a pointer to the third element, and the call LIST-SEARCH( L, 7) returns NIL. To search a list of n objects, the LIST-SEARCH

procedure takes Θ( n) time in the worst case, since it may have to search

the entire list.

LIST-SEARCH( L, k)

1 x = L.head

2 while x ≠ NIL and x.keyk

3

x = x.next

4 return x

Inserting into a linked list

Given an element x whose key attribute has already been set, the LIST-

PREPEND procedure adds x to the front of the linked list, as shown in

Figure 10.4(b). (Recall that our attribute notation can cascade, so that L.head.prev denotes the prev attribute of the object that L.head points to.) The running time for LIST-PREPEND on a list of n elements is

O(1).

LIST-PREPEND( L, x)

1 x.next = L.head

2 x.prev = NIL

3 if L.head ≠ NIL

4

L.head.prev = x

5 L.head = x

You can insert anywhere within a linked list. As Figure 10.4(c)

shows, if you have a pointer y to an object in the list, the LIST-INSERT

procedure on the facing page “splices” a new element x into the list, immediately following y, in O(1) time. Since LIST-INSERT never

references the list object L, it is not supplied as a parameter.

LIST-INSERT( x, y)

1 x.next = y.next

2 x.prev = y

3 if y.next ≠ NIL

4

y.next.prev = x

5 y.next = x

Deleting from a linked list

The procedure LIST-DELETE removes an element x from a linked list

L. It must be given a pointer to x, and it then “‘splices” x out of the list by updating pointers. To delete an element with a given key, first call

LIST-SEARCH to retrieve a pointer to the element. Figure 10.4(d)

shows how an element is deleted from a linked list. LIST-DELETE runs

in O(1) time, but to delete an element with a given key, the call to LIST-

SEARCH makes the worst-case running time be Θ( n).

LIST-DELETE( L, x)

1 if x.prev ≠ NIL

2

x.prev.next = x.next

3 else L.head = x.next

4 if x.next ≠ NIL

5

x.next.prev = x.prev

Insertion and deletion are faster operations on doubly linked lists

than on arrays. If you want to insert a new first element into an array or

delete the first element in an array, maintaining the relative order of all

the existing elements, then each of the existing elements needs to be

moved by one position. In the worst case, therefore, insertion and

deletion take Θ( n) time in an array, compared with O(1) time for a doubly linked list. (Exercise 10.2-1 asks you to show that deleting an

element from a singly linked list takes Θ( n) time in the worst case.) If,

however, you want to find the k th element in the linear order, it takes

just O(1) time in an array regardless of k, but in a linked list, you’d have to traverse k elements, taking Θ( k) time.

Image 387

Sentinels

The code for LIST-DELETE is simpler if you ignore the boundary

conditions at the head and tail of the list:

Figure 10.5 A circular, doubly linked list with a sentinel. The sentinel L.nil, in blue, appears between the head and tail. The attribute L.head is no longer needed, since the head of the list is L.nil.next. (a) An empty list. (b) The linked list from Figure 10.4(a), with key 9 at the head and key 1 at the tail. (c) The list after executing LIST-INSERT′ ( x, L.nil), where x.key = 25. The new object becomes the head of the list. (d) The list after deleting the object with key 1. The new tail is the object with key 4. (e) The list after executing LIST-INSERT′ ( x, y), where x.key = 36 and y points to the object with key 9.

LIST-DELETE′ ( x)

1 x.prev.next = x.next

2 x.next.prev = x.prev

A sentinel is a dummy object that allows us to simplify boundary

conditions. In a linked list L, the sentinel is an object L.nil that represents NIL but has all the attributes of the other objects in the list.

References to NIL are replaced by references to the sentinel L.nil. As shown in Figure 10.5, this change turns a regular doubly linked list into a circular, doubly linked list with a sentinel, in which the sentinel L.nil lies between the head and tail. The attribute L.nil.next points to the head of the list, and L.nil.prev points to the tail. Similarly, both the next

attribute of the tail and the prev attribute of the head point to L.nil.

Since L.nil.next points to the head, the attribute L.head is eliminated altogether, with references to it replaced by references to L.nil.next.

Figure 10.5(a) shows that an empty list consists of just the sentinel, and both L.nil.next and L.nil.prev point to L.nil.

To delete an element from the list, just use the two-line procedure

LIST-DELETE′ from before. Just as LIST-INSERT never references

the list object L, neither does LIST-DELETE′. You should never delete

the sentinel L.nil unless you are deleting the entire list!

The LIST-INSERT′ procedure inserts an element x into the list

following object y. No separate procedure for prepending is necessary:

to insert at the head of the list, let y be L.nil; and to insert at the tail, let y be L.nil.prev. Figure 10.5 shows the effects of LIST-INSERT′ and LIST-DELETE′ on a sample list.

LIST-INSERT′ ( x, y)

1 x.next = y.next

2 x.prev = y

3 y.next.prev = x

4 y.next = x

Searching a circular, doubly linked list with a sentinel has the same

asymptotic running time as without a sentinel, but it is possible to

decrease the constant factor. The test in line 2 of LIST-SEARCH makes

two comparisons: one to check whether the search has run off the end

of the list and, if not, one to check whether the key resides in the current

element x. Suppose that you know that the key is somewhere in the list.

Then you do not need to check whether the search runs off the end of

the list, thereby eliminating one comparison in each iteration of the

while loop.

The sentinel provides a place to put the key before starting the

search. The search starts at the head L.nil.next of list L, and it stops if it finds the key somewhere in the list. Now the search is guaranteed to find

the key, either in the sentinel or before reaching the sentinel. If the key is

found before reaching the sentinel, then it really is in the element where

the search stops. If, however, the search goes through all the elements in the list and finds the key only in the sentinel, then the key is not really in

the list, and the search returns NIL. The procedure LIST-SEARCH′

embodies this idea. (If your sentinel requires its key attribute to be NIL,

then you might want to assign L.nil.key = NIL before line 5.)

LIST-SEARCH′ ( L, k)

1 L.nil.key = k

// store the key in the sentinel to guarantee it is in list

2 x = L.nil.next // start at the head of the list

3 while x.keyk

4

x = x.next

5 if x == L.nil

// found k in the sentinel

6

return NIL // k was not really in the list

7 else return x

// found k in element x

Sentinels often simplify code and, as in searching a linked list, they

might speed up code by a small constant factor, but they don’t typically

improve the asymptotic running time. Use them judiciously. When there

are many small lists, the extra storage used by their sentinels can

represent significant wasted memory. In this book, we use sentinels only

when they significantly simplify the code.

Exercises

10.2-1

Explain why the dynamic-set operation INSERT on a singly linked list

can be implemented in O(1) time, but the worst-case time for DELETE

is Θ( n).

10.2-2

Implement a stack using a singly linked list. The operations PUSH and

POP should still take O(1) time. Do you need to add any attributes to

the list?

10.2-3

Implement a queue using a singly linked list. The operations

ENQUEUE and DEQUEUE should still take O(1) time. Do you need

to add any attributes to the list?

10.2-4

The dynamic-set operation UNION takes two disjoint sets S 1 and S 2 as

input, and it returns a set S = S 1 ⋃ S 2 consisting of all the elements of S 1 and S 2. The sets S 1 and S 2 are usually destroyed by the operation.

Show how to support UNION in O(1) time using a suitable list data

structure.

10.2-5

Give a Θ( n)-time nonrecursive procedure that reverses a singly linked

list of n elements. The procedure should use no more than constant

storage beyond that needed for the list itself.

10.2-6

Explain how to implement doubly linked lists using only one pointer

value x.np per item instead of the usual two ( next and prev). Assume that all pointer values can be interpreted as k-bit integers, and define x.np = x.next XOR x.prev, the k-bit “exclusive-or” of x.next and x.prev.

The value NIL is represented by 0. Be sure to describe what information

you need to access the head of the list. Show how to implement the

SEARCH, INSERT, and DELETE operations on such a list. Also show

how to reverse such a list in O(1) time.

10.3 Representing rooted trees

Linked lists work well for representing linear relationships, but not all

relationships are linear. In this section, we look specifically at the

problem of representing rooted trees by linked data structures. We first

look at binary trees, and then we present a method for rooted trees in

which nodes can have an arbitrary number of children.

We represent each node of a tree by an object. As with linked lists,

we assume that each node contains a key attribute. The remaining

attributes of interest are pointers to other nodes, and they vary according to the type of tree.

Binary trees

Figure 10.6 shows how to use the attributes p, left, and right to store pointers to the parent, left child, and right child of each node in a

binary tree T. If x.p = NIL, then x is the root. If node x has no left child, then x.left = NIL, and similarly for the right child. The root of

the entire tree T is pointed to by the attribute T.root. If T.root = NIL, then the tree is empty.

Rooted trees with unbounded branching

It’s simple to extend the scheme for representing a binary tree to any

class of trees in which the number of children of each node is at most

some constant k: replace the left and right attributes by child 1, child 2, …

, childk. This scheme no longer works when the number of children of a

node is unbounded, however, since we do not know how many

attributes to allocate in advance. Moreover, if k, the number of children,

is bounded by a large constant but most nodes have a small number of

children, we may waste a lot of memory.

Fortunately, there is a clever scheme to represent trees with arbitrary

numbers of children. It has the advantage of using only O( n) space for

any n-node rooted tree. The left-child, right-sibling representation appears in Figure 10.7. As before, each node contains a parent pointer p, and T.root points to the root of tree T. Instead of having a pointer to each of its children, however, each node x has only two pointers:

1. x.left-child points to the leftmost child of node x, and

2. x.right-sibling points to the sibling of x immediately to its right.

If node x has no children, then x.left-child = NIL, and if node x is the rightmost child of its parent, then x.right-sibling = NIL.

Image 388

Image 389

Figure 10.6 The representation of a binary tree T. Each node x has the attributes x.p (top), x.left (lower left), and x.right (lower right). The key attributes are not shown.

Figure 10.7 The left-child, right-sibling representation of a tree T. Each node x has attributes x.p (top), x.left-child (lower left), and x.right-sibling (lower right). The key attributes are not shown.

Other tree representations

We sometimes represent rooted trees in other ways. In Chapter 6, for example, we represented a heap, which is based on a complete binary

tree, by a single array along with an attribute giving the index of the last

node in the heap. The trees that appear in Chapter 19 are traversed only toward the root, and so only the parent pointers are present: there are

no pointers to children. Many other schemes are possible. Which

scheme is best depends on the application.

Exercises

10.3-1

Draw the binary tree rooted at index 6 that is represented by the

following attributes:

index key left right

1

17 8

9

2

14 NIL NIL

3

12 NIL NIL

4

20 10 NIL

5

33 2 NIL

6

15 1

4

7

28 NIL NIL

8

22 NIL NIL

9

13 3

7

10 25 NIL 5

10.3-2

Write an O( n)-time recursive procedure that, given an n-node binary tree, prints out the key of each node in the tree.

10.3-3

Write an O( n)-time nonrecursive procedure that, given an n-node binary tree, prints out the key of each node in the tree. Use a stack as an

auxiliary data structure.

10.3-4

Write an O( n)-time procedure that prints out all the keys of an arbitrary rooted tree with n nodes, where the tree is stored using the left-child, right-sibling representation.

10.3-5

Write an O( n)-time nonrecursive procedure that, given an n-node binary tree, prints out the key of each node. Use no more than constant extra

space outside of the tree itself and do not modify the tree, even

temporarily, during the procedure.

10.3-6

The left-child, right-sibling representation of an arbitrary rooted tree

uses three pointers in each node: left-child, right-sibling, and parent.

From any node, its parent can be accessed in constant time and all its

children can be accessed in time linear in the number of children. Show

how to use only two pointers and one boolean value in each node x so

that x’s parent or all of x’s children can be accessed in time linear in the number of x’s children.

Problems

10-1 Comparisons among lists

For each of the four types of lists in the following table, what is the

asymptotic worst-case running time for each dynamic-set operation

listed?

sorted,

sorted,

unsorted,

unsorted,

singly linked

singly

doubly

linked

doubly linked

linked

SEARCH

INSERT

DELETE

SUCCESSOR

PREDECESSOR

Image 390

MINIMUM

MAXIMUM

10-2 Mergeable heaps using linked lists

A mergeable heap supports the following operations: MAKE-HEAP

(which creates an empty mergeable heap), INSERT, MINIMUM,

EXTRACT-MIN, and UNION.1 Show how to implement mergeable

heaps using linked lists in each of the following cases. Try to make each

operation as efficient as possible. Analyze the running time of each

operation in terms of the size of the dynamic set(s) being operated on.

a. Lists are sorted.

b. Lists are unsorted.

c. Lists are unsorted, and dynamic sets to be merged are disjoint.

10-3 Searching a sorted compact list

We can represent a singly linked list with two arrays, key and next.

Given the index i of an element, its value is stored in key[ i], and the index of its successor is given by next[ i], where next[ i] = NIL for the last element. We also need the index head of the first element in the list. An

n-element list stored in this way is compact if it is stored only in positions 1 through n of the key and next arrays.

Let’s assume that all keys are distinct and that the compact list is

also sorted, that is, key[ i] < key[ next[ i]] for all i = 1, 2, … , n such that next[ i] ≠ NIL. Under these assumptions, you will show that the randomized algorithm COMPACT-LIST-SEARCH searches the list for

key k in

expected time.

COMPACT-LIST-SEARCH( key, next, head, n, k)

1 i = head

2 while i ≠ NIL and key[ i] < k

3

j = RANDOM(1, n)

4

if key[ i] < key[ j] and key[ j] ≤ k 5

i = j

6

if key[ i] == k

7

return i

8

i = next[ i]

9 if i == NIL or key[ i] > k

10

return NIL

11 else return i

If you ignore lines 3–7 of the procedure, you can see that it’s an

ordinary algorithm for searching a sorted linked list, in which index i

points to each position of the list in turn. The search terminates once

the index i “falls off” the end of the list or once key[ i] ≥ k. In the latter case, if key[ i] = k, the procedure has found a key with the value k. If, however, key[ i] > k, then the search will never find a key with the value k, so that terminating the search was the correct action.

Lines 3–7 attempt to skip ahead to a randomly chosen position j.

Such a skip helps if key[ j] is larger than key[ i] and no larger than k. In such a case, j marks a position in the list that i would reach during an ordinary list search. Because the list is compact, we know that any

choice of j between 1 and n indexes some element in the list.

Instead of analyzing the performance of COMPACT-LIST-

SEARCH directly, you will analyze a related algorithm, COMPACT-

LIST-SEARCH, which executes two separate loops. This algorithm

takes an additional parameter t, which specifies an upper bound on the

number of iterations of the first loop.

COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) 1 i = head

2 for q = 1 to t

3

j = RANDOM(1, n)

4

if key[ i] < key[ j] and key[ j] ≤ k 5

i = j

6

if key[ i] == k

7

return i

8 while i ≠ NIL and key[ i] < k

9

i = next[ i]

Image 391

Image 392

Image 393

10 if i == NIL or key[ i] > k

11

return NIL

12 else return i

To compare the execution of the two algorithms, assume that the

sequence of calls of RANDOM(1, n) yields the same sequence of

integers for both algorithms.

a. Argue that for any value of t, COMPACT-LIST-SEARCH( key, next, head, n, k) and COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) return the same result and that the number of iterations of the while

loop of lines 2–8 in COMPACT-LIST-SEARCH is at most the total

number of iterations of both the for and while loops in COMPACT-

LIST-SEARCH′.

In the call COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t), let Xt be the random variable that describes the distance in the linked list (that

is, through the chain of next pointers) from position i to the desired key k after t iterations of the for loop of lines 2–7 have occurred.

b. Argue that COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) has an expected running time of O( t + E [ Xt]).

c. Show that

. ( Hint: Use equation (C.28) on page

1193.)

d. Show that

. ( Hint: Use inequality (A.18) on page

1150.)

e. Prove that E [ Xt] ≤ n/( t + 1).

f. Show that COMPACT-LIST-SEARCH′ ( key, next, head, n, k, t) has an expected running time of O( t + n/ t).

g. Conclude that COMPACT-LIST-SEARCH runs in

expected

time.

h. Why do we assume that all keys are distinct in COMPACT-LIST-

SEARCH? Argue that random skips do not necessarily help

asymptotically when the list contains repeated key values.

Chapter notes

Aho, Hopcroft, and Ullman [6] and Knuth [259] are excellent references for elementary data structures. Many other texts cover both basic data

structures and their implementation in a particular programming

language. Examples of these types of textbooks include Goodrich and

Tamassia [196], Main [311], Shaffer [406], and Weiss [452, 453, 454]. The book by Gonnet and Baeza-Yates [193] provides experimental data on the performance of many data-structure operations.

The origin of stacks and queues as data structures in computer

science is unclear, since corresponding notions already existed in

mathematics and paper-based business practices before the introduction

of digital computers. Knuth [259] cites A. M. Turing for the development of stacks for subroutine linkage in 1947.

Pointer-based data structures also seem to be a folk invention.

According to Knuth, pointers were apparently used in early computers

with drum memories. The A-1 language developed by G. M. Hopper in

1951 represented algebraic formulas as binary trees. Knuth credits the

IPL-II language, developed in 1956 by A. Newell, J. C. Shaw, and H. A.

Simon, for recognizing the importance and promoting the use of

pointers. Their IPL-III language, developed in 1957, included explicit

stack operations.

1 Because we have defined a mergeable heap to support MINIMUM and EXTRACT-MIN, we can also refer to it as a mergeable min-heap. Alternatively, if it supports MAXIMUM and EXTRACT-MAX, it is a mergeable max-heap.

11 Hash Tables

Many applications require a dynamic set that supports only the

dictionary operations INSERT, SEARCH, and DELETE. For example,

a compiler that translates a programming language maintains a symbol

table, in which the keys of elements are arbitrary character strings

corresponding to identifiers in the language. A hash table is an effective

data structure for implementing dictionaries. Although searching for an

element in a hash table can take as long as searching for an element in a

linked list—Θ( n) time in the worst case—in practice, hashing performs

extremely well. Under reasonable assumptions, the average time to

search for an element in a hash table is O(1). Indeed, the built-in dictionaries of Python are implemented with hash tables.

A hash table generalizes the simpler notion of an ordinary array.

Directly addressing into an ordinary array takes advantage of the O(1)

access time for any array element. Section 11.1 discusses direct addressing in more detail. To use direct addressing, you must be able to

allocate an array that contains a position for every possible key.

When the number of keys actually stored is small relative to the total

number of possible keys, hash tables become an effective alternative to

directly addressing an array, since a hash table typically uses an array of

size proportional to the number of keys actually stored. Instead of using

the key as an array index directly, we compute the array index from the

key. Section 11.2 presents the main ideas, focusing on “chaining” as a way to handle “collisions,” in which more than one key maps to the

same array index. Section 11.3 describes how to compute array indices from keys using hash functions. We present and analyze several

variations on the basic theme. Section 11.4 looks at “open addressing,”

which is another way to deal with collisions. The bottom line is that

hashing is an extremely effective and practical technique: the basic

dictionary operations require only O(1) time on the average. Section

11.5 discusses the hierarchical memory systems of modern computer

systems have and illustrates how to design hash tables that work well in

such systems.

11.1 Direct-address tables

Direct addressing is a simple technique that works well when the

universe U of keys is reasonably small. Suppose that an application

needs a dynamic set in which each element has a distinct key drawn

from the universe U = {0, 1, …, m − 1}, where m is not too large.

To represent the dynamic set, you can use an array, or direct-address

table, denoted by T[0 : m − 1], in which each position, or slot, corresponds to a key in the universe U. Figure 11.1 illustrates this approach. Slot k points to an element in the set with key k. If the set contains no element with key k, then T[ k] = NIL.

The dictionary operations DIRECT-ADDRESS-SEARCH,

DIRECT-ADDRESS-INSERT, and DIRECT-ADDRESS-DELETE

on the following page are trivial to implement. Each takes only O(1) time.

For some applications, the direct-address table itself can hold the

elements in the dynamic set. That is, rather than storing an element’s

key and satellite data in an object external to the direct-address table,

with a pointer from a slot in the table to the object, save space by

storing the object directly in the slot. To indicate an empty slot, use a

special key. Then again, why store the key of the object at all? The index

of the object is its key! Of course, then you’d need some way to tell whether slots are empty.

Image 394

Figure 11.1 How to implement a dynamic set by a direct-address table T. Each key in the universe U = {0, 1, …, 9} corresponds to an index into the table. The set K = {2, 3, 5, 8} of actual keys determines the slots in the table that contain pointers to elements. The other slots, in blue, contain NIL.

DIRECT-ADDRESS-SEARCH( T, k)

1return T[ k]

DIRECT-ADDRESS-INSERT( T, x)

1 T[ x.key] = x

DIRECT-ADDRESS-DELETE( T, x)

1 T[ x.key] = NIL

Exercises

11.1-1

A dynamic set S is represented by a direct-address table T of length m.

Describe a procedure that finds the maximum element of S. What is the

worst-case performance of your procedure?

11.1-2

A bit vector is simply an array of bits (each either 0 or 1). A bit vector of

length m takes much less space than an array of m pointers. Describe

how to use a bit vector to represent a dynamic set of distinct elements drawn from the set {0, 1, …, m − 1} and with no satellite data.

Dictionary operations should run in O(1) time.

11.1-3

Suggest how to implement a direct-address table in which the keys of

stored elements do not need to be distinct and the elements can have

satellite data. All three dictionary operations (INSERT, DELETE, and

SEARCH) should run in O(1) time. (Don’t forget that DELETE takes

as an argument a pointer to an object to be deleted, not a key.)

11.1-4

Suppose that you want to implement a dictionary by using direct

addressing on a huge array. That is, if the array size is m and the dictionary contains at most n elements at any one time, then mn. At the start, the array entries may contain garbage, and initializing the

entire array is impractical because of its size. Describe a scheme for

implementing a direct-address dictionary on a huge array. Each stored

object should use O(1) space; the operations SEARCH, INSERT, and

DELETE should take O(1) time each; and initializing the data structure

should take O(1) time. ( Hint: Use an additional array, treated somewhat like a stack whose size is the number of keys actually stored in the

dictionary, to help determine whether a given entry in the huge array is

valid or not.)

11.2 Hash tables

The downside of direct addressing is apparent: if the universe U is large

or infinite, storing a table T of size | U| may be impractical, or even impossible, given the memory available on a typical computer.

Furthermore, the set K of keys actually stored may be so small relative to U that most of the space allocated for T would be wasted.

When the set K of keys stored in a dictionary is much smaller than

the universe U of all possible keys, a hash table requires much less storage than a direct-address table. Specifically, the storage requirement

reduces to Θ(| K|) while maintaining the benefit that searching for an element in the hash table still requires only O(1) time. The catch is that

this bound is for the average-case time, 1 whereas for direct addressing it holds for the worst-case time.

With direct addressing, an element with key k is stored in slot k, but

with hashing, we use a hash function h to compute the slot number from

the key k, so that the element goes into slot h( k). The hash function h maps the universe U of keys into the slots of a hash table T[0 : m − 1]: h : U → {0, 1, …, m − 1},

where the size m of the hash table is typically much less than | U|. We say that an element with key k hashes to slot h( k), and we also say that h( k) is the hash value of key k. Figure 11.2 illustrates the basic idea. The hash function reduces the range of array indices and hence the size of the

array. Instead of a size of | U|, the array can have size m. An example of a simple, but not particularly good, hash function is h( k) = k mod m.

There is one hitch, namely that two keys may hash to the same slot.

We call this situation a collision. Fortunately, there are effective

techniques for resolving the conflict created by collisions.

Of course, the ideal solution is to avoid collisions altogether. We

might try to achieve this goal by choosing a suitable hash function h.

One idea is to make h appear to be “random,” thus avoiding collisions

or at least minimizing their number. The very term “to hash,” evoking

images of random mixing and chopping, captures the spirit of this

approach. (Of course, a hash function h must be deterministic in that a

given input k must always produce the same output h( k).) Because | U| > m, however, there must be at least two keys that have the same hash value, and avoiding collisions altogether is impossible. Thus, although a

well-designed, “random”-looking hash function can reduce the number

of collisions, we still need a method for resolving the collisions that do

occur.

Image 395

Figure 11.2 Using a hash function h to map keys to hash-table slots. Because keys k 2 and k 5

map to the same slot, they collide.

The remainder of this section first presents a definition of

“independent uniform hashing,” which captures the simplest notion of

what it means for a hash function to be “random.” It then presents and

analyzes the simplest collision resolution technique, called chaining.

Section 11.4 introduces an alternative method for resolving collisions, called open addressing.

Independent uniform hashing

An “ideal” hashing function h would have, for each possible input k in

the domain U, an output h( k) that is an element randomly and independently chosen uniformly from the range {0, 1, …, m − 1}. Once

a value h( k) is randomly chosen, each subsequent call to h with the same input k yields the same output h( k).

We call such an ideal hash function an independent uniform hash

function. Such a function is also often called a random oracle [43]. When hash tables are implemented with an independent uniform hash

function, we say we are using independent uniform hashing.

Independent uniform hashing is an ideal theoretical abstraction, but

it is not something that can reasonably be implemented in practice.

Nonetheless, we’ll analyze the efficiency of hashing under the

Image 396

assumption of independent uniform hashing and then present ways of

achieving useful practical approximations to this ideal.

Figure 11.3 Collision resolution by chaining. Each nonempty hash-table slot T[ j] points to a linked list of all the keys whose hash value is j. For example, h( k 1) = h( k 4) and h( k 5) = h( k 2) =

h( k 7). The list can be either singly or doubly linked. We show it as doubly linked because deletion may be faster that way when the deletion procedure knows which list element (not just which key) is to be deleted.

Collision resolution by chaining

At a high level, you can think of hashing with chaining as a

nonrecursive form of divide-and-conquer: the input set of n elements is

divided randomly into m subsets, each of approximate size n/ m. A hash function determines which subset an element belongs to. Each subset is

managed independently as a list.

Figure 11.3 shows the idea behind chaining: each nonempty slot points to a linked list, and all the elements that hash to the same slot go

into that slot’s linked list. Slot j contains a pointer to the head of the list of all stored elements with hash value j. If there are no such elements,

then slot j contains NIL.

When collisions are resolved by chaining, the dictionary operations

are straightforward to implement. They appear on the next page and

use the linked-list procedures from Section 10.2. The worst-case running time for insertion is O(1). The insertion procedure is fast in part because

it assumes that the element x being inserted is not already present in the table. To enforce this assumption, you can search (at additional cost) for

an element whose key is x.key before inserting. For searching, the worst-

case running time is proportional to the length of the list. (We’ll analyze

this operation more closely below.) Deletion takes O(1) time if the lists

are doubly linked, as in Figure 11.3. (Since CHAINED-HASH-

DELETE takes as input an element x and not its key k, no search is needed. If the hash table supports deletion, then its linked lists should

be doubly linked in order to delete an item quickly. If the lists were only

singly linked, then by Exercise 10.2-1, deletion could take time

proportional to the length of the list. With singly linked lists, both

deletion and searching would have the same asymptotic running times.)

CHAINED-HASH-INSERT( T, x)

1 LIST-PREPEND( T[ h( x.key)], x)

CHAINED-HASH-SEARCH( T, k)

1 return LIST-SEARCH( T[ h( k)], k)

CHAINED-HASH-DELETE( T, x)

1 LIST-DELETE( T[ h( x.key)], x)

Analysis of hashing with chaining

How well does hashing with chaining perform? In particular, how long

does it take to search for an element with a given key?

Given a hash table T with m slots that stores n elements, we define the load factor α for T as n/ m, that is, the average number of elements stored in a chain. Our analysis will be in terms of α, which can be less

than, equal to, or greater than 1.

The worst-case behavior of hashing with chaining is terrible: all n

keys hash to the same slot, creating a list of length n. The worst-case time for searching is thus Θ( n) plus the time to compute the hash

function—no better than using one linked list for all the elements. We

clearly don’t use hash tables for their worst-case performance.

Image 397

The average-case performance of hashing depends on how well the

hash function h distributes the set of keys to be stored among the m slots, on the average (meaning with respect to the distribution of keys to

be hashed and with respect to the choice of hash function, if this choice

is randomized). Section 11.3 discusses these issues, but for now we assume that any given element is equally likely to hash into any of the m

slots. That is, the hash function is uniform. We further assume that where a given element hashes to is independent of where any other

elements hash to. In other words, we assume that we are using

independent uniform hashing.

Because hashes of distinct keys are assumed to be independent,

independent uniform hashing is universal: the chance that any two

distinct keys k 1 and k 2 collide is at most 1/ m. Universality is important in our analysis and also in the specification of universal families of hash

functions, which we’ll see in Section 11.3.2.

For j = 0, 1, …, m − 1, denote the length of the list T[ j] by nj, so that and the expected value of nj is E[ nj] = α = n/ m.

We assume that O(1) time suffices to compute the hash value h( k), so that the time required to search for an element with key k depends

linearly on the length nh( k) of the list T[ h( k)]. Setting aside the O(1) time required to compute the hash function and to access slot h( k), we’ll consider the expected number of elements examined by the search

algorithm, that is, the number of elements in the list T[ h( k)] that the algorithm checks to see whether any have a key equal to k. We consider

two cases. In the first, the search is unsuccessful: no element in the table

has key k. In the second, the search successfully finds an element with

key k.

Theorem 11.1

In a hash table in which collisions are resolved by chaining, an

unsuccessful search takes Θ(1 + α) time on average, under the

assumption of independent uniform hashing.

Proof Under the assumption of independent uniform hashing, any key k not already stored in the table is equally likely to hash to any of the m slots. The expected time to search unsuccessfully for a key k is the expected time to search to the end of list T[ h( k)], which has expected length E[ nh( k)] = α. Thus, the expected number of elements examined in an unsuccessful search is α, and the total time required (including the

time for computing h( k)) is Θ(1 + α).

The situation for a successful search is slightly different. An

unsuccessful search is equally likely to go to any slot of the hash table. A

successful search, however, cannot go to an empty slot, since it is for an

element that is present in one of the linked lists. We assume that the