Since coin flips are mutually independent, for any given event Aik, the

probability that all k flips are heads is

and thus the probability that a streak of heads of length at least 2 ⌈lg n

begins in position i is quite small. There are at most n – 2 ⌈lg n⌉ + 1

positions where such a streak can begin. The probability that a streak of

heads of length at least 2 ⌈lg n⌉ begins anywhere is therefore

Image 281

Image 282

Image 283

Image 284

Image 285

Image 286

We can use inequality (5.10) to bound the length of the longest

streak. For j = 0, 1, 2, … , n, let Lj be the event that the longest streak of heads has length exactly j, and let L be the length of the longest streak.

By the definition of expected value, we have

We could try to evaluate this sum using upper bounds on each Pr { Lj}

similar to those computed in inequality (5.10). Unfortunately, this

method yields weak bounds. We can use some intuition gained by the

above analysis to obtain a good bound, however. For no individual term

in the summation in equation (5.11) are both the factors j and Pr { Lj}

large. Why? When j ≥ 2 ⌈lg n⌉, then Pr { Lj} is very small, and when j < 2

⌈lg n⌉, then j is fairly small. More precisely, since the events Lj for j = 0, 1, … , n are disjoint, the probability that a streak of heads of length at

least 2 ⌈lg n⌉ begins anywhere is

. Inequality (5.10) tells us

that the probability that a streak of heads of length at least 2 ⌈lg n

begins anywhere is less than 1/ n, which means that

.

Also, noting that

, we have that

. Thus,

we obtain

Image 287

Image 288

Image 289

The probability that a streak of heads exceeds r ⌈lg n⌉ flips diminishes quickly with r. Let’s get a rough bound on the probability that a streak of at least r ⌈lg n⌉ heads occurs, for r ≥ 1. The probability that a streak of at least r ⌈lg n⌉ heads starts in position i is A streak of at least r ⌈lg n⌉ heads cannot start in the last nr ⌈lg n⌉ + 1

flips, but let’s overestimate the probability of such a streak by allowing it

to start anywhere within the n coin flips. Then the probability that a streak of at least r ⌈lg n⌉ heads occurs is at most

Image 290

Image 291

Equivalently, the probability is at least 1 – 1/ nr–1 that the longest streak

has length less than r ⌈lg n⌉.

As an example, during n = 1000 coin flips, the probability of

encountering a streak of at least 2 ⌈lg n⌉ = 20 heads is at most 1/ n =

1/1000. The chance of a streak of at least 3 ⌈lg n⌉ = 30 heads is at most

1/ n 2 = 1/1,000,000.

Let’s now prove a complementary lower bound: the expected length

of the longest streak of heads in n coin flips is Ω(lg n). To prove this bound, we look for streaks of length s by partitioning the n flips into approximately n/ s groups of s flips each. If we choose s = ⌊(lg n)/2⌊, we’ll see that it is likely that at least one of these groups comes up all heads,

which means that it’s likely that the longest streak has length at least s =

Ω(lg n). We’ll then show that the longest streak has expected length Ω(lg

n).

Let’s partition the n coin flips into at least ⌊ n/ ⌊(lg n)/2⌊⌊ groups of

⌊(lg n)/2⌊ consecutive flips and bound the probability that no group

comes up all heads. By equation (5.9), the probability that the group

starting in position i comes up all heads is

The probability that a streak of heads of length at least ⌊(lg n)/2⌊ does

not begin in position i is therefore at most

. Since the ⌊ n/ ⌊(lg

n)/2⌊⌊ groups are formed from mutually exclusive, independent coin

flips, the probability that every one of these groups fails to be a streak of

length ⌊(lg n)/2⌊ is at most

Image 292

Image 293

Image 294

For this argument, we used inequality (3.14), 1 + xex, on page 66 and the fact, which you may verify, that

for sufficiently

large n.

We want to bound the probability that the longest streak equals or

exceeds ⌊(lg n)/2⌊. To do so, let L be the event that the longest streak of heads equals or exceeds s = ⌊(lg n)/2⌊. Let L be the complementary event, that the longest streak of heads is strictly less than s, so that Pr

{ L} + Pr { L} = 1. Let F be the event that every group of s flips fails to be a streak of s heads. By inequality (5.12), we have Pr { F} = O(1/ n). If the longest streak of heads is less than s, then certainly every group of s flips fails to be a streak of s heads, which means that event L implies event F. Of course, event F could occur even if event L does not (for example, if a streak of s or more heads crosses over the boundary

between two groups), and so we have Pr { L} ≤ Pr { F} = O(1/ n). Since Pr

{ L} + Pr { L} = 1, we have that

Pr { L} = 1 – Pr { L}

≥ 1 – Pr { F}

= 1 – O(1/ n).

That is, the probability that the longest streak equals or exceeds ⌊(lg

n)/2⌊ is

We can now calculate a lower bound on the expected length of the

longest streak, beginning with equation (5.11) and proceeding in a

manner similar to our analysis of the upper bound:

Image 295

Image 296

As with the birthday paradox, we can obtain a simpler, but

approximate, analysis using indicator random variables. Instead of

determining the expected length of the longest streak, we’ll find the

expected number of streaks with at least a given length. Let Xik = I

{ Aik} be the indicator random variable associated with a streak of

heads of length at least k beginning with the i th coin flip. To count the total number of such streaks, define

Taking expectations and using linearity of expectation, we have

Image 297

Image 298

By plugging in various values for k, we can calculate the expected

number of streaks of length at least k. If this expected number is large

(much greater than 1), then we expect many streaks of length k to occur,

and the probability that one occurs is high. If this expected number is

small (much less than 1), then we expect to see few streaks of length k,

and the probability that one occurs is low. If k = c lg n, for some positive constant c, we obtain

If c is large, the expected number of streaks of length c lg n is small, and we conclude that they are unlikely to occur. On the other hand, if c =

1/2, then we obtain E [ X(1/2) lg n] = Θ(1/ n 1/2–1) = Θ( n 1/2), and we expect there to be numerous streaks of length (1/2) lg n. Therefore, one

streak of such a length is likely to occur. We can conclude that the

expected length of the longest streak is Θ(lg n).

5.4.4 The online hiring problem

As a final example, let’s consider a variant of the hiring problem.

Suppose now that you do not wish to interview all the candidates in

order to find the best one. You also want to avoid hiring and firing as

you find better and better applicants. Instead, you are willing to settle

for a candidate who is close to the best, in exchange for hiring exactly

once. You must obey one company requirement: after each interview

you must either immediately offer the position to the applicant or

immediately reject the applicant. What is the trade-off between

minimizing the amount of interviewing and maximizing the quality of

the candidate hired?

We can model this problem in the following way. After meeting an

applicant, you are able to give each one a score. Let score( i) denote the score you give to the i th applicant, and assume that no two applicants

receive the same score. After you have seen j applicants, you know which

of the j has the highest score, but you do not know whether any of the

remaining nj applicants will receive a higher score. You decide to adopt the strategy of selecting a positive integer k < n, interviewing and then rejecting the first k applicants, and hiring the first applicant thereafter who has a higher score than all preceding applicants. If it

turns out that the best-qualified applicant was among the first k

interviewed, then you hire the n th applicant—the last one interviewed.

We formalize this strategy in the procedure ONLINE-MAXIMUM( k,

n), which returns the index of the candidate you wish to hire.

ONLINE-MAXIMUM( k, n)

1 best- score = –∞

2 for i = 1 to k

3

if score( i) > best- score

4

best-score = score( i)

5 for i = k + 1 to n

6

if score( i) > best-score

7

return i

8 return n

Image 299

Image 300

If we determine, for each possible value of k, the probability that you

hire the most qualified applicant, then you can choose the best possible

k and implement the strategy with that value. For the moment, assume

that k is fixed. Let M( j) = max { score( i) : 1 ≤ ij} denote the maximum score among applicants 1 through j. Let S be the event that you succeed in choosing the best-qualified applicant, and let Si be the event that you

succeed when the best-qualified applicant is the i th one interviewed.

Since the various Si are disjoint, we have that

.

Noting that you never succeed when the best-qualified applicant is one

of the first k, we have that Pr { Si} = 0 for i = 1, 2, … , k. Thus, we obtain

We now compute Pr { Si}. In order to succeed when the best-

qualified applicant is the i th one, two things must happen. First, the best-qualified applicant must be in position i, an event which we denote

by Bi. Second, the algorithm must not select any of the applicants in

positions k + 1 through i – 1, which happens only if, for each j such that k + 1 ≤ ji – 1, line 6 finds that score( j) < best-score. (Because scores are unique, we can ignore the possibility of score( j) = best-score.) In other words, all of the values score( k + 1) through score( i – 1) must be less than M( k). If any are greater than M( k), the algorithm instead returns the index of the first one that is greater. We use Oi to denote the event

that none of the applicants in position k + 1 through i – 1 are chosen.

Fortunately, the two events Bi and Oi are independent. The event Oi depends only on the relative ordering of the values in positions 1

through i – 1, whereas Bi depends only on whether the value in position i is greater than the values in all other positions. The ordering of the values in positions 1 through i – 1 does not affect whether the value in

position i is greater than all of them, and the value in position i does not affect the ordering of the values in positions 1 through i – 1. Thus, we

can apply equation (C.17) on page 1188 to obtain

Image 301

Image 302

Image 303

Image 304

Pr { Si} = Pr { BiOi} = Pr { Bi} Pr { Oi}.

We have Pr { Bi} = 1/ n since the maximum is equally likely to be in any

one of the n positions. For event Oi to occur, the maximum value in positions 1 through i –1, which is equally likely to be in any of these i – 1

positions, must be in one of the first k positions. Consequently, Pr { Oi}

= k/( i – 1) and Pr { Si} = k/( n( i – 1)). Using equation (5.14), we have We approximate by integrals to bound this summation from above and

below. By the inequalities (A.19) on page 1150, we have

Evaluating these definite integrals gives us the bounds

which provide a rather tight bound for Pr { S}. Because you wish to

maximize your probability of success, let us focus on choosing the value

of k that maximizes the lower bound on Pr { S}. (Besides, the lower-bound expression is easier to maximize than the upper-bound

expression.) Differentiating the expression ( k/ n)(ln n – ln k) with respect to k, we obtain

Setting this derivative equal to 0, we see that you maximize the lower bound on the probability when ln k = ln n – 1 = ln( n/ e) or, equivalently, when k = n/ e. Thus, if you implement our strategy with k = n/ e, you succeed in hiring the best-qualified applicant with probability at least

1/ e.

Exercises

5.4-1

How many people must there be in a room before the probability that

someone has the same birthday as you do is at least 1/2? How many

people must there be before the probability that at least two people have

a birthday on July 4 is greater than 1/2?

5.4-2

How many people must there be in a room before the probability that

two people have the same birthday is at least 0.99? For that many

people, what is the expected number of pairs of people who have the

same birthday?

5.4-3

You toss balls into b bins until some bin contains two balls. Each toss is

independent, and each ball is equally likely to end up in any bin. What

is the expected number of ball tosses?

5.4-4

For the analysis of the birthday paradox, is it important that the

birthdays be mutually independent, or is pairwise independence

sufficient? Justify your answer.

5.4-5

How many people should be invited to a party in order to make it likely

that there are three people with the same birthday?

5.4-6

What is the probability that a k-string (defined on page 1179) over a set

of size n forms a k-permutation? How does this question relate to the

Image 305

birthday paradox?

5.4-7

You toss n balls into n bins, where each toss is independent and the ball is equally likely to end up in any bin. What is the expected number of

empty bins? What is the expected number of bins with exactly one ball?

5.4-8

Sharpen the lower bound on streak length by showing that in n flips of a

fair coin, the probability is at least 1 – 1/ n that a streak of length lg n – 2

lg lg n consecutive heads occurs.

Problems

5-1 Probabilistic counting

With a b-bit counter, we can ordinarily only count up to 2 b – 1. With R.

Morris’s probabilistic counting, we can count up to a much larger value

at the expense of some loss of precision.

We let a counter value of i represent a count of ni for i = 0, 1, … , 2 b

– 1, where the ni form an increasing sequence of nonnegative values. We

assume that the initial value of the counter is 0, representing a count of

n 0 = 0. The INCREMENT operation works on a counter containing

the value i in a probabilistic manner. If i = 2 b – 1, then the operation reports an overflow error. Otherwise, the INCREMENT operation

increases the counter by 1 with probability 1/( ni + 1 – ni), and it leaves the counter unchanged with probability 1 – 1/( ni + 1 – ni).

If we select ni = i for all i ≥ 0, then the counter is an ordinary one.

More interesting situations arise if we select, say, ni = 2 i – 1 for i > 0 or ni = Fi (the i th Fibonacci number—see equation (3.31) on page 69).

For this problem, assume that

is large enough that the

probability of an overflow error is negligible.

a. Show that the expected value represented by the counter after n INCREMENT operations have been performed is exactly n.

b. The analysis of the variance of the count represented by the counter

depends on the sequence of the ni. Let us consider a simple case: ni =

100 i for all i ≥ 0. Estimate the variance in the value represented by the register after n INCREMENT operations have been performed.

5-2 Searching an unsorted array

This problem examines three algorithms for searching for a value x in

an unsorted array A consisting of n elements.

Consider the following randomized strategy: pick a random index i

into A. If A[ i] = x, then terminate; otherwise, continue the search by picking a new random index into A. Continue picking random indices

into A until you find an index j such that A[ j] = x or until every element of A has been checked. This strategy may examine a given element more

than once, because it picks from the whole set of indices each time.

a. Write pseudocode for a procedure RANDOM-SEARCH to

implement the strategy above. Be sure that your algorithm terminates

when all indices into A have been picked.

b. Suppose that there is exactly one index i such that A[ i] = x. What is the expected number of indices into A that must be picked before x is

found and RANDOM-SEARCH terminates?

c. Generalizing your solution to part (b), suppose that there are k ≥ 1

indices i such that A[ i] = x. What is the expected number of indices into A that must be picked before x is found and RANDOM-SEARCH terminates? Your answer should be a function of n and k.

d. Suppose that there are no indices i such that A[ i] = x. What is the expected number of indices into A that must be picked before all

elements of A have been checked and RANDOM-SEARCH

terminates?

Now consider a deterministic linear search algorithm. The algorithm,

which we call DETERMINISTIC-SEARCH, searches A for x in order,

considering A[1], A[2], A[3], … , A[ n] until either it finds A[ i] = x or it reaches the end of the array. Assume that all possible permutations of

the input array are equally likely.

e. Suppose that there is exactly one index i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH?

What is the worst-case running time of DETERMINISTIC-

SEARCH?

f. Generalizing your solution to part (e), suppose that there are k ≥ 1

indices i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time

of DETERMINISTIC-SEARCH? Your answer should be a function

of n and k.

g. Suppose that there are no indices i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH? What is

the worst-case running time of DETERMINISTIC-SEARCH?

Finally, consider a randomized algorithm SCRAMBLE-SEARCH that

first randomly permutes the input array and then runs the deterministic

linear search given above on the resulting permuted array.

h. Letting k be the number of indices i such that A[ i] = x, give the worst-case and expected running times of SCRAMBLE-SEARCH for the

cases in which k = 0 and k = 1. Generalize your solution to handle the

case in which k ≥ 1.

i. Which of the three searching algorithms would you use? Explain your

answer.

Chapter notes

Bollobás [65], Hofri [223], and Spencer [420] contain a wealth of advanced probabilistic techniques. The advantages of randomized

algorithms are discussed and surveyed by Karp [249] and Rabin [372].

The textbook by Motwani and Raghavan [336] gives an extensive treatment of randomized algorithms.

The RANDOMLY-PERMUTE procedure is by Durstenfeld [128], based on an earlier procedure by Fisher and Yates [143, p. 34].

Several variants of the hiring problem have been widely studied.

These problems are more commonly referred to as “secretary

problems.” Examples of work in this area are the paper by Ajtai,

Meggido, and Waarts [11] and another by Kleinberg [258], which ties the secretary problem to online ad auctions.

Part II Sorting and Order Statistics

Image 306

Image 307

Introduction

This part presents several algorithms that solve the following sorting

problem:

Input: A sequence of n numbers 〈 a 1, a 2, … , an〉.

Output: A permutation (reordering)

of the input sequence

such that

.

The input sequence is usually an n-element array, although it may be represented in some other fashion, such as a linked list.

The structure of the data

In practice, the numbers to be sorted are rarely isolated values. Each is

usually part of a collection of data called a record. Each record contains

a key, which is the value to be sorted. The remainder of the record consists of satellite data, which are usually carried around with the key.

In practice, when a sorting algorithm permutes the keys, it must

permute the satellite data as well. If each record includes a large amount

of satellite data, it often pays to permute an array of pointers to the

records rather than the records themselves in order to minimize data

movement.

In a sense, it is these implementation details that distinguish an

algorithm from a full-blown program. A sorting algorithm describes the

method to determine the sorted order, regardless of whether what’s

being sorted are individual numbers or large records containing many

bytes of satellite data. Thus, when focusing on the problem of sorting, we typically assume that the input consists only of numbers. Translating

an algorithm for sorting numbers into a program for sorting records is

conceptually straightforward, although in a given engineering situation

other subtleties may make the actual programming task a challenge.

Why sorting?

Many computer scientists consider sorting to be the most fundamental

problem in the study of algorithms. There are several reasons:

Sometimes an application inherently needs to sort information.

For example, in order to prepare customer statements, banks need

to sort checks by check number.

Algorithms often use sorting as a key subroutine. For example, a

program that renders graphical objects which are layered on top

of each other might have to sort the objects according to an

“above” relation so that it can draw these objects from bottom to

top. We will see numerous algorithms in this text that use sorting

as a subroutine.

We can draw from among a wide variety of sorting algorithms,

and they employ a rich set of techniques. In fact, many important

techniques used throughout algorithm design appear in sorting

algorithms that have been developed over the years. In this way,

sorting is also a problem of historical interest.

We can prove a nontrivial lower bound for sorting (as we’ll do in

Chapter 8). Since the best upper bounds match the lower bound

asymptotically, we can conclude that certain of our sorting

algorithms are asymptotically optimal. Moreover, we can use the

lower bound for sorting to prove lower bounds for various other

problems.

Many engineering issues come to the fore when implementing

sorting algorithms. The fastest sorting program for a particular

situation may depend on many factors, such as prior knowledge

about the keys and satellite data, the memory hierarchy (caches

and virtual memory) of the host computer, and the software

environment. Many of these issues are best dealt with at the algorithmic level, rather than by “tweaking” the code.

Sorting algorithms

We introduced two algorithms that sort n real numbers in Chapter 2.

Insertion sort takes Θ( n 2) time in the worst case. Because its inner loops

are tight, however, it is a fast sorting algorithm for small input sizes.

Moreover, unlike merge sort, it sorts in place, meaning that at most a

constant number of elements of the input array are ever stored outside

the array, which can be advantageous for space efficiency. Merge sort

has a better asymptotic running time, Θ( n lg n), but the MERGE

procedure it uses does not operate in place. (We’ll see a parallelized

version of merge sort in Section 26.3. )

This part introduces two more algorithms that sort arbitrary real

numbers. Heapsort, presented in Chapter 6, sorts n numbers in place in O( n lg n) time. It uses an important data structure, called a heap, which can also implement a priority queue.

Quicksort, in Chapter 7, also sorts n numbers in place, but its worst-case running time is Θ( n 2). Its expected running time is Θ( n lg n), however, and it generally outperforms heapsort in practice. Like

insertion sort, quicksort has tight code, and so the hidden constant

factor in its running time is small. It is a popular algorithm for sorting

large arrays.

Insertion sort, merge sort, heapsort, and quicksort are all

comparison sorts: they determine the sorted order of an input array by

comparing elements. Chapter 8 begins by introducing the decision-tree model in order to study the performance limitations of comparison

sorts. Using this model, we prove a lower bound of Ω( n lg n) on the worst-case running time of any comparison sort on n inputs, thus

showing that heapsort and merge sort are asymptotically optimal

comparison sorts.

Chapter 8 then goes on to show that we might be able to beat this

lower bound of Ω( n lg n) if an algorithm can gather information about

the sorted order of the input by means other than comparing elements.

The counting sort algorithm, for example, assumes that the input

numbers belong to the set {0, 1, … , k}. By using array indexing as a tool for determining relative order, counting sort can sort n numbers in

Θ( k + n) time. Thus, when k = O( n), counting sort runs in time that is linear in the size of the input array. A related algorithm, radix sort, can

be used to extend the range of counting sort. If there are n integers to

sort, each integer has d digits, and each digit can take on up to k possible values, then radix sort can sort the numbers in Θ( d( n + k)) time.

When d is a constant and k is O( n), radix sort runs in linear time. A third algorithm, bucket sort, requires knowledge of the probabilistic

distribution of numbers in the input array. It can sort n real numbers uniformly distributed in the half-open interval [0, 1) in average-case

O( n) time.

The table on the following page summarizes the running times of the

sorting algorithms from Chapters 2 and 6–8. As usual, n denotes the number of items to sort. For counting sort, the items to sort are integers

in the set {0, 1, … , k}. For radix sort, each item is a d-digit number, where each digit takes on k possible values. For bucket sort, we assume

that the keys are real numbers uniformly distributed in the half-open

interval [0, 1). The rightmost column gives the average-case or expected

running time, indicating which one it gives when it differs from the

worst-case running time. We omit the average-case running time of

heapsort because we do not analyze it in this book.

Worst-case

Average-case/expected

Algorithm running time

running time

Insertion

Θ( n 2)

Θ( n 2)

sort

Merge

Θ( n lg n)

Θ( n lg n)

sort

Heapsort

O( n lg n)

Quicksort Θ( n 2)

Θ( n lg n) (expected)

Counting Θ( k + n)

Θ( k + n)

sort

Radix

Θ( d( n + k))

Θ( d( n + k))

sort

Bucket

Θ( n 2)

Θ( n) (average-case)

sort

Order statistics

The i th order statistic of a set of n numbers is the i th smallest number in the set. You can, of course, select the i th order statistic by sorting the

input and indexing the i th element of the output. With no assumptions

about the input distribution, this method runs in Ω( n lg n) time, as the lower bound proved in Chapter 8 shows.

Chapter 9 shows how to find the i th smallest element in O( n) time, even when the elements are arbitrary real numbers. We present a

randomized algorithm with tight pseudocode that runs in Θ( n 2) time in

the worst case, but whose expected running time is O( n). We also give a more complicated algorithm that runs in O( n) worst-case time.

Background

Although most of this part does not rely on difficult mathematics, some

sections do require mathematical sophistication. In particular, analyses

of quicksort, bucket sort, and the order-statistic algorithm use

probability, which is reviewed in Appendix C, and the material on probabilistic analysis and randomized algorithms in Chapter 5.

6 Heapsort

This chapter introduces another sorting algorithm: heapsort. Like

merge sort, but unlike insertion sort, heapsort’s running time is O( n lg n). Like insertion sort, but unlike merge sort, heapsort sorts in place: only a constant number of array elements are stored outside the input

array at any time. Thus, heapsort combines the better attributes of the

two sorting algorithms we have already discussed.

Heapsort also introduces another algorithm design technique: using

a data structure, in this case one we call a “heap,” to manage

information. Not only is the heap data structure useful for heapsort, but

it also makes an efficient priority queue. The heap data structure will

reappear in algorithms in later chapters.

The term “heap” was originally coined in the context of heapsort,

but it has since come to refer to “garbage-collected storage,” such as the

programming languages Java and Python provide. Please don’t be

confused. The heap data structure is not garbage-collected storage. This

book is consistent in using the term “heap” to refer to the data

structure, not the storage class.

6.1 Heaps

The (binary) heap data structure is an array object that we can view as a

nearly complete binary tree (see Section B.5.3), as shown in Figure 6.1.

Each node of the tree corresponds to an element of the array. The tree is

completely filled on all levels except possibly the lowest, which is filled

Image 308

from the left up to a point. An array A[1 : n] that represents a heap is an object with an attribute A.heap-size, which represents how many

elements in the heap are stored within array A. That is, although A[1 : n]

may contain numbers, only the elements in A[1 : A.heap-size], where 0 ≤

A.heap-sizen, are valid elements of the heap. If A.heap-size = 0, then the heap is empty. The root of the tree is A[1], and given the index i of a node, there’s a simple way to compute the indices of its parent, left

child, and right child with the one-line procedures PARENT, LEFT,

and RIGHT.

Figure 6.1 A max-heap viewed as (a) a binary tree and (b) an array. The number within the circle at each node in the tree is the value stored at that node. The number above a node is the corresponding index in the array. Above and below the array are lines showing parent-child relationships, with parents always to the left of their children. The tree has height 3, and the node at index 4 (with value 8) has height 1.

PARENT( i)

1return ⌊ i/2⌊

LEFT( i)

1return 2 i

RIGHT( i)

1return 2 i + 1

On most computers, the LEFT procedure can compute 2 i in one

instruction by simply shifting the binary representation of i left by one

bit position. Similarly, the RIGHT procedure can quickly compute 2 i +

1 by shifting the binary representation of i left by one bit position and

then adding 1. The PARENT procedure can compute ⌊ i/2⌊ by shifting i

right one bit position. Good implementations of heapsort often

implement these procedures as macros or inline procedures.

There are two kinds of binary heaps: max-heaps and min-heaps. In

both kinds, the values in the nodes satisfy a heap property, the specifics

of which depend on the kind of heap. In a max-heap, the max-heap property is that for every node i other than the root,

A[PARENT( i)] ≥ A[ i],

that is, the value of a node is at most the value of its parent. Thus, the

largest element in a max-heap is stored at the root, and the subtree

rooted at a node contains values no larger than that contained at the

node itself. A min-heap is organized in the opposite way: the min-heap

property is that for every node i other than the root,

A[PARENT( i)] ≤ A[ i].

The smallest element in a min-heap is at the root.

The heapsort algorithm uses max-heaps. Min-heaps commonly

implement priority queues, which we discuss in Section 6.5. We’ll be precise in specifying whether we need a max-heap or a min-heap for any

particular application, and when properties apply to either max-heaps

or min-heaps, we just use the term “heap.”

Viewing a heap as a tree, we define the height of a node in a heap to

be the number of edges on the longest simple downward path from the

node to a leaf, and we define the height of the heap to be the height of

its root. Since a heap of n elements is based on a complete binary tree,

its height is Θ(lg n) (see Exercise 6.1-2). As we’ll see, the basic operations on heaps run in time at most proportional to the height of

the tree and thus take O(lg n) time. The remainder of this chapter presents some basic procedures and shows how they are used in a

sorting algorithm and a priority-queue data structure.

The MAX-HEAPIFY procedure, which runs in O(lg n) time, is

the key to maintaining the max-heap property.

The BUILD-MAX-HEAP procedure, which runs in linear time,

produces a max-heap from an unordered input array.

The HEAPSORT procedure, which runs in O( n lg n) time, sorts an array in place.

The

procedures

MAX-HEAP-INSERT,

MAX-HEAP-

EXTRACT-MAX, MAX-HEAP-INCREASE-KEY, and MAX-

HEAP-MAXIMUM allow the heap data structure to implement

a priority queue. They run in O(lg n) time plus the time for mapping between objects being inserted into the priority queue

and indices in the heap.

Exercises

6.1-1

What are the minimum and maximum numbers of elements in a heap of

height h?

6.1-2

Show that an n-element heap has height ⌊lg n⌊.

6.1-3

Show that in any subtree of a max-heap, the root of the subtree contains

the largest value occurring anywhere in that subtree.

6.1-4

Where in a max-heap might the smallest element reside, assuming that

all elements are distinct?

6.1-5

At which levels in a max-heap might the k th largest element reside, for 2

k ≤ ⌊ n/2⌊, assuming that all elements are distinct?

6.1-6

Is an array that is in sorted order a min-heap?

6.1-7

Is the array with values 〈33, 19, 20, 15, 13, 10, 2, 13, 16, 12〉 a max-heap?

6.1-8

Show that, with the array representation for storing an n-element heap,

the leaves are the nodes indexed by ⌊ n/2⌊ + 1, ⌊ n/2⌊ + 2, … , n.

6.2 Maintaining the heap property

The procedure MAX-HEAPIFY on the facing page maintains the max-

heap property. Its inputs are an array A with the heap-size attribute and an index i into the array. When it is called, MAX-HEAPIFY assumes

that the binary trees rooted at LEFT( i) and RIGHT( i) are max-heaps,

but that A[ i] might be smaller than its children, thus violating the max-heap property. MAX-HEAPIFY lets the value at A[ i] “float down” in

the max-heap so that the subtree rooted at index i obeys the max-heap

property.

Figure 6.2 illustrates the action of MAX-HEAPIFY. Each step determines the largest of the elements A[ i], A[LEFT( i)], and A[RIGHT( i)] and stores the index of the largest element in largest. If A[ i] is largest, then the subtree rooted at node i is already a max-heap and nothing else needs to be done. Otherwise, one of the two children

contains the largest element. Positions i and largest swap their contents, which causes node i and its children to satisfy the max-heap property.

The node indexed by largest, however, just had its value decreased, and

thus the subtree rooted at largest might violate the max-heap property.

Consequently, MAX-HEAPIFY calls itself recursively on that subtree.

Image 309

Figure 6.2 The action of MAX-HEAPIFY( A, 2), where A.heap-size = 10. The node that potentially violates the max-heap property is shown in blue. (a) The initial configuration, with A[2] at node i = 2 violating the max-heap property since it is not larger than both children. The max-heap property is restored for node 2 in (b) by exchanging A[2] with A[4], which destroys the max-heap property for node 4. The recursive call MAX-HEAPIFY( A, 4) now has i = 4. After A[4] and A[9] are swapped, as shown in (c), node 4 is fixed up, and the recursive call MAX-HEAPIFY( A, 9) yields no further change to the data structure.

MAX-HEAPIFY( A, i)

1 l = LEFT( i)

2 r = RIGHT( i)

3 if lA.heap-size and A[ l] > A[ i]

4

largest = l

5 else largest = i

6 if rA.heap-size and A[ r] > A[ largest]

7

largest = r

Image 310

8 if largesti

9

exchange A[ i] with A[ largest]

10

MAX-HEAPIFY( A, largest)

To analyze MAX-HEAPIFY, let T ( n) be the worst-case running

time that the procedure takes on a subtree of size at most n. For a tree

rooted at a given node i, the running time is the Θ(1) time to fix up the

relationships among the elements A[ i], A[LEFT( i)], and A[RIGHT( i)], plus the time to run MAX-HEAPIFY on a subtree rooted at one of the

children of node i (assuming that the recursive call occurs). The

children’s subtrees each have size at most 2 n/3 (see Exercise 6.2-2), and

therefore we can describe the running time of MAX-HEAPIFY by the

recurrence

The solution to this recurrence, by case 2 of the master theorem

(Theorem 4.1 on page 102), is T ( n) = O(lg n). Alternatively, we can characterize the running time of MAX-HEAPIFY on a node of height

h as O( h).

Exercises

6.2-1

Using Figure 6.2 as a model, illustrate the operation of MAX-HEAPIFY( A, 3) on the array A = 〈27, 17, 3, 16, 13, 10, 1, 5, 7, 12, 4, 8, 9, 0〉.

6.2-2

Show that each child of the root of an n-node heap is the root of a subtree containing at most 2 n/3 nodes. What is the smallest constant α

such that each subtree has at most α n nodes? How does that affect the

recurrence (6.1) and its solution?

6.2-3

Starting with the procedure MAX-HEAPIFY, write pseudocode for the

procedure MIN-HEAPIFY( A, i), which performs the corresponding

manipulation on a min-heap. How does the running time of MIN-HEAPIFY compare with that of MAX-HEAPIFY?

6.2-4

What is the effect of calling MAX-HEAPIFY( A, i) when the element A[ i] is larger than its children?

6.2-5

What is the effect of calling MAX-HEAPIFY( A, i) for i > A.heap-size/2?

6.2-6

The code for MAX-HEAPIFY is quite efficient in terms of constant

factors, except possibly for the recursive call in line 10, for which some

compilers might produce inefficient code. Write an efficient MAX-

HEAPIFY that uses an iterative control construct (a loop) instead of

recursion.

6.2-7

Show that the worst-case running time of MAX-HEAPIFY on a heap

of size n is Ω(lg n). ( Hint: For a heap with n nodes, give node values that cause MAX-HEAPIFY to be called recursively at every node on a

simple path from the root down to a leaf.)

6.3 Building a heap

The procedure BUILD-MAX-HEAP converts an array A[1 : n] into a

max-heap by calling MAX-HEAPIFY in a bottom-up manner. Exercise

6.1-8 says that the elements in the subarray A[⌊ n/2⌊ + 1 : n] are all leaves of the tree, and so each is a 1-element heap to begin with. BUILD-MAX-HEAP goes through the remaining nodes of the tree and runs

MAX-HEAPIFY on each one. Figure 6.3 shows an example of the action of BUILD-MAX-HEAP.

BUILD-MAX-HEAP( A, n)

1 A.heap-size = n

Image 311

2 for i = ⌊ n/2⌊ downto 1

3

MAX-HEAPIFY( A, i)

To show why BUILD-MAX-HEAP works correctly, we use the

following loop invariant:

At the start of each iteration of the for loop of lines 2–3, each

node i + 1, i + 2, … , n is the root of a max-heap.

We need to show that this invariant is true prior to the first loop

iteration, that each iteration of the loop maintains the invariant, that

the loop terminates, and that the invariant provides a useful property to

show correctness when the loop terminates.

Initialization: Prior to the first iteration of the loop, i = ⌊ n/2⌊. Each node

n/2⌊ + 1, ⌊ n/2⌊ + 2, … , n is a leaf and is thus the root of a trivial max-heap.

Maintenance: To see that each iteration maintains the loop invariant,

observe that the children of node i are numbered higher than i. By the

loop invariant, therefore, they are both roots of max-heaps. This is

precisely the condition required for the call MAX-HEAPIFY( A, i) to

make node i a max-heap root. Moreover, the MAX-HEAPIFY call

preserves the property that nodes i + 1, i + 2, … , n are all roots of max-heaps. Decrementing i in the for loop update reestablishes the

loop invariant for the next iteration.

Image 312

Figure 6.3 The operation of BUILD-MAX-HEAP, showing the data structure before the call to MAX-HEAPIFY in line 3 of BUILD-MAX-HEAP. The node indexed by i in each iteration is shown in blue. (a) A 10-element input array A and the binary tree it represents. The loop index i refers to node 5 before the call MAX-HEAPIFY( A, i). (b) The data structure that results. The loop index i for the next iteration refers to node 4. (c)–(e) Subsequent iterations of the for loop in BUILD-MAX-HEAP. Observe that whenever MAX-HEAPIFY is called on a node, the two

subtrees of that node are both max-heaps. (f) The max-heap after BUILD-MAX-HEAP

finishes.

Termination: The loop makes exactly ⌊ n/2⌊ iterations, and so it

terminates. At termination, i = 0. By the loop invariant, each node 1,

2, … , n is the root of a max-heap. In particular, node 1 is.

We can compute a simple upper bound on the running time of

BUILD-MAX-HEAP as follows. Each call to MAX-HEAPIFY costs

O(lg n) time, and BUILD-MAX-HEAP makes O( n) such calls. Thus, the running time is O( n lg n). This upper bound, though correct, is not as tight as it can be.

Image 313

Image 314

We can derive a tighter asymptotic bound by observing that the time

for MAX-HEAPIFY to run at a node varies with the height of the node

in the tree, and that the heights of most nodes are small. Our tighter

analysis relies on the properties that an n-element heap has height ⌊lg n

(see Exercise 6.1-2) and at most ⌈ n/2 h + 1⌉ nodes of any height h (see Exercise 6.3-4).

The time required by MAX-HEAPIFY when called on a node of

height h is O( h). Letting c be the constant implicit in the asymptotic notation, we can express the total cost of BUILD-MAX-HEAP as

being bounded from above by

. As Exercise 6.3-2 shows,

we have ⌈ n/2 h + 1⌉ ≥ 1/2 for 0 ≤ h ≤ ⌊lg n⌊. Since ⌈ x⌉ ≤ 2 x for any x ≥ 1/2, we have ⌈ n/2 h + 1⌉ ≤ n/2 h. We thus obtain

Hence, we can build a max-heap from an unordered array in linear time.

To build a min-heap, use the procedure BUILD-MIN-HEAP, which

is the same as BUILD-MAX-HEAP but with the call to MAX-

HEAPIFY in line 3 replaced by a call to MIN-HEAPIFY (see Exercise

6.2-3). BUILD-MIN-HEAP produces a min-heap from an unordered

linear array in linear time.

Exercises

6.3-1

Using Figure 6.3 as a model, illustrate the operation of BUILD-MAX-

HEAP on the array A = 〈5, 3, 17, 10, 84, 19, 6, 22, 9〉.

6.3-2

Show that ⌈ n/2 h + 1⌉ ≥ 1/2 for 0 ≤ h ≤ ⌊lg n⌊.

6.3-3

Why does the loop index i in line 2 of BUILD-MAX-HEAP decrease

from ⌊ n/2⌊ to 1 rather than increase from 1 to ⌊ n/2⌊?

6.3-4

Show that there are at most ⌈ n/2 h + 1⌉ nodes of height h in any n-

element heap.

6.4 The heapsort algorithm

The heapsort algorithm, given by the procedure HEAPSORT, starts by

calling the BUILD-MAX-HEAP procedure to build a max-heap on the

input array A[1 : n]. Since the maximum element of the array is stored at the root A[1], HEAPSORT can place it into its correct final position by

exchanging it with A[ n]. If the procedure then discards node n from the heap—and it can do so by simply decrementing A.heap-size—the

children of the root remain max-heaps, but the new root element might

violate the max-heap property. To restore the max-heap property, the

procedure just calls MAX-HEAPIFY( A, 1), which leaves a max-heap in

A[1 : n – 1]. The HEAPSORT procedure then repeats this process for the max-heap of size n – 1 down to a heap of size 2. (See Exercise 6.4-2

for a precise loop invariant.)

HEAPSORT( A, n)

1 BUILD-MAX-HEAP( A, n)

2 for i = n downto 2

3

exchange A[1] with A[ i]

4

A.heap-size = A.heap-size – 1

5

MAX-HEAPIFY( A, 1)

Figure 6.4 shows an example of the operation of HEAPSORT after

line 1 has built the initial max-heap. The figure shows the max-heap

before the first iteration of the for loop of lines 2–5 and after each

iteration.

Image 315

Figure 6.4 The operation of HEAPSORT. (a) The max-heap data structure just after BUILD-

MAX-HEAP has built it in line 1. (b)–(j) The max-heap just after each call of MAX-HEAPIFY

in line 5, showing the value of i at that time. Only blue nodes remain in the heap. Tan nodes contain the largest values in the array, in sorted order. (k) The resulting sorted array A.

The HEAPSORT procedure takes O( n lg n) time, since the call to BUILD-MAX-HEAP takes O( n) time and each of the n – 1 calls to MAX-HEAPIFY takes O(lg n) time.

Exercises

6.4-1

Using Figure 6.4 as a model, illustrate the operation of HEAPSORT on the array A = 〈5, 13, 2, 25, 7, 17, 20, 8, 4〉.

6.4-2

Argue the correctness of HEAPSORT using the following loop

invariant:

At the start of each iteration of the for loop of lines 2–5, the

subarray A[1 : i] is a max-heap containing the i smallest elements of A[1 : n], and the subarray A[ i + 1 : n] contains the n

i largest elements of A[1 : n], sorted.

6.4-3

What is the running time of HEAPSORT on an array A of length n that

is already sorted in increasing order? How about if the array is already

sorted in decreasing order?

6.4-4

Show that the worst-case running time of HEAPSORT is Ω( n lg n).

6.4-5

Show that when all the elements of A are distinct, the best-case running

time of HEAPSORT is Ω( n lg n).

6.5 Priority queues

In Chapter 8, we will see that any comparison-based sorting algorithm requires Ω( n lg n) comparisons and hence Ω( n lg n) time. Therefore, heapsort is asymptotically optimal among comparison-based sorting

algorithms. Yet, a good implementation of quicksort, presented in

Chapter 7, usually beats it in practice. Nevertheless, the heap data structure itself has many uses. In this section, we present one of the

most popular applications of a heap: as an efficient priority queue. As

with heaps, priority queues come in two forms: max-priority queues and

min-priority queues. We’ll focus here on how to implement max-priority

queues, which are in turn based on max-heaps. Exercise 6.5-3 asks you

to write the procedures for min-priority queues.

A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key. A max-priority queue supports the following operations:

INSERT( S, x, k) inserts the element x with key k into the set S, which is equivalent to the operation S = S ⋃ { x}.

MAXIMUM( S) returns the element of S with the largest key.

EXTRACT-MAX( S) removes and returns the element of S with the

largest key.

INCREASE-KEY( S, x, k) increases the value of element x’s key to the new value k, which is assumed to be at least as large as x’s current key value.

Among their other applications, you can use max-priority queues to

schedule jobs on a computer shared among multiple users. The max-

priority queue keeps track of the jobs to be performed and their relative

priorities. When a job is finished or interrupted, the scheduler selects the

highest-priority job from among those pending by calling EXTRACT-

MAX. The scheduler can add a new job to the queue at any time by

calling INSERT.

Alternatively, a min-priority queue supports the operations INSERT,

MINIMUM, EXTRACT-MIN, and DECREASE-KEY. A min-

priority queue can be used in an event-driven simulator. The items in

the queue are events to be simulated, each with an associated time of

occurrence that serves as its key. The events must be simulated in order

of their time of occurrence, because the simulation of an event can cause

other events to be simulated in the future. The simulation program calls

EXTRACT-MIN at each step to choose the next event to simulate. As

new events are produced, the simulator inserts them into the min-

priority queue by calling INSERT. We’ll see other uses for min-priority

queues, highlighting the DECREASE-KEY operation, in Chapters 21

and 22.

When you use a heap to implement a priority queue within a given

application, elements of the priority queue correspond to objects in the

application. Each object contains a key. If the priority queue is

implemented by a heap, you need to determine which application object

corresponds to a given heap element, and vice versa. Because the heap

elements are stored in an array, you need a way to map application

objects to and from array indices.

One way to map between application objects and heap elements uses

handles, which are additional information stored in the objects and heap

elements that give enough information to perform the mapping.

Handles are often implemented to be opaque to the surrounding code,

thereby maintaining an abstraction barrier between the application and

the priority queue. For example, the handle within an application object

might contain the corresponding index into the heap array. But since

only the code for the priority queue accesses this index, the index is

entirely hidden from the application code. Because heap elements

change locations within the array during heap operations, an actual

implementation of the priority queue, upon relocating a heap element,

must also update the array indices in the corresponding handles.

Conversely, each element in the heap might contain a pointer to the

corresponding application object, but the heap element knows this

pointer as only an opaque handle and the application maps this handle

to an application object. Typically, the worst-case overhead for

maintaining handles is O(1) per access.

As an alternative to incorporating handles in application objects, you

can store within the priority queue a mapping from application objects

to array indices in the heap. The advantage of doing so is that the

mapping is contained entirely within the priority queue, so that the

application objects need no further embellishment. The disadvantage

lies in the additional cost of establishing and maintaining the mapping.

One option for the mapping is a hash table (see Chapter 11). 1 The added expected time for a hash table to map an object to an array index

is just O(1), though the worst-case time can be as bad as Θ( n).

Let’s see how to implement the operations of a max-priority queue

using a max-heap. In the previous sections, we treated the array

elements as the keys to be sorted, implicitly assuming that any satellite data moved with the corresponding keys. When a heap implements a

priority queue, we instead treat each array element as a pointer to an

object in the priority queue, so that the object is analogous to the

satellite data when sorting. We further assume that each such object has

an attribute key, which determines where in the heap the object belongs.

For a heap implemented by an array A, we refer to A[ i]. key.

The procedure MAX-HEAP-MAXIMUM on the facing page

implements the MAXIMUM operation in Θ(1) time, and MAX-HEAP-

EXTRACT-MAX implements the operation EXTRACT-MAX. MAX-

HEAP-EXTRACT-MAX is similar to the for loop body (lines 3–5) of

the HEAPSORT procedure. We implicitly assume that MAX-

HEAPIFY compares priority-queue objects based on their key

attributes. We also assume that when MAX-HEAPIFY exchanges

elements in the array, it is exchanging pointers and also that it updates

the mapping between objects and array indices. The running time of

MAX-HEAP-EXTRACT-MAX is O(lg n), since it performs only a

constant amount of work on top of the O(lg n) time for MAX-

HEAPIFY, plus whatever overhead is incurred within MAX-

HEAPIFY for mapping priority-queue objects to array indices.

The procedure MAX-HEAP-INCREASE-KEY on page 176

implements the INCREASE-KEY operation. It first verifies that the

new key k will not cause the key in the object x to decrease, and if there is no problem, it gives x the new key value. The procedure then finds the

index i in the array corresponding to object x, so that A[ i] is x. Because increasing the key of A[ i] might violate the max-heap property, the procedure then, in a manner reminiscent of the insertion loop (lines 5–

7) of INSERTION-SORT on page 19, traverses a simple path from this

node toward the root to find a proper place for the newly increased key.

As MAX-HEAP-INCREASE-KEY traverses this path, it repeatedly

compares an element’s key to that of its parent, exchanging pointers and

continuing if the element’s key is larger, and terminating if the element’s

key is smaller, since the max-heap property now holds. (See Exercise

6.5-7 for a precise loop invariant.) Like MAX-HEAPIFY when used in

a priority queue, MAX-HEAP-INCREASE-KEY updates the

information that maps objects to array indices when array elements are

exchanged. Figure 6.5 shows an example of a MAX-HEAP-INCREASE-KEY operation. In addition to the overhead for mapping

priority queue objects to array indices, the running time of MAX-

HEAP-INCREASE-KEY on an n-element heap is O(lg n), since the path traced from the node updated in line 3 to the root has length O(lg

n).

MAX-HEAP-MAXIMUM( A)

1 if A.heap-size < 1

2

error “heap underflow”

3 return A[1]

MAX-HEAP-EXTRACT-MAX( A)

1 max = MAX-HEAP-MAXIMUM( A)

2 A[1] = A[ A.heap-size]

3 A.heap-size = A.heap-size – 1

4 MAX-HEAPIFY( A, 1)

5 return max

The procedure MAX-HEAP-INSERT on the next page implements

the INSERT operation. It takes as inputs the array A implementing the

max-heap, the new object x to be inserted into the max-heap, and the

size n of array A. The procedure first verifies that the array has room for the new element. It then expands the max-heap by adding to the tree a

new leaf whose key is –∞. Then it calls MAX-HEAP-INCREASE-KEY

to set the key of this new element to its correct value and maintain the

max-heap property. The running time of MAX-HEAP-INSERT on an

n-element heap is O(lg n) plus the overhead for mapping priority queue objects to indices.

In summary, a heap can support any priority-queue operation on a

set of size n in O(lg n) time, plus the overhead for mapping priority queue objects to array indices.

MAX-HEAP-INCREASE-KEY( A, x, k)

1 if k < x.key

2

error “new key is smaller than current key”

3 x.key = k

4 find the index i in array A where object x occurs

5 while i > 1 and A[PARENT( i)]. key < A[ i]. key 6

exchange A[ i] with A[PARENT( i)], updating the information that maps priority queue objects to array indices

7

i = PARENT( i)

MAX-HEAP-INSERT( A, x, n)

1 if A.heap-size == n

2

error “heap overflow”

3 A.heap-size = A.heap-size + 1

4 k = x.key

5 x.key = –∞

6 A[ A.heap-size] = x

7 map x to index heap-size in the array

8 MAX-HEAP-INCREASE-KEY( A, x, k)

Exercises

6.5-1

Suppose that the objects in a max-priority queue are just keys. Illustrate

the operation of MAX-HEAP-EXTRACT-MAX on the heap A = 〈15,

13, 9, 5, 12, 8, 7, 4, 0, 6, 2, 1〉.

6.5-2

Suppose that the objects in a max-priority queue are just keys. Illustrate

the operation of MAX-HEAP-INSERT( A, 10) on the heap A = 〈15, 13,

9, 5, 12, 8, 7, 4, 0, 6, 2, 1〉.

6.5-3

Write pseudocode to implement a min-priority queue with a min-heap

by writing the procedures MIN-HEAP-MINIMUM, MIN-HEAP-

EXTRACT-MIN, MIN-HEAP-DECREASE-KEY, and MIN-HEAP-

INSERT.

Image 316

6.5-4

Write pseudocode for the procedure MAX-HEAP-DECREASE-

KEY( A, x, k) in a max-heap. What is the running time of your procedure?

Figure 6.5 The operation of MAX-HEAP-INCREASE-KEY. Only the key of each element in

the priority queue is shown. The node indexed by i in each iteration is shown in blue. (a) The max-heap of Figure 6.4(a) with i indexing the node whose key is about to be increased. (b) This node has its key increased to 15. (c) After one iteration of the while loop of lines 5–7, the node and its parent have exchanged keys, and the index i moves up to the parent. (d) The max-heap after one more iteration of the while loop. At this point, A[PARENT( i)] ≥ A[ i]. The max-heap property now holds and the procedure terminates.

6.5-5

Why does MAX-HEAP-INSERT bother setting the key of the inserted

object to –∞ in line 5 given that line 8 will set the object’s key to the desired value?

6.5-6

Professor Uriah suggests replacing the while loop of lines 5–7 in MAX-

HEAP-INCREASE-KEY by a call to MAX-HEAPIFY. Explain the

flaw in the professor’s idea.

6.5-7

Argue the correctness of MAX-HEAP-INCREASE-KEY using the

following loop invariant:

At the start of each iteration of the while loop of lines 5–7:

a. If both nodes PARENT( i) and LEFT( i) exist, then

A[PARENT( i)]. keyA[LEFT( i)]. key.

b. If both nodes PARENT( i) and RIGHT( i) exist, then

A[PARENT( i)]. keyA[RIGHT( i)]. key.

c. The subarray A[1 : A.heap-size] satisfies the max-heap property,

except that there may be one violation, which is that A[ i]. key may be greater than A[PARENT( i)]. key.

You may assume that the subarray A[1 : A.heap-size] satisfies the max-

heap property at the time MAX-HEAP-INCREASE-KEY is called.

6.5-8

Each exchange operation on line 6 of MAX-HEAP-INCREASE-KEY

typically requires three assignments, not counting the updating of the

mapping from objects to array indices. Show how to use the idea of the

inner loop of INSERTION-SORT to reduce the three assignments to

just one assignment.

6.5-9

Show how to implement a first-in, first-out queue with a priority queue.

Show how to implement a stack with a priority queue. (Queues and

stacks are defined in Section 10.1.3.)

6.5-10

The operation MAX-HEAP-DELETE( A, x) deletes the object x from

max-heap A. Give an implementation of MAX-HEAP-DELETE for an

n-element max-heap that runs in O(lg n) time plus the overhead for mapping priority queue objects to array indices.

6.5-11

Give an O( n lg k)-time algorithm to merge k sorted lists into one sorted list, where n is the total number of elements in all the input lists. ( Hint: Use a min-heap for k-way merging.)

Problems

6-1 Building a heap using insertion

One way to build a heap is by repeatedly calling MAX-HEAP-INSERT

to insert the elements into the heap. Consider the procedure BUILD-

MAX-HEAP′ on the facing page. It assumes that the objects being

inserted are just the heap elements.

BUILD-MAX-HEAP′ ( A, n)

1 A.heap-size = 1

2 for i = 2 to n

3

MAX-HEAP-INSERT( A, A[ i], n)

a. Do the procedures BUILD-MAX-HEAP and BUILD-MAX-HEAP′

always create the same heap when run on the same input array? Prove

that they do, or provide a counterexample.

b. Show that in the worst case, BUILD-MAX-HEAP′ requires Θ( n lg n) time to build an n-element heap.

6-2 Analysis of d-ary heaps

A d-ary heap is like a binary heap, but (with one possible exception) nonleaf nodes have d children instead of two children. In all parts of this problem, assume that the time to maintain the mapping between

objects and heap elements is O(1) per operation.

a. Describe how to represent a d-ary heap in an array.

b. Using Θ-notation, express the height of a d-ary heap of n elements in terms of n and d.

c. Give an efficient implementation of EXTRACT-MAX in a d-ary max-heap. Analyze its running time in terms of d and n.

d. Give an efficient implementation of INCREASE-KEY in a d-ary

max-heap. Analyze its running time in terms of d and n.

e. Give an efficient implementation of INSERT in a d-ary max-heap.

Analyze its running time in terms of d and n.

6-3 Young tableaus

An m × n Young tableau is an m × n matrix such that the entries of each row are in sorted order from left to right and the entries of each column

are in sorted order from top to bottom. Some of the entries of a Young

tableau may be ∞, which we treat as nonexistent elements. Thus, a

Young tableau can be used to hold rmn finite numbers.

a. Draw a 4 × 4 Young tableau containing the elements {9, 16, 3, 2, 4, 8,

5, 14, 12}.

b. Argue that an m × n Young tableau Y is empty if Y [1, 1] = ∞. Argue that Y is full (contains mn elements) if Y [ m, n] < ∞.

c. Give an algorithm to implement EXTRACT-MIN on a nonempty m

× n Young tableau that runs in O( m + n) time. Your algorithm should use a recursive subroutine that solves an m × n problem by recursively

solving either an ( m – 1) × n or an m × ( n – 1) subproblem. ( Hint: Think about MAX-HEAPIFY.) Explain why your implementation of

EXTRACT-MIN runs in O( m + n) time.

d. Show how to insert a new element into a nonfull m × n Young tableau in O( m + n) time.

e. Using no other sorting method as a subroutine, show how to use an n

× n Young tableau to sort n 2 numbers in O( n 3) time.

f. Give an O( m + n)-time algorithm to determine whether a given number is stored in a given m × n Young tableau.

Chapter notes

Image 317

Image 318

The heapsort algorithm was invented by Williams [456], who also described how to implement a priority queue with a heap. The BUILD-MAX-HEAP procedure was suggested by Floyd [145]. Schaffer and Sedgewick [395] showed that in the best case, the number of times elements move in the heap during heapsort is approximately ( n/2) lg n

and that the average number of moves is approximately n lg n.

We use min-heaps to implement min-priority queues in Chapters 15,

21, and 22. Other, more complicated, data structures give better time bounds for certain min-priority queue operations. Fredman and Tarjan

[156] developed Fibonacci heaps, which support INSERT and

DECREASE-KEY in O(1) amortized time (see Chapter 16). That is, the average worst-case running time for these operations is O(1). Brodal,

Lagogiannis, and Tarjan [73] subsequently devised strict Fibonacci heaps, which make these time bounds the actual running times. If the

keys are unique and drawn from the set {0, 1, … , n – 1} of nonnegative

integers, van Emde Boas trees [440, 441] support the operations INSERT, DELETE, SEARCH, MINIMUM, MAXIMUM,

PREDECESSOR, and SUCCESSOR in O(lg lg n) time.

If the data are b-bit integers, and the computer memory consists of

addressable b-bit words, Fredman and Willard [157] showed how to implement MINIMUM in O(1) time and INSERT and EXTRACT-MIN in

time. Thorup [436] has improved the

bound to

O(lg lg n) time by using randomized hashing, requiring only linear space.

An important special case of priority queues occurs when the

sequence of EXTRACT-MIN operations is monotone, that is, the values

returned by successive EXTRACT-MIN operations are monotonically

increasing over time. This case arises in several important applications,

such as Dijkstra’s single-source shortest-paths algorithm, which we

discuss in Chapter 22, and in discrete-event simulation. For Dijkstra’s algorithm it is particularly important that the DECREASE-KEY

operation be implemented efficiently. For the monotone case, if the data

are integers in the range 1, 2, … , C, Ahuja, Mehlhorn, Orlin, and

Tarjan [8] describe how to implement EXTRACT-MIN and INSERT in

O(lg C) amortized time (Chapter 16 presents amortized analysis) and

Image 319

DECREASE-KEY in O(1) time, using a data structure called a radix

heap. The O(lg C) bound can be improved to

using Fibonacci

heaps in conjunction with radix heaps. Cherkassky, Goldberg, and

Silverstein [90] further improved the bound to O(lg1/3+ ϵ C) expected time by combining the multilevel bucketing structure of Denardo and

Fox [112] with the heap of Thorup mentioned earlier. Raman [375]

further improved these results to obtain a bound of O(min {lg1/4+ ϵ C, lg1/3+ ϵ n}), for any fixed ϵ > 0.

Many other variants of heaps have been proposed. Brodal [72]

surveys some of these developments.

1 In Python, dictionaries are implemented with hash tables.

7 Quicksort

The quicksort algorithm has a worst-case running time of Θ( n 2) on an

input array of n numbers. Despite this slow worst-case running time,

quicksort is often the best practical choice for sorting because it is

remarkably efficient on average: its expected running time is Θ( n lg n) when all numbers are distinct, and the constant factors hidden in the

Θ( n lg n) notation are small. Unlike merge sort, it also has the advantage of sorting in place (see page 158), and it works well even in

virtual-memory environments.

Our study of quicksort is broken into four sections. Section 7.1

describes the algorithm and an important subroutine used by quicksort

for partitioning. Because the behavior of quicksort is complex, we’ll

start with an intuitive discussion of its performance in Section 7.2 and analyze it precisely at the end of the chapter. Section 7.3 presents a randomized version of quicksort. When all elements are distinct, 1 this randomized algorithm has a good expected running time and no

particular input elicits its worst-case behavior. (See Problem 7-2 for the

case in which elements may be equal.) Section 7.4 analyzes the randomized algorithm, showing that it runs in Θ( n 2) time in the worst

case and, assuming distinct elements, in expected O( n lg n) time.

7.1 Description of quicksort

Quicksort, like merge sort, applies the divide-and-conquer method introduced in Section 2.3.1. Here is the three-step divide-and-conquer process for sorting a subarray A[ p : r]:

Divide by partitioning (rearranging) the array A[ p : r] into two (possibly empty) subarrays A[ p : q – 1] (the low side) and A[ q + 1 : r] (the high side) such that each element in the low side of the partition is less than

or equal to the pivot A[ q], which is, in turn, less than or equal to each element in the high side. Compute the index q of the pivot as part of

this partitioning procedure.

Conquer by calling quicksort recursively to sort each of the subarrays

A[ p : q – 1] and A[ q + 1 : r].

Combine by doing nothing: because the two subarrays are already

sorted, no work is needed to combine them. All elements in A[ p : q

1] are sorted and less than or equal to A[ q], and all elements in A[ q + 1

: r] are sorted and greater than or equal to the pivot A[ q]. The entire subarray A[ p : r] cannot help but be sorted!

The QUICKSORT procedure implements quicksort. To sort an

entire n-element array A[1 : n], the initial call is QUICKSORT ( A, 1, n).

QUICKSORT( A, p, r)

1 if p < r

2

// Partition the subarray around the pivot, which ends up in A[ q].

3

q = PARTITION( A, p, r)

4

QUICKSORT( A, p, q – 1)

// recursively sort the low side

5

QUICKSORT( A, q + 1, r)

// recursively sort the high side

Partitioning the array

The key to the algorithm is the PARTITION procedure on the next

page, which rearranges the subarray A[ p : r] in place, returning the index of the dividing point between the two sides of the partition.

Figure 7.1 shows how PARTITION works on an 8-element array.

PARTITION always selects the element x = A[ r] as the pivot. As the

procedure runs, each element falls into exactly one of four regions, some of which may be empty. At the start of each iteration of the for loop in

lines 3–6, the regions satisfy certain properties, shown in Figure 7.2. We state these properties as a loop invariant:

PARTITION( A, p, r)

1 x = A[ r]

// the pivot

2 i = p – 1

// highest index into the low side

3 for j = p to r – 1

// process each element other than the

pivot

4

if A[ j] ≤ x

// does this element belong on the low

side?

5

i = i + 1

// index of a new slot in the low side

6

exchange A[ i] with// put this element there

A[ j]

7 exchange A[ i + 1] with// pivot goes just to the right of the low A[ r]

side

8 return i + 1

// new index of the pivot

At the beginning of each iteration of the loop of lines 3–6, for

any array index k, the following conditions hold:

1. if pki, then A[ k] ≤ x (the tan region of Figure 7.2); 2. if i + 1 ≤ kj – 1, then A[ k] > x (the blue region); 3. if k = r, then A[ k] = x (the yellow region).

We need to show that this loop invariant is true prior to the first

iteration, that each iteration of the loop maintains the invariant, that

the loop terminates, and that correctness follows from the invariant

when the loop terminates.

Initialization: Prior to the first iteration of the loop, we have i = p – 1

and j = p. Because no values lie between p and i and no values lie between i + 1 and j – 1, the first two conditions of the loop invariant are trivially satisfied. The assignment in line 1 satisfies the third

condition.

Maintenance: As Figure 7.3 shows, we consider two cases, depending on the outcome of the test in line 4. Figure 7.3(a) shows what happens when A[ j] > x: the only action in the loop is to increment j. After j has been incremented, the second condition holds for A[ j – 1] and all other entries remain unchanged. Figure 7.3(b) shows what happens when A[ j] ≤ x: the loop increments i, swaps A[ i] and A[ j], and then increments j. Because of the swap, we now have that A[ i] ≤ x, and condition 1 is satisfied. Similarly, we also have that A[ j – 1] > x, since the item that was swapped into A[ j – 1] is, by the loop invariant, greater than x.

Termination: Since the loop makes exactly rp iterations, it terminates, whereupon j = r. At that point, the unexamined subarray A[ j : r – 1] is empty, and every entry in the array belongs to one of the other three

sets described by the invariant. Thus, the values in the array have been

partitioned into three sets: those less than or equal to x (the low side),

those greater than x (the high side), and a singleton set containing x

(the pivot).

Image 320

Figure 7.1 The operation of PARTITION on a sample array. Array entry A[ r] becomes the pivot element x. Tan array elements all belong to the low side of the partition, with values at most x.

Blue elements belong to the high side, with values greater than x. White elements have not yet been put into either side of the partition, and the yellow element is the pivot x. (a) The initial array and variable settings. None of the elements have been placed into either side of the partition. (b) The value 2 is “swapped with itself” and put into the low side. (c)–(d) The values 8

and 7 are placed into to high side. (e) The values 1 and 8 are swapped, and the low side grows. (f) The values 3 and 7 are swapped, and the low side grows. (g)–(h) The high side of the partition grows to include 5 and 6, and the loop terminates. (i) Line 7 swaps the pivot element so that it lies between the two sides of the partition, and line 8 returns the pivot’s new index.

The final two lines of PARTITION finish up by swapping the pivot

with the leftmost element greater than x, thereby moving the pivot into

its correct place in the partitioned array, and then returning the pivot’s

new index. The output of PARTITION now satisfies the specifications

given for the divide step. In fact, it satisfies a slightly stronger condition:

Image 321

Image 322

after line 3 of QUICKSORT, A[ q] is strictly less than every element of

A[ q + 1 : r].

Figure 7.2 The four regions maintained by the procedure PARTITION on a subarray A[ p : r].

The tan values in A[ p : i] are all less than or equal to x, the blue values in A[ i + 1 : j – 1] are all greater than x, the white values in A[ j : r – 1] have unknown relationships to x, and A[ r] = x.

Figure 7.3 The two cases for one iteration of procedure PARTITION. (a) If A[ j] > x, the only action is to increment j, which maintains the loop invariant. (b) If A[ j] ≤ x, index i is incremented, A[ i] and A[ j] are swapped, and then j is incremented. Again, the loop invariant is maintained.

Exercise 7.1-3 asks you to show that the running time of

PARTITION on a subarray A[ p : r] of n = rp + 1 elements is Θ( n).

Exercises

7.1-1

Using Figure 7.1 as a model, illustrate the operation of PARTITION on the array A = 〈13, 19, 9, 5, 12, 8, 7, 4, 21, 2, 6, 11〉.

7.1-2

What value of q does PARTITION return when all elements in the

subarray A[ p : r] have the same value? Modify PARTITION so that q =

⌊( p + r)/2⌊ when all elements in the subarray A[ p : r] have the same value.

7.1-3

Give a brief argument that the running time of PARTITION on a

subarray of size n is Θ( n).

7.1-4

Modify QUICKSORT to sort into monotonically decreasing order.

7.2 Performance of quicksort

The running time of quicksort depends on how balanced each

partitioning is, which in turn depends on which elements are used as

pivots. If the two sides of a partition are about the same size—the

partitioning is balanced—then the algorithm runs asymptotically as fast

as merge sort. If the partitioning is unbalanced, however, it can run

asymptotically as slowly as insertion sort. To allow you to gain some

intuition before diving into a formal analysis, this section informally

investigates how quicksort performs under the assumptions of balanced

versus unbalanced partitioning.

But first, let’s briefly look at the maximum amount of memory that

quicksort requires. Although quicksort sorts in place according to the

definition on page 158, the amount of memory it uses—aside from the

array being sorted—is not constant. Since each recursive call requires a

constant amount of space on the runtime stack, outside of the array

being sorted, quicksort requires space proportional to the maximum

depth of the recursion. As we’ll see now, that could be as bad as Θ( n) in

the worst case.

Worst-case partitioning

The worst-case behavior for quicksort occurs when the partitioning

produces one subproblem with n – 1 elements and one with 0 elements.

(See Section 7.4.1.) Let us assume that this unbalanced partitioning arises in each recursive call. The partitioning costs Θ( n) time. Since the

recursive call on an array of size 0 just returns without doing anything,

T (0) = Θ(1), and the recurrence for the running time is

T ( n) = T ( n – 1) + T (0) + Θ( n)

= T ( n – 1) + Θ( n).

By summing the costs incurred at each level of the recursion, we obtain

an arithmetic series (equation (A.3) on page 1141), which evaluates to

Θ( n 2). Indeed, the substitution method can be used to prove that the recurrence T ( n) = T ( n – 1) + Θ( n) has the solution T ( n) = Θ( n 2). (See Exercise 7.2-1.)

Thus, if the partitioning is maximally unbalanced at every recursive

level of the algorithm, the running time is Θ( n 2). The worst-case

running time of quicksort is therefore no better than that of insertion

sort. Moreover, the Θ( n 2) running time occurs when the input array is

already completely sorted—a situation in which insertion sort runs in

O( n) time.

Best-case partitioning

In the most even possible split, PARTITION produces two

subproblems, each of size no more than n/2, since one is of size ⌊( n

1)/2⌊ ≤ n/2 and one of size ⌈( n – 1)/2⌉ – 1 ≤ n/2. In this case, quicksort runs much faster. An upper bound on the running time can then be

described by the recurrence

T ( n) = 2 T ( n/2) + Θ( n).

By case 2 of the master theorem (Theorem 4.1 on page 102), this

recurrence has the solution T ( n) = Θ( n lg n). Thus, if the partitioning is

equally balanced at every level of the recursion, an asymptotically faster algorithm results.

Balanced partitioning

As the analyses in Section 7.4 will show, the average-case running time of quicksort is much closer to the best case than to the worst case. By

appreciating how the balance of the partitioning affects the recurrence

describing the running time, we can gain an understanding of why.

Suppose, for example, that the partitioning algorithm always

produces a 9-to-1 proportional split, which at first blush seems quite

unbalanced. We then obtain the recurrence

T ( n) = T (9 n/10) + T ( n/10) + Θ( n), on the running time of quicksort. Figure 7.4 shows the recursion tree for this recurrence, where for simplicity the Θ( n) driving function has been replaced by n, which won’t affect the asymptotic solution of the recurrence (as Exercise 4.7-1 on page 118 justifies). Every level of the

tree has cost n, until the recursion bottoms out in a base case at depth

log10 n = Θ(lg n), and then the levels have cost at most n. The recursion terminates at depth log10/9 n = Θ(lg n). Thus, with a 9-to-1

proportional split at every level of recursion, which intuitively seems

highly unbalanced, quicksort runs in O( n lg n) time—asymptotically the same as if the split were right down the middle. Indeed, even a 99-to-1

split yields an O( n lg n) running time. In fact, any split of constant proportionality yields a recursion tree of depth Θ(lg n), where the cost

at each level is O( n). The running time is therefore O( n lg n) whenever the split has constant proportionality. The ratio of the split affects only

the constant hidden in the O-notation.

Image 323

Figure 7.4 A recursion tree for QUICKSORT in which PARTITION always produces a 9-to-1

split, yielding a running time of O( n lg n). Nodes show subproblem sizes, with per-level costs on the right.

Intuition for the average case

To develop a clear notion of the expected behavior of quicksort, we

must assume something about how its inputs are distributed. Because

quicksort determines the sorted order using only comparisons between

input elements, its behavior depends on the relative ordering of the

values in the array elements given as the input, not on the particular

values in the array. As in the probabilistic analysis of the hiring problem

in Section 5.2, assume that all permutations of the input numbers are equally likely and that the elements are distinct.

When quicksort runs on a random input array, the partitioning is

highly unlikely to happen in the same way at every level, as our informal

analysis has assumed. We expect that some of the splits will be

reasonably well balanced and that some will be fairly unbalanced. For

example, Exercise 7.2-6 asks you to show that about 80% of the time

PARTITION produces a split that is at least as balanced as 9 to 1, and

Image 324

about 20% of the time it produces a split that is less balanced than 9 to

1.

Figure 7.5 (a) Two levels of a recursion tree for quicksort. The partitioning at the root costs n and produces a “bad” split: two subarrays of sizes 0 and n – 1. The partitioning of the subarray of size n – 1 costs n – 1 and produces a “good” split: subarrays of size ( n – 1)/2 – 1 and ( n – 1)/2.

(b) A single level of a recursion tree that is well balanced. In both parts, the partitioning cost for the subproblems shown with blue shading is Θ( n). Yet the subproblems remaining to be solved in (a), shown with tan shading, are no larger than the corresponding subproblems remaining to be solved in (b).

In the average case, PARTITION produces a mix of “good” and

“bad” splits. In a recursion tree for an average-case execution of

PARTITION, the good and bad splits are distributed randomly

throughout the tree. Suppose for the sake of intuition that the good and

bad splits alternate levels in the tree, and that the good splits are best-

case splits and the bad splits are worst-case splits. Figure 7.5(a) shows the splits at two consecutive levels in the recursion tree. At the root of

the tree, the cost is n for partitioning, and the subarrays produced have

sizes n – 1 and 0: the worst case. At the next level, the subarray of size n

– 1 undergoes best-case partitioning into subarrays of size ( n – 1)/2 – 1

and ( n – 1)/2. Let’s assume that the base-case cost is 1 for the subarray

of size 0.

The combination of the bad split followed by the good split produces

three subarrays of sizes 0, ( n – 1)/2 – 1, and ( n – 1)/2 at a combined partitioning cost of Θ( n) + Θ( n – 1) = Θ( n). This situation is at most a constant factor worse than that in Figure 7.5(b), namely, where a single level of partitioning produces two subarrays of size ( n – 1)/2, at a cost of

Θ( n). Yet this latter situation is balanced! Intuitively, the Θ( n – 1) cost of the bad split in Figure 7.5(a) can be absorbed into the Θ( n) cost of the good split, and the resulting split is good. Thus, the running time of

quicksort, when levels alternate between good and bad splits, is like the running time for good splits alone: still O( n lg n), but with a slightly larger constant hidden by the O-notation. We’ll analyze the expected

running time of a randomized version of quicksort rigorously in Section

7.4.2.

Exercises

7.2-1

Use the substitution method to prove that the recurrence T ( n) = T ( n

1) + Θ( n) has the solution T ( n) = Θ( n 2), as claimed at the beginning of

Section 7.2.

7.2-2

What is the running time of QUICKSORT when all elements of array A

have the same value?

7.2-3

Show that the running time of QUICKSORT is Θ( n 2) when the array A

contains distinct elements and is sorted in decreasing order.

7.2-4

Banks often record transactions on an account in order of the times of

the transactions, but many people like to receive their bank statements

with checks listed in order by check number. People usually write checks

in order by check number, and merchants usually cash them with

reasonable dispatch. The problem of converting time-of-transaction

ordering to check-number ordering is therefore the problem of sorting

almost-sorted input. Explain persuasively why the procedure

INSERTION-SORT might tend to beat the procedure QUICKSORT

on this problem.

7.2-5

Suppose that the splits at every level of quicksort are in the constant

proportion α to β, where α + β = 1 and 0 < αβ < 1. Show that the minimum depth of a leaf in the recursion tree is approximately log1/ α n

and that the maximum depth is approximately log1/ β n. (Don’t worry about integer round-off.)

7.2-6

Consider an array with distinct elements and for which all permutations

of the elements are equally likely. Argue that for any constant 0 < α

1/2, the probability is approximately 1 – 2 α that PARTITION produces

a split at least as balanced as 1 – α to α.

7.3 A randomized version of quicksort

In exploring the average-case behavior of quicksort, we have assumed

that all permutations of the input numbers are equally likely. This

assumption does not always hold, however, as, for example, in the

situation laid out in the premise for Exercise 7.2-4. Section 5.3 showed that judicious randomization can sometimes be added to an algorithm

to obtain good expected performance over all inputs. For quicksort,

randomization yields a fast and practical algorithm. Many software

libraries provide a randomized version of quicksort as their algorithm

of choice for sorting large data sets.

In Section 5.3, the RANDOMIZED-HIRE-ASSISTANT procedure

explicitly permutes its input and then runs the deterministic HIRE-

ASSISTANT procedure. We could do the same for quicksort as well,

but a different randomization technique yields a simpler analysis.

Instead of always using A[ r] as the pivot, a randomized version randomly chooses the pivot from the subarray A[ p : r], where each element in A[ p : r] has an equal probability of being chosen. It then exchanges that element with A[ r] before partitioning. Because the pivot is chosen randomly, we expect the split of the input array to be

reasonably well balanced on average.

The changes to PARTITION and QUICKSORT are small. The new

partitioning procedure, RANDOMIZED-PARTITION, simply swaps

before performing the partitioning. The new quicksort procedure,

RANDOMIZED-QUICKSORT, calls RANDOMIZED-PARTITION

instead of PARTITION. We’ll analyze this algorithm in the next section.

RANDOMIZED-PARTITION( A, p, r)

1 i = RANDOM( p, r)

2 exchange A[ r] with A[ i]

3 return PARTITION( A, p, r)

RANDOMIZED-QUICKSORT( A, p, r)

1 if p < r

2

q = RANDOMIZED-PARTITION( A, p, r)

3

RANDOMIZED-QUICKSORT( A, p, q – 1)

4

RANDOMIZED-QUICKSORT( A, q + 1, r)

Exercises

7.3-1

Why do we analyze the expected running time of a randomized

algorithm and not its worst-case running time?

7.3-2

When RANDOMIZED-QUICKSORT runs, how many calls are made

to the random-number generator RANDOM in the worst case? How

about in the best case? Give your answer in terms of Θ-notation.

7.4 Analysis of quicksort

Section 7.2 gave some intuition for the worst-case behavior of quicksort and for why we expect the algorithm to run quickly. This section

analyzes the behavior of quicksort more rigorously. We begin with a

worst-case analysis, which applies to either QUICKSORT or

RANDOMIZED-QUICKSORT, and conclude with an analysis of the

expected running time of RANDOMIZED-QUICKSORT.

7.4.1 Worst-case analysis

Image 325

We saw in Section 7.2 that a worst-case split at every level of recursion in quicksort produces a Θ( n 2) running time, which, intuitively, is the worst-case running time of the algorithm. We now prove this assertion.

We’ll use the substitution method (see Section 4.3) to show that the running time of quicksort is O( n 2). Let T ( n) be the worst-case time for the procedure QUICKSORT on an input of size n. Because the