Since coin flips are mutually independent, for any given event Aik, the
probability that all k flips are heads is
and thus the probability that a streak of heads of length at least 2 ⌈lg n⌉
begins in position i is quite small. There are at most n – 2 ⌈lg n⌉ + 1
positions where such a streak can begin. The probability that a streak of
heads of length at least 2 ⌈lg n⌉ begins anywhere is therefore





We can use inequality (5.10) to bound the length of the longest
streak. For j = 0, 1, 2, … , n, let Lj be the event that the longest streak of heads has length exactly j, and let L be the length of the longest streak.
By the definition of expected value, we have
We could try to evaluate this sum using upper bounds on each Pr { Lj}
similar to those computed in inequality (5.10). Unfortunately, this
method yields weak bounds. We can use some intuition gained by the
above analysis to obtain a good bound, however. For no individual term
in the summation in equation (5.11) are both the factors j and Pr { Lj}
large. Why? When j ≥ 2 ⌈lg n⌉, then Pr { Lj} is very small, and when j < 2
⌈lg n⌉, then j is fairly small. More precisely, since the events Lj for j = 0, 1, … , n are disjoint, the probability that a streak of heads of length at
least 2 ⌈lg n⌉ begins anywhere is
. Inequality (5.10) tells us
that the probability that a streak of heads of length at least 2 ⌈lg n⌉
begins anywhere is less than 1/ n, which means that
.
Also, noting that
, we have that
. Thus,
we obtain


The probability that a streak of heads exceeds r ⌈lg n⌉ flips diminishes quickly with r. Let’s get a rough bound on the probability that a streak of at least r ⌈lg n⌉ heads occurs, for r ≥ 1. The probability that a streak of at least r ⌈lg n⌉ heads starts in position i is A streak of at least r ⌈lg n⌉ heads cannot start in the last n – r ⌈lg n⌉ + 1
flips, but let’s overestimate the probability of such a streak by allowing it
to start anywhere within the n coin flips. Then the probability that a streak of at least r ⌈lg n⌉ heads occurs is at most

Equivalently, the probability is at least 1 – 1/ nr–1 that the longest streak
has length less than r ⌈lg n⌉.
As an example, during n = 1000 coin flips, the probability of
encountering a streak of at least 2 ⌈lg n⌉ = 20 heads is at most 1/ n =
1/1000. The chance of a streak of at least 3 ⌈lg n⌉ = 30 heads is at most
1/ n 2 = 1/1,000,000.
Let’s now prove a complementary lower bound: the expected length
of the longest streak of heads in n coin flips is Ω(lg n). To prove this bound, we look for streaks of length s by partitioning the n flips into approximately n/ s groups of s flips each. If we choose s = ⌊(lg n)/2⌊, we’ll see that it is likely that at least one of these groups comes up all heads,
which means that it’s likely that the longest streak has length at least s =
Ω(lg n). We’ll then show that the longest streak has expected length Ω(lg
n).
Let’s partition the n coin flips into at least ⌊ n/ ⌊(lg n)/2⌊⌊ groups of
⌊(lg n)/2⌊ consecutive flips and bound the probability that no group
comes up all heads. By equation (5.9), the probability that the group
starting in position i comes up all heads is
The probability that a streak of heads of length at least ⌊(lg n)/2⌊ does
not begin in position i is therefore at most
. Since the ⌊ n/ ⌊(lg
n)/2⌊⌊ groups are formed from mutually exclusive, independent coin
flips, the probability that every one of these groups fails to be a streak of
length ⌊(lg n)/2⌊ is at most


For this argument, we used inequality (3.14), 1 + x ≤ ex, on page 66 and the fact, which you may verify, that
for sufficiently
large n.
We want to bound the probability that the longest streak equals or
exceeds ⌊(lg n)/2⌊. To do so, let L be the event that the longest streak of heads equals or exceeds s = ⌊(lg n)/2⌊. Let L be the complementary event, that the longest streak of heads is strictly less than s, so that Pr
{ L} + Pr { L} = 1. Let F be the event that every group of s flips fails to be a streak of s heads. By inequality (5.12), we have Pr { F} = O(1/ n). If the longest streak of heads is less than s, then certainly every group of s flips fails to be a streak of s heads, which means that event L implies event F. Of course, event F could occur even if event L does not (for example, if a streak of s or more heads crosses over the boundary
between two groups), and so we have Pr { L} ≤ Pr { F} = O(1/ n). Since Pr
{ L} + Pr { L} = 1, we have that
Pr { L} = 1 – Pr { L}
≥ 1 – Pr { F}
= 1 – O(1/ n).
That is, the probability that the longest streak equals or exceeds ⌊(lg
n)/2⌊ is
We can now calculate a lower bound on the expected length of the
longest streak, beginning with equation (5.11) and proceeding in a
manner similar to our analysis of the upper bound:

As with the birthday paradox, we can obtain a simpler, but
approximate, analysis using indicator random variables. Instead of
determining the expected length of the longest streak, we’ll find the
expected number of streaks with at least a given length. Let Xik = I
{ Aik} be the indicator random variable associated with a streak of
heads of length at least k beginning with the i th coin flip. To count the total number of such streaks, define
Taking expectations and using linearity of expectation, we have

By plugging in various values for k, we can calculate the expected
number of streaks of length at least k. If this expected number is large
(much greater than 1), then we expect many streaks of length k to occur,
and the probability that one occurs is high. If this expected number is
small (much less than 1), then we expect to see few streaks of length k,
and the probability that one occurs is low. If k = c lg n, for some positive constant c, we obtain
If c is large, the expected number of streaks of length c lg n is small, and we conclude that they are unlikely to occur. On the other hand, if c =
1/2, then we obtain E [ X(1/2) lg n] = Θ(1/ n 1/2–1) = Θ( n 1/2), and we expect there to be numerous streaks of length (1/2) lg n. Therefore, one
streak of such a length is likely to occur. We can conclude that the
expected length of the longest streak is Θ(lg n).
5.4.4 The online hiring problem
As a final example, let’s consider a variant of the hiring problem.
Suppose now that you do not wish to interview all the candidates in
order to find the best one. You also want to avoid hiring and firing as
you find better and better applicants. Instead, you are willing to settle
for a candidate who is close to the best, in exchange for hiring exactly
once. You must obey one company requirement: after each interview
you must either immediately offer the position to the applicant or
immediately reject the applicant. What is the trade-off between
minimizing the amount of interviewing and maximizing the quality of
the candidate hired?
We can model this problem in the following way. After meeting an
applicant, you are able to give each one a score. Let score( i) denote the score you give to the i th applicant, and assume that no two applicants
receive the same score. After you have seen j applicants, you know which
of the j has the highest score, but you do not know whether any of the
remaining n – j applicants will receive a higher score. You decide to adopt the strategy of selecting a positive integer k < n, interviewing and then rejecting the first k applicants, and hiring the first applicant thereafter who has a higher score than all preceding applicants. If it
turns out that the best-qualified applicant was among the first k
interviewed, then you hire the n th applicant—the last one interviewed.
We formalize this strategy in the procedure ONLINE-MAXIMUM( k,
n), which returns the index of the candidate you wish to hire.
ONLINE-MAXIMUM( k, n)
1 best- score = –∞
2 for i = 1 to k
3
if score( i) > best- score
4
best-score = score( i)
5 for i = k + 1 to n
6
if score( i) > best-score
7
return i
8 return n

If we determine, for each possible value of k, the probability that you
hire the most qualified applicant, then you can choose the best possible
k and implement the strategy with that value. For the moment, assume
that k is fixed. Let M( j) = max { score( i) : 1 ≤ i ≤ j} denote the maximum score among applicants 1 through j. Let S be the event that you succeed in choosing the best-qualified applicant, and let Si be the event that you
succeed when the best-qualified applicant is the i th one interviewed.
Since the various Si are disjoint, we have that
.
Noting that you never succeed when the best-qualified applicant is one
of the first k, we have that Pr { Si} = 0 for i = 1, 2, … , k. Thus, we obtain
We now compute Pr { Si}. In order to succeed when the best-
qualified applicant is the i th one, two things must happen. First, the best-qualified applicant must be in position i, an event which we denote
by Bi. Second, the algorithm must not select any of the applicants in
positions k + 1 through i – 1, which happens only if, for each j such that k + 1 ≤ j ≤ i – 1, line 6 finds that score( j) < best-score. (Because scores are unique, we can ignore the possibility of score( j) = best-score.) In other words, all of the values score( k + 1) through score( i – 1) must be less than M( k). If any are greater than M( k), the algorithm instead returns the index of the first one that is greater. We use Oi to denote the event
that none of the applicants in position k + 1 through i – 1 are chosen.
Fortunately, the two events Bi and Oi are independent. The event Oi depends only on the relative ordering of the values in positions 1
through i – 1, whereas Bi depends only on whether the value in position i is greater than the values in all other positions. The ordering of the values in positions 1 through i – 1 does not affect whether the value in
position i is greater than all of them, and the value in position i does not affect the ordering of the values in positions 1 through i – 1. Thus, we
can apply equation (C.17) on page 1188 to obtain



Pr { Si} = Pr { Bi ∩ Oi} = Pr { Bi} Pr { Oi}.
We have Pr { Bi} = 1/ n since the maximum is equally likely to be in any
one of the n positions. For event Oi to occur, the maximum value in positions 1 through i –1, which is equally likely to be in any of these i – 1
positions, must be in one of the first k positions. Consequently, Pr { Oi}
= k/( i – 1) and Pr { Si} = k/( n( i – 1)). Using equation (5.14), we have We approximate by integrals to bound this summation from above and
below. By the inequalities (A.19) on page 1150, we have
Evaluating these definite integrals gives us the bounds
which provide a rather tight bound for Pr { S}. Because you wish to
maximize your probability of success, let us focus on choosing the value
of k that maximizes the lower bound on Pr { S}. (Besides, the lower-bound expression is easier to maximize than the upper-bound
expression.) Differentiating the expression ( k/ n)(ln n – ln k) with respect to k, we obtain
Setting this derivative equal to 0, we see that you maximize the lower bound on the probability when ln k = ln n – 1 = ln( n/ e) or, equivalently, when k = n/ e. Thus, if you implement our strategy with k = n/ e, you succeed in hiring the best-qualified applicant with probability at least
1/ e.
Exercises
5.4-1
How many people must there be in a room before the probability that
someone has the same birthday as you do is at least 1/2? How many
people must there be before the probability that at least two people have
a birthday on July 4 is greater than 1/2?
5.4-2
How many people must there be in a room before the probability that
two people have the same birthday is at least 0.99? For that many
people, what is the expected number of pairs of people who have the
same birthday?
5.4-3
You toss balls into b bins until some bin contains two balls. Each toss is
independent, and each ball is equally likely to end up in any bin. What
is the expected number of ball tosses?
★ 5.4-4
For the analysis of the birthday paradox, is it important that the
birthdays be mutually independent, or is pairwise independence
sufficient? Justify your answer.
★ 5.4-5
How many people should be invited to a party in order to make it likely
that there are three people with the same birthday?
★ 5.4-6
What is the probability that a k-string (defined on page 1179) over a set
of size n forms a k-permutation? How does this question relate to the
birthday paradox?
★ 5.4-7
You toss n balls into n bins, where each toss is independent and the ball is equally likely to end up in any bin. What is the expected number of
empty bins? What is the expected number of bins with exactly one ball?
★ 5.4-8
Sharpen the lower bound on streak length by showing that in n flips of a
fair coin, the probability is at least 1 – 1/ n that a streak of length lg n – 2
lg lg n consecutive heads occurs.
Problems
5-1 Probabilistic counting
With a b-bit counter, we can ordinarily only count up to 2 b – 1. With R.
Morris’s probabilistic counting, we can count up to a much larger value
at the expense of some loss of precision.
We let a counter value of i represent a count of ni for i = 0, 1, … , 2 b
– 1, where the ni form an increasing sequence of nonnegative values. We
assume that the initial value of the counter is 0, representing a count of
n 0 = 0. The INCREMENT operation works on a counter containing
the value i in a probabilistic manner. If i = 2 b – 1, then the operation reports an overflow error. Otherwise, the INCREMENT operation
increases the counter by 1 with probability 1/( ni + 1 – ni), and it leaves the counter unchanged with probability 1 – 1/( ni + 1 – ni).
If we select ni = i for all i ≥ 0, then the counter is an ordinary one.
More interesting situations arise if we select, say, ni = 2 i – 1 for i > 0 or ni = Fi (the i th Fibonacci number—see equation (3.31) on page 69).
For this problem, assume that
is large enough that the
probability of an overflow error is negligible.
a. Show that the expected value represented by the counter after n INCREMENT operations have been performed is exactly n.
b. The analysis of the variance of the count represented by the counter
depends on the sequence of the ni. Let us consider a simple case: ni =
100 i for all i ≥ 0. Estimate the variance in the value represented by the register after n INCREMENT operations have been performed.
5-2 Searching an unsorted array
This problem examines three algorithms for searching for a value x in
an unsorted array A consisting of n elements.
Consider the following randomized strategy: pick a random index i
into A. If A[ i] = x, then terminate; otherwise, continue the search by picking a new random index into A. Continue picking random indices
into A until you find an index j such that A[ j] = x or until every element of A has been checked. This strategy may examine a given element more
than once, because it picks from the whole set of indices each time.
a. Write pseudocode for a procedure RANDOM-SEARCH to
implement the strategy above. Be sure that your algorithm terminates
when all indices into A have been picked.
b. Suppose that there is exactly one index i such that A[ i] = x. What is the expected number of indices into A that must be picked before x is
found and RANDOM-SEARCH terminates?
c. Generalizing your solution to part (b), suppose that there are k ≥ 1
indices i such that A[ i] = x. What is the expected number of indices into A that must be picked before x is found and RANDOM-SEARCH terminates? Your answer should be a function of n and k.
d. Suppose that there are no indices i such that A[ i] = x. What is the expected number of indices into A that must be picked before all
elements of A have been checked and RANDOM-SEARCH
terminates?
Now consider a deterministic linear search algorithm. The algorithm,
which we call DETERMINISTIC-SEARCH, searches A for x in order,
considering A[1], A[2], A[3], … , A[ n] until either it finds A[ i] = x or it reaches the end of the array. Assume that all possible permutations of
the input array are equally likely.
e. Suppose that there is exactly one index i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH?
What is the worst-case running time of DETERMINISTIC-
SEARCH?
f. Generalizing your solution to part (e), suppose that there are k ≥ 1
indices i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH? What is the worst-case running time
of DETERMINISTIC-SEARCH? Your answer should be a function
of n and k.
g. Suppose that there are no indices i such that A[ i] = x. What is the average-case running time of DETERMINISTIC-SEARCH? What is
the worst-case running time of DETERMINISTIC-SEARCH?
Finally, consider a randomized algorithm SCRAMBLE-SEARCH that
first randomly permutes the input array and then runs the deterministic
linear search given above on the resulting permuted array.
h. Letting k be the number of indices i such that A[ i] = x, give the worst-case and expected running times of SCRAMBLE-SEARCH for the
cases in which k = 0 and k = 1. Generalize your solution to handle the
case in which k ≥ 1.
i. Which of the three searching algorithms would you use? Explain your
answer.
Chapter notes
Bollobás [65], Hofri [223], and Spencer [420] contain a wealth of advanced probabilistic techniques. The advantages of randomized
algorithms are discussed and surveyed by Karp [249] and Rabin [372].
The textbook by Motwani and Raghavan [336] gives an extensive treatment of randomized algorithms.
The RANDOMLY-PERMUTE procedure is by Durstenfeld [128], based on an earlier procedure by Fisher and Yates [143, p. 34].
Several variants of the hiring problem have been widely studied.
These problems are more commonly referred to as “secretary
problems.” Examples of work in this area are the paper by Ajtai,
Meggido, and Waarts [11] and another by Kleinberg [258], which ties the secretary problem to online ad auctions.
Part II Sorting and Order Statistics

This part presents several algorithms that solve the following sorting
problem:
Input: A sequence of n numbers 〈 a 1, a 2, … , an〉.
Output: A permutation (reordering)
of the input sequence
such that
.
The input sequence is usually an n-element array, although it may be represented in some other fashion, such as a linked list.
The structure of the data
In practice, the numbers to be sorted are rarely isolated values. Each is
usually part of a collection of data called a record. Each record contains
a key, which is the value to be sorted. The remainder of the record consists of satellite data, which are usually carried around with the key.
In practice, when a sorting algorithm permutes the keys, it must
permute the satellite data as well. If each record includes a large amount
of satellite data, it often pays to permute an array of pointers to the
records rather than the records themselves in order to minimize data
movement.
In a sense, it is these implementation details that distinguish an
algorithm from a full-blown program. A sorting algorithm describes the
method to determine the sorted order, regardless of whether what’s
being sorted are individual numbers or large records containing many
bytes of satellite data. Thus, when focusing on the problem of sorting, we typically assume that the input consists only of numbers. Translating
an algorithm for sorting numbers into a program for sorting records is
conceptually straightforward, although in a given engineering situation
other subtleties may make the actual programming task a challenge.
Why sorting?
Many computer scientists consider sorting to be the most fundamental
problem in the study of algorithms. There are several reasons:
Sometimes an application inherently needs to sort information.
For example, in order to prepare customer statements, banks need
to sort checks by check number.
Algorithms often use sorting as a key subroutine. For example, a
program that renders graphical objects which are layered on top
of each other might have to sort the objects according to an
“above” relation so that it can draw these objects from bottom to
top. We will see numerous algorithms in this text that use sorting
as a subroutine.
We can draw from among a wide variety of sorting algorithms,
and they employ a rich set of techniques. In fact, many important
techniques used throughout algorithm design appear in sorting
algorithms that have been developed over the years. In this way,
sorting is also a problem of historical interest.
We can prove a nontrivial lower bound for sorting (as we’ll do in
Chapter 8). Since the best upper bounds match the lower bound
asymptotically, we can conclude that certain of our sorting
algorithms are asymptotically optimal. Moreover, we can use the
lower bound for sorting to prove lower bounds for various other
problems.
Many engineering issues come to the fore when implementing
sorting algorithms. The fastest sorting program for a particular
situation may depend on many factors, such as prior knowledge
about the keys and satellite data, the memory hierarchy (caches
and virtual memory) of the host computer, and the software
environment. Many of these issues are best dealt with at the algorithmic level, rather than by “tweaking” the code.
Sorting algorithms
We introduced two algorithms that sort n real numbers in Chapter 2.
Insertion sort takes Θ( n 2) time in the worst case. Because its inner loops
are tight, however, it is a fast sorting algorithm for small input sizes.
Moreover, unlike merge sort, it sorts in place, meaning that at most a
constant number of elements of the input array are ever stored outside
the array, which can be advantageous for space efficiency. Merge sort
has a better asymptotic running time, Θ( n lg n), but the MERGE
procedure it uses does not operate in place. (We’ll see a parallelized
version of merge sort in Section 26.3. )
This part introduces two more algorithms that sort arbitrary real
numbers. Heapsort, presented in Chapter 6, sorts n numbers in place in O( n lg n) time. It uses an important data structure, called a heap, which can also implement a priority queue.
Quicksort, in Chapter 7, also sorts n numbers in place, but its worst-case running time is Θ( n 2). Its expected running time is Θ( n lg n), however, and it generally outperforms heapsort in practice. Like
insertion sort, quicksort has tight code, and so the hidden constant
factor in its running time is small. It is a popular algorithm for sorting
large arrays.
Insertion sort, merge sort, heapsort, and quicksort are all
comparison sorts: they determine the sorted order of an input array by
comparing elements. Chapter 8 begins by introducing the decision-tree model in order to study the performance limitations of comparison
sorts. Using this model, we prove a lower bound of Ω( n lg n) on the worst-case running time of any comparison sort on n inputs, thus
showing that heapsort and merge sort are asymptotically optimal
comparison sorts.
Chapter 8 then goes on to show that we might be able to beat this
lower bound of Ω( n lg n) if an algorithm can gather information about
the sorted order of the input by means other than comparing elements.
The counting sort algorithm, for example, assumes that the input
numbers belong to the set {0, 1, … , k}. By using array indexing as a tool for determining relative order, counting sort can sort n numbers in
Θ( k + n) time. Thus, when k = O( n), counting sort runs in time that is linear in the size of the input array. A related algorithm, radix sort, can
be used to extend the range of counting sort. If there are n integers to
sort, each integer has d digits, and each digit can take on up to k possible values, then radix sort can sort the numbers in Θ( d( n + k)) time.
When d is a constant and k is O( n), radix sort runs in linear time. A third algorithm, bucket sort, requires knowledge of the probabilistic
distribution of numbers in the input array. It can sort n real numbers uniformly distributed in the half-open interval [0, 1) in average-case
O( n) time.
The table on the following page summarizes the running times of the
sorting algorithms from Chapters 2 and 6–8. As usual, n denotes the number of items to sort. For counting sort, the items to sort are integers
in the set {0, 1, … , k}. For radix sort, each item is a d-digit number, where each digit takes on k possible values. For bucket sort, we assume
that the keys are real numbers uniformly distributed in the half-open
interval [0, 1). The rightmost column gives the average-case or expected
running time, indicating which one it gives when it differs from the
worst-case running time. We omit the average-case running time of
heapsort because we do not analyze it in this book.
Worst-case
Average-case/expected
Algorithm running time
running time
Insertion
Θ( n 2)
Θ( n 2)
sort
Merge
Θ( n lg n)
Θ( n lg n)
sort
Heapsort
O( n lg n)
—
Quicksort Θ( n 2)
Θ( n lg n) (expected)
Counting Θ( k + n)
Θ( k + n)
sort
Radix
Θ( d( n + k))
Θ( d( n + k))
Bucket
Θ( n 2)
Θ( n) (average-case)
sort
Order statistics
The i th order statistic of a set of n numbers is the i th smallest number in the set. You can, of course, select the i th order statistic by sorting the
input and indexing the i th element of the output. With no assumptions
about the input distribution, this method runs in Ω( n lg n) time, as the lower bound proved in Chapter 8 shows.
Chapter 9 shows how to find the i th smallest element in O( n) time, even when the elements are arbitrary real numbers. We present a
randomized algorithm with tight pseudocode that runs in Θ( n 2) time in
the worst case, but whose expected running time is O( n). We also give a more complicated algorithm that runs in O( n) worst-case time.
Background
Although most of this part does not rely on difficult mathematics, some
sections do require mathematical sophistication. In particular, analyses
of quicksort, bucket sort, and the order-statistic algorithm use
probability, which is reviewed in Appendix C, and the material on probabilistic analysis and randomized algorithms in Chapter 5.
This chapter introduces another sorting algorithm: heapsort. Like
merge sort, but unlike insertion sort, heapsort’s running time is O( n lg n). Like insertion sort, but unlike merge sort, heapsort sorts in place: only a constant number of array elements are stored outside the input
array at any time. Thus, heapsort combines the better attributes of the
two sorting algorithms we have already discussed.
Heapsort also introduces another algorithm design technique: using
a data structure, in this case one we call a “heap,” to manage
information. Not only is the heap data structure useful for heapsort, but
it also makes an efficient priority queue. The heap data structure will
reappear in algorithms in later chapters.
The term “heap” was originally coined in the context of heapsort,
but it has since come to refer to “garbage-collected storage,” such as the
programming languages Java and Python provide. Please don’t be
confused. The heap data structure is not garbage-collected storage. This
book is consistent in using the term “heap” to refer to the data
structure, not the storage class.
The (binary) heap data structure is an array object that we can view as a
nearly complete binary tree (see Section B.5.3), as shown in Figure 6.1.
Each node of the tree corresponds to an element of the array. The tree is
completely filled on all levels except possibly the lowest, which is filled
from the left up to a point. An array A[1 : n] that represents a heap is an object with an attribute A.heap-size, which represents how many
elements in the heap are stored within array A. That is, although A[1 : n]
may contain numbers, only the elements in A[1 : A.heap-size], where 0 ≤
A.heap-size ≤ n, are valid elements of the heap. If A.heap-size = 0, then the heap is empty. The root of the tree is A[1], and given the index i of a node, there’s a simple way to compute the indices of its parent, left
child, and right child with the one-line procedures PARENT, LEFT,
and RIGHT.
Figure 6.1 A max-heap viewed as (a) a binary tree and (b) an array. The number within the circle at each node in the tree is the value stored at that node. The number above a node is the corresponding index in the array. Above and below the array are lines showing parent-child relationships, with parents always to the left of their children. The tree has height 3, and the node at index 4 (with value 8) has height 1.
PARENT( i)
1return ⌊ i/2⌊
LEFT( i)
1return 2 i
RIGHT( i)
1return 2 i + 1
On most computers, the LEFT procedure can compute 2 i in one
instruction by simply shifting the binary representation of i left by one
bit position. Similarly, the RIGHT procedure can quickly compute 2 i +
1 by shifting the binary representation of i left by one bit position and
then adding 1. The PARENT procedure can compute ⌊ i/2⌊ by shifting i
right one bit position. Good implementations of heapsort often
implement these procedures as macros or inline procedures.
There are two kinds of binary heaps: max-heaps and min-heaps. In
both kinds, the values in the nodes satisfy a heap property, the specifics
of which depend on the kind of heap. In a max-heap, the max-heap property is that for every node i other than the root,
A[PARENT( i)] ≥ A[ i],
that is, the value of a node is at most the value of its parent. Thus, the
largest element in a max-heap is stored at the root, and the subtree
rooted at a node contains values no larger than that contained at the
node itself. A min-heap is organized in the opposite way: the min-heap
property is that for every node i other than the root,
A[PARENT( i)] ≤ A[ i].
The smallest element in a min-heap is at the root.
The heapsort algorithm uses max-heaps. Min-heaps commonly
implement priority queues, which we discuss in Section 6.5. We’ll be precise in specifying whether we need a max-heap or a min-heap for any
particular application, and when properties apply to either max-heaps
or min-heaps, we just use the term “heap.”
Viewing a heap as a tree, we define the height of a node in a heap to
be the number of edges on the longest simple downward path from the
node to a leaf, and we define the height of the heap to be the height of
its root. Since a heap of n elements is based on a complete binary tree,
its height is Θ(lg n) (see Exercise 6.1-2). As we’ll see, the basic operations on heaps run in time at most proportional to the height of
the tree and thus take O(lg n) time. The remainder of this chapter presents some basic procedures and shows how they are used in a
sorting algorithm and a priority-queue data structure.
The MAX-HEAPIFY procedure, which runs in O(lg n) time, is
the key to maintaining the max-heap property.
The BUILD-MAX-HEAP procedure, which runs in linear time,
produces a max-heap from an unordered input array.
The HEAPSORT procedure, which runs in O( n lg n) time, sorts an array in place.
The
procedures
MAX-HEAP-INSERT,
MAX-HEAP-
EXTRACT-MAX, MAX-HEAP-INCREASE-KEY, and MAX-
HEAP-MAXIMUM allow the heap data structure to implement
a priority queue. They run in O(lg n) time plus the time for mapping between objects being inserted into the priority queue
and indices in the heap.
Exercises
6.1-1
What are the minimum and maximum numbers of elements in a heap of
height h?
6.1-2
Show that an n-element heap has height ⌊lg n⌊.
6.1-3
Show that in any subtree of a max-heap, the root of the subtree contains
the largest value occurring anywhere in that subtree.
6.1-4
Where in a max-heap might the smallest element reside, assuming that
all elements are distinct?
6.1-5
At which levels in a max-heap might the k th largest element reside, for 2
≤ k ≤ ⌊ n/2⌊, assuming that all elements are distinct?
6.1-6
Is an array that is in sorted order a min-heap?
6.1-7
Is the array with values 〈33, 19, 20, 15, 13, 10, 2, 13, 16, 12〉 a max-heap?
Show that, with the array representation for storing an n-element heap,
the leaves are the nodes indexed by ⌊ n/2⌊ + 1, ⌊ n/2⌊ + 2, … , n.
6.2 Maintaining the heap property
The procedure MAX-HEAPIFY on the facing page maintains the max-
heap property. Its inputs are an array A with the heap-size attribute and an index i into the array. When it is called, MAX-HEAPIFY assumes
that the binary trees rooted at LEFT( i) and RIGHT( i) are max-heaps,
but that A[ i] might be smaller than its children, thus violating the max-heap property. MAX-HEAPIFY lets the value at A[ i] “float down” in
the max-heap so that the subtree rooted at index i obeys the max-heap
property.
Figure 6.2 illustrates the action of MAX-HEAPIFY. Each step determines the largest of the elements A[ i], A[LEFT( i)], and A[RIGHT( i)] and stores the index of the largest element in largest. If A[ i] is largest, then the subtree rooted at node i is already a max-heap and nothing else needs to be done. Otherwise, one of the two children
contains the largest element. Positions i and largest swap their contents, which causes node i and its children to satisfy the max-heap property.
The node indexed by largest, however, just had its value decreased, and
thus the subtree rooted at largest might violate the max-heap property.
Consequently, MAX-HEAPIFY calls itself recursively on that subtree.
Figure 6.2 The action of MAX-HEAPIFY( A, 2), where A.heap-size = 10. The node that potentially violates the max-heap property is shown in blue. (a) The initial configuration, with A[2] at node i = 2 violating the max-heap property since it is not larger than both children. The max-heap property is restored for node 2 in (b) by exchanging A[2] with A[4], which destroys the max-heap property for node 4. The recursive call MAX-HEAPIFY( A, 4) now has i = 4. After A[4] and A[9] are swapped, as shown in (c), node 4 is fixed up, and the recursive call MAX-HEAPIFY( A, 9) yields no further change to the data structure.
MAX-HEAPIFY( A, i)
1 l = LEFT( i)
2 r = RIGHT( i)
3 if l ≤ A.heap-size and A[ l] > A[ i]
4
largest = l
5 else largest = i
6 if r ≤ A.heap-size and A[ r] > A[ largest]
7
largest = r
8 if largest ≠ i
9
exchange A[ i] with A[ largest]
10
MAX-HEAPIFY( A, largest)
To analyze MAX-HEAPIFY, let T ( n) be the worst-case running
time that the procedure takes on a subtree of size at most n. For a tree
rooted at a given node i, the running time is the Θ(1) time to fix up the
relationships among the elements A[ i], A[LEFT( i)], and A[RIGHT( i)], plus the time to run MAX-HEAPIFY on a subtree rooted at one of the
children of node i (assuming that the recursive call occurs). The
children’s subtrees each have size at most 2 n/3 (see Exercise 6.2-2), and
therefore we can describe the running time of MAX-HEAPIFY by the
recurrence
The solution to this recurrence, by case 2 of the master theorem
(Theorem 4.1 on page 102), is T ( n) = O(lg n). Alternatively, we can characterize the running time of MAX-HEAPIFY on a node of height
h as O( h).
Exercises
6.2-1
Using Figure 6.2 as a model, illustrate the operation of MAX-HEAPIFY( A, 3) on the array A = 〈27, 17, 3, 16, 13, 10, 1, 5, 7, 12, 4, 8, 9, 0〉.
6.2-2
Show that each child of the root of an n-node heap is the root of a subtree containing at most 2 n/3 nodes. What is the smallest constant α
such that each subtree has at most α n nodes? How does that affect the
recurrence (6.1) and its solution?
6.2-3
Starting with the procedure MAX-HEAPIFY, write pseudocode for the
procedure MIN-HEAPIFY( A, i), which performs the corresponding
manipulation on a min-heap. How does the running time of MIN-HEAPIFY compare with that of MAX-HEAPIFY?
6.2-4
What is the effect of calling MAX-HEAPIFY( A, i) when the element A[ i] is larger than its children?
6.2-5
What is the effect of calling MAX-HEAPIFY( A, i) for i > A.heap-size/2?
6.2-6
The code for MAX-HEAPIFY is quite efficient in terms of constant
factors, except possibly for the recursive call in line 10, for which some
compilers might produce inefficient code. Write an efficient MAX-
HEAPIFY that uses an iterative control construct (a loop) instead of
recursion.
6.2-7
Show that the worst-case running time of MAX-HEAPIFY on a heap
of size n is Ω(lg n). ( Hint: For a heap with n nodes, give node values that cause MAX-HEAPIFY to be called recursively at every node on a
simple path from the root down to a leaf.)
The procedure BUILD-MAX-HEAP converts an array A[1 : n] into a
max-heap by calling MAX-HEAPIFY in a bottom-up manner. Exercise
6.1-8 says that the elements in the subarray A[⌊ n/2⌊ + 1 : n] are all leaves of the tree, and so each is a 1-element heap to begin with. BUILD-MAX-HEAP goes through the remaining nodes of the tree and runs
MAX-HEAPIFY on each one. Figure 6.3 shows an example of the action of BUILD-MAX-HEAP.
BUILD-MAX-HEAP( A, n)
1 A.heap-size = n
2 for i = ⌊ n/2⌊ downto 1
3
MAX-HEAPIFY( A, i)
To show why BUILD-MAX-HEAP works correctly, we use the
following loop invariant:
At the start of each iteration of the for loop of lines 2–3, each
node i + 1, i + 2, … , n is the root of a max-heap.
We need to show that this invariant is true prior to the first loop
iteration, that each iteration of the loop maintains the invariant, that
the loop terminates, and that the invariant provides a useful property to
show correctness when the loop terminates.
Initialization: Prior to the first iteration of the loop, i = ⌊ n/2⌊. Each node
⌊ n/2⌊ + 1, ⌊ n/2⌊ + 2, … , n is a leaf and is thus the root of a trivial max-heap.
Maintenance: To see that each iteration maintains the loop invariant,
observe that the children of node i are numbered higher than i. By the
loop invariant, therefore, they are both roots of max-heaps. This is
precisely the condition required for the call MAX-HEAPIFY( A, i) to
make node i a max-heap root. Moreover, the MAX-HEAPIFY call
preserves the property that nodes i + 1, i + 2, … , n are all roots of max-heaps. Decrementing i in the for loop update reestablishes the
loop invariant for the next iteration.
Figure 6.3 The operation of BUILD-MAX-HEAP, showing the data structure before the call to MAX-HEAPIFY in line 3 of BUILD-MAX-HEAP. The node indexed by i in each iteration is shown in blue. (a) A 10-element input array A and the binary tree it represents. The loop index i refers to node 5 before the call MAX-HEAPIFY( A, i). (b) The data structure that results. The loop index i for the next iteration refers to node 4. (c)–(e) Subsequent iterations of the for loop in BUILD-MAX-HEAP. Observe that whenever MAX-HEAPIFY is called on a node, the two
subtrees of that node are both max-heaps. (f) The max-heap after BUILD-MAX-HEAP
finishes.
Termination: The loop makes exactly ⌊ n/2⌊ iterations, and so it
terminates. At termination, i = 0. By the loop invariant, each node 1,
2, … , n is the root of a max-heap. In particular, node 1 is.
We can compute a simple upper bound on the running time of
BUILD-MAX-HEAP as follows. Each call to MAX-HEAPIFY costs
O(lg n) time, and BUILD-MAX-HEAP makes O( n) such calls. Thus, the running time is O( n lg n). This upper bound, though correct, is not as tight as it can be.

We can derive a tighter asymptotic bound by observing that the time
for MAX-HEAPIFY to run at a node varies with the height of the node
in the tree, and that the heights of most nodes are small. Our tighter
analysis relies on the properties that an n-element heap has height ⌊lg n⌊
(see Exercise 6.1-2) and at most ⌈ n/2 h + 1⌉ nodes of any height h (see Exercise 6.3-4).
The time required by MAX-HEAPIFY when called on a node of
height h is O( h). Letting c be the constant implicit in the asymptotic notation, we can express the total cost of BUILD-MAX-HEAP as
being bounded from above by
. As Exercise 6.3-2 shows,
we have ⌈ n/2 h + 1⌉ ≥ 1/2 for 0 ≤ h ≤ ⌊lg n⌊. Since ⌈ x⌉ ≤ 2 x for any x ≥ 1/2, we have ⌈ n/2 h + 1⌉ ≤ n/2 h. We thus obtain
Hence, we can build a max-heap from an unordered array in linear time.
To build a min-heap, use the procedure BUILD-MIN-HEAP, which
is the same as BUILD-MAX-HEAP but with the call to MAX-
HEAPIFY in line 3 replaced by a call to MIN-HEAPIFY (see Exercise
6.2-3). BUILD-MIN-HEAP produces a min-heap from an unordered
linear array in linear time.
6.3-1
Using Figure 6.3 as a model, illustrate the operation of BUILD-MAX-
HEAP on the array A = 〈5, 3, 17, 10, 84, 19, 6, 22, 9〉.
6.3-2
Show that ⌈ n/2 h + 1⌉ ≥ 1/2 for 0 ≤ h ≤ ⌊lg n⌊.
6.3-3
Why does the loop index i in line 2 of BUILD-MAX-HEAP decrease
from ⌊ n/2⌊ to 1 rather than increase from 1 to ⌊ n/2⌊?
6.3-4
Show that there are at most ⌈ n/2 h + 1⌉ nodes of height h in any n-
element heap.
The heapsort algorithm, given by the procedure HEAPSORT, starts by
calling the BUILD-MAX-HEAP procedure to build a max-heap on the
input array A[1 : n]. Since the maximum element of the array is stored at the root A[1], HEAPSORT can place it into its correct final position by
exchanging it with A[ n]. If the procedure then discards node n from the heap—and it can do so by simply decrementing A.heap-size—the
children of the root remain max-heaps, but the new root element might
violate the max-heap property. To restore the max-heap property, the
procedure just calls MAX-HEAPIFY( A, 1), which leaves a max-heap in
A[1 : n – 1]. The HEAPSORT procedure then repeats this process for the max-heap of size n – 1 down to a heap of size 2. (See Exercise 6.4-2
for a precise loop invariant.)
HEAPSORT( A, n)
1 BUILD-MAX-HEAP( A, n)
2 for i = n downto 2
exchange A[1] with A[ i]
4
A.heap-size = A.heap-size – 1
5
MAX-HEAPIFY( A, 1)
Figure 6.4 shows an example of the operation of HEAPSORT after
line 1 has built the initial max-heap. The figure shows the max-heap
before the first iteration of the for loop of lines 2–5 and after each
iteration.
Figure 6.4 The operation of HEAPSORT. (a) The max-heap data structure just after BUILD-
MAX-HEAP has built it in line 1. (b)–(j) The max-heap just after each call of MAX-HEAPIFY
in line 5, showing the value of i at that time. Only blue nodes remain in the heap. Tan nodes contain the largest values in the array, in sorted order. (k) The resulting sorted array A.
The HEAPSORT procedure takes O( n lg n) time, since the call to BUILD-MAX-HEAP takes O( n) time and each of the n – 1 calls to MAX-HEAPIFY takes O(lg n) time.
Exercises
6.4-1
Using Figure 6.4 as a model, illustrate the operation of HEAPSORT on the array A = 〈5, 13, 2, 25, 7, 17, 20, 8, 4〉.
6.4-2
Argue the correctness of HEAPSORT using the following loop
invariant:
At the start of each iteration of the for loop of lines 2–5, the
subarray A[1 : i] is a max-heap containing the i smallest elements of A[1 : n], and the subarray A[ i + 1 : n] contains the n
– i largest elements of A[1 : n], sorted.
6.4-3
What is the running time of HEAPSORT on an array A of length n that
is already sorted in increasing order? How about if the array is already
sorted in decreasing order?
6.4-4
Show that the worst-case running time of HEAPSORT is Ω( n lg n).
★ 6.4-5
Show that when all the elements of A are distinct, the best-case running
time of HEAPSORT is Ω( n lg n).
In Chapter 8, we will see that any comparison-based sorting algorithm requires Ω( n lg n) comparisons and hence Ω( n lg n) time. Therefore, heapsort is asymptotically optimal among comparison-based sorting
algorithms. Yet, a good implementation of quicksort, presented in
Chapter 7, usually beats it in practice. Nevertheless, the heap data structure itself has many uses. In this section, we present one of the
most popular applications of a heap: as an efficient priority queue. As
with heaps, priority queues come in two forms: max-priority queues and
min-priority queues. We’ll focus here on how to implement max-priority
queues, which are in turn based on max-heaps. Exercise 6.5-3 asks you
to write the procedures for min-priority queues.
A priority queue is a data structure for maintaining a set S of elements, each with an associated value called a key. A max-priority queue supports the following operations:
INSERT( S, x, k) inserts the element x with key k into the set S, which is equivalent to the operation S = S ⋃ { x}.
MAXIMUM( S) returns the element of S with the largest key.
EXTRACT-MAX( S) removes and returns the element of S with the
largest key.
INCREASE-KEY( S, x, k) increases the value of element x’s key to the new value k, which is assumed to be at least as large as x’s current key value.
Among their other applications, you can use max-priority queues to
schedule jobs on a computer shared among multiple users. The max-
priority queue keeps track of the jobs to be performed and their relative
priorities. When a job is finished or interrupted, the scheduler selects the
highest-priority job from among those pending by calling EXTRACT-
MAX. The scheduler can add a new job to the queue at any time by
calling INSERT.
Alternatively, a min-priority queue supports the operations INSERT,
MINIMUM, EXTRACT-MIN, and DECREASE-KEY. A min-
priority queue can be used in an event-driven simulator. The items in
the queue are events to be simulated, each with an associated time of
occurrence that serves as its key. The events must be simulated in order
of their time of occurrence, because the simulation of an event can cause
other events to be simulated in the future. The simulation program calls
EXTRACT-MIN at each step to choose the next event to simulate. As
new events are produced, the simulator inserts them into the min-
priority queue by calling INSERT. We’ll see other uses for min-priority
queues, highlighting the DECREASE-KEY operation, in Chapters 21
and 22.
When you use a heap to implement a priority queue within a given
application, elements of the priority queue correspond to objects in the
application. Each object contains a key. If the priority queue is
implemented by a heap, you need to determine which application object
corresponds to a given heap element, and vice versa. Because the heap
elements are stored in an array, you need a way to map application
objects to and from array indices.
One way to map between application objects and heap elements uses
handles, which are additional information stored in the objects and heap
elements that give enough information to perform the mapping.
Handles are often implemented to be opaque to the surrounding code,
thereby maintaining an abstraction barrier between the application and
the priority queue. For example, the handle within an application object
might contain the corresponding index into the heap array. But since
only the code for the priority queue accesses this index, the index is
entirely hidden from the application code. Because heap elements
change locations within the array during heap operations, an actual
implementation of the priority queue, upon relocating a heap element,
must also update the array indices in the corresponding handles.
Conversely, each element in the heap might contain a pointer to the
corresponding application object, but the heap element knows this
pointer as only an opaque handle and the application maps this handle
to an application object. Typically, the worst-case overhead for
maintaining handles is O(1) per access.
As an alternative to incorporating handles in application objects, you
can store within the priority queue a mapping from application objects
to array indices in the heap. The advantage of doing so is that the
mapping is contained entirely within the priority queue, so that the
application objects need no further embellishment. The disadvantage
lies in the additional cost of establishing and maintaining the mapping.
One option for the mapping is a hash table (see Chapter 11). 1 The added expected time for a hash table to map an object to an array index
is just O(1), though the worst-case time can be as bad as Θ( n).
Let’s see how to implement the operations of a max-priority queue
using a max-heap. In the previous sections, we treated the array
elements as the keys to be sorted, implicitly assuming that any satellite data moved with the corresponding keys. When a heap implements a
priority queue, we instead treat each array element as a pointer to an
object in the priority queue, so that the object is analogous to the
satellite data when sorting. We further assume that each such object has
an attribute key, which determines where in the heap the object belongs.
For a heap implemented by an array A, we refer to A[ i]. key.
The procedure MAX-HEAP-MAXIMUM on the facing page
implements the MAXIMUM operation in Θ(1) time, and MAX-HEAP-
EXTRACT-MAX implements the operation EXTRACT-MAX. MAX-
HEAP-EXTRACT-MAX is similar to the for loop body (lines 3–5) of
the HEAPSORT procedure. We implicitly assume that MAX-
HEAPIFY compares priority-queue objects based on their key
attributes. We also assume that when MAX-HEAPIFY exchanges
elements in the array, it is exchanging pointers and also that it updates
the mapping between objects and array indices. The running time of
MAX-HEAP-EXTRACT-MAX is O(lg n), since it performs only a
constant amount of work on top of the O(lg n) time for MAX-
HEAPIFY, plus whatever overhead is incurred within MAX-
HEAPIFY for mapping priority-queue objects to array indices.
The procedure MAX-HEAP-INCREASE-KEY on page 176
implements the INCREASE-KEY operation. It first verifies that the
new key k will not cause the key in the object x to decrease, and if there is no problem, it gives x the new key value. The procedure then finds the
index i in the array corresponding to object x, so that A[ i] is x. Because increasing the key of A[ i] might violate the max-heap property, the procedure then, in a manner reminiscent of the insertion loop (lines 5–
7) of INSERTION-SORT on page 19, traverses a simple path from this
node toward the root to find a proper place for the newly increased key.
As MAX-HEAP-INCREASE-KEY traverses this path, it repeatedly
compares an element’s key to that of its parent, exchanging pointers and
continuing if the element’s key is larger, and terminating if the element’s
key is smaller, since the max-heap property now holds. (See Exercise
6.5-7 for a precise loop invariant.) Like MAX-HEAPIFY when used in
a priority queue, MAX-HEAP-INCREASE-KEY updates the
information that maps objects to array indices when array elements are
exchanged. Figure 6.5 shows an example of a MAX-HEAP-INCREASE-KEY operation. In addition to the overhead for mapping
priority queue objects to array indices, the running time of MAX-
HEAP-INCREASE-KEY on an n-element heap is O(lg n), since the path traced from the node updated in line 3 to the root has length O(lg
n).
MAX-HEAP-MAXIMUM( A)
1 if A.heap-size < 1
2
error “heap underflow”
3 return A[1]
MAX-HEAP-EXTRACT-MAX( A)
1 max = MAX-HEAP-MAXIMUM( A)
2 A[1] = A[ A.heap-size]
3 A.heap-size = A.heap-size – 1
4 MAX-HEAPIFY( A, 1)
5 return max
The procedure MAX-HEAP-INSERT on the next page implements
the INSERT operation. It takes as inputs the array A implementing the
max-heap, the new object x to be inserted into the max-heap, and the
size n of array A. The procedure first verifies that the array has room for the new element. It then expands the max-heap by adding to the tree a
new leaf whose key is –∞. Then it calls MAX-HEAP-INCREASE-KEY
to set the key of this new element to its correct value and maintain the
max-heap property. The running time of MAX-HEAP-INSERT on an
n-element heap is O(lg n) plus the overhead for mapping priority queue objects to indices.
In summary, a heap can support any priority-queue operation on a
set of size n in O(lg n) time, plus the overhead for mapping priority queue objects to array indices.
MAX-HEAP-INCREASE-KEY( A, x, k)
1 if k < x.key
error “new key is smaller than current key”
3 x.key = k
4 find the index i in array A where object x occurs
5 while i > 1 and A[PARENT( i)]. key < A[ i]. key 6
exchange A[ i] with A[PARENT( i)], updating the information that maps priority queue objects to array indices
7
i = PARENT( i)
MAX-HEAP-INSERT( A, x, n)
1 if A.heap-size == n
2
error “heap overflow”
3 A.heap-size = A.heap-size + 1
4 k = x.key
5 x.key = –∞
6 A[ A.heap-size] = x
7 map x to index heap-size in the array
8 MAX-HEAP-INCREASE-KEY( A, x, k)
Exercises
6.5-1
Suppose that the objects in a max-priority queue are just keys. Illustrate
the operation of MAX-HEAP-EXTRACT-MAX on the heap A = 〈15,
13, 9, 5, 12, 8, 7, 4, 0, 6, 2, 1〉.
6.5-2
Suppose that the objects in a max-priority queue are just keys. Illustrate
the operation of MAX-HEAP-INSERT( A, 10) on the heap A = 〈15, 13,
9, 5, 12, 8, 7, 4, 0, 6, 2, 1〉.
6.5-3
Write pseudocode to implement a min-priority queue with a min-heap
by writing the procedures MIN-HEAP-MINIMUM, MIN-HEAP-
EXTRACT-MIN, MIN-HEAP-DECREASE-KEY, and MIN-HEAP-
INSERT.
6.5-4
Write pseudocode for the procedure MAX-HEAP-DECREASE-
KEY( A, x, k) in a max-heap. What is the running time of your procedure?
Figure 6.5 The operation of MAX-HEAP-INCREASE-KEY. Only the key of each element in
the priority queue is shown. The node indexed by i in each iteration is shown in blue. (a) The max-heap of Figure 6.4(a) with i indexing the node whose key is about to be increased. (b) This node has its key increased to 15. (c) After one iteration of the while loop of lines 5–7, the node and its parent have exchanged keys, and the index i moves up to the parent. (d) The max-heap after one more iteration of the while loop. At this point, A[PARENT( i)] ≥ A[ i]. The max-heap property now holds and the procedure terminates.
6.5-5
Why does MAX-HEAP-INSERT bother setting the key of the inserted
object to –∞ in line 5 given that line 8 will set the object’s key to the desired value?
Professor Uriah suggests replacing the while loop of lines 5–7 in MAX-
HEAP-INCREASE-KEY by a call to MAX-HEAPIFY. Explain the
flaw in the professor’s idea.
6.5-7
Argue the correctness of MAX-HEAP-INCREASE-KEY using the
following loop invariant:
At the start of each iteration of the while loop of lines 5–7:
a. If both nodes PARENT( i) and LEFT( i) exist, then
A[PARENT( i)]. key ≥ A[LEFT( i)]. key.
b. If both nodes PARENT( i) and RIGHT( i) exist, then
A[PARENT( i)]. key ≥ A[RIGHT( i)]. key.
c. The subarray A[1 : A.heap-size] satisfies the max-heap property,
except that there may be one violation, which is that A[ i]. key may be greater than A[PARENT( i)]. key.
You may assume that the subarray A[1 : A.heap-size] satisfies the max-
heap property at the time MAX-HEAP-INCREASE-KEY is called.
6.5-8
Each exchange operation on line 6 of MAX-HEAP-INCREASE-KEY
typically requires three assignments, not counting the updating of the
mapping from objects to array indices. Show how to use the idea of the
inner loop of INSERTION-SORT to reduce the three assignments to
just one assignment.
6.5-9
Show how to implement a first-in, first-out queue with a priority queue.
Show how to implement a stack with a priority queue. (Queues and
stacks are defined in Section 10.1.3.)
6.5-10
The operation MAX-HEAP-DELETE( A, x) deletes the object x from
max-heap A. Give an implementation of MAX-HEAP-DELETE for an
n-element max-heap that runs in O(lg n) time plus the overhead for mapping priority queue objects to array indices.
Give an O( n lg k)-time algorithm to merge k sorted lists into one sorted list, where n is the total number of elements in all the input lists. ( Hint: Use a min-heap for k-way merging.)
Problems
6-1 Building a heap using insertion
One way to build a heap is by repeatedly calling MAX-HEAP-INSERT
to insert the elements into the heap. Consider the procedure BUILD-
MAX-HEAP′ on the facing page. It assumes that the objects being
inserted are just the heap elements.
BUILD-MAX-HEAP′ ( A, n)
1 A.heap-size = 1
2 for i = 2 to n
3
MAX-HEAP-INSERT( A, A[ i], n)
a. Do the procedures BUILD-MAX-HEAP and BUILD-MAX-HEAP′
always create the same heap when run on the same input array? Prove
that they do, or provide a counterexample.
b. Show that in the worst case, BUILD-MAX-HEAP′ requires Θ( n lg n) time to build an n-element heap.
6-2 Analysis of d-ary heaps
A d-ary heap is like a binary heap, but (with one possible exception) nonleaf nodes have d children instead of two children. In all parts of this problem, assume that the time to maintain the mapping between
objects and heap elements is O(1) per operation.
a. Describe how to represent a d-ary heap in an array.
b. Using Θ-notation, express the height of a d-ary heap of n elements in terms of n and d.
c. Give an efficient implementation of EXTRACT-MAX in a d-ary max-heap. Analyze its running time in terms of d and n.
d. Give an efficient implementation of INCREASE-KEY in a d-ary
max-heap. Analyze its running time in terms of d and n.
e. Give an efficient implementation of INSERT in a d-ary max-heap.
Analyze its running time in terms of d and n.
6-3 Young tableaus
An m × n Young tableau is an m × n matrix such that the entries of each row are in sorted order from left to right and the entries of each column
are in sorted order from top to bottom. Some of the entries of a Young
tableau may be ∞, which we treat as nonexistent elements. Thus, a
Young tableau can be used to hold r ≤ mn finite numbers.
a. Draw a 4 × 4 Young tableau containing the elements {9, 16, 3, 2, 4, 8,
5, 14, 12}.
b. Argue that an m × n Young tableau Y is empty if Y [1, 1] = ∞. Argue that Y is full (contains mn elements) if Y [ m, n] < ∞.
c. Give an algorithm to implement EXTRACT-MIN on a nonempty m
× n Young tableau that runs in O( m + n) time. Your algorithm should use a recursive subroutine that solves an m × n problem by recursively
solving either an ( m – 1) × n or an m × ( n – 1) subproblem. ( Hint: Think about MAX-HEAPIFY.) Explain why your implementation of
EXTRACT-MIN runs in O( m + n) time.
d. Show how to insert a new element into a nonfull m × n Young tableau in O( m + n) time.
e. Using no other sorting method as a subroutine, show how to use an n
× n Young tableau to sort n 2 numbers in O( n 3) time.
f. Give an O( m + n)-time algorithm to determine whether a given number is stored in a given m × n Young tableau.
Chapter notes

The heapsort algorithm was invented by Williams [456], who also described how to implement a priority queue with a heap. The BUILD-MAX-HEAP procedure was suggested by Floyd [145]. Schaffer and Sedgewick [395] showed that in the best case, the number of times elements move in the heap during heapsort is approximately ( n/2) lg n
and that the average number of moves is approximately n lg n.
We use min-heaps to implement min-priority queues in Chapters 15,
21, and 22. Other, more complicated, data structures give better time bounds for certain min-priority queue operations. Fredman and Tarjan
[156] developed Fibonacci heaps, which support INSERT and
DECREASE-KEY in O(1) amortized time (see Chapter 16). That is, the average worst-case running time for these operations is O(1). Brodal,
Lagogiannis, and Tarjan [73] subsequently devised strict Fibonacci heaps, which make these time bounds the actual running times. If the
keys are unique and drawn from the set {0, 1, … , n – 1} of nonnegative
integers, van Emde Boas trees [440, 441] support the operations INSERT, DELETE, SEARCH, MINIMUM, MAXIMUM,
PREDECESSOR, and SUCCESSOR in O(lg lg n) time.
If the data are b-bit integers, and the computer memory consists of
addressable b-bit words, Fredman and Willard [157] showed how to implement MINIMUM in O(1) time and INSERT and EXTRACT-MIN in
time. Thorup [436] has improved the
bound to
O(lg lg n) time by using randomized hashing, requiring only linear space.
An important special case of priority queues occurs when the
sequence of EXTRACT-MIN operations is monotone, that is, the values
returned by successive EXTRACT-MIN operations are monotonically
increasing over time. This case arises in several important applications,
such as Dijkstra’s single-source shortest-paths algorithm, which we
discuss in Chapter 22, and in discrete-event simulation. For Dijkstra’s algorithm it is particularly important that the DECREASE-KEY
operation be implemented efficiently. For the monotone case, if the data
are integers in the range 1, 2, … , C, Ahuja, Mehlhorn, Orlin, and
Tarjan [8] describe how to implement EXTRACT-MIN and INSERT in
O(lg C) amortized time (Chapter 16 presents amortized analysis) and
DECREASE-KEY in O(1) time, using a data structure called a radix
heap. The O(lg C) bound can be improved to
using Fibonacci
heaps in conjunction with radix heaps. Cherkassky, Goldberg, and
Silverstein [90] further improved the bound to O(lg1/3+ ϵ C) expected time by combining the multilevel bucketing structure of Denardo and
Fox [112] with the heap of Thorup mentioned earlier. Raman [375]
further improved these results to obtain a bound of O(min {lg1/4+ ϵ C, lg1/3+ ϵ n}), for any fixed ϵ > 0.
Many other variants of heaps have been proposed. Brodal [72]
surveys some of these developments.
1 In Python, dictionaries are implemented with hash tables.
The quicksort algorithm has a worst-case running time of Θ( n 2) on an
input array of n numbers. Despite this slow worst-case running time,
quicksort is often the best practical choice for sorting because it is
remarkably efficient on average: its expected running time is Θ( n lg n) when all numbers are distinct, and the constant factors hidden in the
Θ( n lg n) notation are small. Unlike merge sort, it also has the advantage of sorting in place (see page 158), and it works well even in
virtual-memory environments.
Our study of quicksort is broken into four sections. Section 7.1
describes the algorithm and an important subroutine used by quicksort
for partitioning. Because the behavior of quicksort is complex, we’ll
start with an intuitive discussion of its performance in Section 7.2 and analyze it precisely at the end of the chapter. Section 7.3 presents a randomized version of quicksort. When all elements are distinct, 1 this randomized algorithm has a good expected running time and no
particular input elicits its worst-case behavior. (See Problem 7-2 for the
case in which elements may be equal.) Section 7.4 analyzes the randomized algorithm, showing that it runs in Θ( n 2) time in the worst
case and, assuming distinct elements, in expected O( n lg n) time.
Quicksort, like merge sort, applies the divide-and-conquer method introduced in Section 2.3.1. Here is the three-step divide-and-conquer process for sorting a subarray A[ p : r]:
Divide by partitioning (rearranging) the array A[ p : r] into two (possibly empty) subarrays A[ p : q – 1] (the low side) and A[ q + 1 : r] (the high side) such that each element in the low side of the partition is less than
or equal to the pivot A[ q], which is, in turn, less than or equal to each element in the high side. Compute the index q of the pivot as part of
this partitioning procedure.
Conquer by calling quicksort recursively to sort each of the subarrays
A[ p : q – 1] and A[ q + 1 : r].
Combine by doing nothing: because the two subarrays are already
sorted, no work is needed to combine them. All elements in A[ p : q –
1] are sorted and less than or equal to A[ q], and all elements in A[ q + 1
: r] are sorted and greater than or equal to the pivot A[ q]. The entire subarray A[ p : r] cannot help but be sorted!
The QUICKSORT procedure implements quicksort. To sort an
entire n-element array A[1 : n], the initial call is QUICKSORT ( A, 1, n).
QUICKSORT( A, p, r)
1 if p < r
2
// Partition the subarray around the pivot, which ends up in A[ q].
3
q = PARTITION( A, p, r)
4
QUICKSORT( A, p, q – 1)
// recursively sort the low side
5
QUICKSORT( A, q + 1, r)
// recursively sort the high side
Partitioning the array
The key to the algorithm is the PARTITION procedure on the next
page, which rearranges the subarray A[ p : r] in place, returning the index of the dividing point between the two sides of the partition.
Figure 7.1 shows how PARTITION works on an 8-element array.
PARTITION always selects the element x = A[ r] as the pivot. As the
procedure runs, each element falls into exactly one of four regions, some of which may be empty. At the start of each iteration of the for loop in
lines 3–6, the regions satisfy certain properties, shown in Figure 7.2. We state these properties as a loop invariant:
PARTITION( A, p, r)
1 x = A[ r]
// the pivot
2 i = p – 1
// highest index into the low side
3 for j = p to r – 1
// process each element other than the
pivot
4
if A[ j] ≤ x
// does this element belong on the low
side?
5
i = i + 1
// index of a new slot in the low side
6
exchange A[ i] with// put this element there
A[ j]
7 exchange A[ i + 1] with// pivot goes just to the right of the low A[ r]
side
8 return i + 1
// new index of the pivot
At the beginning of each iteration of the loop of lines 3–6, for
any array index k, the following conditions hold:
1. if p ≤ k ≤ i, then A[ k] ≤ x (the tan region of Figure 7.2); 2. if i + 1 ≤ k ≤ j – 1, then A[ k] > x (the blue region); 3. if k = r, then A[ k] = x (the yellow region).
We need to show that this loop invariant is true prior to the first
iteration, that each iteration of the loop maintains the invariant, that
the loop terminates, and that correctness follows from the invariant
when the loop terminates.
Initialization: Prior to the first iteration of the loop, we have i = p – 1
and j = p. Because no values lie between p and i and no values lie between i + 1 and j – 1, the first two conditions of the loop invariant are trivially satisfied. The assignment in line 1 satisfies the third
condition.
Maintenance: As Figure 7.3 shows, we consider two cases, depending on the outcome of the test in line 4. Figure 7.3(a) shows what happens when A[ j] > x: the only action in the loop is to increment j. After j has been incremented, the second condition holds for A[ j – 1] and all other entries remain unchanged. Figure 7.3(b) shows what happens when A[ j] ≤ x: the loop increments i, swaps A[ i] and A[ j], and then increments j. Because of the swap, we now have that A[ i] ≤ x, and condition 1 is satisfied. Similarly, we also have that A[ j – 1] > x, since the item that was swapped into A[ j – 1] is, by the loop invariant, greater than x.
Termination: Since the loop makes exactly r – p iterations, it terminates, whereupon j = r. At that point, the unexamined subarray A[ j : r – 1] is empty, and every entry in the array belongs to one of the other three
sets described by the invariant. Thus, the values in the array have been
partitioned into three sets: those less than or equal to x (the low side),
those greater than x (the high side), and a singleton set containing x
(the pivot).
Figure 7.1 The operation of PARTITION on a sample array. Array entry A[ r] becomes the pivot element x. Tan array elements all belong to the low side of the partition, with values at most x.
Blue elements belong to the high side, with values greater than x. White elements have not yet been put into either side of the partition, and the yellow element is the pivot x. (a) The initial array and variable settings. None of the elements have been placed into either side of the partition. (b) The value 2 is “swapped with itself” and put into the low side. (c)–(d) The values 8
and 7 are placed into to high side. (e) The values 1 and 8 are swapped, and the low side grows. (f) The values 3 and 7 are swapped, and the low side grows. (g)–(h) The high side of the partition grows to include 5 and 6, and the loop terminates. (i) Line 7 swaps the pivot element so that it lies between the two sides of the partition, and line 8 returns the pivot’s new index.
The final two lines of PARTITION finish up by swapping the pivot
with the leftmost element greater than x, thereby moving the pivot into
its correct place in the partitioned array, and then returning the pivot’s
new index. The output of PARTITION now satisfies the specifications
given for the divide step. In fact, it satisfies a slightly stronger condition:

after line 3 of QUICKSORT, A[ q] is strictly less than every element of
A[ q + 1 : r].
Figure 7.2 The four regions maintained by the procedure PARTITION on a subarray A[ p : r].
The tan values in A[ p : i] are all less than or equal to x, the blue values in A[ i + 1 : j – 1] are all greater than x, the white values in A[ j : r – 1] have unknown relationships to x, and A[ r] = x.
Figure 7.3 The two cases for one iteration of procedure PARTITION. (a) If A[ j] > x, the only action is to increment j, which maintains the loop invariant. (b) If A[ j] ≤ x, index i is incremented, A[ i] and A[ j] are swapped, and then j is incremented. Again, the loop invariant is maintained.
Exercise 7.1-3 asks you to show that the running time of
PARTITION on a subarray A[ p : r] of n = r – p + 1 elements is Θ( n).
Exercises
7.1-1
Using Figure 7.1 as a model, illustrate the operation of PARTITION on the array A = 〈13, 19, 9, 5, 12, 8, 7, 4, 21, 2, 6, 11〉.
7.1-2
What value of q does PARTITION return when all elements in the
subarray A[ p : r] have the same value? Modify PARTITION so that q =
⌊( p + r)/2⌊ when all elements in the subarray A[ p : r] have the same value.
7.1-3
Give a brief argument that the running time of PARTITION on a
subarray of size n is Θ( n).
7.1-4
Modify QUICKSORT to sort into monotonically decreasing order.
The running time of quicksort depends on how balanced each
partitioning is, which in turn depends on which elements are used as
pivots. If the two sides of a partition are about the same size—the
partitioning is balanced—then the algorithm runs asymptotically as fast
as merge sort. If the partitioning is unbalanced, however, it can run
asymptotically as slowly as insertion sort. To allow you to gain some
intuition before diving into a formal analysis, this section informally
investigates how quicksort performs under the assumptions of balanced
versus unbalanced partitioning.
But first, let’s briefly look at the maximum amount of memory that
quicksort requires. Although quicksort sorts in place according to the
definition on page 158, the amount of memory it uses—aside from the
array being sorted—is not constant. Since each recursive call requires a
constant amount of space on the runtime stack, outside of the array
being sorted, quicksort requires space proportional to the maximum
depth of the recursion. As we’ll see now, that could be as bad as Θ( n) in
the worst case.
The worst-case behavior for quicksort occurs when the partitioning
produces one subproblem with n – 1 elements and one with 0 elements.
(See Section 7.4.1.) Let us assume that this unbalanced partitioning arises in each recursive call. The partitioning costs Θ( n) time. Since the
recursive call on an array of size 0 just returns without doing anything,
T (0) = Θ(1), and the recurrence for the running time is
T ( n) = T ( n – 1) + T (0) + Θ( n)
= T ( n – 1) + Θ( n).
By summing the costs incurred at each level of the recursion, we obtain
an arithmetic series (equation (A.3) on page 1141), which evaluates to
Θ( n 2). Indeed, the substitution method can be used to prove that the recurrence T ( n) = T ( n – 1) + Θ( n) has the solution T ( n) = Θ( n 2). (See Exercise 7.2-1.)
Thus, if the partitioning is maximally unbalanced at every recursive
level of the algorithm, the running time is Θ( n 2). The worst-case
running time of quicksort is therefore no better than that of insertion
sort. Moreover, the Θ( n 2) running time occurs when the input array is
already completely sorted—a situation in which insertion sort runs in
O( n) time.
Best-case partitioning
In the most even possible split, PARTITION produces two
subproblems, each of size no more than n/2, since one is of size ⌊( n –
1)/2⌊ ≤ n/2 and one of size ⌈( n – 1)/2⌉ – 1 ≤ n/2. In this case, quicksort runs much faster. An upper bound on the running time can then be
described by the recurrence
T ( n) = 2 T ( n/2) + Θ( n).
By case 2 of the master theorem (Theorem 4.1 on page 102), this
recurrence has the solution T ( n) = Θ( n lg n). Thus, if the partitioning is
equally balanced at every level of the recursion, an asymptotically faster algorithm results.
Balanced partitioning
As the analyses in Section 7.4 will show, the average-case running time of quicksort is much closer to the best case than to the worst case. By
appreciating how the balance of the partitioning affects the recurrence
describing the running time, we can gain an understanding of why.
Suppose, for example, that the partitioning algorithm always
produces a 9-to-1 proportional split, which at first blush seems quite
unbalanced. We then obtain the recurrence
T ( n) = T (9 n/10) + T ( n/10) + Θ( n), on the running time of quicksort. Figure 7.4 shows the recursion tree for this recurrence, where for simplicity the Θ( n) driving function has been replaced by n, which won’t affect the asymptotic solution of the recurrence (as Exercise 4.7-1 on page 118 justifies). Every level of the
tree has cost n, until the recursion bottoms out in a base case at depth
log10 n = Θ(lg n), and then the levels have cost at most n. The recursion terminates at depth log10/9 n = Θ(lg n). Thus, with a 9-to-1
proportional split at every level of recursion, which intuitively seems
highly unbalanced, quicksort runs in O( n lg n) time—asymptotically the same as if the split were right down the middle. Indeed, even a 99-to-1
split yields an O( n lg n) running time. In fact, any split of constant proportionality yields a recursion tree of depth Θ(lg n), where the cost
at each level is O( n). The running time is therefore O( n lg n) whenever the split has constant proportionality. The ratio of the split affects only
the constant hidden in the O-notation.
Figure 7.4 A recursion tree for QUICKSORT in which PARTITION always produces a 9-to-1
split, yielding a running time of O( n lg n). Nodes show subproblem sizes, with per-level costs on the right.
Intuition for the average case
To develop a clear notion of the expected behavior of quicksort, we
must assume something about how its inputs are distributed. Because
quicksort determines the sorted order using only comparisons between
input elements, its behavior depends on the relative ordering of the
values in the array elements given as the input, not on the particular
values in the array. As in the probabilistic analysis of the hiring problem
in Section 5.2, assume that all permutations of the input numbers are equally likely and that the elements are distinct.
When quicksort runs on a random input array, the partitioning is
highly unlikely to happen in the same way at every level, as our informal
analysis has assumed. We expect that some of the splits will be
reasonably well balanced and that some will be fairly unbalanced. For
example, Exercise 7.2-6 asks you to show that about 80% of the time
PARTITION produces a split that is at least as balanced as 9 to 1, and
about 20% of the time it produces a split that is less balanced than 9 to
1.
Figure 7.5 (a) Two levels of a recursion tree for quicksort. The partitioning at the root costs n and produces a “bad” split: two subarrays of sizes 0 and n – 1. The partitioning of the subarray of size n – 1 costs n – 1 and produces a “good” split: subarrays of size ( n – 1)/2 – 1 and ( n – 1)/2.
(b) A single level of a recursion tree that is well balanced. In both parts, the partitioning cost for the subproblems shown with blue shading is Θ( n). Yet the subproblems remaining to be solved in (a), shown with tan shading, are no larger than the corresponding subproblems remaining to be solved in (b).
In the average case, PARTITION produces a mix of “good” and
“bad” splits. In a recursion tree for an average-case execution of
PARTITION, the good and bad splits are distributed randomly
throughout the tree. Suppose for the sake of intuition that the good and
bad splits alternate levels in the tree, and that the good splits are best-
case splits and the bad splits are worst-case splits. Figure 7.5(a) shows the splits at two consecutive levels in the recursion tree. At the root of
the tree, the cost is n for partitioning, and the subarrays produced have
sizes n – 1 and 0: the worst case. At the next level, the subarray of size n
– 1 undergoes best-case partitioning into subarrays of size ( n – 1)/2 – 1
and ( n – 1)/2. Let’s assume that the base-case cost is 1 for the subarray
of size 0.
The combination of the bad split followed by the good split produces
three subarrays of sizes 0, ( n – 1)/2 – 1, and ( n – 1)/2 at a combined partitioning cost of Θ( n) + Θ( n – 1) = Θ( n). This situation is at most a constant factor worse than that in Figure 7.5(b), namely, where a single level of partitioning produces two subarrays of size ( n – 1)/2, at a cost of
Θ( n). Yet this latter situation is balanced! Intuitively, the Θ( n – 1) cost of the bad split in Figure 7.5(a) can be absorbed into the Θ( n) cost of the good split, and the resulting split is good. Thus, the running time of
quicksort, when levels alternate between good and bad splits, is like the running time for good splits alone: still O( n lg n), but with a slightly larger constant hidden by the O-notation. We’ll analyze the expected
running time of a randomized version of quicksort rigorously in Section
Exercises
7.2-1
Use the substitution method to prove that the recurrence T ( n) = T ( n –
1) + Θ( n) has the solution T ( n) = Θ( n 2), as claimed at the beginning of
7.2-2
What is the running time of QUICKSORT when all elements of array A
have the same value?
7.2-3
Show that the running time of QUICKSORT is Θ( n 2) when the array A
contains distinct elements and is sorted in decreasing order.
7.2-4
Banks often record transactions on an account in order of the times of
the transactions, but many people like to receive their bank statements
with checks listed in order by check number. People usually write checks
in order by check number, and merchants usually cash them with
reasonable dispatch. The problem of converting time-of-transaction
ordering to check-number ordering is therefore the problem of sorting
almost-sorted input. Explain persuasively why the procedure
INSERTION-SORT might tend to beat the procedure QUICKSORT
on this problem.
7.2-5
Suppose that the splits at every level of quicksort are in the constant
proportion α to β, where α + β = 1 and 0 < α ≤ β < 1. Show that the minimum depth of a leaf in the recursion tree is approximately log1/ α n
and that the maximum depth is approximately log1/ β n. (Don’t worry about integer round-off.)
7.2-6
Consider an array with distinct elements and for which all permutations
of the elements are equally likely. Argue that for any constant 0 < α ≤
1/2, the probability is approximately 1 – 2 α that PARTITION produces
a split at least as balanced as 1 – α to α.
7.3 A randomized version of quicksort
In exploring the average-case behavior of quicksort, we have assumed
that all permutations of the input numbers are equally likely. This
assumption does not always hold, however, as, for example, in the
situation laid out in the premise for Exercise 7.2-4. Section 5.3 showed that judicious randomization can sometimes be added to an algorithm
to obtain good expected performance over all inputs. For quicksort,
randomization yields a fast and practical algorithm. Many software
libraries provide a randomized version of quicksort as their algorithm
of choice for sorting large data sets.
In Section 5.3, the RANDOMIZED-HIRE-ASSISTANT procedure
explicitly permutes its input and then runs the deterministic HIRE-
ASSISTANT procedure. We could do the same for quicksort as well,
but a different randomization technique yields a simpler analysis.
Instead of always using A[ r] as the pivot, a randomized version randomly chooses the pivot from the subarray A[ p : r], where each element in A[ p : r] has an equal probability of being chosen. It then exchanges that element with A[ r] before partitioning. Because the pivot is chosen randomly, we expect the split of the input array to be
reasonably well balanced on average.
The changes to PARTITION and QUICKSORT are small. The new
partitioning procedure, RANDOMIZED-PARTITION, simply swaps
before performing the partitioning. The new quicksort procedure,
RANDOMIZED-QUICKSORT, calls RANDOMIZED-PARTITION
instead of PARTITION. We’ll analyze this algorithm in the next section.
RANDOMIZED-PARTITION( A, p, r)
1 i = RANDOM( p, r)
2 exchange A[ r] with A[ i]
3 return PARTITION( A, p, r)
RANDOMIZED-QUICKSORT( A, p, r)
1 if p < r
2
q = RANDOMIZED-PARTITION( A, p, r)
3
RANDOMIZED-QUICKSORT( A, p, q – 1)
4
RANDOMIZED-QUICKSORT( A, q + 1, r)
Exercises
7.3-1
Why do we analyze the expected running time of a randomized
algorithm and not its worst-case running time?
7.3-2
When RANDOMIZED-QUICKSORT runs, how many calls are made
to the random-number generator RANDOM in the worst case? How
about in the best case? Give your answer in terms of Θ-notation.
Section 7.2 gave some intuition for the worst-case behavior of quicksort and for why we expect the algorithm to run quickly. This section
analyzes the behavior of quicksort more rigorously. We begin with a
worst-case analysis, which applies to either QUICKSORT or
RANDOMIZED-QUICKSORT, and conclude with an analysis of the
expected running time of RANDOMIZED-QUICKSORT.
7.4.1 Worst-case analysis
We saw in Section 7.2 that a worst-case split at every level of recursion in quicksort produces a Θ( n 2) running time, which, intuitively, is the worst-case running time of the algorithm. We now prove this assertion.
We’ll use the substitution method (see Section 4.3) to show that the running time of quicksort is O( n 2). Let T ( n) be the worst-case time for the procedure QUICKSORT on an input of size n. Because the