procedure PARTITION produces two subproblems with total size n – 1,
we obtain the recurrence
We guess that T ( n) ≤ cn 2 for some constant c > 0. Substituting this guess into recurrence (7.1) yields
T ( n) ≤ max { cq 2 + c( n – 1 – q)2 : 0 ≤ q ≤ n – 1} + Θ( n)
= c · max { q 2 + ( n – 1 – q)2 : 0 ≤ q ≤ n – 1} + Θ( n).
Let’s focus our attention on the maximization. For q = 0, 1, … , n –
1, we have
q 2 + ( n – 1 – q)2 = q 2 + ( n – 1)2 – 2 q( n – 1) + q 2
= ( n – 1)2 + 2 q( q – ( n – 1))
≤ ( n – 1)2
because q ≤ n – 1 implies that 2 q( q – ( n – 1)) ≤ 0. Thus every term in the maximization is bounded by ( n – 1)2.
Continuing with our analysis of T ( n), we obtain
T ( n) ≤ c( n – 1)2 + Θ( n)
≤ cn 2 – c(2 n – 1) + Θ( n)
≤ cn 2,
by picking the constant c large enough that the c(2 n – 1) term dominates the Θ( n) term. Thus T ( n) = O( n 2). Section 7.2 showed a specific case
where quicksort takes Ω( n 2) time: when partitioning is maximally unbalanced. Thus, the worst-case running time of quicksort is Θ( n 2).
7.4.2 Expected running time
We have already seen the intuition behind why the expected running
time of RANDOMIZED-QUICKSORT is O( n lg n): if, in each level of recursion, the split induced by RANDOMIZED-PARTITION puts any
constant fraction of the elements on one side of the partition, then the
recursion tree has depth Θ(lg n) and O( n) work is performed at each level. Even if we add a few new levels with the most unbalanced split
possible between these levels, the total time remains O( n lg n). We can analyze the expected running time of RANDOMIZED-QUICKSORT
precisely by first understanding how the partitioning procedure operates
and then using this understanding to derive an O( n lg n) bound on the expected running time. This upper bound on the expected running time,
combined with the Θ( n lg n) best-case bound we saw in Section 7.2,
yields a Θ( n lg n) expected running time. We assume throughout that the values of the elements being sorted are distinct.
Running time and comparisons
The QUICKSORT and RANDOMIZED-QUICKSORT procedures
differ only in how they select pivot elements. They are the same in all
other respects. We can therefore analyze RANDOMIZED-
QUICKSORT by considering the QUICKSORT and PARTITION
procedures, but with the assumption that pivot elements are selected
randomly from the subarray passed to RANDOMIZED-PARTITION.
Let’s start by relating the asymptotic running time of QUICKSORT to
the number of times elements are compared (all in line 4 of
PARTITION), understanding that this analysis also applies to
RANDOMIZED-QUICKSORT. Note that we are counting the
number of times that array elements are compared, not comparisons of
indices.
Lemma 7.1
The running time of QUICKSORT on an n-element array is O( n + X), where X is the number of element comparisons performed.
Proof The running time of QUICKSORT is dominated by the time
spent in the PARTITION procedure. Each time PARTITION is called,
it selects a pivot element, which is never included in any future recursive
calls to QUICKSORT and PARTITION. Thus, there can be at most n
calls to PARTITION over the entire execution of the quicksort
algorithm. Each time QUICKSORT calls PARTITION, it also
recursively calls itself twice, so there are at most 2 n calls to the QUICKSORT procedure itself.
One call to PARTITION takes O(1) time plus an amount of time
that is proportional to the number of iterations of the for loop in lines
3–6. Each iteration of this for loop performs one comparison in line 4,
comparing the pivot element to another element of the array A.
Therefore, the total time spent in the for loop across all executions is
proportional to X. Since there are at most n calls to PARTITION and
the time spent outside the for loop is O(1) for each call, the total time
spent in PARTITION outside of the for loop is O( n). Thus the total time for quicksort is O( n + X).
▪
Our goal for analyzing RANDOMIZED-QUICKSORT, therefore,
is to compute the expected value E [ X] of the random variable X
denoting the total number of comparisons performed in all calls to
PARTITION. To do so, we must understand when the quicksort
algorithm compares two elements of the array and when it does not. For
ease of analysis, let’s index the elements of the array A by their position
in the sorted output, rather than their position in the input. That is,
although the elements in A may start out in any order, we’ll refer to them by z 1, z 2, … , zn, where z 1 < z 2 < ⋯ < zn, with strict inequality because we assume that all elements are distinct. We denote the set { zi,
zi + 1, … , zj} by Zij.
The next lemma characterizes when two elements are compared.
Lemma 7.2
During the execution of RANDOMIZED-QUICKSORT on an array
of n distinct elements z 1 < z 2 < ⋯ < zn, an element zi is compared with an element zj, where i < j, if and only if one of them is chosen as a pivot before any other element in the set Zij. Moreover, no two elements are
ever compared twice.
Proof Let’s look at the first time that an element x ∈ Zij is chosen as a pivot during the execution of the algorithm. There are three cases to
consider. If x is neither zi nor zj—that is, zi < x < zj—then zi and zj are not compared at any subsequent time, because they fall into different
sides of the partition around x. If x = zi, then PARTITION compares zi with every other item in Zij. Similarly, if x = zj, then PARTITION
compares zj with every other item in Zij. Thus, zi and zj are compared if and only if the first element to be chosen as a pivot from Zij is either zi or zj. In the latter two cases, where one of zi and zj is chosen as a pivot, since the pivot is removed from future comparisons, it is never
compared again with the other element.
▪
As an example of this lemma, consider an input to quicksort of the
numbers 1 through 10 in some arbitrary order. Suppose that the first
pivot element is 7. Then the first call to PARTITION separates the
numbers into two sets: {1, 2, 3, 4, 5, 6} and {8, 9, 10}. In the process,
the pivot element 7 is compared with all other elements, but no number
from the first set (e.g., 2) is or ever will be compared with any number
from the second set (e.g., 9). The values 7 and 9 are compared because 7
is the first item from Z 7,9 to be chosen as a pivot. In contrast, 2 and 9
are never compared because the first pivot element chosen from Z 2,9 is
7. The next lemma gives the probability that two elements are
compared.
Lemma 7.3
Consider an execution of the procedure RANDOMIZED-
QUICKSORT on an array of n distinct elements z 1 < z 2 < ⋯ < zn.
Given two arbitrary elements zi and zj where i < j, the probability that they are compared is 2/( j – i + 1).
Proof Let’s look at the tree of recursive calls that RANDOMIZED-
QUICKSORT makes, and consider the sets of elements provided as
input to each call. Initially, the root set contains all the elements of Zij,
since the root set contains every element in A. The elements belonging
to Zij all stay together for each recursive call of RANDOMIZED-
QUICKSORT until PARTITION chooses some element x ∈ Zij as a
pivot. From that point on, the pivot x appears in no subsequent input
set. The first time that RANDOMIZED-SELECT chooses a pivot x ∈
Zij from a set containing all the elements of Zij, each element in Zij is equally likely to be x because the pivot is chosen uniformly at random.
Since | Zij| = j – i + 1, the probability is 1/( j – i + 1) that any given element in Zij is the first pivot chosen from Zij. Thus, by Lemma 7.2, we have
Pr { zi is compared with Pr { zi or zj is the first pivot chosen from zj}
= Zij}
= Pr { zi is the first pivot chosen from Zij}
+ Pr { zj is the first pivot chosen from
Zij}
=
,
where the second line follows from the first because the two events are
mutually exclusive.
▪
We can now complete the analysis of randomized quicksort.
Theorem 7.4

The expected running time of RANDOMIZED-QUICKSORT on an
input of n distinct elements is O( n lg n).
Proof The analysis uses indicator random variables (see Section 5.2).
Let the n distinct elements be z 1 < z 2 < ⋯ < zn, and for 1 ≤ i < j ≤ n, define the indicator random variable Xij = I { zi is compared with zj}.
From Lemma 7.2, each pair is compared at most once, and so we can
express X as follows:
By taking expectations of both sides and using linearity of expectation
(equation (C.24) on page 1192) and Lemma 5.1 on page 130, we obtain
We can evaluate this sum using a change of variables ( k = j – i) and the bound on the harmonic series in equation (A.9) on page 1142:
This bound and Lemma 7.1 allow us to conclude that the expected
running time of RANDOMIZED-QUICKSORT is O( n lg n) (assuming
that the element values are distinct).
▪
Exercises
7.4-1
Show that the recurrence
T ( n) = max { T ( q) + T ( n – q – 1) : 0 ≤ q ≤ n – 1} + Θ( n) has a lower bound of T ( n) = Ω ( n 2).
7.4-2
Show that quicksort’s best-case running time is Ω( n lg n).
7.4-3
Show that the expression q 2 + ( n – q – 1)2 achieves its maximum value over q = 0, 1, … , n – 1 when q = 0 or q = n – 1.
7.4-4
Show that RANDOMIZED-QUICKSORT’s expected running time is
Ω( n lg n).
7.4-5
Coarsening the recursion, as we did in Problem 2-1 for merge sort, is a common way to improve the running time of quicksort in practice. We
modify the base case of the recursion so that if the array has fewer than
k elements, the subarray is sorted by insertion sort, rather than by continued recursive calls to quicksort. Argue that the randomized
version of this sorting algorithm runs in O( nk + n lg( n/ k)) expected time.
How should you pick k, both in theory and in practice?
★ 7.4-6
Consider modifying the PARTITION procedure by randomly picking
three elements from subarray A[ p : r] and partitioning about their median (the middle value of the three elements). Approximate the
probability of getting worse than an α-to-(1– α) split, as a function of α
in the range 0 < α < 1/2.
Problems
7-1 Hoare partition correctness
The version of PARTITION given in this chapter is not the original
partitioning algorithm. Here is the original partitioning algorithm,
which is due to C. A. R. Hoare.
HOARE-PARTITION( A, p, r)
1
x = A[ p]
2
i = p – 1
3
j = r + 1
4
while TRUE
5
repeat
6
j = j – 1
7
until A[ j] ≤ x
8
repeat
9
i = i + 1
10
until A[ i] ≥ x
11
if i < j
12
exchange A[ i] with A[ j]
else return j
a. Demonstrate the operation of HOARE-PARTITION on the array A
= 〈13, 19, 9, 5, 12, 8, 7, 4, 11, 2, 6, 21〉, showing the values of the array
and the indices i and j after each iteration of the while loop in lines 4–
13.
b. Describe how the PARTITION procedure in Section 7.1 differs from HOARE-PARTITION when all elements in A[ p : r] are equal.
Describe a practical advantage of HOARE-PARTITION over
PARTITION for use in quicksort.
The next three questions ask you to give a careful argument that the
procedure HOARE-PARTITION is correct. Assuming that the
subarray A[ p : r] contains at least two elements, prove the following: c. The indices i and j are such that the procedure never accesses an element of A outside the subarray A[ p : r].
d. When HOARE-PARTITION terminates, it returns a value j such
that p ≤ j < r.
e. Every element of A[ p : j] is less than or equal to every element of A[ j +
1 : r] when HOARE-PARTITION terminates.
The PARTITION procedure in Section 7.1 separates the pivot value (originally in A[ r]) from the two partitions it forms. The HOARE-PARTITION procedure, on the other hand, always places the pivot
value (originally in A[ p]) into one of the two partitions A[ p : j] and A[ j +
1 : r]. Since p ≤ j < r, neither partition is empty.
f. Rewrite the QUICKSORT procedure to use HOARE-PARTITION.
7-2 Quicksort with equal element values
The analysis of the expected running time of randomized quicksort in
Section 7.4.2 assumes that all element values are distinct. This problem examines what happens when they are not.
a. Suppose that all element values are equal. What is randomized quicksort’s running time in this case?
b. The PARTITION procedure returns an index q such that each
element of A[ p : q – 1] is less than or equal to A[ q] and each element of A[ q + 1 : r] is greater than A[ q]. Modify the PARTITION procedure to produce a procedure PARTITION′ ( A, p, r), which permutes the elements of A[ p : r] and returns two indices q and t, where p ≤ q ≤ t ≤ r, such that
all elements of A[ q : t] are equal,
each element of A[ p : q – 1] is less than A[ q], and each element of A[ t + 1 : r] is greater than A[ q].
Like PARTITION, your PARTITION′ procedure should take Θ( r –
p) time.
c. Modify the RANDOMIZED-PARTITION procedure to call
PARTITION′, and name the new procedure RANDOMIZED-
PARTITION′. Then modify the QUICKSORT procedure to produce
a procedure QUICKSORT′ ( A, p, r) that calls RANDOMIZED-
PARTITION′ and recurses only on partitions where elements are not
known to be equal to each other.
d. Using QUICKSORT′, adjust the analysis in Section 7.4.2 to avoid the assumption that all elements are distinct.
7-3 Alternative quicksort analysis
An alternative analysis of the running time of randomized quicksort
focuses on the expected running time of each individual recursive call to
RANDOMIZED-QUICKSORT, rather than on the number of
comparisons performed. As in the analysis of Section 7.4.2, assume that the values of the elements are distinct.
a. Argue that, given an array of size n, the probability that any
particular element is chosen as the pivot is 1/ n. Use this probability to
define indicator random variables Xi = I { i th smallest element is
chosen as the pivot}. What is E [ Xi]?


b. Let T ( n) be a random variable denoting the running time of quicksort on an array of size n. Argue that
c. Show how to rewrite equation (7.2) as
d. Show that
for n ≥ 2. ( Hint: Split the summation into two parts, one summation
for q = 1, 2, … , ⌈ n/2⌉ – 1 and one summation for q = ⌈ n/2⌉ , … , n –
1.)
e. Using the bound from equation (7.4), show that the recurrence in
equation (7.3) has the solution E [ T ( n)] = O( n lg n). ( Hint: Show, by substitution, that E [ T ( n)] ≤ an lg n for sufficiently large n and for some positive constant a.)
7-4 Stooge sort
Professors Howard, Fine, and Howard have proposed a deceptively
simple sorting algorithm, named stooge sort in their honor, appearing
on the following page.
a. Argue that the call STOOGE-SORT( A, 1, n) correctly sorts the array A[1 : n].
b. Give a recurrence for the worst-case running time of STOOGE-SORT
and a tight asymptotic (Θ-notation) bound on the worst-case running
time.
c. Compare the worst-case running time of STOOGE-SORT with that of insertion sort, merge sort, heapsort, and quicksort. Do the
professors deserve tenure?
STOOGE-SORT( A, p, r)
1
if A[ p] > A[ r]
2
exchange A[ p] with A[ r]
3
if p + 1 < r
4
k = ⌊( r – p + 1)/3⌊
// round down
5
STOOGE-SORT( A, p,// first two-thirds
r – k)
6
STOOGE-SORT( A, p// last two-thirds
+ k, r)
7
STOOGE-SORT( A, p,// first two-thirds
r – k)
again
7-5 Stack depth for quicksort
The QUICKSORT procedure of Section 7.1 makes two recursive calls
to itself. After QUICKSORT calls PARTITION, it recursively sorts the
low side of the partition and then it recursively sorts the high side of the
partition. The second recursive call in QUICKSORT is not really
necessary, because the procedure can instead use an iterative control
structure. This transformation technique, called tail-recursion
elimination, is provided automatically by good compilers. Applying tail-
recursion elimination transforms QUICKSORT into the TRE-
QUICKSORT procedure.
TRE-QUICKSORT( A, p, r)
1 while p < r
2
// Partition and then sort the low side.
3
q = PARTITION( A, p, r)
4
TRE-QUICKSORT( A, p, q – 1)
5
p = q + 1
a. Argue that TRE-QUICKSORT( A, 1, n) correctly sorts the array A[1 : n].
Compilers usually execute recursive procedures by using a stack that
contains pertinent information, including the parameter values, for each
recursive call. The information for the most recent call is at the top of
the stack, and the information for the initial call is at the bottom. When
a procedure is called, its information is pushed onto the stack, and when
it terminates, its information is popped. Since we assume that array
parameters are represented by pointers, the information for each
procedure call on the stack requires O(1) stack space. The stack depth is the maximum amount of stack space used at any time during a
computation.
b. Describe a scenario in which TRE-QUICKSORT’s stack depth is
Θ( n) on an n-element input array.
c. Modify TRE-QUICKSORT so that the worst-case stack depth is Θ(lg
n). Maintain the O( n lg n) expected running time of the algorithm.
7-6 Median-of-3 partition
One way to improve the RANDOMIZED-QUICKSORT procedure is
to partition around a pivot that is chosen more carefully than by
picking a random element from the subarray. A common approach is
the median-of-3 method: choose the pivot as the median (middle
element) of a set of 3 elements randomly selected from the subarray.
(See Exercise 7.4-6.) For this problem, assume that the n elements in the
input subarray A[ p : r] are distinct and that n ≥ 3. Denote the sorted version of A[ p : r] by z 1, z 2, … , zn. Using the median-of-3 method to choose the pivot element x, define pi = Pr { x = zi}.
a. Give an exact formula for pi as a function of n and i for i = 2, 3, … , n
– 1. (Observe that p 1 = pn = 0.)
b. By what amount does the median-of-3 method increase the likelihood
of choosing the pivot to be x = z⌊( n + 1)/2⌊, the median of A[ p : r],
compared with the ordinary implementation? Assume that n → ∞,
and give the limiting ratio of these probabilities.
c. Suppose that we define a “good” split to mean choosing the pivot as
x = zi, where n/3 ≤ i ≤ 2 n/3. By what amount does the median-of-3
method increase the likelihood of getting a good split compared with
the ordinary implementation? ( Hint: Approximate the sum by an
integral.)
d. Argue that in the Ω( n lg n) running time of quicksort, the median-of-3
method affects only the constant factor.
7-7 Fuzzy sorting of intervals
Consider a sorting problem in which you do not know the numbers
exactly. Instead, for each number, you know an interval on the real line
to which it belongs. That is, you are given n closed intervals of the form
[ ai, bi], where ai ≤ bi. The goal is to fuzzy-sort these intervals: to produce a permutation 〈 i 1, i 2, … , in〉 of the intervals such that for j = 1, 2, … , n, there exist
satisfying c 1 ≤ c 2 ≤ ⋯ ≤ cn.
a. Design a randomized algorithm for fuzzy-sorting n intervals. Your
algorithm should have the general structure of an algorithm that
quicksorts the left endpoints (the ai values), but it should take
advantage of overlapping intervals to improve the running time. (As
the intervals overlap more and more, the problem of fuzzy-sorting the
intervals becomes progressively easier. Your algorithm should take
advantage of such overlapping, to the extent that it exists.)
b. Argue that your algorithm runs in Θ( n lg n) expected time in general, but runs in Θ( n) expected time when all of the intervals overlap (i.e.,
when there exists a value x such that x ∈ [ ai, bi] for all i). Your algorithm should not be checking for this case explicitly, but rather, its
performance should naturally improve as the amount of overlap
increases.
Quicksort was invented by Hoare [219], and his version of PARTITION
appears in Problem 7-1. Bentley [51, p. 117] attributes the PARTITION
procedure given in Section 7.1 to N. Lomuto. The analysis in Section 7.4
based on an analysis due to Motwani and Raghavan [336]. Sedgewick
[401] and Bentley [51] provide good references on the details of implementation and how they matter.
McIlroy [323] shows how to engineer a “killer adversary” that produces an array on which virtually any implementation of quicksort
takes Θ( n 2) time.
1 You can enforce the assumption that the values in an array A are distinct at the cost of Θ( n) additional space and only constant overhead in running time by converting each input value A[ i]
to an ordered pair ( A[ i], i) with ( A[ i], i) < ( A[ j], j) if A[ i] < A[ j] or if A[ i] = A[ j] and i < j. There are also more practical variants of quicksort that work well when elements are not distinct.
We have now seen a handful of algorithms that can sort n numbers in
O( n lg n) time. Whereas merge sort and heapsort achieve this upper bound in the worst case, quicksort achieves it on average. Moreover, for
each of these algorithms, we can produce a sequence of n input numbers
that causes the algorithm to run in Ω( n lg n) time.
These algorithms share an interesting property: the sorted order they
determine is based only on comparisons between the input elements. We
call such sorting algorithms comparison sorts. All the sorting algorithms
introduced thus far are comparison sorts.
In Section 8.1, we’ll prove that any comparison sort must make Ω( n lg n) comparisons in the worst case to sort n elements. Thus, merge sort and heapsort are asymptotically optimal, and no comparison sort exists
that is faster by more than a constant factor.
Sections 8.2, 8.3, and 8.4 examine three sorting algorithms—
counting sort, radix sort, and bucket sort—that run in linear time on
certain types of inputs. Of course, these algorithms use operations other
than comparisons to determine the sorted order. Consequently, the Ω( n
lg n) lower bound does not apply to them.
A comparison sort uses only comparisons between elements to gain
order information about an input sequence 〈 a 1, a 2, … , an〉. That is, given two elements ai and aj, it performs one of the tests ai < aj, ai ≤ aj,
ai = aj, ai ≥ aj, or ai > aj to determine their relative order. It may not inspect the values of the elements or gain order information about them
in any other way.
Since we are proving a lower bound, we assume without loss of
generality in this section that all the input elements are distinct. After
all, a lower bound for distinct elements applies when elements may or
may not be distinct. Consequently, comparisons of the form ai = aj are
useless, which means that we can assume that no comparisons for exact
equality occur. Moreover, the comparisons ai ≤ aj, ai ≥ aj, ai > aj, and ai
< aj are all equivalent in that they yield identical information about the
relative order of ai and aj. We therefore assume that all comparisons have the form ai ≤ aj.
Figure 8.1 The decision tree for insertion sort operating on three elements. An internal node (shown in blue) annotated by i : j indicates a comparison between ai and aj. A leaf annotated by the permutation 〈 π(1), π(2), … , π( n)〉 indicates the ordering aπ(1) ≤ aπ(2) ≤ ⋯ ≤ aπ( n). The highlighted path indicates the decisions made when sorting the input sequence 〈 a 1 = 6, a 2 = 8, a 3 = 5〉. Going left from the root node, labeled 1:2, indicates that a 1 ≤ a 2. Going right from the node labeled 2:3 indicates that a 2 > a 3. Going right from the node labeled 1:3 indicates that a 1
> a 3. Therefore, we have the ordering a 3 ≤ a 1 ≤ a 2, as indicated in the leaf labeled 〈3, 1, 2〉.
Because the three input elements have 3! = 6 possible permutations, the decision tree must have at least 6 leaves.
The decision-tree model
We can view comparison sorts abstractly in terms of decision trees. A
decision tree is a full binary tree (each node is either a leaf or has both
children) that represents the comparisons between elements that are
performed by a particular sorting algorithm operating on an input of a given size. Control, data movement, and all other aspects of the
algorithm are ignored. Figure 8.1 shows the decision tree corresponding to the insertion sort algorithm from Section 2.1 operating on an input sequence of three elements.
A decision tree has each internal node annotated by i : j for some i and j in the range 1 ≤ i, j ≤ n, where n is the number of elements in the input sequence. We also annotate each leaf by a permutation 〈 π(1), π(2),
… , π( n)〉. (See Section C.1 for background on permutations.) Indices in the internal nodes and the leaves always refer to the original positions of
the array elements at the start of the sorting algorithm. The execution of
the comparison sorting algorithm corresponds to tracing a simple path
from the root of the decision tree down to a leaf. Each internal node
indicates a comparison ai ≤ aj. The left subtree then dictates subsequent comparisons once we know that ai ≤ aj, and the right subtree dictates
subsequent comparisons when ai > aj. Arriving at a leaf, the sorting algorithm has established the ordering aπ(1) ≤ aπ(2) ≤ ⋯ ≤ aπ( n).
Because any correct sorting algorithm must be able to produce each
permutation of its input, each of the n! permutations on n elements must appear as at least one of the leaves of the decision tree for a
comparison sort to be correct. Furthermore, each of these leaves must
be reachable from the root by a downward path corresponding to an
actual execution of the comparison sort. (We call such leaves
“reachable.”) Thus, we consider only decision trees in which each
permutation appears as a reachable leaf.
A lower bound for the worst case
The length of the longest simple path from the root of a decision tree to
any of its reachable leaves represents the worst-case number of
comparisons that the corresponding sorting algorithm performs.
Consequently, the worst-case number of comparisons for a given
comparison sort algorithm equals the height of its decision tree. A lower
bound on the heights of all decision trees in which each permutation
appears as a reachable leaf is therefore a lower bound on the running
time of any comparison sort algorithm. The following theorem
establishes such a lower bound.
Theorem 8.1
Any comparison sort algorithm requires Ω( n lg n) comparisons in the worst case.
Proof From the preceding discussion, it suffices to determine the height
of a decision tree in which each permutation appears as a reachable leaf.
Consider a decision tree of height h with l reachable leaves
corresponding to a comparison sort on n elements. Because each of the
n! permutations of the input appears as one or more leaves, we have n! ≤
l. Since a binary tree of height h has no more than 2 h leaves, we have n! ≤ l ≤ 2 h,
which, by taking logarithms, implies
h ≥ lg( n!)
(since the lg function is monotonically increasing)
= Ω ( n lg n) (by equation (3.28) on page 67).
▪
Corollary 8.2
Heapsort and merge sort are asymptotically optimal comparison sorts.
Proof The O( n lg n) upper bounds on the running times for heapsort and merge sort match the Ω( n lg n) worst-case lower bound from Theorem 8.1.
▪
Exercises
8.1-1
What is the smallest possible depth of a leaf in a decision tree for a
comparison sort?
8.1-2
Obtain asymptotically tight bounds on lg( n!) without using Stirling’s
approximation. Instead, evaluate the summation
using
techniques from Section A.2.
8.1-3
Show that there is no comparison sort whose running time is linear for
at least half of the n! inputs of length n. What about a fraction of 1/ n of the inputs of length n? What about a fraction 1/2 n?
8.1-4
You are given an n-element input sequence, and you know in advance
that it is partly sorted in the following sense. Each element initially in
position i such that i mod 4 = 0 is either already in its correct position, or it is one place away from its correct position. For example, you know
that after sorting, the element initially in position 12 belongs in position
11, 12, or 13. You have no advance information about the other
elements, in positions i where i mod 4 ≠ 0. Show that an Ω( n lg n) lower bound on comparison-based sorting still holds in this case.
Counting sort assumes that each of the n input elements is an integer in the range 0 to k, for some integer k. It runs in Θ( n + k) time, so that when k = O( n), counting sort runs in Θ( n) time.
Counting sort first determines, for each input element x, the number
of elements less than or equal to x. It then uses this information to place
element x directly into its position in the output array. For example, if
17 elements are less than or equal to x, then x belongs in output position 17. We must modify this scheme slightly to handle the situation
in which several elements have the same value, since we do not want
them all to end up in the same position.
The COUNTING-SORT procedure on the facing page takes as
input an array A[1 : n], the size n of this array, and the limit k on the nonnegative integer values in A. It returns its sorted output in the array
B[1 : n] and uses an array C [0 : k] for temporary working storage.
1let B[1 : n] and C [0 : k] be new
arrays
2for i = 0 to k
3
C [ i] = 0
4for j = 1 to n
5
C [ A[ j]] = C [ A[ j]] + 1
6// C [ i] now contains the number of elements equal to i.
7for i = 1 to k
8
C [ i] = C [ i] + C [ i – 1]
9// C [ i] now contains the number of elements less than or
equal to i.
10// Copy A to B, starting from the end of A.
11for j = n downto 1
12
B[ C [ A[ j]]] = A[ j]
13
C [ A[ j]] = C [ A[ j]] – 1
// to handle duplicate
values
14return B
Figure 8.2 illustrates counting sort. After the for loop of lines 2–3
initializes the array C to all zeros, the for loop of lines 4–5 makes a pass
over the array A to inspect each input element. Each time it finds an input element whose value is i, it increments C [ i]. Thus, after line 5, C
[ i] holds the number of input elements equal to i for each integer i = 0, 1, … , k. Lines 7–8 determine for each i = 0, 1, … , k how many input elements are less than or equal to i by keeping a running sum of the array C.
Finally, the for loop of lines 11–13 makes another pass over A, but in
reverse, to place each element A[ j] into its correct sorted position in the output array B. If all n elements are distinct, then when line 11 is first entered, for each A[ j], the value C [ A[ j]] is the correct final position of A[ j] in the output array, since there are C [ A[ j]] elements less than or equal to A[ j]. Because the elements might not be distinct, the loop decrements C [ A[ j]] each time it places a value A[ j] into B. Decrementing C [ A[ j]] causes the previous element in A with a value equal to A[ j], if
one exists, to go to the position immediately before A[ j] in the output array B.
How much time does counting sort require? The for loop of lines 2–3
takes Θ( k) time, the for loop of lines 4–5 takes Θ( n) time, the for loop of lines 7–8 takes Θ( k) time, and the for loop of lines 11–13 takes Θ( n) time. Thus, the overall time is Θ( k + n). In practice, we usually use counting sort when we have k = O( n), in which case the running time is Θ( n).
Counting sort can beat the lower bound of Ω( n lg n) proved in
Section 8.1 because it is not a comparison sort. In fact, no comparisons between input elements occur anywhere in the code. Instead, counting
sort uses the actual values of the elements to index into an array. The
Ω( n lg n) lower bound for sorting does not apply when we depart from
the comparison sort model.
Figure 8.2 The operation of COUNTING-SORT on an input array A[1 : 8], where each element of A is a nonnegative integer no larger than k = 5. (a) The array A and the auxiliary array C
after line 5. (b) The array C after line 8. (c)–(e) The output array B and the auxiliary array C
after one, two, and three iterations of the loop in lines 11–13, respectively. Only the tan elements of array B have been filled in. (f) The final sorted output array B.
An important property of counting sort is that it is stable: elements
with the same value appear in the output array in the same order as they
do in the input array. That is, it breaks ties between two elements by the
rule that whichever element appears first in the input array appears first
in the output array. Normally, the property of stability is important only when satellite data are carried around with the element being sorted.
Counting sort’s stability is important for another reason: counting sort
is often used as a subroutine in radix sort. As we shall see in the next
section, in order for radix sort to work correctly, counting sort must be
stable.
Exercises
8.2-1
Using Figure 8.2 as a model, illustrate the operation of COUNTING-
SORT on the array A = 〈6, 0, 2, 0, 1, 3, 4, 6, 1, 3, 2〉.
8.2-2
Prove that COUNTING-SORT is stable.
8.2-3
Suppose that we were to rewrite the for loop header in line 11 of the
COUNTING-SORT as
11for j = 1 to n
Show that the algorithm still works properly, but that it is not stable.
Then rewrite the pseudocode for counting sort so that elements with the
same value are written into the output array in order of increasing index
and the algorithm is stable.
8.2-4
Prove the following loop invariant for COUNTING-SORT:
At the start of each iteration of the for loop of lines 11–13, the
last element in A with value i that has not yet been copied into
B belongs in B[ C [ i]].
8.2-5
Suppose that the array being sorted contains only integers in the range 0
to k and that there are no satellite data to move with those keys. Modify
counting sort to use just the arrays A and C, putting the sorted result back into array A instead of into a new array B.
8.2-6
Describe an algorithm that, given n integers in the range 0 to k, preprocesses its input and then answers any query about how many of
the n integers fall into a range [ a : b] in O(1) time. Your algorithm should use Θ( n + k) preprocessing time.
8.2-7
Counting sort can also work efficiently if the input values have
fractional parts, but the number of digits in the fractional part is small.
Suppose that you are given n numbers in the range 0 to k, each with at
most d decimal (base 10) digits to the right of the decimal point. Modify
counting sort to run in Θ( n + 10 d k) time.
Radix sort is the algorithm used by the card-sorting machines you now
find only in computer museums. The cards have 80 columns, and in each
column a machine can punch a hole in one of 12 places. The sorter can
be mechanically “programmed” to examine a given column of each card
in a deck and distribute the card into one of 12 bins depending on
which place has been punched. An operator can then gather the cards
bin by bin, so that cards with the first place punched are on top of cards
with the second place punched, and so on.
Figure 8.3 The operation of radix sort on seven 3-digit numbers. The leftmost column is the input. The remaining columns show the numbers after successive sorts on increasingly significant digit positions. Tan shading indicates the digit position sorted on to produce each list from the previous one.
For decimal digits, each column uses only 10 places. (The other two
places are reserved for encoding nonnumeric characters.) A d-digit
number occupies a field of d columns. Since the card sorter can look at
only one column at a time, the problem of sorting n cards on a d-digit
number requires a sorting algorithm.
Intuitively, you might sort numbers on their most significant
(leftmost) digit, sort each of the resulting bins recursively, and then
combine the decks in order. Unfortunately, since the cards in 9 of the 10
bins must be put aside to sort each of the bins, this procedure generates
many intermediate piles of cards that you would have to keep track of.
(See Exercise 8.3-6.)
Radix sort solves the problem of card sorting—counterintuitively—
by sorting on the least significant digit first. The algorithm then
combines the cards into a single deck, with the cards in the 0 bin
preceding the cards in the 1 bin preceding the cards in the 2 bin, and so
on. Then it sorts the entire deck again on the second-least significant
digit and recombines the deck in a like manner. The process continues
until the cards have been sorted on all d digits. Remarkably, at that point the cards are fully sorted on the d-digit number. Thus, only d passes through the deck are required to sort. Figure 8.3 shows how radix sort operates on a “deck” of seven 3-digit numbers.
In order for radix sort to work correctly, the digit sorts must be
stable. The sort performed by a card sorter is stable, but the operator
must be careful not to change the order of the cards as they come out of
a bin, even though all the cards in a bin have the same digit in the chosen column.
In a typical computer, which is a sequential random-access machine,
we sometimes use radix sort to sort records of information that are
keyed by multiple fields. For example, we might wish to sort dates by
three keys: year, month, and day. We could run a sorting algorithm with
a comparison function that, given two dates, compares years, and if
there is a tie, compares months, and if another tie occurs, compares
days. Alternatively, we could sort the information three times with a
stable sort: first on day (the “least significant” part), next on month, and
finally on year.
The code for radix sort is straightforward. The RADIX-SORT
procedure assumes that each element in array A[1 : n] has d digits, where digit 1 is the lowest-order digit and digit d is the highest-order digit.
RADIX-SORT( A, n, d)
1 for i = 1 to d
2
use a stable sort to sort array A[1 : n] on
digit i
Although the pseudocode for RADIX-SORT does not specify which
stable sort to use, COUNTING-SORT is commonly used. If you use
COUNTING-SORT as the stable sort, you can make RADIX-SORT a
little more efficient by revising COUNTING-SORT to take a pointer to
the output array as a parameter, having RADIX-SORT preallocate this
array, and alternating input and output between the two arrays in
successive iterations of the for loop in RADIX-SORT.
Lemma 8.3
Given n d-digit numbers in which each digit can take on up to k possible values, RADIX-SORT correctly sorts these numbers in Θ( d( n + k)) time if the stable sort it uses takes Θ( n + k) time.
Proof The correctness of radix sort follows by induction on the column
being sorted (see Exercise 8.3-3). The analysis of the running time
depends on the stable sort used as the intermediate sorting algorithm.
When each digit lies in the range 0 to k – 1 (so that it can take on k possible values), and k is not too large, counting sort is the obvious choice. Each pass over n d-digit numbers then takes Θ( n + k) time.
There are d passes, and so the total time for radix sort is Θ( d( n + k)).
▪
When d is constant and k = O( n), we can make radix sort run in linear time. More generally, we have some flexibility in how to break
each key into digits.
Lemma 8.4
Given n b-bit numbers and any positive integer r ≤ b, RADIX-SORT
correctly sorts these numbers in Θ(( b/ r)( n + 2 r)) time if the stable sort it uses takes Θ( n + k) time for inputs in the range 0 to k.
Proof For a value r ≤ b, view each key as having d = ⌈ b/ r⌉ digits of r bits each. Each digit is an integer in the range 0 to 2 r – 1, so that we can use
counting sort with k = 2 r – 1. (For example, we can view a 32-bit word
as having four 8-bit digits, so that b = 32, r = 8, k = 2 r – 1 = 255, and d
= b/ r = 4.) Each pass of counting sort takes Θ( n + k) = Θ( n + 2 r) time and there are d passes, for a total running time of Θ( d( n + 2 r)) = Θ(( b/ r) ( n + 2 r)).
▪
Given n and b, what value of r ≤ b minimizes the expression ( b/ r)( n +
2 r)? As r decreases, the factor b/ r increases, but as r increases so does 2 r.
The answer depends on whether b < ⌊lg n⌊. If b < ⌊lg n⌊, then r ≤ b implies ( n + 2 r) = Θ( n). Thus, choosing r = b yields a running time of ( b/ b)( n + 2 b) = Θ( n), which is asymptotically optimal. If b ≥ ⌊lg n⌊, then choosing r = ⌊lg n⌊ gives the best running time to within a constant factor, which we can see as follows. 1 Choosing r = ⌊lg n⌊ yields a running time of Θ( bn/lg n). As r increases above ⌊lg n⌊, the 2 r term in the
numerator increases faster than the r term in the denominator, and so increasing r above ⌊lg n⌊ yields a running time of Ω( bn / lg n). If instead r were to decrease below ⌊lg n⌊, then the b/ r term increases and the n + 2 r term remains at Θ( n).
Is radix sort preferable to a comparison-based sorting algorithm,
such as quicksort? If b = O(lg n), as is often the case, and r ≈ lg n, then radix sort’s running time is Θ( n), which appears to be better than
quicksort’s expected running time of Θ( n lg n). The constant factors hidden in the Θ-notation differ, however. Although radix sort may make
fewer passes than quicksort over the n keys, each pass of radix sort may
take significantly longer. Which sorting algorithm to prefer depends on
the characteristics of the implementations, of the underlying machine
(e.g., quicksort often uses hardware caches more effectively than radix
sort), and of the input data. Moreover, the version of radix sort that
uses counting sort as the intermediate stable sort does not sort in place,
which many of the Θ( n lg n)-time comparison sorts do. Thus, when primary memory storage is at a premium, an in-place algorithm such as
quicksort could be the better choice.
Exercises
8.3-1
Using Figure 8.3 as a model, illustrate the operation of RADIX-SORT
on the following list of English words: COW, DOG, SEA, RUG, ROW,
MOB, BOX, TAB, BAR, EAR, TAR, DIG, BIG, TEA, NOW, FOX.
8.3-2
Which of the following sorting algorithms are stable: insertion sort,
merge sort, heapsort, and quicksort? Give a simple scheme that makes
any comparison sort stable. How much additional time and space does
your scheme entail?
8.3-3
Use induction to prove that radix sort works. Where does your proof
need the assumption that the intermediate sort is stable?
8.3-4
Suppose that COUNTING-SORT is used as the stable sort within
RADIX-SORT. If RADIX-SORT calls COUNTING-SORT d times,
then since each call of COUNTING-SORT makes two passes over the
data (lines 4–5 and 11–13), altogether 2 d passes over the data occur.
Describe how to reduce the total number of passes to d + 1.
8.3-5
Show how to sort n integers in the range 0 to n 3 – 1 in O( n) time.
★ 8.3-6
In the first card-sorting algorithm in this section, which sorts on the
most significant digit first, exactly how many sorting passes are needed
to sort d-digit decimal numbers in the worst case? How many piles of
cards does an operator need to keep track of in the worst case?
Bucket sort assumes that the input is drawn from a uniform distribution
and has an average-case running time of O( n). Like counting sort, bucket sort is fast because it assumes something about the input.
Whereas counting sort assumes that the input consists of integers in a
small range, bucket sort assumes that the input is generated by a
random process that distributes elements uniformly and independently
over the interval [0, 1). (See Section C.2 for a definition of a uniform distribution.)
Bucket sort divides the interval [0, 1) into n equal-sized subintervals,
or buckets, and then distributes the n input numbers into the buckets.
Since the inputs are uniformly and independently distributed over [0, 1),
we do not expect many numbers to fall into each bucket. To produce the
output, we simply sort the numbers in each bucket and then go through
the buckets in order, listing the elements in each.
The BUCKET-SORT procedure on the next page assumes that the
input is an array A[1 : n] and that each element A[ i] in the array satisfies 0 ≤ A[ i] < 1. The code requires an auxiliary array B[0 : n – 1] of linked lists (buckets) and assumes that there is a mechanism for maintaining
such lists. (Section 10.2 describes how to implement basic operations on linked lists.) Figure 8.4 shows the operation of bucket sort on an input array of 10 numbers.
Figure 8.4 The operation of BUCKET-SORT for n = 10. (a) The input array A[1 : 10]. (b) The array B[0 : 9] of sorted lists (buckets) after line 7 of the algorithm, with slashes indicating the end of each bucket. Bucket i holds values in the half-open interval [ i/10, ( i + 1)/10). The sorted output consists of a concatenation of the lists B[0], B[1], … , B[9] in order.
BUCKET-SORT( A, n)
1 let B[0 : n – 1] be a new array
2 for i = 0 to n – 1
3
make B[ i] an empty list
4 for i = 1 to n
5
insert A[ i] into list B[⌊ n · A[ i]⌊]
6 for i = 0 to n – 1
7
sort list B[ i] with insertion sort
8 concatenate the lists B[0], B[1], … , B[ n – 1] together in order 9 return the concatenated lists
To see that this algorithm works, consider two elements A[ i] and A[ j].
Assume without loss of generality that A[ i] ≤ A[ j]. Since ⌊ n · A[ i]⌊ ≤ ⌊ n ·
A[ j]⌊, either element A[ i] goes into the same bucket as A[ j] or it goes into



a bucket with a lower index. If A[ i] and A[ j] go into the same bucket, then the for loop of lines 6–7 puts them into the proper order. If A[ i]
and A[ j] go into different buckets, then line 8 puts them into the proper order. Therefore, bucket sort works correctly.
To analyze the running time, observe that, together, all lines except
line 7 take O( n) time in the worst case. We need to analyze the total time taken by the n calls to insertion sort in line 7.
To analyze the cost of the calls to insertion sort, let ni be the random
variable denoting the number of elements placed in bucket B[ i]. Since insertion sort runs in quadratic time (see Section 2.2), the running time of bucket sort is
We now analyze the average-case running time of bucket sort, by
computing the expected value of the running time, where we take the
expectation over the input distribution. Taking expectations of both
sides and using linearity of expectation (equation (C.24) on page 1192),
we have
We claim that
for i = 0, 1, … , n – 1. It is no surprise that each bucket i has the same value of
, since each value in the input array A is equally likely to
fall in any bucket.
To prove equation (8.3), view each random variable ni as the number
of successes in n Bernoulli trials (see Section C.4). Success in a trial occurs when an element goes into bucket B[ i], with a probability p = 1/ n of success and q = 1 – 1/ n of failure. A binomial distribution counts ni, the number of successes, in the n trials. By equations (C.41) and (C.44)
on pages 1199–1200, we have E [ ni] = np = n(1/ n) = 1 and Var [ ni] = npq
= 1 – 1/ n. Equation (C.31) on page 1194 gives
= Var [ ni] + E2 [ ni]
= (1 – 1/ n) + 12
= 2 – 1/ n,
which proves equation (8.3). Using this expected value in equation (8.2),
we get that the average-case running time for bucket sort is Θ( n) + n ·
O(2 – 1/ n) = Θ( n).
Even if the input is not drawn from a uniform distribution, bucket
sort may still run in linear time. As long as the input has the property
that the sum of the squares of the bucket sizes is linear in the total
number of elements, equation (8.1) tells us that bucket sort runs in
linear time.
Exercises
8.4-1
Using Figure 8.4 as a model, illustrate the operation of BUCKET-SORT on the array A = 〈.79, .13, .16, .64, .39, .20, .89, .53, .71, .42〉.
8.4-2
Explain why the worst-case running time for bucket sort is Θ( n 2). What
simple change to the algorithm preserves its linear average-case running
time and makes its worst-case running time O( n lg n)?
8.4-3
Let X be a random variable that is equal to the number of heads in two
flips of a fair coin. What is E [ X 2]? What is E2 [ X]?


8.4-4
An array A of size n > 10 is filled in the following way. For each element A[ i], choose two random variables xi and yi uniformly and independently from [0, 1). Then set
Modify bucket sort so that it sorts the array A in O( n) expected time.
★ 8.4-5
You are given n points in the unit disk, pi = ( xi, yi), such that for i = 1, 2, … , n. Suppose that the points are uniformly
distributed, that is, the probability of finding a point in any region of the
disk is proportional to the area of that region. Design an algorithm with
an average-case running time of Θ( n) to sort the n points by their distances
from the origin. ( Hint: Design the bucket sizes
in BUCKET-SORT to reflect the uniform distribution of the points in
the unit disk.)
★ 8.4-6
A probability distribution function P( x) for a random variable X is defined by P( x) = Pr { X ≤ x}. Suppose that you draw a list of n random variables X 1, X 2, … , Xn from a continuous probability distribution function P that is computable in O(1) time (given y you can find x such that P( x) = y in O(1) time). Give an algorithm that sorts these numbers in linear average-case time.
Problems
8-1 Probabilistic lower bounds on comparison sorting
In this problem, you will prove a probabilistic Ω( n lg n) lower bound on the running time of any deterministic or randomized comparison sort
on n distinct input elements. You’ll begin by examining a deterministic
comparison sort A with decision tree TA. Assume that every permutation of A’s inputs is equally likely.
a. Suppose that each leaf of TA is labeled with the probability that it is reached given a random input. Prove that exactly n! leaves are labeled
1/ n! and that the rest are labeled 0.
b. Let D( T) denote the external path length of a decision tree T—the sum of the depths of all the leaves of T. Let T be a decision tree with k
> 1 leaves, and let LT and RT be the left and right subtrees of T. Show that D( T) = D( LT) + D( RT) + k.
c. Let d( k) be the minimum value of D( T) over all decision trees T with k > 1 leaves. Show that d( k) = min { d( i) + d( k – i) + k : 1 ≤ i ≤ k – 1}.
( Hint: Consider a decision tree T with k leaves that achieves the minimum. Let i 0 be the number of leaves in LT and k – i 0 the number of leaves in RT.)
d. Prove that for a given value of k > 1 and i in the range 1 ≤ i ≤ k – 1, the function i lg i + ( k – i) lg( k – i) is minimized at i = k/2. Conclude that d( k) = Ω ( k lg k).
e. Prove that D( TA) = Ω ( n! lg( n!)), and conclude that the average-case time to sort n elements is Ω( n lg n).
Now consider a randomized comparison sort B. We can extend the decision-tree model to handle randomization by incorporating two
kinds of nodes: ordinary comparison nodes and “randomization”
nodes. A randomization node models a random choice of the form
RANDOM(1, r) made by algorithm B. The node has r children, each of which is equally likely to be chosen during an execution of the
algorithm.
f. Show that for any randomized comparison sort B, there exists a
deterministic comparison sort A whose expected number of
comparisons is no more than those made by B.
8-2 Sorting in place in linear time
You have an array of n data records to sort, each with a key of 0 or 1.
An algorithm for sorting such a set of records might possess some
subset of the following three desirable characteristics:
1. The algorithm runs in O( n) time.
2. The algorithm is stable.
3. The algorithm sorts in place, using no more than a constant
amount of storage space in addition to the original array.
a. Give an algorithm that satisfies criteria 1 and 2 above.
b. Give an algorithm that satisfies criteria 1 and 3 above.
c. Give an algorithm that satisfies criteria 2 and 3 above.
d. Can you use any of your sorting algorithms from parts (a)–(c) as the
sorting method used in line 2 of RADIX-SORT, so that RADIX-
SORT sorts n records with b-bit keys in O( bn) time? Explain how or why not.
e. Suppose that the n records have keys in the range from 1 to k. Show how to modify counting sort so that it sorts the records in place in
O( n + k) time. You may use O( k) storage outside the input array. Is your algorithm stable?
8-3 Sorting variable-length items
a. You are given an array of integers, where different integers may have
different numbers of digits, but the total number of digits over all the
integers in the array is n. Show how to sort the array in O( n) time.
b. You are given an array of strings, where different strings may have
different numbers of characters, but the total number of characters
over all the strings is n. Show how to sort the strings in O( n) time.
(The desired order is the standard alphabetical order: for example, a
< ab < b.)
8-4 Water jugs
You are given n red and n blue water jugs, all of different shapes and sizes. All the red jugs hold different amounts of water, as do all the blue
jugs, and you cannot tell from the size of a jug how much water it holds.
Moreover, for every jug of one color, there is a jug of the other color
that holds the same amount of water.
Your task is to group the jugs into pairs of red and blue jugs that
hold the same amount of water. To do so, you may perform the
following operation: pick a pair of jugs in which one is red and one is
blue, fill the red jug with water, and then pour the water into the blue
jug. This operation tells you whether the red jug or the blue jug can hold
more water, or that they have the same volume. Assume that such a
comparison takes one time unit. Your goal is to find an algorithm that
makes a minimum number of comparisons to determine the grouping.
Remember that you may not directly compare two red jugs or two blue
jugs.
a. Describe a deterministic algorithm that uses Θ( n 2) comparisons to
group the jugs into pairs.
b. Prove a lower bound of Ω( n lg n) for the number of comparisons that an algorithm solving this problem must make.
c. Give a randomized algorithm whose expected number of
comparisons is O( n lg n), and prove that this bound is correct. What is the worst-case number of comparisons for your algorithm?
8-5 Average sorting
Suppose that, instead of sorting an array, we just require that the
elements increase on average. More precisely, we call an n-element array
A k-sorted if, for all i = 1, 2, … , n – k, the following holds: a. What does it mean for an array to be 1-sorted?
b. Give a permutation of the numbers 1, 2, … , 10 that is 2-sorted, but
not sorted.
c. Prove that an n-element array is k-sorted if and only if A[ i] ≤ A[ i + k]
for all i = 1, 2, … , n – k.
d. Give an algorithm that k-sorts an n-element array in O( n lg( n/ k)) time.
We can also show a lower bound on the time to produce a k-sorted
array, when k is a constant.
e. Show how to sort a k-sorted array of length n in O( n lg k) time. ( Hint: Use the solution to Exercise 6.5-11.)
f. Show that when k is a constant, k-sorting an n-element array requires Ω( n lg n) time. ( Hint: Use the solution to part (e) along with the lower bound on comparison sorts.)
8-6 Lower bound on merging sorted lists
The problem of merging two sorted lists arises frequently. We have seen
a procedure for it as the subroutine MERGE in Section 2.3.1. In this problem, you will prove a lower bound of 2 n – 1 on the worst-case
number of comparisons required to merge two sorted lists, each
containing n items. First, you will show a lower bound of 2 n – o( n) comparisons by using a decision tree.
a. Given 2 n numbers, compute the number of possible ways to divide
them into two sorted lists, each with n numbers.
b. Using a decision tree and your answer to part (a), show that any
algorithm that correctly merges two sorted lists must perform at least
2 n – o( n) comparisons.
Now you will show a slightly tighter 2 n – 1 bound.
c. Show that if two elements are consecutive in the sorted order and
from different lists, then they must be compared.
d. Use your answer to part (c) to show a lower bound of 2 n – 1
comparisons for merging two sorted lists.
8-7 The 0-1 sorting lemma and columnsort
A compare-exchange operation on two array elements A[ i] and A[ j], where i < j, has the form
COMPARE-EXCHANGE( A, i, j)
1 if A[ i] > A[ j]
2
exchange A[ i] with A[ j]
After the compare-exchange operation, we know that A[ i] ≤ A[ j].
An oblivious compare-exchange algorithm operates solely by a
sequence of prespecified compare-exchange operations. The indices of
the positions compared in the sequence must be determined in advance,
and although they can depend on the number of elements being sorted,
they cannot depend on the values being sorted, nor can they depend on
the result of any prior compare-exchange operation. For example, the
COMPARE-EXCHANGE-INSERTION-SORT procedure on the
facing page shows a variation of insertion sort as an oblivious compare-
exchange algorithm. (Unlike the INSERTION-SORT procedure on
page 19, the oblivious version runs in Θ( n 2) time in all cases.)
The 0-1 sorting lemma provides a powerful way to prove that an
oblivious compare-exchange algorithm produces a sorted result. It
states that if an oblivious compare-exchange algorithm correctly sorts
all input sequences consisting of only 0s and 1s, then it correctly sorts
all inputs containing arbitrary values.
COMPARE-EXCHANGE-INSERTION-SORT( A, n)
1 for i = 2 to n
2
for j = i – 1 downto 1
3
COMPARE-EXCHANGE( A, j, j + 1)
You will prove the 0-1 sorting lemma by proving its contrapositive: if
an oblivious compare-exchange algorithm fails to sort an input
containing arbitrary values, then it fails to sort some 0-1 input. Assume
that an oblivious compare-exchange algorithm X fails to correctly sort
the array A[1 : n]. Let A[ p] be the smallest value in A that algorithm X
puts into the wrong location, and let A[ q] be the value that algorithm X
moves to the location into which A[ p] should have gone. Define an array B[1 : n] of 0s and 1s as follows:
a. Argue that A[ q] > A[ p], so that B[ p] = 0 and B[ q] = 1.
b. To complete the proof of the 0-1 sorting lemma, prove that algorithm
X fails to sort array B correctly.
Now you will use the 0-1 sorting lemma to prove that a particular
sorting algorithm works correctly. The algorithm, columnsort, works on
a rectangular array of n elements. The array has r rows and s columns (so that n = rs), subject to three restrictions:
r must be even,
s must be a divisor of r, and
r ≥ 2 s 2.
When columnsort completes, the array is sorted in column-major order:
reading down each column in turn, from left to right, the elements
monotonically increase.
Columnsort operates in eight steps, regardless of the value of n. The
odd steps are all the same: sort each column individually. Each even
step is a fixed permutation. Here are the steps:
1. Sort each column.
2. Transpose the array, but reshape it back to r rows and s columns.
In other words, turn the leftmost column into the top r/ s rows, in
order; turn the next column into the next r/ s rows, in order; and
so on.
3. Sort each column.
4. Perform the inverse of the permutation performed in step 2.
5. Sort each column.
6. Shift the top half of each column into the bottom half of the
same column, and shift the bottom half of each column into the
top half of the next column to the right. Leave the top half of the
leftmost column empty. Shift the bottom half of the last column
into the top half of a new rightmost column, and leave the
bottom half of this new column empty.
7. Sort each column.
8. Perform the inverse of the permutation performed in step 6.
Figure 8.5 The steps of columnsort. (a) The input array with 6 rows and 3 columns. (This example does not obey the r ≥ 2 s 2 requirement, but it works.) (b) After sorting each column in step 1. (c) After transposing and reshaping in step 2. (d) After sorting each column in step 3. (e) After performing step 4, which inverts the permutation from step 2. (f) After sorting each column in step 5. (g) After shifting by half a column in step 6. (h) After sorting each column in step 7. (i) After performing step 8, which inverts the permutation from step 6. Steps 6–8 sort the bottom half of each column with the top half of the next column. After step 8, the array is sorted in column-major order.
You can think of steps 6–8 as a single step that sorts the bottom half of
each column and the top half of the next column. Figure 8.5 shows an example of the steps of columnsort with r = 6 and s = 3. (Even though
this example violates the requirement that r ≥ 2 s 2, it happens to work.)
c. Argue that we can treat columnsort as an oblivious compare-exchange algorithm, even if we do not know what sorting method the
odd steps use.
Although it might seem hard to believe that columnsort actually
sorts, you will use the 0-1 sorting lemma to prove that it does. The 0-1
sorting lemma applies because we can treat columnsort as an oblivious
compare-exchange algorithm. A couple of definitions will help you
apply the 0-1 sorting lemma. We say that an area of an array is clean if
we know that it contains either all 0s or all 1s or if it is empty.
Otherwise, the area might contain mixed 0s and 1s, and it is dirty. From
here on, assume that the input array contains only 0s and 1s, and that
we can treat it as an array with r rows and s columns.
d. Prove that after steps 1–3, the array consists of clean rows of 0s at the
top, clean rows of 1s at the bottom, and at most s dirty rows between
them. (One of the clean rows could be empty.)
e. Prove that after step 4, the array, read in column-major order, starts
with a clean area of 0s, ends with a clean area of 1s, and has a dirty
area of at most s 2 elements in the middle. (Again, one of the clean
areas could be empty.)
f. Prove that steps 5–8 produce a fully sorted 0-1 output. Conclude that
columnsort correctly sorts all inputs containing arbitrary values.
g. Now suppose that s does not divide r. Prove that after steps 1–3, the array consists of clean rows of 0s at the top, clean rows of 1s at the
bottom, and at most 2 s –1 dirty rows between them. (Once again, one
of the clean areas could be empty.) How large must r be, compared
with s, for columnsort to correctly sort when s does not divide r?
h. Suggest a simple change to step 1 that allows us to maintain the
requirement that r ≥ 2 s 2 even when s does not divide r, and prove that with your change, columnsort correctly sorts.
Chapter notes
The decision-tree model for studying comparison sorts was introduced
by Ford and Johnson [150]. Knuth’s comprehensive treatise on sorting
[261] covers many variations on the sorting problem, including the information-theoretic lower bound on the complexity of sorting given
here. Ben-Or [46] studied lower bounds for sorting using generalizations of the decision-tree model.
Knuth credits H. H. Seward with inventing counting sort in 1954, as
well as with the idea of combining counting sort with radix sort. Radix
sorting starting with the least significant digit appears to be a folk
algorithm widely used by operators of mechanical card-sorting
machines. According to Knuth, the first published reference to the
method is a 1929 document by L. J. Comrie describing punched-card
equipment. Bucket sorting has been in use since 1956, when the basic
idea was proposed by Isaac and Singleton [235].
Munro and Raman [338] give a stable sorting algorithm that performs O( n 1+ ϵ) comparisons in the worst case, where 0 < ϵ ≤ 1 is any fixed constant. Although any of the O( n lg n)-time algorithms make fewer comparisons, the algorithm by Munro and Raman moves data
only O( n) times and operates in place.
The case of sorting n b-bit integers in o( n lg n) time has been considered by many researchers. Several positive results have been
obtained, each under slightly different assumptions about the model of
computation and the restrictions placed on the algorithm. All the
results assume that the computer memory is divided into addressable b-
bit words. Fredman and Willard [157] introduced the fusion tree data structure and used it to sort n integers in O( n lg n/lg lg n) time. This bound was later improved to
time by Andersson [17]. These
algorithms require the use of multiplication and several precomputed
constants. Andersson, Hagerup, Nilsson, and Raman [18] have shown how to sort n integers in O( n lg lg n) time without using multiplication, but their method requires storage that can be unbounded in terms of n.
Using multiplicative hashing, we can reduce the storage needed to O( n), but then the O( n lg lg n) worst-case bound on the running time becomes an expected-time bound. Generalizing the exponential search trees of
Andersson [17], Thorup [434] gave an O( n(lg lg n)2)-time sorting
algorithm that does not use multiplication or randomization, and it uses linear space. Combining these techniques with some new ideas, Han
[207] improved the bound for sorting to O( n lg lg n lg lg lg n) time.
Although these algorithms are important theoretical breakthroughs,
they are all fairly complicated and at the present time seem unlikely to
compete with existing sorting algorithms in practice.
The columnsort algorithm in Problem 8-7 is by Leighton [286].
1 The choice of r = ⌊lg n⌊ assumes that n > 1. If n ≤ 1, there is nothing to sort.
9 Medians and Order Statistics
The i th order statistic of a set of n elements is the i th smallest element.
For example, the minimum of a set of elements is the first order statistic
( i = 1), and the maximum is the n th order statistic ( i = n). A median, informally, is the “halfway point” of the set. When n is odd, the median
is unique, occurring at i = ( n + 1)/2. When n is even, there are two medians, the lower median occurring at i = n/2 and the upper median occurring at i = n/2 + 1. Thus, regardless of the parity of n, medians occur at i = ⌊( n + 1)/2⌊ and i = ⌈( n + 1)/2⌉. For simplicity in this text, however, we consistently use the phrase “the median” to refer to the
lower median.
This chapter addresses the problem of selecting the i th order statistic
from a set of n distinct numbers. We assume for convenience that the set
contains distinct numbers, although virtually everything that we do
extends to the situation in which a set contains repeated values. We
formally specify the selection problem as follows:
Input: A set A of n distinct numbers1 and an integer i, with 1 ≤ i ≤ n.
Output: The element x ∈ A that is larger than exactly i – 1 other elements of A.
We can solve the selection problem in O( n lg n) time simply by sorting the numbers using heapsort or merge sort and then outputting the i th
element in the sorted array. This chapter presents asymptotically faster
algorithms.
Section 9.1 examines the problem of selecting the minimum and maximum of a set of elements. More interesting is the general selection
problem, which we investigate in the subsequent two sections. Section
9.2 analyzes a practical randomized algorithm that achieves an O( n)
expected running time, assuming distinct elements. Section 9.3 contains an algorithm of more theoretical interest that achieves the O( n) running time in the worst case.
How many comparisons are necessary to determine the minimum of a
set of n elements? To obtain an upper bound of n – 1 comparisons, just
examine each element of the set in turn and keep track of the smallest
element seen so far. The MINIMUM procedure assumes that the set
resides in array A[1 : n].
MINIMUM( A, n)
1 min = A[1]
2 for i = 2 to n
3
if min > A[ i]
4
min = A[ i]
5 return min
It’s no more difficult to find the maximum with n – 1 comparisons.
Is this algorithm for minimum the best we can do? Yes, because it
turns out that there’s a lower bound of n – 1 comparisons for the
problem of determining the minimum. Think of any algorithm that
determines the minimum as a tournament among the elements. Each
comparison is a match in the tournament in which the smaller of the
two elements wins. Since every element except the winner must lose at
least one match, we can conclude that n – 1 comparisons are necessary
to determine the minimum. Hence the algorithm MINIMUM is
optimal with respect to the number of comparisons performed.
Simultaneous minimum and maximum
Some applications need to find both the minimum and the maximum of
a set of n elements. For example, a graphics program may need to scale
a set of ( x, y) data to fit onto a rectangular display screen or other graphical output device. To do so, the program must first determine the
minimum and maximum value of each coordinate.
Of course, we can determine both the minimum and the maximum of
n elements using Θ( n) comparisons. We simply find the minimum and maximum independently, using n – 1 comparisons for each, for a total
of 2 n – 2 = Θ( n) comparisons.
Although 2 n – 2 comparisons is asymptotically optimal, it is possible
to improve the leading constant. We can find both the minimum and the
maximum using at most 3 ⌊ n/2⌊ comparisons. The trick is to maintain
both the minimum and maximum elements seen thus far. Rather than
processing each element of the input by comparing it against the current
minimum and maximum, at a cost of 2 comparisons per element,
process elements in pairs. Compare pairs of elements from the input
first with each other, and then compare the smaller with the current minimum and the larger to the current maximum, at a cost of 3
comparisons for every 2 elements.
How you set up initial values for the current minimum and
maximum depends on whether n is odd or even. If n is odd, set both the
minimum and maximum to the value of the first element, and then
process the rest of the elements in pairs. If n is even, perform 1
comparison on the first 2 elements to determine the initial values of the
minimum and maximum, and then process the rest of the elements in
pairs as in the case for odd n.
Let’s count the total number of comparisons. If n is odd, then 3 ⌊ n/2⌊
comparisons occur. If n is even, 1 initial comparison occurs, followed by
another 3( n – 2)/2 comparisons, for a total of 3 n/2 – 2. Thus, in either case, the total number of comparisons is at most 3 ⌊ n/2⌊.
Exercises
9.1-1
Show that the second smallest of n elements can be found with n + ⌈lg n⌉ – 2 comparisons in the worst case. ( Hint: Also find the smallest element.)
9.1-2
Given n > 2 distinct numbers, you want to find a number that is neither
the minimum nor the maximum. What is the smallest number of
comparisons that you need to perform?
9.1-3
A racetrack can run races with five horses at a time to determine their
relative speeds. For 25 horses, it takes six races to determine the fastest
horse, assuming transitivity (see page 1159). What’s the minimum
number of races it takes to determine the fastest three horses out of 25?
★ 9.1-4
Prove the lower bound of ⌈3 n/2⌉ – 2 comparisons in the worst case to
find both the maximum and minimum of n numbers. ( Hint: Consider how many numbers are potentially either the maximum or minimum,
and investigate how a comparison affects these counts.)
9.2 Selection in expected linear time
The general selection problem—finding the i th order statistic for any value of i—appears more difficult than the simple problem of finding a
minimum. Yet, surprisingly, the asymptotic running time for both
problems is the same: Θ( n). This section presents a divide-and-conquer
algorithm for the selection problem. The algorithm RANDOMIZED-
SELECT is modeled after the quicksort algorithm of Chapter 7. Like quicksort it partitions the input array recursively. But unlike quicksort,
which recursively processes both sides of the partition,
RANDOMIZED-SELECT works on only one side of the partition.
This difference shows up in the analysis: whereas quicksort has an
expected running time of Θ( n lg n), the expected running time of RANDOMIZED-SELECT is Θ( n), assuming that the elements are
distinct.
RANDOMIZED-SELECT uses the procedure RANDOMIZED-
PARTITION introduced in Section 7.3. Like RANDOMIZED-
QUICKSORT, it is a randomized algorithm, since its behavior is
determined in part by the output of a random-number generator. The
RANDOMIZED-SELECT procedure returns the i th smallest element
of the array A[ p : r], where 1 ≤ i ≤ r – p + 1.
RANDOMIZED-SELECT( A, p, r, i)
1 if p == r
2
return A[ p]// 1 ≤ i ≤ r – p + 1 when p == r means that i = 1
3 q = RANDOMIZED-PARTITION( A, p, r)
4 k = q – p + 1
5 if i == k
6
return A[ q]// the pivot value is the answer
7 elseif i < k
8
return RANDOMIZED-SELECT( A, p, q – 1, i)
9 else return RANDOMIZED-SELECT( A, q + 1, r, i – k)
Figure 9.1 illustrates how the RANDOMIZED-SELECT procedure
works. Line 1 checks for the base case of the recursion, in which the
subarray A[ p : r] consists of just one element. In this case, i must equal 1, and line 2 simply returns A[ p] as the i th smallest element. Otherwise, the call to RANDOMIZED-PARTITION in line 3 partitions the array
A[ p : r] into two (possibly empty) subarrays A[ p : q – 1] and A[ q + 1 : r]
such that each element of A[ p : q – 1] is less than or equal to A[ q], which in turn is less than each element of A[ q + 1 : r]. (Although our analysis assumes that the elements are distinct, the procedure still yields the
correct result even if equal elements are present.) As in quicksort, we’ll
refer to A[ q] as the pivot element. Line 4 computes the number k of elements in the subarray A[ p : q], that is, the number of elements in the low side of the partition, plus 1 for the pivot element. Line 5 then
checks whether A[ q] is the i th smallest element. If it is, then line 6
returns A[ q]. Otherwise, the algorithm determines in which of the two
subarrays A[ p: q – 1] and A[ q + 1 : r] the i th smallest element lies. If i < k, then the desired element lies on the low side of the partition, and line
8 recursively selects it from the subarray. If i > k, however, then the desired element lies on the high side of the partition. Since we already
know k values that are smaller than the i th smallest element of A[ p : r]—
namely, the elements of A[ p : q]—the desired element is the ( i – k)th smallest element of A[ q + 1 : r], which line 9 finds recursively. The code appears to allow recursive calls to subarrays with 0 elements, but
Exercise 9.2-1 asks you to show that this situation cannot happen.
Figure 9.1 The action of RANDOMIZED-SELECT as successive partitionings narrow the subarray A[ p: r], showing the values of the parameters p, r, and i at each recursive call. The subarray A[ p : r] in each recursive step is shown in tan, with the dark tan element selected as the pivot for the next partitioning. Blue elements are outside A[ p : r]. The answer is the tan element in the bottom array, where p = r = 5 and i = 1. The array designations A(0), A(1), … , A(5), the partitioning numbers, and whether the partitioning is helpful are explained on the following page.
The worst-case running time for RANDOMIZED-SELECT is
Θ( n 2), even to find the minimum, because it could be extremely unlucky
and always partition around the largest remaining element before
identifying the i th smallest when only one element remains. In this worst
case, each recursive step removes only the pivot from consideration.
Because partitioning n elements takes Θ( n) time, the recurrence for the
worst-case running time is the same as for QUICKSORT: T ( n) = T ( n –
1) + Θ( n), with the solution T ( n) = Θ( n 2). We’ll see that the algorithm has a linear expected running time, however, and because it is
randomized, no particular input elicits the worst-case behavior.
To see the intuition behind the linear expected running time, suppose
that each time the algorithm randomly selects a pivot element, the pivot
lies somewhere within the second and third quartiles—the “middle
half”—of the remaining elements in sorted order. If the i th smallest element is less than the pivot, then all the elements greater than the
pivot are ignored in all future recursive calls. These ignored elements
include at least the uppermost quartile, and possibly more. Likewise, if
the i th smallest element is greater than the pivot, then all the elements
less than the pivot—at least the first quartile—are ignored in all future
recursive calls. Either way, therefore, at least 1/4 of the remaining
elements are ignored in all future recursive calls, leaving at most 3/4 of
the remaining elements in play: residing in the subarray A[ p : r]. Since RANDOMIZED-PARTITION takes Θ( n) time on a subarray of n
elements, the recurrence for the worst-case running time is T ( n) = T
(3 n/4) + Θ( n). By case 3 of the master method (Theorem 4.1 on page 102), this recurrence has solution T ( n) = Θ( n).
Of course, the pivot does not necessarily fall into the middle half
every time. Since the pivot is selected at random, the probability that it
falls into the middle half is about 1/2 each time. We can view the process
of selecting the pivot as a Bernoulli trial (see Section C.4) with success equating to the pivot residing in the middle half. Thus the expected
number of trials needed for success is given by a geometric distribution:
just two trials on average (equation (C.36) on page 1197). In other
words, we expect that half of the partitionings reduce the number of
elements still in play by at least 3/4 and that half of the partitionings do
not help as much. Consequently, the expected number of partitionings
at most doubles from the case when the pivot always falls into the
middle half. The cost of each extra partitioning is less than the one that
preceded it, so that the expected running time is still Θ( n).
To make the above argument rigorous, we start by defining the
random variable A( j) as the set of elements of A that are still in play
after j partitionings (that is, within the subarray A[ p : r] after j calls of RANDOMIZED-SELECT), so that A(0) consists of all the elements in
A. Since each partitioning removes at least one element—the pivot—
from being in play, the sequence | A(0)|, | A(1)|, | A(2)|, … strictly decreases. Set A( j–1) is in play before the j th partitioning, and set A( j) remains in play afterward. For convenience, assume that the initial set
A(0) is the result of a 0th “dummy” partitioning.
Let’s call the j th partitioning helpful if | A( j)| ≤ (3/4)| A( j–1)|. Figure
9.1 shows the sets A( j) and whether partitionings are helpful for an
example array. A helpful partitioning corresponds to a successful
Bernoulli trial. The following lemma shows that a partitioning is at least
as likely to be helpful as not.
Lemma 9.1
A partitioning is helpful with probability at least 1/2.
Proof Whether a partitioning is helpful depends on the randomly
chosen pivot. We discussed the “middle half” in the informal argument
above. Let’s more precisely define the middle half of an n-element
subarray as all but the smallest ⌈ n/4⌉ – 1 and greatest ⌈ n/4⌉ – 1 elements (that is, all but the first ⌈ n/4⌉ – 1 and last ⌈ n/4⌉ – 1 elements if the subarray were sorted). We’ll prove that if the pivot falls into the middle
half, then the pivot leads to a helpful partitioning, and we’ll also prove
that the probability of the pivot falling into the middle half is at least
1/2.Regardless of where the pivot falls, either all the elements greater
than it or all the elements less than it, along with the pivot itself, will no
longer be in play after partitioning. If the pivot falls into the middle
half, therefore, at least ⌈ n/4⌉ – 1 elements less than the pivot or ⌈ n/4⌉ – 1
elements greater than the pivot, plus the pivot, will no longer be in play
after partitioning. That is, at least ⌈ n/4⌉ elements will no longer be in play. The number of elements remaining in play will be at most n –
⌈ n/4⌉, which equals ⌊3 n/4⌊ by Exercise 3.3-2 on page 70. Since ⌊3 n/4⌊ ≤
3 n/4, the partitioning is helpful.


To determine a lower bound on the probability that a randomly
chosen pivot falls into the middle half, we determine an upper bound on
the probability that it does not. That probability is
Thus, the pivot has a probability of at least 1/2 of falling into the middle
half, and so the probability is at least 1/2 that a partitioning is helpful.
▪
We can now bound the expected running time of RANDOMIZED-
SELECT.
Theorem 9.2
The procedure RANDOMIZED-SELECT on an input array of n
distinct elements has an expected running time of Θ( n).
Proof Since not every partitioning is necessarily helpful, let’s give each
partitioning an index starting at 0 and denote by 〈 h 0, h 1, h 2, … , hm〉
the sequence of partitionings that are helpful, so that the hk th
partitioning is helpful for k = 0, 1, 2, … , m. Although the number m of helpful partitionings is a random variable, we can bound it, since after
at most ⌈log4/3 n⌉ helpful partitionings, only one element remains in
play. Consider the dummy 0th partitioning as helpful, so that h 0 = 0.
Denote
by nk, where n 0 = | A(0)| is the original problem size. Since
the hk th partitioning is helpful and the sizes of the sets A( j) strictly decrease, we have
for k = 1, 2, …
, m. By iterating nk ≤ (3/4) nk–1, we have that nk ≤ (3/4) kn 0 for k = 0, 1, 2, … , m.













Figure 9.2 The sets within each generation in the proof of Theorem 9.2. Vertical lines represent the sets, with the height of each line indicating the size of the set, which equals the number of elements in play. Each generation starts with a set
, which is the result of a helpful
partitioning. These sets are drawn in black and are at most 3/4 the size of the sets to their immediate left. Sets drawn in orange are not the first within a generation. A generation may contain just one set. The sets in generation k are
,
. The sets
are defined so that
. If the partitioning gets all the way to
generation hm, set
has at most one element in play.
As Figure 9.2 depicts, we break up the sequence of sets A( j) into m generations consisting of consecutively partitioned sets, starting with the result
of a helpful partitioning and ending with the last set
before the next helpful partitioning, so that the sets in
generation k are
,
. Then for each set of elements
A( j) in the k th generation, we have that
.
Next, we define the random variable
Xk = hk + 1 – hk
for k = 0, 1, 2, … , m – 1. That is, Xk is the number of sets in the k th generation, so that the sets in the k th generation are
,
.
By Lemma 9.1, the probability that a partitioning is helpful is at
least 1/2. The probability is actually even higher, since a partitioning is

helpful even if the pivot does not fall into the middle half but the i th smallest element happens to lie in the smaller side of the partitioning.
We’ll just use the lower bound of 1/2, however, and then equation (C.36)
gives that E [ Xk] ≤ 2 for k = 0, 1, 2, … , m – 1.
Let’s derive an upper bound on how many comparisons are made
altogether during partitioning, since the running time is dominated by
the comparisons. Since we are calculating an upper bound, assume that
the recursion goes all the way until only one element remains in play.
The j th partitioning takes the set A( j–1) of elements in play, and it compares the randomly chosen pivot with all the other | A( j–1)| – 1
elements, so that the j th partitioning makes fewer than | A( j–1)|
comparisons. The sets in the k th generation have sizes
. Thus, the total number of comparisons
during partitioning is less than
Since E [ Xk] ≤ 2, we have that the expected total number of comparisons
during partitioning is less than
Since n 0 is the size of the original array A, we conclude that the expected number of comparisons, and thus the expected running time,
for RANDOMIZED-SELECT is O( n). All n elements are examined in
the first call of RANDOMIZED-PARTITION, giving a lower bound of
Ω( n). Hence the expected running time is Θ( n).
▪