procedure PARTITION produces two subproblems with total size n – 1,

we obtain the recurrence

We guess that T ( n) ≤ cn 2 for some constant c > 0. Substituting this guess into recurrence (7.1) yields

T ( n) ≤ max { cq 2 + c( n – 1 – q)2 : 0 ≤ qn – 1} + Θ( n)

= c · max { q 2 + ( n – 1 – q)2 : 0 ≤ qn – 1} + Θ( n).

Let’s focus our attention on the maximization. For q = 0, 1, … , n

1, we have

q 2 + ( n – 1 – q)2 = q 2 + ( n – 1)2 – 2 q( n – 1) + q 2

= ( n – 1)2 + 2 q( q – ( n – 1))

≤ ( n – 1)2

because qn – 1 implies that 2 q( q – ( n – 1)) ≤ 0. Thus every term in the maximization is bounded by ( n – 1)2.

Continuing with our analysis of T ( n), we obtain

T ( n) ≤ c( n – 1)2 + Θ( n)

cn 2 – c(2 n – 1) + Θ( n)

cn 2,

by picking the constant c large enough that the c(2 n – 1) term dominates the Θ( n) term. Thus T ( n) = O( n 2). Section 7.2 showed a specific case

where quicksort takes Ω( n 2) time: when partitioning is maximally unbalanced. Thus, the worst-case running time of quicksort is Θ( n 2).

7.4.2 Expected running time

We have already seen the intuition behind why the expected running

time of RANDOMIZED-QUICKSORT is O( n lg n): if, in each level of recursion, the split induced by RANDOMIZED-PARTITION puts any

constant fraction of the elements on one side of the partition, then the

recursion tree has depth Θ(lg n) and O( n) work is performed at each level. Even if we add a few new levels with the most unbalanced split

possible between these levels, the total time remains O( n lg n). We can analyze the expected running time of RANDOMIZED-QUICKSORT

precisely by first understanding how the partitioning procedure operates

and then using this understanding to derive an O( n lg n) bound on the expected running time. This upper bound on the expected running time,

combined with the Θ( n lg n) best-case bound we saw in Section 7.2,

yields a Θ( n lg n) expected running time. We assume throughout that the values of the elements being sorted are distinct.

Running time and comparisons

The QUICKSORT and RANDOMIZED-QUICKSORT procedures

differ only in how they select pivot elements. They are the same in all

other respects. We can therefore analyze RANDOMIZED-

QUICKSORT by considering the QUICKSORT and PARTITION

procedures, but with the assumption that pivot elements are selected

randomly from the subarray passed to RANDOMIZED-PARTITION.

Let’s start by relating the asymptotic running time of QUICKSORT to

the number of times elements are compared (all in line 4 of

PARTITION), understanding that this analysis also applies to

RANDOMIZED-QUICKSORT. Note that we are counting the

number of times that array elements are compared, not comparisons of

indices.

Lemma 7.1

The running time of QUICKSORT on an n-element array is O( n + X), where X is the number of element comparisons performed.

Proof The running time of QUICKSORT is dominated by the time

spent in the PARTITION procedure. Each time PARTITION is called,

it selects a pivot element, which is never included in any future recursive

calls to QUICKSORT and PARTITION. Thus, there can be at most n

calls to PARTITION over the entire execution of the quicksort

algorithm. Each time QUICKSORT calls PARTITION, it also

recursively calls itself twice, so there are at most 2 n calls to the QUICKSORT procedure itself.

One call to PARTITION takes O(1) time plus an amount of time

that is proportional to the number of iterations of the for loop in lines

3–6. Each iteration of this for loop performs one comparison in line 4,

comparing the pivot element to another element of the array A.

Therefore, the total time spent in the for loop across all executions is

proportional to X. Since there are at most n calls to PARTITION and

the time spent outside the for loop is O(1) for each call, the total time

spent in PARTITION outside of the for loop is O( n). Thus the total time for quicksort is O( n + X).

Our goal for analyzing RANDOMIZED-QUICKSORT, therefore,

is to compute the expected value E [ X] of the random variable X

denoting the total number of comparisons performed in all calls to

PARTITION. To do so, we must understand when the quicksort

algorithm compares two elements of the array and when it does not. For

ease of analysis, let’s index the elements of the array A by their position

in the sorted output, rather than their position in the input. That is,

although the elements in A may start out in any order, we’ll refer to them by z 1, z 2, … , zn, where z 1 < z 2 < ⋯ < zn, with strict inequality because we assume that all elements are distinct. We denote the set { zi,

zi + 1, … , zj} by Zij.

The next lemma characterizes when two elements are compared.

Lemma 7.2

During the execution of RANDOMIZED-QUICKSORT on an array

of n distinct elements z 1 < z 2 < ⋯ < zn, an element zi is compared with an element zj, where i < j, if and only if one of them is chosen as a pivot before any other element in the set Zij. Moreover, no two elements are

ever compared twice.

Proof Let’s look at the first time that an element xZij is chosen as a pivot during the execution of the algorithm. There are three cases to

consider. If x is neither zi nor zj—that is, zi < x < zj—then zi and zj are not compared at any subsequent time, because they fall into different

sides of the partition around x. If x = zi, then PARTITION compares zi with every other item in Zij. Similarly, if x = zj, then PARTITION

compares zj with every other item in Zij. Thus, zi and zj are compared if and only if the first element to be chosen as a pivot from Zij is either zi or zj. In the latter two cases, where one of zi and zj is chosen as a pivot, since the pivot is removed from future comparisons, it is never

compared again with the other element.

As an example of this lemma, consider an input to quicksort of the

numbers 1 through 10 in some arbitrary order. Suppose that the first

pivot element is 7. Then the first call to PARTITION separates the

numbers into two sets: {1, 2, 3, 4, 5, 6} and {8, 9, 10}. In the process,

the pivot element 7 is compared with all other elements, but no number

from the first set (e.g., 2) is or ever will be compared with any number

from the second set (e.g., 9). The values 7 and 9 are compared because 7

is the first item from Z 7,9 to be chosen as a pivot. In contrast, 2 and 9

are never compared because the first pivot element chosen from Z 2,9 is

7. The next lemma gives the probability that two elements are

compared.

Lemma 7.3

Image 326

Consider an execution of the procedure RANDOMIZED-

QUICKSORT on an array of n distinct elements z 1 < z 2 < ⋯ < zn.

Given two arbitrary elements zi and zj where i < j, the probability that they are compared is 2/( ji + 1).

Proof Let’s look at the tree of recursive calls that RANDOMIZED-

QUICKSORT makes, and consider the sets of elements provided as

input to each call. Initially, the root set contains all the elements of Zij,

since the root set contains every element in A. The elements belonging

to Zij all stay together for each recursive call of RANDOMIZED-

QUICKSORT until PARTITION chooses some element xZij as a

pivot. From that point on, the pivot x appears in no subsequent input

set. The first time that RANDOMIZED-SELECT chooses a pivot x

Zij from a set containing all the elements of Zij, each element in Zij is equally likely to be x because the pivot is chosen uniformly at random.

Since | Zij| = ji + 1, the probability is 1/( ji + 1) that any given element in Zij is the first pivot chosen from Zij. Thus, by Lemma 7.2, we have

Pr { zi is compared with Pr { zi or zj is the first pivot chosen from zj}

= Zij}

= Pr { zi is the first pivot chosen from Zij}

+ Pr { zj is the first pivot chosen from

Zij}

=

,

where the second line follows from the first because the two events are

mutually exclusive.

We can now complete the analysis of randomized quicksort.

Theorem 7.4

Image 327

Image 328

The expected running time of RANDOMIZED-QUICKSORT on an

input of n distinct elements is O( n lg n).

Proof The analysis uses indicator random variables (see Section 5.2).

Let the n distinct elements be z 1 < z 2 < ⋯ < zn, and for 1 ≤ i < jn, define the indicator random variable Xij = I { zi is compared with zj}.

From Lemma 7.2, each pair is compared at most once, and so we can

express X as follows:

By taking expectations of both sides and using linearity of expectation

(equation (C.24) on page 1192) and Lemma 5.1 on page 130, we obtain

We can evaluate this sum using a change of variables ( k = ji) and the bound on the harmonic series in equation (A.9) on page 1142:

Image 329

This bound and Lemma 7.1 allow us to conclude that the expected

running time of RANDOMIZED-QUICKSORT is O( n lg n) (assuming

that the element values are distinct).

Exercises

7.4-1

Show that the recurrence

T ( n) = max { T ( q) + T ( nq – 1) : 0 ≤ qn – 1} + Θ( n) has a lower bound of T ( n) = Ω ( n 2).

7.4-2

Show that quicksort’s best-case running time is Ω( n lg n).

7.4-3

Show that the expression q 2 + ( nq – 1)2 achieves its maximum value over q = 0, 1, … , n – 1 when q = 0 or q = n – 1.

7.4-4

Show that RANDOMIZED-QUICKSORT’s expected running time is

Ω( n lg n).

7.4-5

Coarsening the recursion, as we did in Problem 2-1 for merge sort, is a common way to improve the running time of quicksort in practice. We

modify the base case of the recursion so that if the array has fewer than

k elements, the subarray is sorted by insertion sort, rather than by continued recursive calls to quicksort. Argue that the randomized

version of this sorting algorithm runs in O( nk + n lg( n/ k)) expected time.

How should you pick k, both in theory and in practice?

7.4-6

Consider modifying the PARTITION procedure by randomly picking

three elements from subarray A[ p : r] and partitioning about their median (the middle value of the three elements). Approximate the

probability of getting worse than an α-to-(1– α) split, as a function of α

in the range 0 < α < 1/2.

Problems

7-1 Hoare partition correctness

The version of PARTITION given in this chapter is not the original

partitioning algorithm. Here is the original partitioning algorithm,

which is due to C. A. R. Hoare.

HOARE-PARTITION( A, p, r)

1

x = A[ p]

2

i = p – 1

3

j = r + 1

4

while TRUE

5

repeat

6

j = j – 1

7

until A[ j] ≤ x

8

repeat

9

i = i + 1

10

until A[ i] ≥ x

11

if i < j

12

exchange A[ i] with A[ j]

13

else return j

a. Demonstrate the operation of HOARE-PARTITION on the array A

= 〈13, 19, 9, 5, 12, 8, 7, 4, 11, 2, 6, 21〉, showing the values of the array

and the indices i and j after each iteration of the while loop in lines 4–

13.

b. Describe how the PARTITION procedure in Section 7.1 differs from HOARE-PARTITION when all elements in A[ p : r] are equal.

Describe a practical advantage of HOARE-PARTITION over

PARTITION for use in quicksort.

The next three questions ask you to give a careful argument that the

procedure HOARE-PARTITION is correct. Assuming that the

subarray A[ p : r] contains at least two elements, prove the following: c. The indices i and j are such that the procedure never accesses an element of A outside the subarray A[ p : r].

d. When HOARE-PARTITION terminates, it returns a value j such

that pj < r.

e. Every element of A[ p : j] is less than or equal to every element of A[ j +

1 : r] when HOARE-PARTITION terminates.

The PARTITION procedure in Section 7.1 separates the pivot value (originally in A[ r]) from the two partitions it forms. The HOARE-PARTITION procedure, on the other hand, always places the pivot

value (originally in A[ p]) into one of the two partitions A[ p : j] and A[ j +

1 : r]. Since pj < r, neither partition is empty.

f. Rewrite the QUICKSORT procedure to use HOARE-PARTITION.

7-2 Quicksort with equal element values

The analysis of the expected running time of randomized quicksort in

Section 7.4.2 assumes that all element values are distinct. This problem examines what happens when they are not.

a. Suppose that all element values are equal. What is randomized quicksort’s running time in this case?

b. The PARTITION procedure returns an index q such that each

element of A[ p : q – 1] is less than or equal to A[ q] and each element of A[ q + 1 : r] is greater than A[ q]. Modify the PARTITION procedure to produce a procedure PARTITION′ ( A, p, r), which permutes the elements of A[ p : r] and returns two indices q and t, where pqtr, such that

all elements of A[ q : t] are equal,

each element of A[ p : q – 1] is less than A[ q], and each element of A[ t + 1 : r] is greater than A[ q].

Like PARTITION, your PARTITION′ procedure should take Θ( r

p) time.

c. Modify the RANDOMIZED-PARTITION procedure to call

PARTITION′, and name the new procedure RANDOMIZED-

PARTITION′. Then modify the QUICKSORT procedure to produce

a procedure QUICKSORT′ ( A, p, r) that calls RANDOMIZED-

PARTITION′ and recurses only on partitions where elements are not

known to be equal to each other.

d. Using QUICKSORT′, adjust the analysis in Section 7.4.2 to avoid the assumption that all elements are distinct.

7-3 Alternative quicksort analysis

An alternative analysis of the running time of randomized quicksort

focuses on the expected running time of each individual recursive call to

RANDOMIZED-QUICKSORT, rather than on the number of

comparisons performed. As in the analysis of Section 7.4.2, assume that the values of the elements are distinct.

a. Argue that, given an array of size n, the probability that any

particular element is chosen as the pivot is 1/ n. Use this probability to

define indicator random variables Xi = I { i th smallest element is

chosen as the pivot}. What is E [ Xi]?

Image 330

Image 331

Image 332

b. Let T ( n) be a random variable denoting the running time of quicksort on an array of size n. Argue that

c. Show how to rewrite equation (7.2) as

d. Show that

for n ≥ 2. ( Hint: Split the summation into two parts, one summation

for q = 1, 2, … , ⌈ n/2⌉ – 1 and one summation for q = ⌈ n/2⌉ , … , n

1.)

e. Using the bound from equation (7.4), show that the recurrence in

equation (7.3) has the solution E [ T ( n)] = O( n lg n). ( Hint: Show, by substitution, that E [ T ( n)] ≤ an lg n for sufficiently large n and for some positive constant a.)

7-4 Stooge sort

Professors Howard, Fine, and Howard have proposed a deceptively

simple sorting algorithm, named stooge sort in their honor, appearing

on the following page.

a. Argue that the call STOOGE-SORT( A, 1, n) correctly sorts the array A[1 : n].

b. Give a recurrence for the worst-case running time of STOOGE-SORT

and a tight asymptotic (Θ-notation) bound on the worst-case running

time.

c. Compare the worst-case running time of STOOGE-SORT with that of insertion sort, merge sort, heapsort, and quicksort. Do the

professors deserve tenure?

STOOGE-SORT( A, p, r)

1

if A[ p] > A[ r]

2

exchange A[ p] with A[ r]

3

if p + 1 < r

4

k = ⌊( rp + 1)/3⌊

// round down

5

STOOGE-SORT( A, p,// first two-thirds

rk)

6

STOOGE-SORT( A, p// last two-thirds

+ k, r)

7

STOOGE-SORT( A, p,// first two-thirds

rk)

again

7-5 Stack depth for quicksort

The QUICKSORT procedure of Section 7.1 makes two recursive calls

to itself. After QUICKSORT calls PARTITION, it recursively sorts the

low side of the partition and then it recursively sorts the high side of the

partition. The second recursive call in QUICKSORT is not really

necessary, because the procedure can instead use an iterative control

structure. This transformation technique, called tail-recursion

elimination, is provided automatically by good compilers. Applying tail-

recursion elimination transforms QUICKSORT into the TRE-

QUICKSORT procedure.

TRE-QUICKSORT( A, p, r)

1 while p < r

2

// Partition and then sort the low side.

3

q = PARTITION( A, p, r)

4

TRE-QUICKSORT( A, p, q – 1)

5

p = q + 1

a. Argue that TRE-QUICKSORT( A, 1, n) correctly sorts the array A[1 : n].

Compilers usually execute recursive procedures by using a stack that

contains pertinent information, including the parameter values, for each

recursive call. The information for the most recent call is at the top of

the stack, and the information for the initial call is at the bottom. When

a procedure is called, its information is pushed onto the stack, and when

it terminates, its information is popped. Since we assume that array

parameters are represented by pointers, the information for each

procedure call on the stack requires O(1) stack space. The stack depth is the maximum amount of stack space used at any time during a

computation.

b. Describe a scenario in which TRE-QUICKSORT’s stack depth is

Θ( n) on an n-element input array.

c. Modify TRE-QUICKSORT so that the worst-case stack depth is Θ(lg

n). Maintain the O( n lg n) expected running time of the algorithm.

7-6 Median-of-3 partition

One way to improve the RANDOMIZED-QUICKSORT procedure is

to partition around a pivot that is chosen more carefully than by

picking a random element from the subarray. A common approach is

the median-of-3 method: choose the pivot as the median (middle

element) of a set of 3 elements randomly selected from the subarray.

(See Exercise 7.4-6.) For this problem, assume that the n elements in the

input subarray A[ p : r] are distinct and that n ≥ 3. Denote the sorted version of A[ p : r] by z 1, z 2, … , zn. Using the median-of-3 method to choose the pivot element x, define pi = Pr { x = zi}.

a. Give an exact formula for pi as a function of n and i for i = 2, 3, … , n

– 1. (Observe that p 1 = pn = 0.)

b. By what amount does the median-of-3 method increase the likelihood

of choosing the pivot to be x = z⌊( n + 1)/2⌊, the median of A[ p : r],

Image 333

compared with the ordinary implementation? Assume that n → ∞,

and give the limiting ratio of these probabilities.

c. Suppose that we define a “good” split to mean choosing the pivot as

x = zi, where n/3 ≤ i ≤ 2 n/3. By what amount does the median-of-3

method increase the likelihood of getting a good split compared with

the ordinary implementation? ( Hint: Approximate the sum by an

integral.)

d. Argue that in the Ω( n lg n) running time of quicksort, the median-of-3

method affects only the constant factor.

7-7 Fuzzy sorting of intervals

Consider a sorting problem in which you do not know the numbers

exactly. Instead, for each number, you know an interval on the real line

to which it belongs. That is, you are given n closed intervals of the form

[ ai, bi], where aibi. The goal is to fuzzy-sort these intervals: to produce a permutation 〈 i 1, i 2, … , in〉 of the intervals such that for j = 1, 2, … , n, there exist

satisfying c 1 ≤ c 2 ≤ ⋯ ≤ cn.

a. Design a randomized algorithm for fuzzy-sorting n intervals. Your

algorithm should have the general structure of an algorithm that

quicksorts the left endpoints (the ai values), but it should take

advantage of overlapping intervals to improve the running time. (As

the intervals overlap more and more, the problem of fuzzy-sorting the

intervals becomes progressively easier. Your algorithm should take

advantage of such overlapping, to the extent that it exists.)

b. Argue that your algorithm runs in Θ( n lg n) expected time in general, but runs in Θ( n) expected time when all of the intervals overlap (i.e.,

when there exists a value x such that x ∈ [ ai, bi] for all i). Your algorithm should not be checking for this case explicitly, but rather, its

performance should naturally improve as the amount of overlap

increases.

Chapter notes

Quicksort was invented by Hoare [219], and his version of PARTITION

appears in Problem 7-1. Bentley [51, p. 117] attributes the PARTITION

procedure given in Section 7.1 to N. Lomuto. The analysis in Section 7.4

based on an analysis due to Motwani and Raghavan [336]. Sedgewick

[401] and Bentley [51] provide good references on the details of implementation and how they matter.

McIlroy [323] shows how to engineer a “killer adversary” that produces an array on which virtually any implementation of quicksort

takes Θ( n 2) time.

1 You can enforce the assumption that the values in an array A are distinct at the cost of Θ( n) additional space and only constant overhead in running time by converting each input value A[ i]

to an ordered pair ( A[ i], i) with ( A[ i], i) < ( A[ j], j) if A[ i] < A[ j] or if A[ i] = A[ j] and i < j. There are also more practical variants of quicksort that work well when elements are not distinct.

8 Sorting in Linear Time

We have now seen a handful of algorithms that can sort n numbers in

O( n lg n) time. Whereas merge sort and heapsort achieve this upper bound in the worst case, quicksort achieves it on average. Moreover, for

each of these algorithms, we can produce a sequence of n input numbers

that causes the algorithm to run in Ω( n lg n) time.

These algorithms share an interesting property: the sorted order they

determine is based only on comparisons between the input elements. We

call such sorting algorithms comparison sorts. All the sorting algorithms

introduced thus far are comparison sorts.

In Section 8.1, we’ll prove that any comparison sort must make Ω( n lg n) comparisons in the worst case to sort n elements. Thus, merge sort and heapsort are asymptotically optimal, and no comparison sort exists

that is faster by more than a constant factor.

Sections 8.2, 8.3, and 8.4 examine three sorting algorithms—

counting sort, radix sort, and bucket sort—that run in linear time on

certain types of inputs. Of course, these algorithms use operations other

than comparisons to determine the sorted order. Consequently, the Ω( n

lg n) lower bound does not apply to them.

8.1 Lower bounds for sorting

A comparison sort uses only comparisons between elements to gain

order information about an input sequence 〈 a 1, a 2, … , an〉. That is, given two elements ai and aj, it performs one of the tests ai < aj, aiaj,

Image 334

ai = aj, aiaj, or ai > aj to determine their relative order. It may not inspect the values of the elements or gain order information about them

in any other way.

Since we are proving a lower bound, we assume without loss of

generality in this section that all the input elements are distinct. After

all, a lower bound for distinct elements applies when elements may or

may not be distinct. Consequently, comparisons of the form ai = aj are

useless, which means that we can assume that no comparisons for exact

equality occur. Moreover, the comparisons aiaj, aiaj, ai > aj, and ai

< aj are all equivalent in that they yield identical information about the

relative order of ai and aj. We therefore assume that all comparisons have the form aiaj.

Figure 8.1 The decision tree for insertion sort operating on three elements. An internal node (shown in blue) annotated by i : j indicates a comparison between ai and aj. A leaf annotated by the permutation 〈 π(1), π(2), … , π( n)〉 indicates the ordering (1) ≤ (2) ≤ ⋯ ≤ ( n). The highlighted path indicates the decisions made when sorting the input sequence 〈 a 1 = 6, a 2 = 8, a 3 = 5〉. Going left from the root node, labeled 1:2, indicates that a 1 ≤ a 2. Going right from the node labeled 2:3 indicates that a 2 > a 3. Going right from the node labeled 1:3 indicates that a 1

> a 3. Therefore, we have the ordering a 3 ≤ a 1 ≤ a 2, as indicated in the leaf labeled 〈3, 1, 2〉.

Because the three input elements have 3! = 6 possible permutations, the decision tree must have at least 6 leaves.

The decision-tree model

We can view comparison sorts abstractly in terms of decision trees. A

decision tree is a full binary tree (each node is either a leaf or has both

children) that represents the comparisons between elements that are

performed by a particular sorting algorithm operating on an input of a given size. Control, data movement, and all other aspects of the

algorithm are ignored. Figure 8.1 shows the decision tree corresponding to the insertion sort algorithm from Section 2.1 operating on an input sequence of three elements.

A decision tree has each internal node annotated by i : j for some i and j in the range 1 ≤ i, jn, where n is the number of elements in the input sequence. We also annotate each leaf by a permutation 〈 π(1), π(2),

… , π( n)〉. (See Section C.1 for background on permutations.) Indices in the internal nodes and the leaves always refer to the original positions of

the array elements at the start of the sorting algorithm. The execution of

the comparison sorting algorithm corresponds to tracing a simple path

from the root of the decision tree down to a leaf. Each internal node

indicates a comparison aiaj. The left subtree then dictates subsequent comparisons once we know that aiaj, and the right subtree dictates

subsequent comparisons when ai > aj. Arriving at a leaf, the sorting algorithm has established the ordering (1) ≤ (2) ≤ ⋯ ≤ ( n).

Because any correct sorting algorithm must be able to produce each

permutation of its input, each of the n! permutations on n elements must appear as at least one of the leaves of the decision tree for a

comparison sort to be correct. Furthermore, each of these leaves must

be reachable from the root by a downward path corresponding to an

actual execution of the comparison sort. (We call such leaves

“reachable.”) Thus, we consider only decision trees in which each

permutation appears as a reachable leaf.

A lower bound for the worst case

The length of the longest simple path from the root of a decision tree to

any of its reachable leaves represents the worst-case number of

comparisons that the corresponding sorting algorithm performs.

Consequently, the worst-case number of comparisons for a given

comparison sort algorithm equals the height of its decision tree. A lower

bound on the heights of all decision trees in which each permutation

appears as a reachable leaf is therefore a lower bound on the running

time of any comparison sort algorithm. The following theorem

establishes such a lower bound.

Theorem 8.1

Any comparison sort algorithm requires Ω( n lg n) comparisons in the worst case.

Proof From the preceding discussion, it suffices to determine the height

of a decision tree in which each permutation appears as a reachable leaf.

Consider a decision tree of height h with l reachable leaves

corresponding to a comparison sort on n elements. Because each of the

n! permutations of the input appears as one or more leaves, we have n! ≤

l. Since a binary tree of height h has no more than 2 h leaves, we have n! ≤ l ≤ 2 h,

which, by taking logarithms, implies

h ≥ lg( n!)

(since the lg function is monotonically increasing)

= Ω ( n lg n) (by equation (3.28) on page 67).

Corollary 8.2

Heapsort and merge sort are asymptotically optimal comparison sorts.

Proof The O( n lg n) upper bounds on the running times for heapsort and merge sort match the Ω( n lg n) worst-case lower bound from Theorem 8.1.

Exercises

8.1-1

What is the smallest possible depth of a leaf in a decision tree for a

comparison sort?

8.1-2

Image 335

Obtain asymptotically tight bounds on lg( n!) without using Stirling’s

approximation. Instead, evaluate the summation

using

techniques from Section A.2.

8.1-3

Show that there is no comparison sort whose running time is linear for

at least half of the n! inputs of length n. What about a fraction of 1/ n of the inputs of length n? What about a fraction 1/2 n?

8.1-4

You are given an n-element input sequence, and you know in advance

that it is partly sorted in the following sense. Each element initially in

position i such that i mod 4 = 0 is either already in its correct position, or it is one place away from its correct position. For example, you know

that after sorting, the element initially in position 12 belongs in position

11, 12, or 13. You have no advance information about the other

elements, in positions i where i mod 4 ≠ 0. Show that an Ω( n lg n) lower bound on comparison-based sorting still holds in this case.

8.2 Counting sort

Counting sort assumes that each of the n input elements is an integer in the range 0 to k, for some integer k. It runs in Θ( n + k) time, so that when k = O( n), counting sort runs in Θ( n) time.

Counting sort first determines, for each input element x, the number

of elements less than or equal to x. It then uses this information to place

element x directly into its position in the output array. For example, if

17 elements are less than or equal to x, then x belongs in output position 17. We must modify this scheme slightly to handle the situation

in which several elements have the same value, since we do not want

them all to end up in the same position.

The COUNTING-SORT procedure on the facing page takes as

input an array A[1 : n], the size n of this array, and the limit k on the nonnegative integer values in A. It returns its sorted output in the array

B[1 : n] and uses an array C [0 : k] for temporary working storage.

COUNTING-SORT( A, n, k)

1let B[1 : n] and C [0 : k] be new

arrays

2for i = 0 to k

3

C [ i] = 0

4for j = 1 to n

5

C [ A[ j]] = C [ A[ j]] + 1

6// C [ i] now contains the number of elements equal to i.

7for i = 1 to k

8

C [ i] = C [ i] + C [ i – 1]

9// C [ i] now contains the number of elements less than or

equal to i.

10// Copy A to B, starting from the end of A.

11for j = n downto 1

12

B[ C [ A[ j]]] = A[ j]

13

C [ A[ j]] = C [ A[ j]] – 1

// to handle duplicate

values

14return B

Figure 8.2 illustrates counting sort. After the for loop of lines 2–3

initializes the array C to all zeros, the for loop of lines 4–5 makes a pass

over the array A to inspect each input element. Each time it finds an input element whose value is i, it increments C [ i]. Thus, after line 5, C

[ i] holds the number of input elements equal to i for each integer i = 0, 1, … , k. Lines 7–8 determine for each i = 0, 1, … , k how many input elements are less than or equal to i by keeping a running sum of the array C.

Finally, the for loop of lines 11–13 makes another pass over A, but in

reverse, to place each element A[ j] into its correct sorted position in the output array B. If all n elements are distinct, then when line 11 is first entered, for each A[ j], the value C [ A[ j]] is the correct final position of A[ j] in the output array, since there are C [ A[ j]] elements less than or equal to A[ j]. Because the elements might not be distinct, the loop decrements C [ A[ j]] each time it places a value A[ j] into B. Decrementing C [ A[ j]] causes the previous element in A with a value equal to A[ j], if

Image 336

one exists, to go to the position immediately before A[ j] in the output array B.

How much time does counting sort require? The for loop of lines 2–3

takes Θ( k) time, the for loop of lines 4–5 takes Θ( n) time, the for loop of lines 7–8 takes Θ( k) time, and the for loop of lines 11–13 takes Θ( n) time. Thus, the overall time is Θ( k + n). In practice, we usually use counting sort when we have k = O( n), in which case the running time is Θ( n).

Counting sort can beat the lower bound of Ω( n lg n) proved in

Section 8.1 because it is not a comparison sort. In fact, no comparisons between input elements occur anywhere in the code. Instead, counting

sort uses the actual values of the elements to index into an array. The

Ω( n lg n) lower bound for sorting does not apply when we depart from

the comparison sort model.

Figure 8.2 The operation of COUNTING-SORT on an input array A[1 : 8], where each element of A is a nonnegative integer no larger than k = 5. (a) The array A and the auxiliary array C

after line 5. (b) The array C after line 8. (c)–(e) The output array B and the auxiliary array C

after one, two, and three iterations of the loop in lines 11–13, respectively. Only the tan elements of array B have been filled in. (f) The final sorted output array B.

An important property of counting sort is that it is stable: elements

with the same value appear in the output array in the same order as they

do in the input array. That is, it breaks ties between two elements by the

rule that whichever element appears first in the input array appears first

in the output array. Normally, the property of stability is important only when satellite data are carried around with the element being sorted.

Counting sort’s stability is important for another reason: counting sort

is often used as a subroutine in radix sort. As we shall see in the next

section, in order for radix sort to work correctly, counting sort must be

stable.

Exercises

8.2-1

Using Figure 8.2 as a model, illustrate the operation of COUNTING-

SORT on the array A = 〈6, 0, 2, 0, 1, 3, 4, 6, 1, 3, 2〉.

8.2-2

Prove that COUNTING-SORT is stable.

8.2-3

Suppose that we were to rewrite the for loop header in line 11 of the

COUNTING-SORT as

11for j = 1 to n

Show that the algorithm still works properly, but that it is not stable.

Then rewrite the pseudocode for counting sort so that elements with the

same value are written into the output array in order of increasing index

and the algorithm is stable.

8.2-4

Prove the following loop invariant for COUNTING-SORT:

At the start of each iteration of the for loop of lines 11–13, the

last element in A with value i that has not yet been copied into

B belongs in B[ C [ i]].

8.2-5

Suppose that the array being sorted contains only integers in the range 0

to k and that there are no satellite data to move with those keys. Modify

counting sort to use just the arrays A and C, putting the sorted result back into array A instead of into a new array B.

8.2-6

Describe an algorithm that, given n integers in the range 0 to k, preprocesses its input and then answers any query about how many of

the n integers fall into a range [ a : b] in O(1) time. Your algorithm should use Θ( n + k) preprocessing time.

8.2-7

Counting sort can also work efficiently if the input values have

fractional parts, but the number of digits in the fractional part is small.

Suppose that you are given n numbers in the range 0 to k, each with at

most d decimal (base 10) digits to the right of the decimal point. Modify

counting sort to run in Θ( n + 10 d k) time.

8.3 Radix sort

Radix sort is the algorithm used by the card-sorting machines you now

find only in computer museums. The cards have 80 columns, and in each

column a machine can punch a hole in one of 12 places. The sorter can

be mechanically “programmed” to examine a given column of each card

in a deck and distribute the card into one of 12 bins depending on

which place has been punched. An operator can then gather the cards

bin by bin, so that cards with the first place punched are on top of cards

with the second place punched, and so on.

Image 337

Figure 8.3 The operation of radix sort on seven 3-digit numbers. The leftmost column is the input. The remaining columns show the numbers after successive sorts on increasingly significant digit positions. Tan shading indicates the digit position sorted on to produce each list from the previous one.

For decimal digits, each column uses only 10 places. (The other two

places are reserved for encoding nonnumeric characters.) A d-digit

number occupies a field of d columns. Since the card sorter can look at

only one column at a time, the problem of sorting n cards on a d-digit

number requires a sorting algorithm.

Intuitively, you might sort numbers on their most significant

(leftmost) digit, sort each of the resulting bins recursively, and then

combine the decks in order. Unfortunately, since the cards in 9 of the 10

bins must be put aside to sort each of the bins, this procedure generates

many intermediate piles of cards that you would have to keep track of.

(See Exercise 8.3-6.)

Radix sort solves the problem of card sorting—counterintuitively—

by sorting on the least significant digit first. The algorithm then

combines the cards into a single deck, with the cards in the 0 bin

preceding the cards in the 1 bin preceding the cards in the 2 bin, and so

on. Then it sorts the entire deck again on the second-least significant

digit and recombines the deck in a like manner. The process continues

until the cards have been sorted on all d digits. Remarkably, at that point the cards are fully sorted on the d-digit number. Thus, only d passes through the deck are required to sort. Figure 8.3 shows how radix sort operates on a “deck” of seven 3-digit numbers.

In order for radix sort to work correctly, the digit sorts must be

stable. The sort performed by a card sorter is stable, but the operator

must be careful not to change the order of the cards as they come out of

a bin, even though all the cards in a bin have the same digit in the chosen column.

In a typical computer, which is a sequential random-access machine,

we sometimes use radix sort to sort records of information that are

keyed by multiple fields. For example, we might wish to sort dates by

three keys: year, month, and day. We could run a sorting algorithm with

a comparison function that, given two dates, compares years, and if

there is a tie, compares months, and if another tie occurs, compares

days. Alternatively, we could sort the information three times with a

stable sort: first on day (the “least significant” part), next on month, and

finally on year.

The code for radix sort is straightforward. The RADIX-SORT

procedure assumes that each element in array A[1 : n] has d digits, where digit 1 is the lowest-order digit and digit d is the highest-order digit.

RADIX-SORT( A, n, d)

1 for i = 1 to d

2

use a stable sort to sort array A[1 : n] on

digit i

Although the pseudocode for RADIX-SORT does not specify which

stable sort to use, COUNTING-SORT is commonly used. If you use

COUNTING-SORT as the stable sort, you can make RADIX-SORT a

little more efficient by revising COUNTING-SORT to take a pointer to

the output array as a parameter, having RADIX-SORT preallocate this

array, and alternating input and output between the two arrays in

successive iterations of the for loop in RADIX-SORT.

Lemma 8.3

Given n d-digit numbers in which each digit can take on up to k possible values, RADIX-SORT correctly sorts these numbers in Θ( d( n + k)) time if the stable sort it uses takes Θ( n + k) time.

Proof The correctness of radix sort follows by induction on the column

being sorted (see Exercise 8.3-3). The analysis of the running time

depends on the stable sort used as the intermediate sorting algorithm.

When each digit lies in the range 0 to k – 1 (so that it can take on k possible values), and k is not too large, counting sort is the obvious choice. Each pass over n d-digit numbers then takes Θ( n + k) time.

There are d passes, and so the total time for radix sort is Θ( d( n + k)).

When d is constant and k = O( n), we can make radix sort run in linear time. More generally, we have some flexibility in how to break

each key into digits.

Lemma 8.4

Given n b-bit numbers and any positive integer rb, RADIX-SORT

correctly sorts these numbers in Θ(( b/ r)( n + 2 r)) time if the stable sort it uses takes Θ( n + k) time for inputs in the range 0 to k.

Proof For a value rb, view each key as having d = ⌈ b/ r⌉ digits of r bits each. Each digit is an integer in the range 0 to 2 r – 1, so that we can use

counting sort with k = 2 r – 1. (For example, we can view a 32-bit word

as having four 8-bit digits, so that b = 32, r = 8, k = 2 r – 1 = 255, and d

= b/ r = 4.) Each pass of counting sort takes Θ( n + k) = Θ( n + 2 r) time and there are d passes, for a total running time of Θ( d( n + 2 r)) = Θ(( b/ r) ( n + 2 r)).

Given n and b, what value of rb minimizes the expression ( b/ r)( n +

2 r)? As r decreases, the factor b/ r increases, but as r increases so does 2 r.

The answer depends on whether b < ⌊lg n⌊. If b < ⌊lg n⌊, then rb implies ( n + 2 r) = Θ( n). Thus, choosing r = b yields a running time of ( b/ b)( n + 2 b) = Θ( n), which is asymptotically optimal. If b ≥ ⌊lg n⌊, then choosing r = ⌊lg n⌊ gives the best running time to within a constant factor, which we can see as follows. 1 Choosing r = ⌊lg n⌊ yields a running time of Θ( bn/lg n). As r increases above ⌊lg n⌊, the 2 r term in the

numerator increases faster than the r term in the denominator, and so increasing r above ⌊lg n⌊ yields a running time of Ω( bn / lg n). If instead r were to decrease below ⌊lg n⌊, then the b/ r term increases and the n + 2 r term remains at Θ( n).

Is radix sort preferable to a comparison-based sorting algorithm,

such as quicksort? If b = O(lg n), as is often the case, and r ≈ lg n, then radix sort’s running time is Θ( n), which appears to be better than

quicksort’s expected running time of Θ( n lg n). The constant factors hidden in the Θ-notation differ, however. Although radix sort may make

fewer passes than quicksort over the n keys, each pass of radix sort may

take significantly longer. Which sorting algorithm to prefer depends on

the characteristics of the implementations, of the underlying machine

(e.g., quicksort often uses hardware caches more effectively than radix

sort), and of the input data. Moreover, the version of radix sort that

uses counting sort as the intermediate stable sort does not sort in place,

which many of the Θ( n lg n)-time comparison sorts do. Thus, when primary memory storage is at a premium, an in-place algorithm such as

quicksort could be the better choice.

Exercises

8.3-1

Using Figure 8.3 as a model, illustrate the operation of RADIX-SORT

on the following list of English words: COW, DOG, SEA, RUG, ROW,

MOB, BOX, TAB, BAR, EAR, TAR, DIG, BIG, TEA, NOW, FOX.

8.3-2

Which of the following sorting algorithms are stable: insertion sort,

merge sort, heapsort, and quicksort? Give a simple scheme that makes

any comparison sort stable. How much additional time and space does

your scheme entail?

8.3-3

Use induction to prove that radix sort works. Where does your proof

need the assumption that the intermediate sort is stable?

8.3-4

Suppose that COUNTING-SORT is used as the stable sort within

RADIX-SORT. If RADIX-SORT calls COUNTING-SORT d times,

then since each call of COUNTING-SORT makes two passes over the

data (lines 4–5 and 11–13), altogether 2 d passes over the data occur.

Describe how to reduce the total number of passes to d + 1.

8.3-5

Show how to sort n integers in the range 0 to n 3 – 1 in O( n) time.

8.3-6

In the first card-sorting algorithm in this section, which sorts on the

most significant digit first, exactly how many sorting passes are needed

to sort d-digit decimal numbers in the worst case? How many piles of

cards does an operator need to keep track of in the worst case?

8.4 Bucket sort

Bucket sort assumes that the input is drawn from a uniform distribution

and has an average-case running time of O( n). Like counting sort, bucket sort is fast because it assumes something about the input.

Whereas counting sort assumes that the input consists of integers in a

small range, bucket sort assumes that the input is generated by a

random process that distributes elements uniformly and independently

over the interval [0, 1). (See Section C.2 for a definition of a uniform distribution.)

Bucket sort divides the interval [0, 1) into n equal-sized subintervals,

or buckets, and then distributes the n input numbers into the buckets.

Since the inputs are uniformly and independently distributed over [0, 1),

we do not expect many numbers to fall into each bucket. To produce the

output, we simply sort the numbers in each bucket and then go through

the buckets in order, listing the elements in each.

The BUCKET-SORT procedure on the next page assumes that the

input is an array A[1 : n] and that each element A[ i] in the array satisfies 0 ≤ A[ i] < 1. The code requires an auxiliary array B[0 : n – 1] of linked lists (buckets) and assumes that there is a mechanism for maintaining

Image 338

such lists. (Section 10.2 describes how to implement basic operations on linked lists.) Figure 8.4 shows the operation of bucket sort on an input array of 10 numbers.

Figure 8.4 The operation of BUCKET-SORT for n = 10. (a) The input array A[1 : 10]. (b) The array B[0 : 9] of sorted lists (buckets) after line 7 of the algorithm, with slashes indicating the end of each bucket. Bucket i holds values in the half-open interval [ i/10, ( i + 1)/10). The sorted output consists of a concatenation of the lists B[0], B[1], … , B[9] in order.

BUCKET-SORT( A, n)

1 let B[0 : n – 1] be a new array

2 for i = 0 to n – 1

3

make B[ i] an empty list

4 for i = 1 to n

5

insert A[ i] into list B[⌊ n · A[ i]⌊]

6 for i = 0 to n – 1

7

sort list B[ i] with insertion sort

8 concatenate the lists B[0], B[1], … , B[ n – 1] together in order 9 return the concatenated lists

To see that this algorithm works, consider two elements A[ i] and A[ j].

Assume without loss of generality that A[ i] ≤ A[ j]. Since ⌊ n · A[ i]⌊ ≤ ⌊ n ·

A[ j]⌊, either element A[ i] goes into the same bucket as A[ j] or it goes into

Image 339

Image 340

Image 341

Image 342

a bucket with a lower index. If A[ i] and A[ j] go into the same bucket, then the for loop of lines 6–7 puts them into the proper order. If A[ i]

and A[ j] go into different buckets, then line 8 puts them into the proper order. Therefore, bucket sort works correctly.

To analyze the running time, observe that, together, all lines except

line 7 take O( n) time in the worst case. We need to analyze the total time taken by the n calls to insertion sort in line 7.

To analyze the cost of the calls to insertion sort, let ni be the random

variable denoting the number of elements placed in bucket B[ i]. Since insertion sort runs in quadratic time (see Section 2.2), the running time of bucket sort is

We now analyze the average-case running time of bucket sort, by

computing the expected value of the running time, where we take the

expectation over the input distribution. Taking expectations of both

sides and using linearity of expectation (equation (C.24) on page 1192),

we have

We claim that

for i = 0, 1, … , n – 1. It is no surprise that each bucket i has the same value of

, since each value in the input array A is equally likely to

fall in any bucket.

Image 343

To prove equation (8.3), view each random variable ni as the number

of successes in n Bernoulli trials (see Section C.4). Success in a trial occurs when an element goes into bucket B[ i], with a probability p = 1/ n of success and q = 1 – 1/ n of failure. A binomial distribution counts ni, the number of successes, in the n trials. By equations (C.41) and (C.44)

on pages 1199–1200, we have E [ ni] = np = n(1/ n) = 1 and Var [ ni] = npq

= 1 – 1/ n. Equation (C.31) on page 1194 gives

= Var [ ni] + E2 [ ni]

= (1 – 1/ n) + 12

= 2 – 1/ n,

which proves equation (8.3). Using this expected value in equation (8.2),

we get that the average-case running time for bucket sort is Θ( n) + n ·

O(2 – 1/ n) = Θ( n).

Even if the input is not drawn from a uniform distribution, bucket

sort may still run in linear time. As long as the input has the property

that the sum of the squares of the bucket sizes is linear in the total

number of elements, equation (8.1) tells us that bucket sort runs in

linear time.

Exercises

8.4-1

Using Figure 8.4 as a model, illustrate the operation of BUCKET-SORT on the array A = 〈.79, .13, .16, .64, .39, .20, .89, .53, .71, .42〉.

8.4-2

Explain why the worst-case running time for bucket sort is Θ( n 2). What

simple change to the algorithm preserves its linear average-case running

time and makes its worst-case running time O( n lg n)?

8.4-3

Let X be a random variable that is equal to the number of heads in two

flips of a fair coin. What is E [ X 2]? What is E2 [ X]?

Image 344

Image 345

Image 346

8.4-4

An array A of size n > 10 is filled in the following way. For each element A[ i], choose two random variables xi and yi uniformly and independently from [0, 1). Then set

Modify bucket sort so that it sorts the array A in O( n) expected time.

8.4-5

You are given n points in the unit disk, pi = ( xi, yi), such that for i = 1, 2, … , n. Suppose that the points are uniformly

distributed, that is, the probability of finding a point in any region of the

disk is proportional to the area of that region. Design an algorithm with

an average-case running time of Θ( n) to sort the n points by their distances

from the origin. ( Hint: Design the bucket sizes

in BUCKET-SORT to reflect the uniform distribution of the points in

the unit disk.)

8.4-6

A probability distribution function P( x) for a random variable X is defined by P( x) = Pr { Xx}. Suppose that you draw a list of n random variables X 1, X 2, … , Xn from a continuous probability distribution function P that is computable in O(1) time (given y you can find x such that P( x) = y in O(1) time). Give an algorithm that sorts these numbers in linear average-case time.

Problems

8-1 Probabilistic lower bounds on comparison sorting

In this problem, you will prove a probabilistic Ω( n lg n) lower bound on the running time of any deterministic or randomized comparison sort

on n distinct input elements. You’ll begin by examining a deterministic

comparison sort A with decision tree TA. Assume that every permutation of A’s inputs is equally likely.

a. Suppose that each leaf of TA is labeled with the probability that it is reached given a random input. Prove that exactly n! leaves are labeled

1/ n! and that the rest are labeled 0.

b. Let D( T) denote the external path length of a decision tree T—the sum of the depths of all the leaves of T. Let T be a decision tree with k

> 1 leaves, and let LT and RT be the left and right subtrees of T. Show that D( T) = D( LT) + D( RT) + k.

c. Let d( k) be the minimum value of D( T) over all decision trees T with k > 1 leaves. Show that d( k) = min { d( i) + d( ki) + k : 1 ≤ ik – 1}.

( Hint: Consider a decision tree T with k leaves that achieves the minimum. Let i 0 be the number of leaves in LT and ki 0 the number of leaves in RT.)

d. Prove that for a given value of k > 1 and i in the range 1 ≤ ik – 1, the function i lg i + ( ki) lg( ki) is minimized at i = k/2. Conclude that d( k) = Ω ( k lg k).

e. Prove that D( TA) = Ω ( n! lg( n!)), and conclude that the average-case time to sort n elements is Ω( n lg n).

Now consider a randomized comparison sort B. We can extend the decision-tree model to handle randomization by incorporating two

kinds of nodes: ordinary comparison nodes and “randomization”

nodes. A randomization node models a random choice of the form

RANDOM(1, r) made by algorithm B. The node has r children, each of which is equally likely to be chosen during an execution of the

algorithm.

f. Show that for any randomized comparison sort B, there exists a

deterministic comparison sort A whose expected number of

comparisons is no more than those made by B.

8-2 Sorting in place in linear time

You have an array of n data records to sort, each with a key of 0 or 1.

An algorithm for sorting such a set of records might possess some

subset of the following three desirable characteristics:

1. The algorithm runs in O( n) time.

2. The algorithm is stable.

3. The algorithm sorts in place, using no more than a constant

amount of storage space in addition to the original array.

a. Give an algorithm that satisfies criteria 1 and 2 above.

b. Give an algorithm that satisfies criteria 1 and 3 above.

c. Give an algorithm that satisfies criteria 2 and 3 above.

d. Can you use any of your sorting algorithms from parts (a)–(c) as the

sorting method used in line 2 of RADIX-SORT, so that RADIX-

SORT sorts n records with b-bit keys in O( bn) time? Explain how or why not.

e. Suppose that the n records have keys in the range from 1 to k. Show how to modify counting sort so that it sorts the records in place in

O( n + k) time. You may use O( k) storage outside the input array. Is your algorithm stable?

8-3 Sorting variable-length items

a. You are given an array of integers, where different integers may have

different numbers of digits, but the total number of digits over all the

integers in the array is n. Show how to sort the array in O( n) time.

b. You are given an array of strings, where different strings may have

different numbers of characters, but the total number of characters

over all the strings is n. Show how to sort the strings in O( n) time.

(The desired order is the standard alphabetical order: for example, a

< ab < b.)

8-4 Water jugs

Image 347

You are given n red and n blue water jugs, all of different shapes and sizes. All the red jugs hold different amounts of water, as do all the blue

jugs, and you cannot tell from the size of a jug how much water it holds.

Moreover, for every jug of one color, there is a jug of the other color

that holds the same amount of water.

Your task is to group the jugs into pairs of red and blue jugs that

hold the same amount of water. To do so, you may perform the

following operation: pick a pair of jugs in which one is red and one is

blue, fill the red jug with water, and then pour the water into the blue

jug. This operation tells you whether the red jug or the blue jug can hold

more water, or that they have the same volume. Assume that such a

comparison takes one time unit. Your goal is to find an algorithm that

makes a minimum number of comparisons to determine the grouping.

Remember that you may not directly compare two red jugs or two blue

jugs.

a. Describe a deterministic algorithm that uses Θ( n 2) comparisons to

group the jugs into pairs.

b. Prove a lower bound of Ω( n lg n) for the number of comparisons that an algorithm solving this problem must make.

c. Give a randomized algorithm whose expected number of

comparisons is O( n lg n), and prove that this bound is correct. What is the worst-case number of comparisons for your algorithm?

8-5 Average sorting

Suppose that, instead of sorting an array, we just require that the

elements increase on average. More precisely, we call an n-element array

A k-sorted if, for all i = 1, 2, … , nk, the following holds: a. What does it mean for an array to be 1-sorted?

b. Give a permutation of the numbers 1, 2, … , 10 that is 2-sorted, but

not sorted.

c. Prove that an n-element array is k-sorted if and only if A[ i] ≤ A[ i + k]

for all i = 1, 2, … , nk.

d. Give an algorithm that k-sorts an n-element array in O( n lg( n/ k)) time.

We can also show a lower bound on the time to produce a k-sorted

array, when k is a constant.

e. Show how to sort a k-sorted array of length n in O( n lg k) time. ( Hint: Use the solution to Exercise 6.5-11.)

f. Show that when k is a constant, k-sorting an n-element array requires Ω( n lg n) time. ( Hint: Use the solution to part (e) along with the lower bound on comparison sorts.)

8-6 Lower bound on merging sorted lists

The problem of merging two sorted lists arises frequently. We have seen

a procedure for it as the subroutine MERGE in Section 2.3.1. In this problem, you will prove a lower bound of 2 n – 1 on the worst-case

number of comparisons required to merge two sorted lists, each

containing n items. First, you will show a lower bound of 2 no( n) comparisons by using a decision tree.

a. Given 2 n numbers, compute the number of possible ways to divide

them into two sorted lists, each with n numbers.

b. Using a decision tree and your answer to part (a), show that any

algorithm that correctly merges two sorted lists must perform at least

2 no( n) comparisons.

Now you will show a slightly tighter 2 n – 1 bound.

c. Show that if two elements are consecutive in the sorted order and

from different lists, then they must be compared.

d. Use your answer to part (c) to show a lower bound of 2 n – 1

comparisons for merging two sorted lists.

8-7 The 0-1 sorting lemma and columnsort

A compare-exchange operation on two array elements A[ i] and A[ j], where i < j, has the form

COMPARE-EXCHANGE( A, i, j)

1 if A[ i] > A[ j]

2

exchange A[ i] with A[ j]

After the compare-exchange operation, we know that A[ i] ≤ A[ j].

An oblivious compare-exchange algorithm operates solely by a

sequence of prespecified compare-exchange operations. The indices of

the positions compared in the sequence must be determined in advance,

and although they can depend on the number of elements being sorted,

they cannot depend on the values being sorted, nor can they depend on

the result of any prior compare-exchange operation. For example, the

COMPARE-EXCHANGE-INSERTION-SORT procedure on the

facing page shows a variation of insertion sort as an oblivious compare-

exchange algorithm. (Unlike the INSERTION-SORT procedure on

page 19, the oblivious version runs in Θ( n 2) time in all cases.)

The 0-1 sorting lemma provides a powerful way to prove that an

oblivious compare-exchange algorithm produces a sorted result. It

states that if an oblivious compare-exchange algorithm correctly sorts

all input sequences consisting of only 0s and 1s, then it correctly sorts

all inputs containing arbitrary values.

COMPARE-EXCHANGE-INSERTION-SORT( A, n)

1 for i = 2 to n

2

for j = i – 1 downto 1

3

COMPARE-EXCHANGE( A, j, j + 1)

You will prove the 0-1 sorting lemma by proving its contrapositive: if

an oblivious compare-exchange algorithm fails to sort an input

containing arbitrary values, then it fails to sort some 0-1 input. Assume

that an oblivious compare-exchange algorithm X fails to correctly sort

the array A[1 : n]. Let A[ p] be the smallest value in A that algorithm X

Image 348

puts into the wrong location, and let A[ q] be the value that algorithm X

moves to the location into which A[ p] should have gone. Define an array B[1 : n] of 0s and 1s as follows:

a. Argue that A[ q] > A[ p], so that B[ p] = 0 and B[ q] = 1.

b. To complete the proof of the 0-1 sorting lemma, prove that algorithm

X fails to sort array B correctly.

Now you will use the 0-1 sorting lemma to prove that a particular

sorting algorithm works correctly. The algorithm, columnsort, works on

a rectangular array of n elements. The array has r rows and s columns (so that n = rs), subject to three restrictions:

r must be even,

s must be a divisor of r, and

r ≥ 2 s 2.

When columnsort completes, the array is sorted in column-major order:

reading down each column in turn, from left to right, the elements

monotonically increase.

Columnsort operates in eight steps, regardless of the value of n. The

odd steps are all the same: sort each column individually. Each even

step is a fixed permutation. Here are the steps:

1. Sort each column.

2. Transpose the array, but reshape it back to r rows and s columns.

In other words, turn the leftmost column into the top r/ s rows, in

order; turn the next column into the next r/ s rows, in order; and

so on.

3. Sort each column.

4. Perform the inverse of the permutation performed in step 2.

5. Sort each column.

Image 349

6. Shift the top half of each column into the bottom half of the

same column, and shift the bottom half of each column into the

top half of the next column to the right. Leave the top half of the

leftmost column empty. Shift the bottom half of the last column

into the top half of a new rightmost column, and leave the

bottom half of this new column empty.

7. Sort each column.

8. Perform the inverse of the permutation performed in step 6.

Figure 8.5 The steps of columnsort. (a) The input array with 6 rows and 3 columns. (This example does not obey the r ≥ 2 s 2 requirement, but it works.) (b) After sorting each column in step 1. (c) After transposing and reshaping in step 2. (d) After sorting each column in step 3. (e) After performing step 4, which inverts the permutation from step 2. (f) After sorting each column in step 5. (g) After shifting by half a column in step 6. (h) After sorting each column in step 7. (i) After performing step 8, which inverts the permutation from step 6. Steps 6–8 sort the bottom half of each column with the top half of the next column. After step 8, the array is sorted in column-major order.

You can think of steps 6–8 as a single step that sorts the bottom half of

each column and the top half of the next column. Figure 8.5 shows an example of the steps of columnsort with r = 6 and s = 3. (Even though

this example violates the requirement that r ≥ 2 s 2, it happens to work.)

c. Argue that we can treat columnsort as an oblivious compare-exchange algorithm, even if we do not know what sorting method the

odd steps use.

Although it might seem hard to believe that columnsort actually

sorts, you will use the 0-1 sorting lemma to prove that it does. The 0-1

sorting lemma applies because we can treat columnsort as an oblivious

compare-exchange algorithm. A couple of definitions will help you

apply the 0-1 sorting lemma. We say that an area of an array is clean if

we know that it contains either all 0s or all 1s or if it is empty.

Otherwise, the area might contain mixed 0s and 1s, and it is dirty. From

here on, assume that the input array contains only 0s and 1s, and that

we can treat it as an array with r rows and s columns.

d. Prove that after steps 1–3, the array consists of clean rows of 0s at the

top, clean rows of 1s at the bottom, and at most s dirty rows between

them. (One of the clean rows could be empty.)

e. Prove that after step 4, the array, read in column-major order, starts

with a clean area of 0s, ends with a clean area of 1s, and has a dirty

area of at most s 2 elements in the middle. (Again, one of the clean

areas could be empty.)

f. Prove that steps 5–8 produce a fully sorted 0-1 output. Conclude that

columnsort correctly sorts all inputs containing arbitrary values.

g. Now suppose that s does not divide r. Prove that after steps 1–3, the array consists of clean rows of 0s at the top, clean rows of 1s at the

bottom, and at most 2 s –1 dirty rows between them. (Once again, one

of the clean areas could be empty.) How large must r be, compared

with s, for columnsort to correctly sort when s does not divide r?

h. Suggest a simple change to step 1 that allows us to maintain the

requirement that r ≥ 2 s 2 even when s does not divide r, and prove that with your change, columnsort correctly sorts.

Chapter notes

Image 350

The decision-tree model for studying comparison sorts was introduced

by Ford and Johnson [150]. Knuth’s comprehensive treatise on sorting

[261] covers many variations on the sorting problem, including the information-theoretic lower bound on the complexity of sorting given

here. Ben-Or [46] studied lower bounds for sorting using generalizations of the decision-tree model.

Knuth credits H. H. Seward with inventing counting sort in 1954, as

well as with the idea of combining counting sort with radix sort. Radix

sorting starting with the least significant digit appears to be a folk

algorithm widely used by operators of mechanical card-sorting

machines. According to Knuth, the first published reference to the

method is a 1929 document by L. J. Comrie describing punched-card

equipment. Bucket sorting has been in use since 1956, when the basic

idea was proposed by Isaac and Singleton [235].

Munro and Raman [338] give a stable sorting algorithm that performs O( n 1+ ϵ) comparisons in the worst case, where 0 < ϵ ≤ 1 is any fixed constant. Although any of the O( n lg n)-time algorithms make fewer comparisons, the algorithm by Munro and Raman moves data

only O( n) times and operates in place.

The case of sorting n b-bit integers in o( n lg n) time has been considered by many researchers. Several positive results have been

obtained, each under slightly different assumptions about the model of

computation and the restrictions placed on the algorithm. All the

results assume that the computer memory is divided into addressable b-

bit words. Fredman and Willard [157] introduced the fusion tree data structure and used it to sort n integers in O( n lg n/lg lg n) time. This bound was later improved to

time by Andersson [17]. These

algorithms require the use of multiplication and several precomputed

constants. Andersson, Hagerup, Nilsson, and Raman [18] have shown how to sort n integers in O( n lg lg n) time without using multiplication, but their method requires storage that can be unbounded in terms of n.

Using multiplicative hashing, we can reduce the storage needed to O( n), but then the O( n lg lg n) worst-case bound on the running time becomes an expected-time bound. Generalizing the exponential search trees of

Andersson [17], Thorup [434] gave an O( n(lg lg n)2)-time sorting

algorithm that does not use multiplication or randomization, and it uses linear space. Combining these techniques with some new ideas, Han

[207] improved the bound for sorting to O( n lg lg n lg lg lg n) time.

Although these algorithms are important theoretical breakthroughs,

they are all fairly complicated and at the present time seem unlikely to

compete with existing sorting algorithms in practice.

The columnsort algorithm in Problem 8-7 is by Leighton [286].

1 The choice of r = ⌊lg n⌊ assumes that n > 1. If n ≤ 1, there is nothing to sort.

9 Medians and Order Statistics

The i th order statistic of a set of n elements is the i th smallest element.

For example, the minimum of a set of elements is the first order statistic

( i = 1), and the maximum is the n th order statistic ( i = n). A median, informally, is the “halfway point” of the set. When n is odd, the median

is unique, occurring at i = ( n + 1)/2. When n is even, there are two medians, the lower median occurring at i = n/2 and the upper median occurring at i = n/2 + 1. Thus, regardless of the parity of n, medians occur at i = ⌊( n + 1)/2⌊ and i = ⌈( n + 1)/2⌉. For simplicity in this text, however, we consistently use the phrase “the median” to refer to the

lower median.

This chapter addresses the problem of selecting the i th order statistic

from a set of n distinct numbers. We assume for convenience that the set

contains distinct numbers, although virtually everything that we do

extends to the situation in which a set contains repeated values. We

formally specify the selection problem as follows:

Input: A set A of n distinct numbers1 and an integer i, with 1 ≤ in.

Output: The element xA that is larger than exactly i – 1 other elements of A.

We can solve the selection problem in O( n lg n) time simply by sorting the numbers using heapsort or merge sort and then outputting the i th

element in the sorted array. This chapter presents asymptotically faster

algorithms.

Section 9.1 examines the problem of selecting the minimum and maximum of a set of elements. More interesting is the general selection

problem, which we investigate in the subsequent two sections. Section

9.2 analyzes a practical randomized algorithm that achieves an O( n)

expected running time, assuming distinct elements. Section 9.3 contains an algorithm of more theoretical interest that achieves the O( n) running time in the worst case.

9.1 Minimum and maximum

How many comparisons are necessary to determine the minimum of a

set of n elements? To obtain an upper bound of n – 1 comparisons, just

examine each element of the set in turn and keep track of the smallest

element seen so far. The MINIMUM procedure assumes that the set

resides in array A[1 : n].

MINIMUM( A, n)

1 min = A[1]

2 for i = 2 to n

3

if min > A[ i]

4

min = A[ i]

5 return min

It’s no more difficult to find the maximum with n – 1 comparisons.

Is this algorithm for minimum the best we can do? Yes, because it

turns out that there’s a lower bound of n – 1 comparisons for the

problem of determining the minimum. Think of any algorithm that

determines the minimum as a tournament among the elements. Each

comparison is a match in the tournament in which the smaller of the

two elements wins. Since every element except the winner must lose at

least one match, we can conclude that n – 1 comparisons are necessary

to determine the minimum. Hence the algorithm MINIMUM is

optimal with respect to the number of comparisons performed.

Simultaneous minimum and maximum

Some applications need to find both the minimum and the maximum of

a set of n elements. For example, a graphics program may need to scale

a set of ( x, y) data to fit onto a rectangular display screen or other graphical output device. To do so, the program must first determine the

minimum and maximum value of each coordinate.

Of course, we can determine both the minimum and the maximum of

n elements using Θ( n) comparisons. We simply find the minimum and maximum independently, using n – 1 comparisons for each, for a total

of 2 n – 2 = Θ( n) comparisons.

Although 2 n – 2 comparisons is asymptotically optimal, it is possible

to improve the leading constant. We can find both the minimum and the

maximum using at most 3 ⌊ n/2⌊ comparisons. The trick is to maintain

both the minimum and maximum elements seen thus far. Rather than

processing each element of the input by comparing it against the current

minimum and maximum, at a cost of 2 comparisons per element,

process elements in pairs. Compare pairs of elements from the input

first with each other, and then compare the smaller with the current minimum and the larger to the current maximum, at a cost of 3

comparisons for every 2 elements.

How you set up initial values for the current minimum and

maximum depends on whether n is odd or even. If n is odd, set both the

minimum and maximum to the value of the first element, and then

process the rest of the elements in pairs. If n is even, perform 1

comparison on the first 2 elements to determine the initial values of the

minimum and maximum, and then process the rest of the elements in

pairs as in the case for odd n.

Let’s count the total number of comparisons. If n is odd, then 3 ⌊ n/2⌊

comparisons occur. If n is even, 1 initial comparison occurs, followed by

another 3( n – 2)/2 comparisons, for a total of 3 n/2 – 2. Thus, in either case, the total number of comparisons is at most 3 ⌊ n/2⌊.

Exercises

9.1-1

Show that the second smallest of n elements can be found with n + ⌈lg n⌉ – 2 comparisons in the worst case. ( Hint: Also find the smallest element.)

9.1-2

Given n > 2 distinct numbers, you want to find a number that is neither

the minimum nor the maximum. What is the smallest number of

comparisons that you need to perform?

9.1-3

A racetrack can run races with five horses at a time to determine their

relative speeds. For 25 horses, it takes six races to determine the fastest

horse, assuming transitivity (see page 1159). What’s the minimum

number of races it takes to determine the fastest three horses out of 25?

9.1-4

Prove the lower bound of ⌈3 n/2⌉ – 2 comparisons in the worst case to

find both the maximum and minimum of n numbers. ( Hint: Consider how many numbers are potentially either the maximum or minimum,

and investigate how a comparison affects these counts.)

9.2 Selection in expected linear time

The general selection problem—finding the i th order statistic for any value of i—appears more difficult than the simple problem of finding a

minimum. Yet, surprisingly, the asymptotic running time for both

problems is the same: Θ( n). This section presents a divide-and-conquer

algorithm for the selection problem. The algorithm RANDOMIZED-

SELECT is modeled after the quicksort algorithm of Chapter 7. Like quicksort it partitions the input array recursively. But unlike quicksort,

which recursively processes both sides of the partition,

RANDOMIZED-SELECT works on only one side of the partition.

This difference shows up in the analysis: whereas quicksort has an

expected running time of Θ( n lg n), the expected running time of RANDOMIZED-SELECT is Θ( n), assuming that the elements are

distinct.

RANDOMIZED-SELECT uses the procedure RANDOMIZED-

PARTITION introduced in Section 7.3. Like RANDOMIZED-

QUICKSORT, it is a randomized algorithm, since its behavior is

determined in part by the output of a random-number generator. The

RANDOMIZED-SELECT procedure returns the i th smallest element

of the array A[ p : r], where 1 ≤ irp + 1.

RANDOMIZED-SELECT( A, p, r, i)

1 if p == r

2

return A[ p]// 1 ≤ irp + 1 when p == r means that i = 1

3 q = RANDOMIZED-PARTITION( A, p, r)

4 k = qp + 1

5 if i == k

6

return A[ q]// the pivot value is the answer

7 elseif i < k

8

return RANDOMIZED-SELECT( A, p, q – 1, i)

9 else return RANDOMIZED-SELECT( A, q + 1, r, ik)

Figure 9.1 illustrates how the RANDOMIZED-SELECT procedure

works. Line 1 checks for the base case of the recursion, in which the

subarray A[ p : r] consists of just one element. In this case, i must equal 1, and line 2 simply returns A[ p] as the i th smallest element. Otherwise, the call to RANDOMIZED-PARTITION in line 3 partitions the array

A[ p : r] into two (possibly empty) subarrays A[ p : q – 1] and A[ q + 1 : r]

such that each element of A[ p : q – 1] is less than or equal to A[ q], which in turn is less than each element of A[ q + 1 : r]. (Although our analysis assumes that the elements are distinct, the procedure still yields the

correct result even if equal elements are present.) As in quicksort, we’ll

refer to A[ q] as the pivot element. Line 4 computes the number k of elements in the subarray A[ p : q], that is, the number of elements in the low side of the partition, plus 1 for the pivot element. Line 5 then

checks whether A[ q] is the i th smallest element. If it is, then line 6

returns A[ q]. Otherwise, the algorithm determines in which of the two

subarrays A[ p: q – 1] and A[ q + 1 : r] the i th smallest element lies. If i < k, then the desired element lies on the low side of the partition, and line

Image 351

8 recursively selects it from the subarray. If i > k, however, then the desired element lies on the high side of the partition. Since we already

know k values that are smaller than the i th smallest element of A[ p : r]—

namely, the elements of A[ p : q]—the desired element is the ( ik)th smallest element of A[ q + 1 : r], which line 9 finds recursively. The code appears to allow recursive calls to subarrays with 0 elements, but

Exercise 9.2-1 asks you to show that this situation cannot happen.

Figure 9.1 The action of RANDOMIZED-SELECT as successive partitionings narrow the subarray A[ p: r], showing the values of the parameters p, r, and i at each recursive call. The subarray A[ p : r] in each recursive step is shown in tan, with the dark tan element selected as the pivot for the next partitioning. Blue elements are outside A[ p : r]. The answer is the tan element in the bottom array, where p = r = 5 and i = 1. The array designations A(0), A(1), … , A(5), the partitioning numbers, and whether the partitioning is helpful are explained on the following page.

The worst-case running time for RANDOMIZED-SELECT is

Θ( n 2), even to find the minimum, because it could be extremely unlucky

and always partition around the largest remaining element before

identifying the i th smallest when only one element remains. In this worst

case, each recursive step removes only the pivot from consideration.

Because partitioning n elements takes Θ( n) time, the recurrence for the

worst-case running time is the same as for QUICKSORT: T ( n) = T ( n

1) + Θ( n), with the solution T ( n) = Θ( n 2). We’ll see that the algorithm has a linear expected running time, however, and because it is

randomized, no particular input elicits the worst-case behavior.

To see the intuition behind the linear expected running time, suppose

that each time the algorithm randomly selects a pivot element, the pivot

lies somewhere within the second and third quartiles—the “middle

half”—of the remaining elements in sorted order. If the i th smallest element is less than the pivot, then all the elements greater than the

pivot are ignored in all future recursive calls. These ignored elements

include at least the uppermost quartile, and possibly more. Likewise, if

the i th smallest element is greater than the pivot, then all the elements

less than the pivot—at least the first quartile—are ignored in all future

recursive calls. Either way, therefore, at least 1/4 of the remaining

elements are ignored in all future recursive calls, leaving at most 3/4 of

the remaining elements in play: residing in the subarray A[ p : r]. Since RANDOMIZED-PARTITION takes Θ( n) time on a subarray of n

elements, the recurrence for the worst-case running time is T ( n) = T

(3 n/4) + Θ( n). By case 3 of the master method (Theorem 4.1 on page 102), this recurrence has solution T ( n) = Θ( n).

Of course, the pivot does not necessarily fall into the middle half

every time. Since the pivot is selected at random, the probability that it

falls into the middle half is about 1/2 each time. We can view the process

of selecting the pivot as a Bernoulli trial (see Section C.4) with success equating to the pivot residing in the middle half. Thus the expected

number of trials needed for success is given by a geometric distribution:

just two trials on average (equation (C.36) on page 1197). In other

words, we expect that half of the partitionings reduce the number of

elements still in play by at least 3/4 and that half of the partitionings do

not help as much. Consequently, the expected number of partitionings

at most doubles from the case when the pivot always falls into the

middle half. The cost of each extra partitioning is less than the one that

preceded it, so that the expected running time is still Θ( n).

To make the above argument rigorous, we start by defining the

random variable A( j) as the set of elements of A that are still in play

after j partitionings (that is, within the subarray A[ p : r] after j calls of RANDOMIZED-SELECT), so that A(0) consists of all the elements in

A. Since each partitioning removes at least one element—the pivot—

from being in play, the sequence | A(0)|, | A(1)|, | A(2)|, … strictly decreases. Set A( j–1) is in play before the j th partitioning, and set A( j) remains in play afterward. For convenience, assume that the initial set

A(0) is the result of a 0th “dummy” partitioning.

Let’s call the j th partitioning helpful if | A( j)| ≤ (3/4)| A( j–1)|. Figure

9.1 shows the sets A( j) and whether partitionings are helpful for an

example array. A helpful partitioning corresponds to a successful

Bernoulli trial. The following lemma shows that a partitioning is at least

as likely to be helpful as not.

Lemma 9.1

A partitioning is helpful with probability at least 1/2.

Proof Whether a partitioning is helpful depends on the randomly

chosen pivot. We discussed the “middle half” in the informal argument

above. Let’s more precisely define the middle half of an n-element

subarray as all but the smallest ⌈ n/4⌉ – 1 and greatest ⌈ n/4⌉ – 1 elements (that is, all but the first ⌈ n/4⌉ – 1 and last ⌈ n/4⌉ – 1 elements if the subarray were sorted). We’ll prove that if the pivot falls into the middle

half, then the pivot leads to a helpful partitioning, and we’ll also prove

that the probability of the pivot falling into the middle half is at least

1/2.Regardless of where the pivot falls, either all the elements greater

than it or all the elements less than it, along with the pivot itself, will no

longer be in play after partitioning. If the pivot falls into the middle

half, therefore, at least ⌈ n/4⌉ – 1 elements less than the pivot or ⌈ n/4⌉ – 1

elements greater than the pivot, plus the pivot, will no longer be in play

after partitioning. That is, at least ⌈ n/4⌉ elements will no longer be in play. The number of elements remaining in play will be at most n

n/4⌉, which equals ⌊3 n/4⌊ by Exercise 3.3-2 on page 70. Since ⌊3 n/4⌊ ≤

3 n/4, the partitioning is helpful.

Image 352

Image 353

Image 354

To determine a lower bound on the probability that a randomly

chosen pivot falls into the middle half, we determine an upper bound on

the probability that it does not. That probability is

Thus, the pivot has a probability of at least 1/2 of falling into the middle

half, and so the probability is at least 1/2 that a partitioning is helpful.

We can now bound the expected running time of RANDOMIZED-

SELECT.

Theorem 9.2

The procedure RANDOMIZED-SELECT on an input array of n

distinct elements has an expected running time of Θ( n).

Proof Since not every partitioning is necessarily helpful, let’s give each

partitioning an index starting at 0 and denote by 〈 h 0, h 1, h 2, … , hm

the sequence of partitionings that are helpful, so that the hk th

partitioning is helpful for k = 0, 1, 2, … , m. Although the number m of helpful partitionings is a random variable, we can bound it, since after

at most ⌈log4/3 n⌉ helpful partitionings, only one element remains in

play. Consider the dummy 0th partitioning as helpful, so that h 0 = 0.

Denote

by nk, where n 0 = | A(0)| is the original problem size. Since

the hk th partitioning is helpful and the sizes of the sets A( j) strictly decrease, we have

for k = 1, 2, …

, m. By iterating nk ≤ (3/4) nk–1, we have that nk ≤ (3/4) kn 0 for k = 0, 1, 2, … , m.

Image 355

Image 356

Image 357

Image 358

Image 359

Image 360

Image 361

Image 362

Image 363

Image 364

Image 365

Image 366

Image 367

Image 368

Figure 9.2 The sets within each generation in the proof of Theorem 9.2. Vertical lines represent the sets, with the height of each line indicating the size of the set, which equals the number of elements in play. Each generation starts with a set

, which is the result of a helpful

partitioning. These sets are drawn in black and are at most 3/4 the size of the sets to their immediate left. Sets drawn in orange are not the first within a generation. A generation may contain just one set. The sets in generation k are

,

. The sets

are defined so that

. If the partitioning gets all the way to

generation hm, set

has at most one element in play.

As Figure 9.2 depicts, we break up the sequence of sets A( j) into m generations consisting of consecutively partitioned sets, starting with the result

of a helpful partitioning and ending with the last set

before the next helpful partitioning, so that the sets in

generation k are

,

. Then for each set of elements

A( j) in the k th generation, we have that

.

Next, we define the random variable

Xk = hk + 1 – hk

for k = 0, 1, 2, … , m – 1. That is, Xk is the number of sets in the k th generation, so that the sets in the k th generation are

,

.

By Lemma 9.1, the probability that a partitioning is helpful is at

least 1/2. The probability is actually even higher, since a partitioning is

Image 369

Image 370

helpful even if the pivot does not fall into the middle half but the i th smallest element happens to lie in the smaller side of the partitioning.

We’ll just use the lower bound of 1/2, however, and then equation (C.36)

gives that E [ Xk] ≤ 2 for k = 0, 1, 2, … , m – 1.

Let’s derive an upper bound on how many comparisons are made

altogether during partitioning, since the running time is dominated by

the comparisons. Since we are calculating an upper bound, assume that

the recursion goes all the way until only one element remains in play.

The j th partitioning takes the set A( j–1) of elements in play, and it compares the randomly chosen pivot with all the other | A( j–1)| – 1

elements, so that the j th partitioning makes fewer than | A( j–1)|

comparisons. The sets in the k th generation have sizes

. Thus, the total number of comparisons

during partitioning is less than

Since E [ Xk] ≤ 2, we have that the expected total number of comparisons

during partitioning is less than

Image 371

Since n 0 is the size of the original array A, we conclude that the expected number of comparisons, and thus the expected running time,

for RANDOMIZED-SELECT is O( n). All n elements are examined in

the first call of RANDOMIZED-PARTITION, giving a lower bound of

Ω( n). Hence the expected running time is Θ( n).