a. T ( n) = 2 T ( n/4) + 1.

b. T ( n) = 2 T ( n/4) + .

c. T ( n) = 2 T ( n/4) +

.

d. T ( n) = 2 T ( n/4) + n.

e. T ( n) = 2 T ( n/4) + n 2.

4.5-2

Professor Caesar wants to develop a matrix-multiplication algorithm

that is asymptotically faster than Strassen’s algorithm. His algorithm

will use the divide-and-conquer method, dividing each matrix into n/4 ×

n/4 submatrices, and the divide and combine steps together will take

Θ( n 2) time. Suppose that the professor’s algorithm creates a recursive subproblems of size n/4. What is the largest integer value of a for which his algorithm could possibly run asymptotically faster than Strassen’s?

4.5-3

Use the master method to show that the solution to the binary-search

recurrence T ( n) = T ( n/2) + Θ(1) is T ( n) = Θ(lg n). (See Exercise 2.3-6

Image 173

for a description of binary search.)

4.5-4

Consider the function f ( n) = lg n. Argue that although f ( n/2) < f ( n), the regularity condition af ( n/ b) ≤ cf ( n) with a = 1 and b = 2 does not hold for any constant c < 1. Argue further that for any ϵ > 0, the condition in case 3 that

does not hold.

4.5-5

Show that for suitable constants a, b, and ϵ, the function f ( n) = 2⌈lg n

satisfies all the conditions in case 3 of the master theorem except the

regularity condition.

★ 4.6 Proof of the continuous master theorem

Proving the master theorem (Theorem 4.1) in its full generality,

especially dealing with the knotty technical issue of floors and ceilings,

is beyond the scope of this book. This section, however, states and

proves a variant of the master theorem, called the continuous master

theorem1 in which the master recurrence (4.17) is defined over sufficiently large positive real numbers. The proof of this version,

uncomplicated by floors and ceilings, contains the main ideas needed to

understand how master recurrences behave. Section 4.7 discusses floors and ceilings in divide-and-conquer recurrences at greater length,

presenting sufficient conditions for them not to affect the asymptotic

solutions.

Of course, since you need not understand the proof of the master

theorem in order to apply the master method, you may choose to skip

this section. But if you wish to study more-advanced algorithms beyond

the scope of this textbook, you may appreciate a better understanding

of the underlying mathematics, which the proof of the continuous

master theorem provides.

Although we usually assume that recurrences are algorithmic and

don’t require an explicit statement of a base case, we must be much

more careful for proofs that justify the practice. The lemmas and

Image 174

Image 175

theorem in this section explicitly state the base cases, because the

inductive proofs require mathematical grounding. It is common in the

world of mathematics to be extraordinarily careful proving theorems

that justify acting more casually in practice.

The proof of the continuous master theorem involves two lemmas.

Lemma 4.2 uses a slightly simplified master recurrence with a threshold

constant of n 0 = 1, rather than the more general n 0 > 0 threshold constant implied by the unstated base case. The lemma employs a

recursion tree to reduce the solution of the simplified master recurrence

to that of evaluating a summation. Lemma 4.3 then provides asymptotic

bounds for the summation, mirroring the three cases of the master

theorem. Finally, the continuous master theorem itself (Theorem 4.4)

gives asymptotic bounds for master recurrences, while generalizing to

an arbitrary threshold constant n 0 > 0 as implied by the unstated base

case.Some of the proofs use the properties described in Problem 3-5 on

pages 72–73 to combine and simplify complicated asymptotic

expressions. Although Problem 3-5 addresses only Θ-notation, the

properties enumerated there can be extended to O-notation and Ω-

notation as well.

Here’s the first lemma.

Lemma 4.2

Let a > 0 and b > 1 be constants, and let f ( n) be a function defined over real numbers n ≥ 1. Then the recurrence

has solution

Proof Consider the recursion tree in Figure 4.3. Let’s look first at its internal nodes. The root of the tree has cost f ( n), and it has a children,

Image 176

Image 177

Image 178

Image 179

Image 180

Image 181

Image 182

Image 183

Image 184

Image 185

Image 186

each with cost f ( n/ b). (It is convenient to think of a as being an integer, especially when visualizing the recursion tree, but the mathematics does

not require it.) Each of these children has a children, making a 2 nodes at depth 2, and each of the a children has cost f ( n/ b 2). In general, there are aj nodes at depth j, and each node has cost f ( n/ bj).

Now, let’s move on to understanding the leaves. The tree grows

downward until n/ bj becomes less than 1. Thus, the tree has height ⌊log b n⌊ + 1, because

and

.

Since, as we have observed, the number of nodes at depth j is aj and all the leaves are at depth ⌊log b n⌊ + 1, the tree contains

leaves.

Using

the

identity

(3.21)

on

page

66,

we

have

, since a is constant, and

. Consequently, the total number of

leaves is

—asymptotically, the watershed function.

We are now in a position to derive equation (4.18) by summing the

costs of the nodes at each depth in the tree, as shown in the figure. The

first term in the equation is the total costs of the leaves. Since each leaf

is at depth ⌊log bn⌊ + 1 and

, the base case of the

recurrence gives the cost of a leaf:

. Hence the cost of all

leaves is

by Problem 3-5(d). The second term

in equation (4.18) is the cost of the internal nodes, which, in the

underlying divide-and-conquer algorithm, represents the costs of

dividing problems into subproblems and then recombining the

subproblems. Since the cost for all the internal nodes at depth j is aj f

( n/ bj), the total cost of all internal nodes is

Image 187

Image 188

Figure 4.3 The recursion tree generated by T ( n) = aT ( n/ b) + f ( n). The tree is a complete a-ary tree with

leaves and height ⌊log b n⌊ + 1. The cost of the nodes at each depth is shown

at the right, and their sum is given in equation (4.18).

As we’ll see, the three cases of the master theorem depend on the

distribution of the total cost across levels of the recursion tree:

Case 1: The costs increase geometrically from the root to the leaves,

growing by a constant factor with each level.

Case 2: The costs depend on the value of k in the theorem. With k = 0,

the costs are equal for each level; with k = 1, the costs grow linearly

from the root to the leaves; with k = 2, the growth is quadratic; and in

general, the costs grow polynomially in k.

Case 3: The costs decrease geometrically from the root to the leaves,

shrinking by a constant factor with each level.

The summation in equation (4.18) describes the cost of the dividing

and combining steps in the underlying divide-and-conquer algorithm.

The next lemma provides asymptotic bounds on the summation’s

growth.

Image 189

Image 190

Image 191

Image 192

Image 193

Image 194

Image 195

Image 196

Lemma 4.3

Let a > 0 and b > 1 be constants, and let f ( n) be a function defined over real numbers n ≥ 1. Then the asymptotic behavior of the function

defined for n ≥ 1, can be characterized as follows:

1. If there exists a constant ϵ > 0 such that

, then

.

2. If there exists a constant k ≥ 0 such that

, then

.

3. If there exists a constant c in the range 0 < c < 1 such that 0 < af ( n/ b) ≤ cf ( n) for all n ≥ 1, then g( n) = Θ( f ( n)).

Proof For case 1, we have

, which implies that

. Substituting into equation (4.19) yields

the last series being geometric. Since b and ϵ are constants, the – 1

denominator doesn’t affect the asymptotic growth of g( n), and neither

Image 197

Image 198

Image 199

Image 200

Image 201

does

the

–1

in

the

numerator.

Since

,

we

obtain

, thereby proving case 1.

Case 2 assumes that

, from which we can

conclude that

. Substituting into

equation (4.19) and repeatedly applying Problem 3-5(c) yields

The summation within the Θ-notation can be bounded from above as

follows:

Image 202

Image 203

Image 204

Image 205

Image 206

Exercise 4.6-1 asks you to show that the summation can similarly be

bounded from below by

. Since we have tight upper and lower

bounds, the summation is

, from which we can conclude that

, thereby completing the proof of case 2.

For case 3, observe that f ( n) appears in the definition (4.19) of g( n) (when j = 0) and that all terms of g( n) are positive. Therefore, we must have g( n) = Ω( f ( n)), and it only remains to prove that g( n) = O( f ( n)).

Performing j iterations of the inequality af ( n/ b) ≤ cf ( n) yields aj f ( n/ bj)

cj f ( n). Substituting into equation (4.19), we obtain

Thus, we can conclude that g( n) = Θ( f ( n)). With case 3 proved, the entire proof of the lemma is complete.

We can now state and prove the continuous master theorem.

Image 207

Image 208

Image 209

Image 210

Image 211

Image 212

Theorem 4.4 (Continuous master theorem)

Let a > 0 and b > 1 be constants, and let f ( n) be a driving function that is defined and nonnegative on all sufficiently large reals. Define the

algorithmic recurrence T ( n) on the positive real numbers by

T ( n) = aT ( n/ b) + f ( n).

Then the asymptotic behavior of T ( n) can be characterized as follows:

1. If there exists a constant ϵ > 0 such that

, then

.

2. If there exists a constant k ≥ 0 such that

, then

.

3. If there exists a constant ϵ > 0 such that

, and if

f ( n) additionally satisfies the regularity condition af ( n/ b) ≤ cf ( n) for some constant c < 1 and all sufficiently large n, then T ( n) =

Θ( f ( n)).

Proof The idea is to bound the summation (4.18) from Lemma 4.2 by

applying Lemma 4.3. But we must account for Lemma 4.2 using a base

case for 0 < n < 1, whereas this theorem uses an implicit base case for 0

< n < n 0, where n 0 > 0 is an arbitrary threshold constant. Since the recurrence is algorithmic, we can assume that f ( n) is defined for nn 0.

For n > 0, let us define two auxiliary functions T′( n) = T ( n 0 n) and f ′( n) = f ( n 0 n). We have

We have obtained a recurrence for T ′( n) that satisfies the conditions of Lemma 4.2, and by that lemma, the solution is

Image 213

Image 214

Image 215

Image 216

Image 217

Image 218

Image 219

Image 220

To solve T ′( n), we first need to bound f ′( n). Let’s examine the individual cases in the theorem.

The condition for case 1 is

for some constant ϵ > 0.

We have

since a, b, n 0, and ϵ are all constant. The function f ′( n) satisfies the conditions of case 1 of Lemma 4.3, and the summation in equation

(4.18) of Lemma 4.2 evaluates to

. Because a, b and n 0 are all

constants, we have

thereby completing case 1 of the theorem.

The condition for case 2 is

for some constant k

0. We have

Similar to the proof of case 1, the function f′( n) satisfies the conditions of case 2 of Lemma 4.3. The summation in equation (4.18) of Lemma

4.2 is therefore

, which implies that

Image 221

Image 222

Image 223

Image 224

which proves case 2 of the theorem.

Finally, the condition for case 3 is

for some

constant ϵ > 0 and f ( n) additionally satisfies the regularity condition af ( n/ b) ≤ cf ( n) for all nn 0 and some constants c < 1 and n 0 > 1. The first part of case 3 is like case 1:

Using the definition of f ′( n) and the fact that n 0 nn 0 for all n ≥ 1, we have for n ≥ 1 that

af ′( n/ b) = af ( n 0 n/ b)

cf ( n 0 n)

= cf ′( n).

Thus f ′( n) satisfies the requirements for case 3 of Lemma 4.3, and the summation in equation (4.18) of Lemma 4.2 evaluates to Θ( f ′( n)), yielding

T ( n) = T ′( n/ n 0)

=

= Θ( f ′( n/ n 0))

= Θ( f ( n)),

which completes the proof of case 3 of the theorem and thus the whole

theorem.

Image 225

Image 226

Image 227

Image 228

Image 229

Image 230

Image 231

Exercises

4.6-1

Show that

.

4.6-2

Show that case 3 of the master theorem is overstated (which is also why

case 3 of Lemma 4.3 does not require that

in the

sense that the regularity condition af ( n/ b) ≤ cf ( n) for some constant c < 1 implies that there exists a constant ϵ > 0 such that

.

4.6-3

For

, prove that the summation in equation (4.19)

has solution

. Conclude that a master recurrence T

( n) using f ( n) as its driving function has solution

.

★ 4.7 Akra-Bazzi recurrences

This section provides an overview of two advanced topics related to

divide-and-conquer recurrences. The first deals with technicalities

arising from the use of floors and ceilings, and the second discusses the

Akra-Bazzi method, which involves a little calculus, for solving

complicated divide-and-conquer recurrences.

In particular, we’ll look at the class of algorithmic divide-and-

conquer recurrences originally studied by M. Akra and L. Bazzi [13].

These Akra-Bazzi recurrences take the form

where k is a positive integer; all the constants a 1, a 2, … , ak ∈ R are strictly positive; all the constants b 1, b 2, … , bk ∈ R are strictly greater than 1; and the driving function f ( n) is defined on sufficiently large nonnegative reals and is itself nonnegative.

Image 232

Akra-Bazzi recurrences generalize the class of recurrences addressed

by the master theorem. Whereas master recurrences characterize the

running times of divide-and-conquer algorithms that break a problem

into equal-sized subproblems (modulo floors and ceilings), Akra-Bazzi

recurrences can describe the running time of divide-and-conquer

algorithms that break a problem into different-sized subproblems. The

master theorem, however, allows you to ignore floors and ceilings, but

the Akra-Bazzi method for solving Akra-Bazzi recurrences needs an

additional requirement to deal with floors and ceilings.

But before diving into the Akra-Bazzi method itself, let’s understand

the limitations involved in ignoring floors and ceilings in Akra-Bazzi

recurrences. As you’re aware, algorithms generally deal with integer-

sized inputs. The mathematics for recurrences is often easier with real

numbers, however, than with integers, where we must cope with floors

and ceilings to ensure that terms are well defined. The difference may

not seem to be much—especially because that’s often the truth with

recurrences—but to be mathematically correct, we must be careful with

our assumptions. Since our end goal is to understand algorithms and

not the vagaries of mathematical corner cases, we’d like to be casual yet

rigorous. How can we treat floors and ceilings casually while still

ensuring rigor?

From a mathematical point of view, the difficulty in dealing with

floors and ceilings is that some driving functions can be really, really

weird. So it’s not okay in general to ignore floors and ceilings in Akra-

Bazzi recurrences. Fortunately, most of the driving functions we

encounter in the study of algorithms behave nicely, and floors and

ceilings don’t make a difference.

The polynomial-growth condition

If the driving function f ( n) in equation (4.22) is well behaved in the following sense, it’s okay to drop floors and ceilings.

A function f ( n) defined on all sufficiently large positive reals satisfies the polynomial-growth condition if there exists a

constant

such that the following holds: for every constant

Image 233

ϕ ≥ 1, there exists a constant d > 1 (depending on ϕ) such that f ( n)/ df ( ψ n) ≤ df ( n) for all 1 ≤ ψϕ and

.

This definition may be one of the hardest in this textbook to get your

head around. To a first order, it says that f ( n) satisfies the property that f (Θ( n)) = Θ( f ( n)), although the polynomial-growth condition is actually somewhat stronger (see Exercise 4.7-4). The definition also

implies that f ( n) is asymptotically positive (see Exercise 4.7-3).

Examples of functions that satisfy the polynomial-growth condition

include any function of the form f ( n) = Θ( lg β n lg lg γn), where α, β, and γ are constants. Most of the polynomially bounded functions used

in this book satisfy the condition. Exponentials and superexponentials

do not (see Exercise 4.7-2, for example), and there also exist

polynomially bounded functions that do not.

Floors and ceilings in “nice” recurrences

When the driving function in an Akra-Bazzi recurrence satisfies the

polynomial-growth condition, floors and ceilings don’t change the

asymptotic behavior of the solution. The following theorem, which is

presented without proof, formalizes this notion.

Theorem 4.5

Let T ( n) be a function defined on the nonnegative reals that satisfies recurrence (4.22), where f ( n) satisfies the polynomial-growth condition.

Let T ′( n) be another function defined on the natural numbers also satisfying recurrence (4.22), except that each T ( n/ bi) is replaced either with T (⌈ n/ bi⌉) or with T (⌊ n/ bi⌊). Then we have T ′( n) = Θ( T ( n)).

Floors and ceilings represent a minor perturbation to the arguments

in the recursion. By inequality (3.2) on page 64, they perturb an

argument by at most 1. But much larger perturbations are tolerable. As

long as the driving function f ( n) in recurrence (4.22) satisfies the polynomial-growth condition, it turns out that replacing any term T

Image 234

Image 235

Image 236

Image 237

( n/ bi) with T ( n/ bi + hi( n)), where | hi( n)| = O( n/lg1+ ϵ n) for some constant ϵ > 0 and sufficiently large n, leaves the asymptotic solution unaffected. Thus, the divide step in a divide-and-conquer algorithm can

be moderately coarse without affecting the solution to its running-time

recurrence.

The Akra-Bazzi method

The Akra-Bazzi method, not surprisingly, was developed to solve Akra-

Bazzi recurrences (4.22), which by dint of Theorem 4.5, applies in the

presence of floors and ceilings or even larger perturbations, as just

discussed. The method involves first determining the unique real

number p such that

. Such a p always exists, because when

p → –∞, the sum goes to ∞; it decreases as p increases; and when p → ∞, it goes to 0. The Akra-Bazzi method then gives the solution to the

recurrence as

As an example, consider the recurrence

We’ll see the similar recurrence (9.1) on page 240 when we study an

algorithm for selecting the i th smallest element from a set of n numbers.

This recurrence has the form of equation (4.22), where a 1 = a 2 = 1, b 1 =

5, b 2 = 10/7, and f ( n) = n. To solve it, the Akra-Bazzi method says that we should determine the unique p satisfying

Solving for p is kind of messy—it turns out that p = 0.83978 …—but we

can solve the recurrence without actually knowing the exact value for p.

Observe that (1/5)0 + (7/10)0 = 2 and (1/5)1 + (7/10)1 = 9/10, and thus p

lies in the range 0 < p < 1. That turns out to be sufficient for the Akra-

Image 238

Image 239

Bazzi method to give us the solution. We’ll use the fact from calculus

that if k ≠ –1, then ∫ xkdx = xk + 1/( k + 1), which we’ll apply with k = –

p ≠ –1. The Akra-Bazzi solution (4.23) gives us

Although the Akra-Bazzi method is more general than the master

theorem, it requires calculus and sometimes a bit more reasoning. You

also must ensure that your driving function satisfies the polynomial-

growth condition if you want to ignore floors and ceilings, although

that’s rarely a problem. When it applies, the master method is much

simpler to use, but only when subproblem sizes are more or less equal.

They are both good tools for your algorithmic toolkit.

Exercises

4.7-1

Consider an Akra-Bazzi recurrence T ( n) on the reals as given in recurrence (4.22), and define T ′( n) as

where c > 0 is constant. Prove that whatever the implicit initial conditions for T ( n) might be, there exist initial conditions for T ′( n) such that T ′( n) = cT ( n) for all n > 0. Conclude that we can drop the

asymptotics on a driving function in any Akra-Bazzi recurrence without affecting its asymptotic solution.

4.7-2

Show that f ( n) = n 2 satisfies the polynomial-growth condition but that f ( n) = 2 n does not.

4.7-3

Let f ( n) be a function that satisfies the polynomial-growth condition.

Prove that f ( n) is asymptotically positive, that is, there exists a constant n 0 ≥ 0 such that f ( n) ≥ 0 for all nn 0.

4.7-4

Give an example of a function f ( n) that does not satisfy the polynomial-growth condition but for which f (Θ( n)) = Θ( f ( n)).

4.7-5

Use the Akra-Bazzi method to solve the following recurrences.

a. T ( n) = T ( n/2) + T ( n/3) + T ( n/6) + n lg n.

b. T ( n) = 3 T ( n/3) + 8 T ( n/4) + n 2/lg n.

c. T ( n) = (2/3) T ( n/3) + (1/3) T (2 n/3) + lg n.

d. T ( n) = (1/3) T ( n/3) + 1/ n.

e. T ( n) = 3 T ( n/3) + 3 T (2 n/3) + n 2.

4.7-6

Use the Akra-Bazzi method to prove the continuous master theorem.

Problems

4-1 Recurrence examples

Give asymptotically tight upper and lower bounds for T ( n) in each of

the following algorithmic recurrences. Justify your answers.

Image 240

a. T ( n) = 2 T ( n/2) + n 3.

b. T ( n) = T (8 n/11) + n.

c. T ( n) = 16 T ( n/4) + n 2.

d. T ( n) = 4 T ( n/2) + n 2 lg n.

e. T ( n) = 8 T ( n/3) + n 2.

f. T ( n) = 7 T ( n/2) + n 2 lg n.

g.

.

h. T ( n) = T ( n –2) + n 2.

4-2 Parameter-passing costs

Throughout this book, we assume that parameter passing during

procedure calls takes constant time, even if an N-element array is being

passed. This assumption is valid in most systems because a pointer to

the array is passed, not the array itself. This problem examines the

implications of three parameter-passing strategies:

1. Arrays are passed by pointer. Time = Θ(1).

2. Arrays are passed by copying. Time = Θ( N), where N is the size

of the array.

3. Arrays are passed by copying only the subrange that might be

accessed by the called procedure. Time = Θ( n) if the subarray

contains n elements.

Consider the following three algorithms:

a. The recursive binary-search algorithm for finding a number in a

sorted array (see Exercise 2.3-6).

b. The MERGE-SORT procedure from Section 2.3.1.

c. The MATRIX-MULTIPLY-RECURSIVE procedure from Section

4.1.

Image 241

Image 242

Image 243

Image 244

Give nine recurrences Ta 1( N, n), Ta 2( N, n), … , Tc 3( N, n) for the worst-case running times of each of the three algorithms above when arrays

and matrices are passed using each of the three parameter-passing

strategies above. Solve your recurrences, giving tight asymptotic bounds.

4-3 Solving recurrences with a change of variables

Sometimes, a little algebraic manipulation can make an unknown

recurrence similar to one you have seen before. Let’s solve the recurrence

by using the change-of-variables method.

a. Define m = lg n and S( m) = T (2 m). Rewrite recurrence (4.25) in terms of m and S( m).

b. Solve your recurrence for S( m).

c. Use your solution for S( m) to conclude that T ( n) = Θ(lg n lg lg n).

d. Sketch the recursion tree for recurrence (4.25), and use it to explain

intuitively why the solution is T ( n) = Θ(lg n lg lg n).

Solve the following recurrences by changing variables:

e.

.

f.

.

4-4 More recurrence examples

Give asymptotically tight upper and lower bounds for T ( n) in each of

the following recurrences. Justify your answers.

a. T ( n) = 5 T ( n/3) + n lg n.

b. T ( n) = 3 T ( n/3) + n/lg n.

c.

.

d. T ( n) = 2 T ( n/2 –2) + n/2.

e. T ( n) = 2 T ( n/2) + n/lg n.

Image 245

Image 246

Image 247

Image 248

Image 249

Image 250

f. T ( n) = T ( n/2) + T ( n/4) + T ( n/8) + n.

g. T ( n) = T ( n – 1) + 1/ n.

h. T ( n) = T ( n – 1) + lg n.

i. T ( n) = T ( n – 2) + 1/lg n.

j.

.

4-5 Fibonacci numbers

This problem develops properties of the Fibonacci numbers, which are

defined by recurrence (3.31) on page 69. We’ll explore the technique of

generating functions to solve the Fibonacci recurrence. Define the

generating function (or formal power series) F as

where Fi is the i th Fibonacci number.

a. Show that F ( z) = z + z F ( z) + z 2F ( z).

b. Show that

where ϕ is the golden ratio, and is its conjugate (see page 69).

c. Show that

You may use without proof the generating-function version of

equation (A.7) on page 1142,

. Because this

equation involves a generating function, x is a formal variable, not a

Image 251

Image 252

real-valued variable, so that you don’t have to worry about

convergence of the summation or about the requirement in equation

(A.7) that | x| < 1, which doesn’t make sense here.

d. Use part (c) to prove that

for i > 0, rounded to the nearest

integer. ( Hint: Observe that

.)

e. Prove that Fi+2 ≥ ϕi for i ≥ 0.

4-6 Chip testing

Professor Diogenes has n supposedly identical integrated-circuit chips

that in principle are capable of testing each other. The professor’s test jig

accommodates two chips at a time. When the jig is loaded, each chip

tests the other and reports whether it is good or bad. A good chip

always reports accurately whether the other chip is good or bad, but the

professor cannot trust the answer of a bad chip. Thus, the four possible

outcomes of a test are as follows:

Chip A says Chip B says Conclusion

B is good

A is good

both are good, or both are bad

B is good

A is bad

at least one is bad

B is bad

A is good

at least one is bad

B is bad

A is bad

at least one is bad

a. Show that if at least n/2 chips are bad, the professor cannot

necessarily determine which chips are good using any strategy based

on this kind of pairwise test. Assume that the bad chips can conspire

to fool the professor.

Now you will design an algorithm to identify which chips are good and

which are bad, assuming that more than n/2 of the chips are good. First,

you will determine how to identify one good chip.

b. Show that ⌊ n/2⌊ pairwise tests are sufficient to reduce the problem to one of nearly half the size. That is, show how to use ⌊ n/2⌊ pairwise

tests to obtain a set with at most ⌈ n/2⌉ chips that still has the property that more than half of the chips are good.

c. Show how to apply the solution to part (b) recursively to identify one

good chip. Give and solve the recurrence that describes the number of

tests needed to identify one good chip.

You have now determined how to identify one good chip.

d. Show how to identify all the good chips with an additional Θ( n)

pairwise tests.

4-7 Monge arrays

An m × n array A of real numbers is a Monge array if for all i, j, k, and l such that 1 ≤ i < km and 1 ≤ j < ln, we have A[ i, j] + A[ k, l] ≤ A[ i, l] + A[ k, j].

In other words, whenever we pick two rows and two columns of a

Monge array and consider the four elements at the intersections of the

rows and the columns, the sum of the upper-left and lower-right

elements is less than or equal to the sum of the lower-left and upper-

right elements. For example, the following array is Monge:

10 17 13 28 23

17 22 16 29 23

24 28 22 34 24

11 13 6 17 7

45 44 32 37 23

36 33 19 21 6

75 66 51 53 34

a. Prove that an array is Monge if and only if for all i = 1, 2, …, m – 1

and j = 1, 2, …, n – 1, we have

A[ i, j] + A[ i + 1, j + 1] ≤ A[ i, j + 1] + A[ i + 1, j].

( Hint: For the “if” part, use induction separately on rows and

columns.)

b. The following array is not Monge. Change one element in order to make it Monge. ( Hint: Use part (a).)

37

23

22

32

21

6

7

10

53

34

30

31

32

13

9

6

43

21

15

8

c. Let f ( i) be the index of the column containing the leftmost minimum element of row i. Prove that f (1) ≤ f (2) ≤ ⋯ ≤ f ( m) for any m × n Monge array.

d. Here is a description of a divide-and-conquer algorithm that

computes the leftmost minimum element in each row of an m × n

Monge array A:

Construct a submatrix A′ of A consisting of the even-numbered

rows of A. Recursively determine the leftmost minimum for

each row of A′. Then compute the leftmost minimum in the

odd-numbered rows of A.

Explain how to compute the leftmost minimum in the odd-numbered

rows of A (given that the leftmost minimum of the even-numbered

rows is known) in O( m + n) time.

e. Write the recurrence for the running time of the algorithm in part (d).

Show that its solution is O( m + n log m).

Chapter notes

Divide-and-conquer as a technique for designing algorithms dates back

at least to 1962 in an article by Karatsuba and Ofman [242], but it might have been used well before then. According to Heideman, Johnson, and

Burrus [211], C. F. Gauss devised the first fast Fourier transform algorithm in 1805, and Gauss’s formulation breaks the problem into

smaller subproblems whose solutions are combined.

Strassen’s algorithm [424] caused much excitement when it appeared in 1969. Before then, few imagined the possibility of an algorithm

asymptotically faster than the basic MATRIX-MULTIPLY procedure.

Shortly thereafter, S. Winograd reduced the number of submatrix

additions from 18 to 15 while still using seven submatrix multiplications.

This improvement, which Winograd apparently never published (and

which is frequently miscited in the literature), may enhance the

practicality of the method, but it does not affect its asymptotic

performance. Probert [368] described Winograd’s algorithm and showed that with seven multiplications, 15 additions is the minimum possible.

Strassen’s Θ( n lg 7) = O( n 2.81) bound for matrix multiplication held until 1987, when Coppersmith and Winograd [103] made a significant advance, improving the bound to O( n 2.376) time with a mathematically

sophisticated but wildly impractical algorithm based on tensor

products. It took approximately 25 years before the asymptotic upper

bound was again improved. In 2012 Vassilevska Williams [445]

improved it to O( n 2.37287), and two years later Le Gall [278] achieved O( n 2.37286), both of them using mathematically fascinating but

impractical algorithms. The best lower bound to date is just the obvious

Ω( n 2) bound (obvious because any algorithm for matrix multiplication

must fill in the n 2 elements of the product matrix).

The performance of MATRIX-MULTIPLY-RECURSIVE can be

improved in practice by coarsening the leaves of the recursion. It also

exhibits better cache behavior than MATRIX-MULTIPLY, although

MATRIX-MULTIPLY can be improved by “tiling.” Leiserson et al.

[293] conducted a performance-engineering study of matrix

multiplication in which a parallel and vectorized divide-and-conquer

algorithm achieved the highest performance. Strassen’s algorithm can be

practical for large dense matrices, although large matrices tend to be

sparse, and sparse methods can be much faster. When using limited-

precision floating-point values, Strassen’s algorithm produces larger

numerical errors than the Θ( n 3) algorithms do, although Higham [215]

demonstrated that Strassen’s algorithm is amply accurate for some applications.

Recurrences were studied as early as 1202 by Leonardo Bonacci [66], also known as Fibonacci, for whom the Fibonacci numbers are named,

although Indian mathematicians had discovered Fibonacci numbers

centuries before. The French mathematician De Moivre [108]

introduced the method of generating functions with which he studied

Fibonacci numbers (see Problem 4-5). Knuth [259] and Liu [302] are good resources for learning the method of generating functions.

Aho, Hopcroft, and Ullman [5, 6] offered one of the first general methods for solving recurrences arising from the analysis of divide-and-conquer algorithms. The master method was adapted from Bentley,

Haken, and Saxe [52]. The Akra-Bazzi method is due (unsurprisingly) to Akra and Bazzi [13]. Divide-and-conquer recurrences have been studied by many researchers, including Campbell [79], Graham, Knuth, and Patashnik [199], Kuszmaul and Leiserson [274], Leighton [287], Purdom and Brown [371], Roura [389], Verma [447], and Yap [462].

The issue of floors and ceilings in divide-and-conquer recurrences,

including a theorem similar to Theorem 4.5, was studied by Leighton

[287]. Leighton proposed a version of the polynomial-growth condition.

Campbell [79] removed several limitations in Leighton’s statement of it and showed that there were polynomially bounded functions that do

not satisfy Leighton’s condition. Campbell also carefully studied many

other technical issues, including the well-definedness of divide-and-

conquer recurrences. Kuszmaul and Leiserson [274] provided a proof of Theorem 4.5 that does not involve calculus or other higher math. Both

Campbell and Leighton explored the perturbations of arguments

beyond simple floors and ceilings.

1 This terminology does not mean that either T ( n) or f ( n) need be continuous, only that the domain of T ( n) is the real numbers, as opposed to integers.

5 Probabilistic Analysis and Randomized

Algorithms

This chapter introduces probabilistic analysis and randomized

algorithms. If you are unfamiliar with the basics of probability theory,

you should read Sections C.1–C.4 of Appendix C, which review this material. We’ll revisit probabilistic analysis and randomized algorithms

several times throughout this book.

5.1 The hiring problem

Suppose that you need to hire a new office assistant. Your previous

attempts at hiring have been unsuccessful, and you decide to use an

employment agency. The employment agency sends you one candidate

each day. You interview that person and then decide either to hire that

person or not. You must pay the employment agency a small fee to

interview an applicant. To actually hire an applicant is more costly,

however, since you must fire your current office assistant and also pay a

substantial hiring fee to the employment agency. You are committed to

having, at all times, the best possible person for the job. Therefore, you

decide that, after interviewing each applicant, if that applicant is better

qualified than the current office assistant, you will fire the current office

assistant and hire the new applicant. You are willing to pay the resulting

price of this strategy, but you wish to estimate what that price will be.

The procedure HIRE-ASSISTANT on the facing page expresses this

strategy for hiring in pseudocode. The candidates for the office assistant

job are numbered 1 through n and interviewed in that order. The procedure assumes that after interviewing candidate i, you can

determine whether candidate i is the best candidate you have seen so far.

It starts by creating a dummy candidate, numbered 0, who is less

qualified than each of the other candidates.

The cost model for this problem differs from the model described in

Chapter 2. We focus not on the running time of HIRE-ASSISTANT, but instead on the fees paid for interviewing and hiring. On the surface,

analyzing the cost of this algorithm may seem very different from

analyzing the running time of, say, merge sort. The analytical

techniques used, however, are identical whether we are analyzing cost or

running time. In either case, we are counting the number of times

certain basic operations are executed.

HIRE-ASSISTANT( n)

1 best = 0 // candidate 0 is a least-qualified dummy candidate

2 for i = 1 to n

3

interview candidate i

4

if candidate i is better than candidate best

5

best = i

6

hire candidate i

Interviewing has a low cost, say ci, whereas hiring is expensive,

costing ch. Letting m be the number of people hired, the total cost associated with this algorithm is O( cin + chm). No matter how many people you hire, you always interview n candidates and thus always

incur the cost cin associated with interviewing. We therefore concentrate

on analyzing chm, the hiring cost. This quantity depends on the order in

which you interview candidates.

This scenario serves as a model for a common computational

paradigm. Algorithms often need to find the maximum or minimum

value in a sequence by examining each element of the sequence and

maintaining a current “winner.” The hiring problem models how often

a procedure updates its notion of which element is currently winning.

Worst-case analysis

In the worst case, you actually hire every candidate that you interview.

This situation occurs if the candidates come in strictly increasing order

of quality, in which case you hire n times, for a total hiring cost of O( chn).

Of course, the candidates do not always come in increasing order of

quality. In fact, you have no idea about the order in which they arrive,

nor do you have any control over this order. Therefore, it is natural to

ask what we expect to happen in a typical or average case.

Probabilistic analysis

Probabilistic analysis is the use of probability in the analysis of

problems. Most commonly, we use probabilistic analysis to analyze the

running time of an algorithm. Sometimes we use it to analyze other

quantities, such as the hiring cost in procedure HIRE-ASSISTANT. In

order to perform a probabilistic analysis, we must use knowledge of, or

make assumptions about, the distribution of the inputs. Then we

analyze our algorithm, computing an average-case running time, where

we take the average, or expected value, over the distribution of the

possible inputs. When reporting such a running time, we refer to it as

the average-case running time.

You must be careful in deciding on the distribution of inputs. For

some problems, you may reasonably assume something about the set of

all possible inputs, and then you can use probabilistic analysis as a

technique for designing an efficient algorithm and as a means for

gaining insight into a problem. For other problems, you cannot

characterize a reasonable input distribution, and in these cases you

cannot use probabilistic analysis.

For the hiring problem, we can assume that the applicants come in a

random order. What does that mean for this problem? We assume that

you can compare any two candidates and decide which one is better

qualified, which is to say that there is a total order on the candidates.

(See Section B.2 for the definition of a total order.) Thus, you can rank each candidate with a unique number from 1 through n, using rank( i) to denote the rank of applicant i, and adopt the convention that a higher

rank corresponds to a better qualified applicant. The ordered list

rank(1), rank(2), … , rank( n)〉 is a permutation of the list 〈1, 2, … , n〉.

Saying that the applicants come in a random order is equivalent to

saying that this list of ranks is equally likely to be any one of the n!

permutations of the numbers 1 through n. Alternatively, we say that the

ranks form a uniform random permutation, that is, each of the possible n!

permutations appears with equal probability.

Section 5.2 contains a probabilistic analysis of the hiring problem.

Randomized algorithms

In order to use probabilistic analysis, you need to know something

about the distribution of the inputs. In many cases, you know little

about the input distribution. Even if you do know something about the

distribution, you might not be able to model this knowledge

computationally. Yet, probability and randomness often serve as tools

for algorithm design and analysis, by making part of the algorithm

behave randomly.

In the hiring problem, it may seem as if the candidates are being

presented to you in a random order, but you have no way of knowing

whether they really are. Thus, in order to develop a randomized

algorithm for the hiring problem, you need greater control over the

order in which you’ll interview the candidates. We will, therefore,

change the model slightly. The employment agency sends you a list of

the n candidates in advance. On each day, you choose, randomly, which

candidate to interview. Although you know nothing about the

candidates (besides their names), we have made a significant change.

Instead of accepting the order given to you by the employment agency

and hoping that it’s random, you have instead gained control of the

process and enforced a random order.

More generally, we call an algorithm randomized if its behavior is

determined not only by its input but also by values produced by a

random-number generator. We assume that we have at our disposal a

random-number generator RANDOM. A call to RANDOM( a, b)

returns an integer between a and b, inclusive, with each such integer being equally likely. For example, RANDOM(0, 1) produces 0 with

probability 1/2, and it produces 1 with probability 1/2. A call to RANDOM(3, 7) returns any one of 3, 4, 5, 6, or 7, each with

probability 1/5. Each integer returned by RANDOM is independent of

the integers returned on previous calls. You may imagine RANDOM as

rolling a ( ba + 1)-sided die to obtain its output. (In practice, most programming environments offer a pseudorandom-number generator: a

deterministic algorithm returning numbers that “look” statistically

random.)

When analyzing the running time of a randomized algorithm, we

take the expectation of the running time over the distribution of values

returned by the random number generator. We distinguish these

algorithms from those in which the input is random by referring to the

running time of a randomized algorithm as an expected running time. In

general, we discuss the average-case running time when the probability

distribution is over the inputs to the algorithm, and we discuss the

expected running time when the algorithm itself makes random choices.

Exercises

5.1-1

Show that the assumption that you are always able to determine which

candidate is best, in line 4 of procedure HIRE-ASSISTANT, implies

that you know a total order on the ranks of the candidates.

5.1-2

Describe an implementation of the procedure RANDOM( a, b) that

makes calls only to RANDOM(0, 1). What is the expected running time

of your procedure, as a function of a and b?

5.1-3

You wish to implement a program that outputs 0 with probability 1/2

and 1 with probability 1/2. At your disposal is a procedure BIASED-

RANDOM that outputs either 0 or 1, but it outputs 1 with some

probability p and 0 with probability 1 – p, where 0 < p < 1. You do not know what p is. Give an algorithm that uses BIASED-RANDOM as a

subroutine, and returns an unbiased answer, returning 0 with

Image 253

Image 254

probability 1/2 and 1 with probability 1/2. What is the expected running

time of your algorithm as a function of p?

5.2 Indicator random variables

In order to analyze many algorithms, including the hiring problem, we

use indicator random variables. Indicator random variables provide a

convenient method for converting between probabilities and

expectations. Given a sample space S and an event A, the indicator random variable I { A} associated with event A is defined as

As a simple example, let us determine the expected number of heads

obtained when flipping a fair coin. The sample space for a single coin

flip is S = { H, T}, with Pr { H} = Pr { T} = 1/2. We can then define an indicator random variable XH, associated with the coin coming up

heads, which is the event H. This variable counts the number of heads

obtained in this flip, and it is 1 if the coin comes up heads and 0

otherwise. We write

The expected number of heads obtained in one flip of the coin is simply

the expected value of our indicator variable XH:

E [ XH] = E [I { H}]

= 1 · Pr { H} + 0 · Pr { T}

= 1 · (1/2) + 0 · (1/2)

= 1/2.

Thus the expected number of heads obtained by one flip of a fair coin is

1/2. As the following lemma shows, the expected value of an indicator

Image 255

random variable associated with an event A is equal to the probability

that A occurs.

Lemma 5.1

Given a sample space S and an event A in the sample space S, let XA =

I { A}. Then E [ XA] = Pr { A}.

Proof By the definition of an indicator random variable from equation

(5.1) and the definition of expected value, we have

E [ XA] = E [I { A}]

= 1 · Pr { A} + 0 · Pr { A}

= Pr { A},

where A denotes SA, the complement of A.

Although indicator random variables may seem cumbersome for an

application such as counting the expected number of heads on a flip of a

single coin, they are useful for analyzing situations that perform

repeated random trials. In Appendix C, for example, indicator random variables provide a simple way to determine the expected number of

heads in n coin flips. One option is to consider separately the probability

of obtaining 0 heads, 1 head, 2 heads, etc. to arrive at the result of

equation (C.41) on page 1199. Alternatively, we can employ the simpler

method proposed in equation (C.42), which uses indicator random

variables implicitly. Making this argument more explicit, let Xi be the indicator random variable associated with the event in which the i th flip

comes up heads: Xi = I {the i th flip results in the event H}. Let X be the random variable denoting the total number of heads in the n coin flips,

so that

Image 256

Image 257

Image 258

In order to compute the expected number of heads, take the expectation

of both sides of the above equation to obtain

By Lemma 5.1, the expectation of each of the random variables is E [ Xi]

= 1/2 for i = 1, 2, … , n. Then we can compute the sum of the expectations:

. But equation (5.2) calls for the

expectation of the sum, not the sum of the expectations. How can we

resolve this conundrum? Linearity of expectation, equation (C.24) on

page 1192, to the rescue: the expectation of the sum always equals the

sum of the expectations. Linearity of expectation applies even when

there is dependence among the random variables. Combining indicator

random variables with linearity of expectation gives us a powerful

technique to compute expected values when multiple events occur. We

now can compute the expected number of heads:

Thus, compared with the method used in equation (C.41), indicator

random variables greatly simplify the calculation. We use indicator

random variables throughout this book.

Analysis of the hiring problem using indicator random variables

Returning to the hiring problem, we now wish to compute the expected

number of times that you hire a new office assistant. In order to use a

probabilistic analysis, let’s assume that the candidates arrive in a

random order, as discussed in Section 5.1. (We’ll see in Section 5.3 how

Image 259

Image 260

Image 261

to remove this assumption.) Let X be the random variable whose value

equals the number of times you hire a new office assistant. We could

then apply the definition of expected value from equation (C.23) on

page 1192 to obtain

but this calculation would be cumbersome. Instead, let’s simplify the

calculation by using indicator random variables.

To use indicator random variables, instead of computing E [ X] by

defining just one variable denoting the number of times you hire a new

office assistant, think of the process of hiring as repeated random trials

and define n variables indicating whether each particular candidate is hired. In particular, let Xi be the indicator random variable associated

with the event in which the i th candidate is hired. Thus,

and

Lemma 5.1 gives

E [ Xi] = Pr {candidate i is hired},

and we must therefore compute the probability that lines 5–6 of HIRE-

ASSISTANT are executed.

Candidate i is hired, in line 6, exactly when candidate i is better than each of candidates 1 through i – 1. Because we have assumed that the

candidates arrive in a random order, the first i candidates have appeared

in a random order. Any one of these first i candidates is equally likely to

be the best qualified so far. Candidate i has a probability of 1/ i of being better qualified than candidates 1 through i – 1 and thus a probability of

1/ i of being hired. By Lemma 5.1, we conclude that

Image 262

Image 263

Now we can compute E [ X]:

Even though you interview n people, you actually hire only

approximately ln n of them, on average. We summarize this result in the

following lemma.

Lemma 5.2

Assuming that the candidates are presented in a random order,

algorithm HIRE-ASSISTANT has an average-case total hiring cost of

O( ch ln n).

Proof The bound follows immediately from our definition of the hiring

cost and equation (5.6), which shows that the expected number of hires

is approximately ln n.

The average-case hiring cost is a significant improvement over the

worst-case hiring cost of O( chn).

Exercises

5.2-1

In HIRE-ASSISTANT, assuming that the candidates are presented in a

random order, what is the probability that you hire exactly one time?

What is the probability that you hire exactly n times?

5.2-2

In HIRE-ASSISTANT, assuming that the candidates are presented in a

random order, what is the probability that you hire exactly twice?

5.2-3

Use indicator random variables to compute the expected value of the

sum of n dice.

5.2-4

This exercise asks you to (partly) verify that linearity of expectation

holds even if the random variables are not independent. Consider two 6-

sided dice that are rolled independently. What is the expected value of

the sum? Now consider the case where the first die is rolled normally

and then the second die is set equal to the value shown on the first die.

What is the expected value of the sum? Now consider the case where the

first die is rolled normally and the second die is set equal to 7 minus the

value of the first die. What is the expected value of the sum?

5.2-5

Use indicator random variables to solve the following problem, which is

known as the hat-check problem. Each of n customers gives a hat to a

hat-check person at a restaurant. The hat-check person gives the hats

back to the customers in a random order. What is the expected number

of customers who get back their own hat?

5.2-6

Let A[1 : n] be an array of n distinct numbers. If i < j and A[ i] > A[ j], then the pair ( i, j) is called an inversion of A. (See Problem 2-4 on page 47 for more on inversions.) Suppose that the elements of A form a

uniform random permutation of 〈1, 2, … , n〉. Use indicator random

variables to compute the expected number of inversions.

5.3 Randomized algorithms

In the previous section, we showed how knowing a distribution on the

inputs can help us to analyze the average-case behavior of an algorithm.

What if you do not know the distribution? Then you cannot perform an

average-case analysis. As mentioned in Section 5.1, however, you might be able to use a randomized algorithm.

For a problem such as the hiring problem, in which it is helpful to

assume that all permutations of the input are equally likely, a

probabilistic analysis can guide us when developing a randomized

algorithm. Instead of assuming a distribution of inputs, we impose a distribution. In particular, before running the algorithm, let’s randomly

permute the candidates in order to enforce the property that every

permutation is equally likely. Although we have modified the algorithm,

we still expect to hire a new office assistant approximately ln n times.

But now we expect this to be the case for any input, rather than for inputs drawn from a particular distribution.

Let us further explore the distinction between probabilistic analysis

and randomized algorithms. In Section 5.2, we claimed that, assuming that the candidates arrive in a random order, the expected number of

times you hire a new office assistant is about ln n. This algorithm is deterministic: for any particular input, the number of times a new office

assistant is hired is always the same. Furthermore, the number of times

you hire a new office assistant differs for different inputs, and it depends

on the ranks of the various candidates. Since this number depends only

on the ranks of the candidates, to represent a particular input, we can

just list, in order, the ranks 〈 rank(1), rank(2), … , rank( n)〉 of the candidates. Given the rank list A 1 = 〈1, 2, 3, 4, 5, 6, 7, 8, 9, 10〉, a new

office assistant is always hired 10 times, since each successive candidate

is better than the previous one, and lines 5–6 of HIRE-ASSISTANT are

executed in each iteration. Given the list of ranks A 2 = 〈10, 9, 8, 7, 6, 5,

4, 3, 2, 1〉, a new office assistant is hired only once, in the first iteration.

Given a list of ranks A 3 = 〈5, 2, 1, 8, 4, 7, 10, 9, 3, 6〉, a new office assistant is hired three times, upon interviewing the candidates with

ranks 5, 8, and 10. Recalling that the cost of our algorithm depends on

how many times you hire a new office assistant, we see that there are

expensive inputs such as A 1, inexpensive inputs such as A 2, and moderately expensive inputs such as A 3.

Consider, on the other hand, the randomized algorithm that first permutes the list of candidates and then determines the best candidate.

In this case, we randomize in the algorithm, not in the input

distribution. Given a particular input, say A 3 above, we cannot say how

many times the maximum is updated, because this quantity differs with

each run of the algorithm. The first time you run the algorithm on A 3, it

might produce the permutation A 1 and perform 10 updates. But the

second time you run the algorithm, it might produce the permutation

A 2 and perform only one update. The third time you run the algorithm,

it might perform some other number of updates. Each time you run the

algorithm, its execution depends on the random choices made and is

likely to differ from the previous execution of the algorithm. For this

algorithm and many other randomized algorithms, no particular input

elicits its worst-case behavior. Even your worst enemy cannot produce a

bad input array, since the random permutation makes the input order

irrelevant. The randomized algorithm performs badly only if the

random-number generator produces an “unlucky” permutation.

For the hiring problem, the only change needed in the code is to

randomly permute the array, as done in the RANDOMIZED-HIRE-

ASSISTANT procedure. This simple change creates a randomized

algorithm whose performance matches that obtained by assuming that

the candidates were presented in a random order.

RANDOMIZED-HIRE-ASSISTANT( n)

1randomly permute the list

of candidates

2HIRE-ASSISTANT( n)

Lemma 5.3

The expected hiring cost of the procedure RANDOMIZED-HIRE-

ASSISTANT is O( ch ln n).

Proof Permuting the input array achieves a situation identical to that

of the probabilistic analysis of HIRE-ASSISTANT in Secetion 5.2.

By carefully comparing Lemmas 5.2 and 5.3, you can see the

difference between probabilistic analysis and randomized algorithms.

Lemma 5.2 makes an assumption about the input. Lemma 5.3 makes no

such assumption, although randomizing the input takes some

additional time. To remain consistent with our terminology, we couched

Lemma 5.2 in terms of the average-case hiring cost and Lemma 5.3 in

terms of the expected hiring cost. In the remainder of this section, we

discuss some issues involved in randomly permuting inputs.

Randomly permuting arrays

Many randomized algorithms randomize the input by permuting a

given input array. We’ll see elsewhere in this book other ways to

randomize an algorithm, but now, let’s see how we can randomly

permute an array of n elements. The goal is to produce a uniform random permutation, that is, a permutation that is as likely as any other

permutation. Since there are n! possible permutations, we want the

probability that any particular permutation is produced to be 1/ n!.

You might think that to prove that a permutation is a uniform

random permutation, it suffices to show that, for each element A[ i], the probability that the element winds up in position j is 1/ n. Exercise 5.3-4

shows that this weaker condition is, in fact, insufficient.

Our method to generate a random permutation permutes the array

in place: at most a constant number of elements of the input array are

ever stored outside the array. The procedure RANDOMLY-

PERMUTE permutes an array A[1 : n] in place in Θ( n) time. In its i th iteration, it chooses the element A[ i] randomly from among elements A[ i] through A[ n]. After the i th iteration, A[ i] is never altered.

RANDOMLY-PERMUTE( A, n)

1 for i = 1 to n

2

swap A[ i] with A[RANDOM( i, n)]

We use a loop invariant to show that procedure RANDOMLY-

PERMUTE produces a uniform random permutation. A k-permutation

on a set of n elements is a sequence containing k of the n elements, with no repetitions. (See page 1180 in Appendix C. ) There are n!/( nk)! such possible k-permutations.

Lemma 5.4

Procedure RANDOMLY-PERMUTE computes a uniform random

permutation.

Proof We use the following loop invariant:

Just prior to the i th iteration of the for loop of lines 1–2, for each possible ( i – 1)-permutation of the n elements, the subarray

A[1 : i – 1] contains this ( i – 1)-permutation with probability ( n

i + 1)!/ n!.

We need to show that this invariant is true prior to the first loop

iteration, that each iteration of the loop maintains the invariant, that

the loop terminates, and that the invariant provides a useful property to

show correctness when the loop terminates.

Initialization: Consider the situation just before the first loop iteration,

so that i = 1. The loop invariant says that for each possible 0-

permutation, the subarray A[1 : 0] contains this 0-permutation with

probability ( ni + 1)!/ n! = n!/ n! = 1. The subarray A[1 : 0] is an empty subarray, and a 0-permutation has no elements. Thus, A[1 : 0]

contains any 0-permutation with probability 1, and the loop invariant

holds prior to the first iteration.

Maintenance: By the loop invariant, we assume that just before the i th

iteration, each possible ( i – 1)-permutation appears in the subarray

A[1 : i – 1] with probability ( ni + 1)!/ n!. We shall show that after the i th iteration, each possible i-permutation appears in the subarray A[1 : i] with probability ( ni)!/ n!. Incrementing i for the next iteration then maintains the loop invariant.

Image 264

Let us examine the i th iteration. Consider a particular i-permutation, and denote the elements in it by 〈 x 1, x 2, … , xi〉. This permutation consists of an ( i – 1)-permutation 〈 x 1, … , xi–1〉 followed by the value xi that the algorithm places in A[ i]. Let E 1 denote the event in which the first i – 1 iterations have created the particular ( i – 1)-permutation

x 1, … , xi–1〉 in A[1 : i – 1]. By the loop invariant, Pr { E 1} = ( ni +

1)!/ n!. Let E 2 be the event that the i th iteration puts xi in position A[ i].

The i-permutation 〈 x 1, … , xi〉 appears in A[1 : i] precisely when both E 1 and E 2 occur, and so we wish to compute Pr { E 2 ∩ E 1}. Using equation (C.16) on page 1187, we have

Pr { E 2 ∩ E 1} = Pr { E 2 | E 1} Pr { E 1}.

The probability Pr { E 2 | E 1} equals 1/( ni + 1) because in line 2 the algorithm chooses xi randomly from the ni + 1 values in positions A[ i : n]. Thus, we have

Termination: The loop terminates, since it is a for loop iterating n times.

At termination, i = n + 1, and we have that the subarray A[1 : n] is a given n-permutation with probability ( n – ( n + 1) + 1)!/ n! = 0!/ n! =

1/ n!.

Thus, RANDOMLY-PERMUTE produces a uniform random

permutation.

A randomized algorithm is often the simplest and most efficient way

to solve a problem.

Exercises

5.3-1

Professor Marceau objects to the loop invariant used in the proof of

Lemma 5.4. He questions whether it holds prior to the first iteration. He

reasons that we could just as easily declare that an empty subarray

contains no 0-permutations. Therefore, the probability that an empty

subarray contains a 0-permutation should be 0, thus invalidating the

loop invariant prior to the first iteration. Rewrite the procedure

RANDOMLY-PERMUTE so that its associated loop invariant applies

to a nonempty subarray prior to the first iteration, and modify the

proof of Lemma 5.4 for your procedure.

5.3-2

Professor Kelp decides to write a procedure that produces at random

any permutation except the identity permutation, in which every element

ends up where it started. He proposes the procedure PERMUTE-

WITHOUT-IDENTITY. Does this procedure do what Professor Kelp

intends?

PERMUTE-WITHOUT-IDENTITY( A, n)

1 for i = 1 to n – 1

2

swap A[ i] with A[RANDOM( i + 1, n)]

5.3-3

Consider the PERMUTE-WITH-ALL procedure on the facing page,

which instead of swapping element A[ i] with a random element from the

subarray A[ i : n], swaps it with a random element from anywhere in the array. Does PERMUTE-WITH-ALL produce a uniform random

permutation? Why or why not?

PERMUTE-WITH-ALL( A, n)

1 for i = 1 to n

2

swap A[ i] with A[RANDOM(1, n)]

5.3-4

Professor Knievel suggests the procedure PERMUTE-BY-CYCLE to

generate a uniform random permutation. Show that each element A[ i]

has a 1/ n probability of winding up in any particular position in B. Then show that Professor Knievel is mistaken by showing that the resulting

permutation is not uniformly random.

PERMUTE-BY-CYCLE( A, n)

1 let B[1 : n] be a new array

2 offset = RANDOM(1, n)

3 for i = 1 to n

4

dest = i + offset

5

if dest > n

6

dest = destn

7

B[ dest] = A[ i]

8 return B

5.3-5

Professor Gallup wants to create a random sample of the set {1, 2, 3, … ,

n}, that is, an m-element subset S, where 0 ≤ mn, such that each m-

subset is equally likely to be created. One way is to set A[ i] = i, for i = 1, 2, 3, … , n, call RANDOMLY-PERMUTE( A), and then take just the

first m array elements. This method makes n calls to the RANDOM

procedure. In Professor Gallup’s application, n is much larger than m,

and so the professor wants to create a random sample with fewer calls

to RANDOM.

RANDOM-SAMPLE( m, n)

1 S = ∅

2 for k = nm + 1 to n

// iterates m times

3

i = RANDOM(1, k)

4

if iS

5

S = S ⋃ { k}

6

else S = S ⋃ { i}

7 return S

Show that the procedure RANDOM-SAMPLE on the previous page

returns a random m-subset S of {1, 2, 3, … , n}, in which each m-subset is equally likely, while making only m calls to RANDOM.

★ 5.4 Probabilistic analysis and further uses of indicator

random variables

This advanced section further illustrates probabilistic analysis by way of

four examples. The first determines the probability that in a room of k

people, two of them share the same birthday. The second example

examines what happens when randomly tossing balls into bins. The

third investigates “streaks” of consecutive heads when flipping coins.

The final example analyzes a variant of the hiring problem in which you

have to make decisions without actually interviewing all the candidates.

5.4.1 The birthday paradox

Our first example is the birthday paradox. How many people must there

be in a room before there is a 50% chance that two of them were born

on the same day of the year? The answer is surprisingly few. The

paradox is that it is in fact far fewer than the number of days in a year,

or even half the number of days in a year, as we shall see.

To answer this question, we index the people in the room with the

integers 1, 2, … , k, where k is the number of people in the room. We

ignore the issue of leap years and assume that all years have n = 365

days. For i = 1, 2, … , k, let bi be the day of the year on which person i’s birthday falls, where 1 ≤ bin. We also assume that birthdays are uniformly distributed across the n days of the year, so that Pr { bi = r} =

1/ n for i = 1, 2, … , k and r = 1, 2, … , n.

The probability that two given people, say i and j, have matching birthdays depends on whether the random selection of birthdays is

independent. We assume from now on that birthdays are independent,

so that the probability that i’s birthday and j’s birthday both fall on day r is

Image 265

Image 266

Image 267

Image 268

Thus, the probability that they both fall on the same day is

More intuitively, once bi is chosen, the probability that bj is chosen to be the same day is 1/ n. As long as the birthdays are independent, the

probability that i and j have the same birthday is the same as the probability that the birthday of one of them falls on a given day.

We can analyze the probability of at least 2 out of k people having

matching birthdays by looking at the complementary event. The

probability that at least two of the birthdays match is 1 minus the

probability that all the birthdays are different. The event Bk that k people have distinct birthdays is

where Ai is the event that person i’s birthday is different from person j’s for all j < i. Since we can write Bk = AkBk–1, we obtain from equation (C.18) on page 1189 the recurrence

where we take Pr { B 1} = Pr { A 1} = 1 as an initial condition. In other words, the probability that b 1, b 2, … , bk are distinct birthdays equals the probability that b 1, b 2, … , bk–1 are distinct birthdays multiplied by

Image 269

Image 270

Image 271

the probability that bkbi for i = 1, 2, … , k – 1, given that b 1, b 2, … , bk–1 are distinct.

If b 1, b 2, … , bk–1 are distinct, the conditional probability that bk

bi for i = 1, 2, … , k – 1 is Pr { Ak | Bk–1} = ( nk + 1)/ n, since out of the n days, n – ( k – 1) days are not taken. We iteratively apply the recurrence (5.8) to obtain

Inequality (3.14) on page 66, 1 + xex, gives us

when – k( k – 1)/2 n ≤ ln(1/2). The probability that all k birthdays are distinct is at most 1/2 when k( k – 1) ≥ 2 n ln 2 or, solving the quadratic equation, when

. For n = 365, we must have k

23. Thus, if at least 23 people are in a room, the probability is at least

1/2 that at least two people have the same birthday. Since a year on

Mars is 669 Martian days long, it takes 31 Martians to get the same

effect.

An analysis using indicator random variables

Indicator random variables afford a simpler but approximate analysis of

the birthday paradox. For each pair ( i, j) of the k people in the room,

Image 272

Image 273

Image 274

Image 275

define the indicator random variable Xij, for 1 ≤ i < jk, by By equation (5.7), the probability that two people have matching

birthdays is 1/ n, and thus by Lemma 5.1 on page 130, we have

E [ Xij] = Pr {person i and person j have the same birthday}

= 1/ n.

Letting X be the random variable that counts the number of pairs of

individuals having the same birthday, we have

Taking expectations of both sides and applying linearity of expectation,

we obtain

When k( k – 1) ≥ 2 n, therefore, the expected number of pairs of people with the same birthday is at least 1. Thus, if we have at least

individuals in a room, we can expect at least two to have the same

birthday. For n = 365, if k = 28, the expected number of pairs with the same birthday is (28 · 27)/(2 · 365) ≈ 1.0356. Thus, with at least 28

Image 276

people, we expect to find at least one matching pair of birthdays. On

Mars, with 669 days per year, we need at least 38 Martians.

The first analysis, which used only probabilities, determined the

number of people required for the probability to exceed 1/2 that a

matching pair of birthdays exists, and the second analysis, which used

indicator random variables, determined the number such that the

expected number of matching birthdays is 1. Although the exact

numbers of people differ for the two situations, they are the same

asymptotically:

.

5.4.2 Balls and bins

Consider a process in which you randomly toss identical balls into b

bins, numbered 1, 2, … , b. The tosses are independent, and on each toss the ball is equally likely to end up in any bin. The probability that a

tossed ball lands in any given bin is 1/ b. If we view the ball-tossing process as a sequence of Bernoulli trials (see Appendix C.4), where success means that the ball falls in the given bin, then each trial has a

probability 1/ b of success. This model is particularly useful for analyzing

hashing (see Chapter 11), and we can answer a variety of interesting questions about the ball-tossing process. (Problem C-2 asks additional

questions about balls and bins.)

How many balls fall in a given bin? The number of balls that fall in

a given bin follows the binomial distribution b( k; n, 1/ b). If you toss n balls, equation (C.41) on page 1199 tells us that the

expected number of balls that fall in the given bin is n/ b.

How many balls must you toss, on the average, until a given bin

contains a ball? The number of tosses until the given bin receives a

ball follows the geometric distribution with probability 1/ b and,

by equation (C.36) on page 1197, the expected number of tosses

until success is 1/(1/ b) = b.

How many balls must you toss until every bin contains at least one

ball? Let us call a toss in which a ball falls into an empty bin a

“hit.” We want to know the expected number n of tosses required

to get b hits.

Image 277

Image 278

Image 279

Using the hits, we can partition the n tosses into stages. The i th

stage consists of the tosses after the ( i – 1)st hit up to and

including the i th hit. The first stage consists of the first toss, since

you are guaranteed to have a hit when all bins are empty. For each

toss during the i th stage, i – 1 bins contain balls and bi + 1 bins are empty. Thus, for each toss in the i th stage, the probability of

obtaining a hit is ( bi + 1)/ b.

Let ni denote the number of tosses in the i th stage. The number of

tosses required to get b hits is

. Each random variable

ni has a geometric distribution with probability of success ( bi +

1)/ b and thus, by equation (C.36), we have

By linearity of expectation, we have

It therefore takes approximately b ln b tosses before we can expect

that every bin has a ball. This problem is also known as the

coupon collector’s problem, which says that if you are trying to

collect each of b different coupons, then you should expect to

acquire approximately b ln b randomly obtained coupons in order

to succeed.

Image 280

5.4.3 Streaks

Suppose that you flip a fair coin n times. What is the longest streak of

consecutive heads that you expect to see? We’ll prove upper and lower

bounds separately to show that the answer is Θ(lg n).

We first prove that the expected length of the longest streak of heads

is O(lg n). The probability that each coin flip is a head is 1/2. Let Aik be the event that a streak of heads of length at least k begins with the i th coin flip or, more precisely, the event that the k consecutive coin flips i, i

+ 1, … , i + k – 1 yield only heads, where 1 ≤ kn and 1 ≤ ink + 1.