a. T ( n) = 2 T ( n/4) + 1.
b. T ( n) = 2 T ( n/4) + .
c. T ( n) = 2 T ( n/4) +
.
d. T ( n) = 2 T ( n/4) + n.
e. T ( n) = 2 T ( n/4) + n 2.
4.5-2
Professor Caesar wants to develop a matrix-multiplication algorithm
that is asymptotically faster than Strassen’s algorithm. His algorithm
will use the divide-and-conquer method, dividing each matrix into n/4 ×
n/4 submatrices, and the divide and combine steps together will take
Θ( n 2) time. Suppose that the professor’s algorithm creates a recursive subproblems of size n/4. What is the largest integer value of a for which his algorithm could possibly run asymptotically faster than Strassen’s?
4.5-3
Use the master method to show that the solution to the binary-search
recurrence T ( n) = T ( n/2) + Θ(1) is T ( n) = Θ(lg n). (See Exercise 2.3-6
for a description of binary search.)
4.5-4
Consider the function f ( n) = lg n. Argue that although f ( n/2) < f ( n), the regularity condition af ( n/ b) ≤ cf ( n) with a = 1 and b = 2 does not hold for any constant c < 1. Argue further that for any ϵ > 0, the condition in case 3 that
does not hold.
4.5-5
Show that for suitable constants a, b, and ϵ, the function f ( n) = 2⌈lg n⌉
satisfies all the conditions in case 3 of the master theorem except the
regularity condition.
★ 4.6 Proof of the continuous master theorem
Proving the master theorem (Theorem 4.1) in its full generality,
especially dealing with the knotty technical issue of floors and ceilings,
is beyond the scope of this book. This section, however, states and
proves a variant of the master theorem, called the continuous master
theorem1 in which the master recurrence (4.17) is defined over sufficiently large positive real numbers. The proof of this version,
uncomplicated by floors and ceilings, contains the main ideas needed to
understand how master recurrences behave. Section 4.7 discusses floors and ceilings in divide-and-conquer recurrences at greater length,
presenting sufficient conditions for them not to affect the asymptotic
solutions.
Of course, since you need not understand the proof of the master
theorem in order to apply the master method, you may choose to skip
this section. But if you wish to study more-advanced algorithms beyond
the scope of this textbook, you may appreciate a better understanding
of the underlying mathematics, which the proof of the continuous
master theorem provides.
Although we usually assume that recurrences are algorithmic and
don’t require an explicit statement of a base case, we must be much
more careful for proofs that justify the practice. The lemmas and

theorem in this section explicitly state the base cases, because the
inductive proofs require mathematical grounding. It is common in the
world of mathematics to be extraordinarily careful proving theorems
that justify acting more casually in practice.
The proof of the continuous master theorem involves two lemmas.
Lemma 4.2 uses a slightly simplified master recurrence with a threshold
constant of n 0 = 1, rather than the more general n 0 > 0 threshold constant implied by the unstated base case. The lemma employs a
recursion tree to reduce the solution of the simplified master recurrence
to that of evaluating a summation. Lemma 4.3 then provides asymptotic
bounds for the summation, mirroring the three cases of the master
theorem. Finally, the continuous master theorem itself (Theorem 4.4)
gives asymptotic bounds for master recurrences, while generalizing to
an arbitrary threshold constant n 0 > 0 as implied by the unstated base
case.Some of the proofs use the properties described in Problem 3-5 on
pages 72–73 to combine and simplify complicated asymptotic
expressions. Although Problem 3-5 addresses only Θ-notation, the
properties enumerated there can be extended to O-notation and Ω-
notation as well.
Here’s the first lemma.
Lemma 4.2
Let a > 0 and b > 1 be constants, and let f ( n) be a function defined over real numbers n ≥ 1. Then the recurrence
has solution
Proof Consider the recursion tree in Figure 4.3. Let’s look first at its internal nodes. The root of the tree has cost f ( n), and it has a children,










each with cost f ( n/ b). (It is convenient to think of a as being an integer, especially when visualizing the recursion tree, but the mathematics does
not require it.) Each of these children has a children, making a 2 nodes at depth 2, and each of the a children has cost f ( n/ b 2). In general, there are aj nodes at depth j, and each node has cost f ( n/ bj).
Now, let’s move on to understanding the leaves. The tree grows
downward until n/ bj becomes less than 1. Thus, the tree has height ⌊log b n⌊ + 1, because
and
.
Since, as we have observed, the number of nodes at depth j is aj and all the leaves are at depth ⌊log b n⌊ + 1, the tree contains
leaves.
Using
the
identity
(3.21)
on
page
66,
we
have
, since a is constant, and
. Consequently, the total number of
leaves is
—asymptotically, the watershed function.
We are now in a position to derive equation (4.18) by summing the
costs of the nodes at each depth in the tree, as shown in the figure. The
first term in the equation is the total costs of the leaves. Since each leaf
is at depth ⌊log bn⌊ + 1 and
, the base case of the
recurrence gives the cost of a leaf:
. Hence the cost of all
leaves is
by Problem 3-5(d). The second term
in equation (4.18) is the cost of the internal nodes, which, in the
underlying divide-and-conquer algorithm, represents the costs of
dividing problems into subproblems and then recombining the
subproblems. Since the cost for all the internal nodes at depth j is aj f
( n/ bj), the total cost of all internal nodes is
▪

Figure 4.3 The recursion tree generated by T ( n) = aT ( n/ b) + f ( n). The tree is a complete a-ary tree with
leaves and height ⌊log b n⌊ + 1. The cost of the nodes at each depth is shown
at the right, and their sum is given in equation (4.18).
As we’ll see, the three cases of the master theorem depend on the
distribution of the total cost across levels of the recursion tree:
Case 1: The costs increase geometrically from the root to the leaves,
growing by a constant factor with each level.
Case 2: The costs depend on the value of k in the theorem. With k = 0,
the costs are equal for each level; with k = 1, the costs grow linearly
from the root to the leaves; with k = 2, the growth is quadratic; and in
general, the costs grow polynomially in k.
Case 3: The costs decrease geometrically from the root to the leaves,
shrinking by a constant factor with each level.
The summation in equation (4.18) describes the cost of the dividing
and combining steps in the underlying divide-and-conquer algorithm.
The next lemma provides asymptotic bounds on the summation’s
growth.







Lemma 4.3
Let a > 0 and b > 1 be constants, and let f ( n) be a function defined over real numbers n ≥ 1. Then the asymptotic behavior of the function
defined for n ≥ 1, can be characterized as follows:
1. If there exists a constant ϵ > 0 such that
, then
.
2. If there exists a constant k ≥ 0 such that
, then
.
3. If there exists a constant c in the range 0 < c < 1 such that 0 < af ( n/ b) ≤ cf ( n) for all n ≥ 1, then g( n) = Θ( f ( n)).
Proof For case 1, we have
, which implies that
. Substituting into equation (4.19) yields
the last series being geometric. Since b and ϵ are constants, the bϵ – 1
denominator doesn’t affect the asymptotic growth of g( n), and neither




does
the
–1
in
the
numerator.
Since
,
we
obtain
, thereby proving case 1.
Case 2 assumes that
, from which we can
conclude that
. Substituting into
equation (4.19) and repeatedly applying Problem 3-5(c) yields
The summation within the Θ-notation can be bounded from above as
follows:




Exercise 4.6-1 asks you to show that the summation can similarly be
bounded from below by
. Since we have tight upper and lower
bounds, the summation is
, from which we can conclude that
, thereby completing the proof of case 2.
For case 3, observe that f ( n) appears in the definition (4.19) of g( n) (when j = 0) and that all terms of g( n) are positive. Therefore, we must have g( n) = Ω( f ( n)), and it only remains to prove that g( n) = O( f ( n)).
Performing j iterations of the inequality af ( n/ b) ≤ cf ( n) yields aj f ( n/ bj)
≤ cj f ( n). Substituting into equation (4.19), we obtain
Thus, we can conclude that g( n) = Θ( f ( n)). With case 3 proved, the entire proof of the lemma is complete.
▪
We can now state and prove the continuous master theorem.





Theorem 4.4 (Continuous master theorem)
Let a > 0 and b > 1 be constants, and let f ( n) be a driving function that is defined and nonnegative on all sufficiently large reals. Define the
algorithmic recurrence T ( n) on the positive real numbers by
T ( n) = aT ( n/ b) + f ( n).
Then the asymptotic behavior of T ( n) can be characterized as follows:
1. If there exists a constant ϵ > 0 such that
, then
.
2. If there exists a constant k ≥ 0 such that
, then
.
3. If there exists a constant ϵ > 0 such that
, and if
f ( n) additionally satisfies the regularity condition af ( n/ b) ≤ cf ( n) for some constant c < 1 and all sufficiently large n, then T ( n) =
Θ( f ( n)).
Proof The idea is to bound the summation (4.18) from Lemma 4.2 by
applying Lemma 4.3. But we must account for Lemma 4.2 using a base
case for 0 < n < 1, whereas this theorem uses an implicit base case for 0
< n < n 0, where n 0 > 0 is an arbitrary threshold constant. Since the recurrence is algorithmic, we can assume that f ( n) is defined for n ≥ n 0.
For n > 0, let us define two auxiliary functions T′( n) = T ( n 0 n) and f ′( n) = f ( n 0 n). We have
We have obtained a recurrence for T ′( n) that satisfies the conditions of Lemma 4.2, and by that lemma, the solution is







To solve T ′( n), we first need to bound f ′( n). Let’s examine the individual cases in the theorem.
The condition for case 1 is
for some constant ϵ > 0.
We have
since a, b, n 0, and ϵ are all constant. The function f ′( n) satisfies the conditions of case 1 of Lemma 4.3, and the summation in equation
(4.18) of Lemma 4.2 evaluates to
. Because a, b and n 0 are all
constants, we have
thereby completing case 1 of the theorem.
The condition for case 2 is
for some constant k ≥
0. We have
Similar to the proof of case 1, the function f′( n) satisfies the conditions of case 2 of Lemma 4.3. The summation in equation (4.18) of Lemma
4.2 is therefore
, which implies that



which proves case 2 of the theorem.
Finally, the condition for case 3 is
for some
constant ϵ > 0 and f ( n) additionally satisfies the regularity condition af ( n/ b) ≤ cf ( n) for all n ≥ n 0 and some constants c < 1 and n 0 > 1. The first part of case 3 is like case 1:
Using the definition of f ′( n) and the fact that n 0 n ≥ n 0 for all n ≥ 1, we have for n ≥ 1 that
af ′( n/ b) = af ( n 0 n/ b)
≤ cf ( n 0 n)
= cf ′( n).
Thus f ′( n) satisfies the requirements for case 3 of Lemma 4.3, and the summation in equation (4.18) of Lemma 4.2 evaluates to Θ( f ′( n)), yielding
T ( n) = T ′( n/ n 0)
=
= Θ( f ′( n/ n 0))
= Θ( f ( n)),
which completes the proof of case 3 of the theorem and thus the whole
theorem.
▪






Exercises
4.6-1
Show that
.
★ 4.6-2
Show that case 3 of the master theorem is overstated (which is also why
case 3 of Lemma 4.3 does not require that
in the
sense that the regularity condition af ( n/ b) ≤ cf ( n) for some constant c < 1 implies that there exists a constant ϵ > 0 such that
.
★ 4.6-3
For
, prove that the summation in equation (4.19)
has solution
. Conclude that a master recurrence T
( n) using f ( n) as its driving function has solution
.
This section provides an overview of two advanced topics related to
divide-and-conquer recurrences. The first deals with technicalities
arising from the use of floors and ceilings, and the second discusses the
Akra-Bazzi method, which involves a little calculus, for solving
complicated divide-and-conquer recurrences.
In particular, we’ll look at the class of algorithmic divide-and-
conquer recurrences originally studied by M. Akra and L. Bazzi [13].
These Akra-Bazzi recurrences take the form
where k is a positive integer; all the constants a 1, a 2, … , ak ∈ R are strictly positive; all the constants b 1, b 2, … , bk ∈ R are strictly greater than 1; and the driving function f ( n) is defined on sufficiently large nonnegative reals and is itself nonnegative.
Akra-Bazzi recurrences generalize the class of recurrences addressed
by the master theorem. Whereas master recurrences characterize the
running times of divide-and-conquer algorithms that break a problem
into equal-sized subproblems (modulo floors and ceilings), Akra-Bazzi
recurrences can describe the running time of divide-and-conquer
algorithms that break a problem into different-sized subproblems. The
master theorem, however, allows you to ignore floors and ceilings, but
the Akra-Bazzi method for solving Akra-Bazzi recurrences needs an
additional requirement to deal with floors and ceilings.
But before diving into the Akra-Bazzi method itself, let’s understand
the limitations involved in ignoring floors and ceilings in Akra-Bazzi
recurrences. As you’re aware, algorithms generally deal with integer-
sized inputs. The mathematics for recurrences is often easier with real
numbers, however, than with integers, where we must cope with floors
and ceilings to ensure that terms are well defined. The difference may
not seem to be much—especially because that’s often the truth with
recurrences—but to be mathematically correct, we must be careful with
our assumptions. Since our end goal is to understand algorithms and
not the vagaries of mathematical corner cases, we’d like to be casual yet
rigorous. How can we treat floors and ceilings casually while still
ensuring rigor?
From a mathematical point of view, the difficulty in dealing with
floors and ceilings is that some driving functions can be really, really
weird. So it’s not okay in general to ignore floors and ceilings in Akra-
Bazzi recurrences. Fortunately, most of the driving functions we
encounter in the study of algorithms behave nicely, and floors and
ceilings don’t make a difference.
The polynomial-growth condition
If the driving function f ( n) in equation (4.22) is well behaved in the following sense, it’s okay to drop floors and ceilings.
A function f ( n) defined on all sufficiently large positive reals satisfies the polynomial-growth condition if there exists a
constant
such that the following holds: for every constant
ϕ ≥ 1, there exists a constant d > 1 (depending on ϕ) such that f ( n)/ d ≤ f ( ψ n) ≤ df ( n) for all 1 ≤ ψ ≤ ϕ and
.
This definition may be one of the hardest in this textbook to get your
head around. To a first order, it says that f ( n) satisfies the property that f (Θ( n)) = Θ( f ( n)), although the polynomial-growth condition is actually somewhat stronger (see Exercise 4.7-4). The definition also
implies that f ( n) is asymptotically positive (see Exercise 4.7-3).
Examples of functions that satisfy the polynomial-growth condition
include any function of the form f ( n) = Θ( nα lg β n lg lg γn), where α, β, and γ are constants. Most of the polynomially bounded functions used
in this book satisfy the condition. Exponentials and superexponentials
do not (see Exercise 4.7-2, for example), and there also exist
polynomially bounded functions that do not.
Floors and ceilings in “nice” recurrences
When the driving function in an Akra-Bazzi recurrence satisfies the
polynomial-growth condition, floors and ceilings don’t change the
asymptotic behavior of the solution. The following theorem, which is
presented without proof, formalizes this notion.
Theorem 4.5
Let T ( n) be a function defined on the nonnegative reals that satisfies recurrence (4.22), where f ( n) satisfies the polynomial-growth condition.
Let T ′( n) be another function defined on the natural numbers also satisfying recurrence (4.22), except that each T ( n/ bi) is replaced either with T (⌈ n/ bi⌉) or with T (⌊ n/ bi⌊). Then we have T ′( n) = Θ( T ( n)).
▪
Floors and ceilings represent a minor perturbation to the arguments
in the recursion. By inequality (3.2) on page 64, they perturb an
argument by at most 1. But much larger perturbations are tolerable. As
long as the driving function f ( n) in recurrence (4.22) satisfies the polynomial-growth condition, it turns out that replacing any term T



( n/ bi) with T ( n/ bi + hi( n)), where | hi( n)| = O( n/lg1+ ϵ n) for some constant ϵ > 0 and sufficiently large n, leaves the asymptotic solution unaffected. Thus, the divide step in a divide-and-conquer algorithm can
be moderately coarse without affecting the solution to its running-time
recurrence.
The Akra-Bazzi method
The Akra-Bazzi method, not surprisingly, was developed to solve Akra-
Bazzi recurrences (4.22), which by dint of Theorem 4.5, applies in the
presence of floors and ceilings or even larger perturbations, as just
discussed. The method involves first determining the unique real
number p such that
. Such a p always exists, because when
p → –∞, the sum goes to ∞; it decreases as p increases; and when p → ∞, it goes to 0. The Akra-Bazzi method then gives the solution to the
recurrence as
As an example, consider the recurrence
We’ll see the similar recurrence (9.1) on page 240 when we study an
algorithm for selecting the i th smallest element from a set of n numbers.
This recurrence has the form of equation (4.22), where a 1 = a 2 = 1, b 1 =
5, b 2 = 10/7, and f ( n) = n. To solve it, the Akra-Bazzi method says that we should determine the unique p satisfying
Solving for p is kind of messy—it turns out that p = 0.83978 …—but we
can solve the recurrence without actually knowing the exact value for p.
Observe that (1/5)0 + (7/10)0 = 2 and (1/5)1 + (7/10)1 = 9/10, and thus p
lies in the range 0 < p < 1. That turns out to be sufficient for the Akra-

Bazzi method to give us the solution. We’ll use the fact from calculus
that if k ≠ –1, then ∫ xkdx = xk + 1/( k + 1), which we’ll apply with k = –
p ≠ –1. The Akra-Bazzi solution (4.23) gives us
Although the Akra-Bazzi method is more general than the master
theorem, it requires calculus and sometimes a bit more reasoning. You
also must ensure that your driving function satisfies the polynomial-
growth condition if you want to ignore floors and ceilings, although
that’s rarely a problem. When it applies, the master method is much
simpler to use, but only when subproblem sizes are more or less equal.
They are both good tools for your algorithmic toolkit.
Exercises
★ 4.7-1
Consider an Akra-Bazzi recurrence T ( n) on the reals as given in recurrence (4.22), and define T ′( n) as
where c > 0 is constant. Prove that whatever the implicit initial conditions for T ( n) might be, there exist initial conditions for T ′( n) such that T ′( n) = cT ( n) for all n > 0. Conclude that we can drop the
asymptotics on a driving function in any Akra-Bazzi recurrence without affecting its asymptotic solution.
4.7-2
Show that f ( n) = n 2 satisfies the polynomial-growth condition but that f ( n) = 2 n does not.
4.7-3
Let f ( n) be a function that satisfies the polynomial-growth condition.
Prove that f ( n) is asymptotically positive, that is, there exists a constant n 0 ≥ 0 such that f ( n) ≥ 0 for all n ≥ n 0.
★ 4.7-4
Give an example of a function f ( n) that does not satisfy the polynomial-growth condition but for which f (Θ( n)) = Θ( f ( n)).
4.7-5
Use the Akra-Bazzi method to solve the following recurrences.
a. T ( n) = T ( n/2) + T ( n/3) + T ( n/6) + n lg n.
b. T ( n) = 3 T ( n/3) + 8 T ( n/4) + n 2/lg n.
c. T ( n) = (2/3) T ( n/3) + (1/3) T (2 n/3) + lg n.
d. T ( n) = (1/3) T ( n/3) + 1/ n.
e. T ( n) = 3 T ( n/3) + 3 T (2 n/3) + n 2.
★ 4.7-6
Use the Akra-Bazzi method to prove the continuous master theorem.
Problems
4-1 Recurrence examples
Give asymptotically tight upper and lower bounds for T ( n) in each of
the following algorithmic recurrences. Justify your answers.
a. T ( n) = 2 T ( n/2) + n 3.
b. T ( n) = T (8 n/11) + n.
c. T ( n) = 16 T ( n/4) + n 2.
d. T ( n) = 4 T ( n/2) + n 2 lg n.
e. T ( n) = 8 T ( n/3) + n 2.
f. T ( n) = 7 T ( n/2) + n 2 lg n.
g.
.
h. T ( n) = T ( n –2) + n 2.
4-2 Parameter-passing costs
Throughout this book, we assume that parameter passing during
procedure calls takes constant time, even if an N-element array is being
passed. This assumption is valid in most systems because a pointer to
the array is passed, not the array itself. This problem examines the
implications of three parameter-passing strategies:
1. Arrays are passed by pointer. Time = Θ(1).
2. Arrays are passed by copying. Time = Θ( N), where N is the size
of the array.
3. Arrays are passed by copying only the subrange that might be
accessed by the called procedure. Time = Θ( n) if the subarray
contains n elements.
Consider the following three algorithms:
a. The recursive binary-search algorithm for finding a number in a
sorted array (see Exercise 2.3-6).
b. The MERGE-SORT procedure from Section 2.3.1.
c. The MATRIX-MULTIPLY-RECURSIVE procedure from Section



Give nine recurrences Ta 1( N, n), Ta 2( N, n), … , Tc 3( N, n) for the worst-case running times of each of the three algorithms above when arrays
and matrices are passed using each of the three parameter-passing
strategies above. Solve your recurrences, giving tight asymptotic bounds.
4-3 Solving recurrences with a change of variables
Sometimes, a little algebraic manipulation can make an unknown
recurrence similar to one you have seen before. Let’s solve the recurrence
by using the change-of-variables method.
a. Define m = lg n and S( m) = T (2 m). Rewrite recurrence (4.25) in terms of m and S( m).
b. Solve your recurrence for S( m).
c. Use your solution for S( m) to conclude that T ( n) = Θ(lg n lg lg n).
d. Sketch the recursion tree for recurrence (4.25), and use it to explain
intuitively why the solution is T ( n) = Θ(lg n lg lg n).
Solve the following recurrences by changing variables:
e.
.
f.
.
4-4 More recurrence examples
Give asymptotically tight upper and lower bounds for T ( n) in each of
the following recurrences. Justify your answers.
a. T ( n) = 5 T ( n/3) + n lg n.
b. T ( n) = 3 T ( n/3) + n/lg n.
c.
.
d. T ( n) = 2 T ( n/2 –2) + n/2.
e. T ( n) = 2 T ( n/2) + n/lg n.





f. T ( n) = T ( n/2) + T ( n/4) + T ( n/8) + n.
g. T ( n) = T ( n – 1) + 1/ n.
h. T ( n) = T ( n – 1) + lg n.
i. T ( n) = T ( n – 2) + 1/lg n.
j.
.
4-5 Fibonacci numbers
This problem develops properties of the Fibonacci numbers, which are
defined by recurrence (3.31) on page 69. We’ll explore the technique of
generating functions to solve the Fibonacci recurrence. Define the
generating function (or formal power series) F as
where Fi is the i th Fibonacci number.
a. Show that F ( z) = z + z F ( z) + z 2F ( z).
b. Show that
where ϕ is the golden ratio, and is its conjugate (see page 69).
c. Show that
You may use without proof the generating-function version of
equation (A.7) on page 1142,
. Because this
equation involves a generating function, x is a formal variable, not a

real-valued variable, so that you don’t have to worry about
convergence of the summation or about the requirement in equation
(A.7) that | x| < 1, which doesn’t make sense here.
d. Use part (c) to prove that
for i > 0, rounded to the nearest
integer. ( Hint: Observe that
.)
e. Prove that Fi+2 ≥ ϕi for i ≥ 0.
4-6 Chip testing
Professor Diogenes has n supposedly identical integrated-circuit chips
that in principle are capable of testing each other. The professor’s test jig
accommodates two chips at a time. When the jig is loaded, each chip
tests the other and reports whether it is good or bad. A good chip
always reports accurately whether the other chip is good or bad, but the
professor cannot trust the answer of a bad chip. Thus, the four possible
outcomes of a test are as follows:
Chip A says Chip B says Conclusion
B is good
A is good
both are good, or both are bad
B is good
A is bad
at least one is bad
B is bad
A is good
at least one is bad
B is bad
A is bad
at least one is bad
a. Show that if at least n/2 chips are bad, the professor cannot
necessarily determine which chips are good using any strategy based
on this kind of pairwise test. Assume that the bad chips can conspire
to fool the professor.
Now you will design an algorithm to identify which chips are good and
which are bad, assuming that more than n/2 of the chips are good. First,
you will determine how to identify one good chip.
b. Show that ⌊ n/2⌊ pairwise tests are sufficient to reduce the problem to one of nearly half the size. That is, show how to use ⌊ n/2⌊ pairwise
tests to obtain a set with at most ⌈ n/2⌉ chips that still has the property that more than half of the chips are good.
c. Show how to apply the solution to part (b) recursively to identify one
good chip. Give and solve the recurrence that describes the number of
tests needed to identify one good chip.
You have now determined how to identify one good chip.
d. Show how to identify all the good chips with an additional Θ( n)
pairwise tests.
4-7 Monge arrays
An m × n array A of real numbers is a Monge array if for all i, j, k, and l such that 1 ≤ i < k ≤ m and 1 ≤ j < l ≤ n, we have A[ i, j] + A[ k, l] ≤ A[ i, l] + A[ k, j].
In other words, whenever we pick two rows and two columns of a
Monge array and consider the four elements at the intersections of the
rows and the columns, the sum of the upper-left and lower-right
elements is less than or equal to the sum of the lower-left and upper-
right elements. For example, the following array is Monge:
10 17 13 28 23
17 22 16 29 23
24 28 22 34 24
11 13 6 17 7
45 44 32 37 23
36 33 19 21 6
75 66 51 53 34
a. Prove that an array is Monge if and only if for all i = 1, 2, …, m – 1
and j = 1, 2, …, n – 1, we have
A[ i, j] + A[ i + 1, j + 1] ≤ A[ i, j + 1] + A[ i + 1, j].
( Hint: For the “if” part, use induction separately on rows and
columns.)
b. The following array is not Monge. Change one element in order to make it Monge. ( Hint: Use part (a).)
37
23
22
32
21
6
7
10
53
34
30
31
32
13
9
6
43
21
15
8
c. Let f ( i) be the index of the column containing the leftmost minimum element of row i. Prove that f (1) ≤ f (2) ≤ ⋯ ≤ f ( m) for any m × n Monge array.
d. Here is a description of a divide-and-conquer algorithm that
computes the leftmost minimum element in each row of an m × n
Monge array A:
Construct a submatrix A′ of A consisting of the even-numbered
rows of A. Recursively determine the leftmost minimum for
each row of A′. Then compute the leftmost minimum in the
odd-numbered rows of A.
Explain how to compute the leftmost minimum in the odd-numbered
rows of A (given that the leftmost minimum of the even-numbered
rows is known) in O( m + n) time.
e. Write the recurrence for the running time of the algorithm in part (d).
Show that its solution is O( m + n log m).
Chapter notes
Divide-and-conquer as a technique for designing algorithms dates back
at least to 1962 in an article by Karatsuba and Ofman [242], but it might have been used well before then. According to Heideman, Johnson, and
Burrus [211], C. F. Gauss devised the first fast Fourier transform algorithm in 1805, and Gauss’s formulation breaks the problem into
smaller subproblems whose solutions are combined.
Strassen’s algorithm [424] caused much excitement when it appeared in 1969. Before then, few imagined the possibility of an algorithm
asymptotically faster than the basic MATRIX-MULTIPLY procedure.
Shortly thereafter, S. Winograd reduced the number of submatrix
additions from 18 to 15 while still using seven submatrix multiplications.
This improvement, which Winograd apparently never published (and
which is frequently miscited in the literature), may enhance the
practicality of the method, but it does not affect its asymptotic
performance. Probert [368] described Winograd’s algorithm and showed that with seven multiplications, 15 additions is the minimum possible.
Strassen’s Θ( n lg 7) = O( n 2.81) bound for matrix multiplication held until 1987, when Coppersmith and Winograd [103] made a significant advance, improving the bound to O( n 2.376) time with a mathematically
sophisticated but wildly impractical algorithm based on tensor
products. It took approximately 25 years before the asymptotic upper
bound was again improved. In 2012 Vassilevska Williams [445]
improved it to O( n 2.37287), and two years later Le Gall [278] achieved O( n 2.37286), both of them using mathematically fascinating but
impractical algorithms. The best lower bound to date is just the obvious
Ω( n 2) bound (obvious because any algorithm for matrix multiplication
must fill in the n 2 elements of the product matrix).
The performance of MATRIX-MULTIPLY-RECURSIVE can be
improved in practice by coarsening the leaves of the recursion. It also
exhibits better cache behavior than MATRIX-MULTIPLY, although
MATRIX-MULTIPLY can be improved by “tiling.” Leiserson et al.
[293] conducted a performance-engineering study of matrix
multiplication in which a parallel and vectorized divide-and-conquer
algorithm achieved the highest performance. Strassen’s algorithm can be
practical for large dense matrices, although large matrices tend to be
sparse, and sparse methods can be much faster. When using limited-
precision floating-point values, Strassen’s algorithm produces larger
numerical errors than the Θ( n 3) algorithms do, although Higham [215]
demonstrated that Strassen’s algorithm is amply accurate for some applications.
Recurrences were studied as early as 1202 by Leonardo Bonacci [66], also known as Fibonacci, for whom the Fibonacci numbers are named,
although Indian mathematicians had discovered Fibonacci numbers
centuries before. The French mathematician De Moivre [108]
introduced the method of generating functions with which he studied
Fibonacci numbers (see Problem 4-5). Knuth [259] and Liu [302] are good resources for learning the method of generating functions.
Aho, Hopcroft, and Ullman [5, 6] offered one of the first general methods for solving recurrences arising from the analysis of divide-and-conquer algorithms. The master method was adapted from Bentley,
Haken, and Saxe [52]. The Akra-Bazzi method is due (unsurprisingly) to Akra and Bazzi [13]. Divide-and-conquer recurrences have been studied by many researchers, including Campbell [79], Graham, Knuth, and Patashnik [199], Kuszmaul and Leiserson [274], Leighton [287], Purdom and Brown [371], Roura [389], Verma [447], and Yap [462].
The issue of floors and ceilings in divide-and-conquer recurrences,
including a theorem similar to Theorem 4.5, was studied by Leighton
[287]. Leighton proposed a version of the polynomial-growth condition.
Campbell [79] removed several limitations in Leighton’s statement of it and showed that there were polynomially bounded functions that do
not satisfy Leighton’s condition. Campbell also carefully studied many
other technical issues, including the well-definedness of divide-and-
conquer recurrences. Kuszmaul and Leiserson [274] provided a proof of Theorem 4.5 that does not involve calculus or other higher math. Both
Campbell and Leighton explored the perturbations of arguments
beyond simple floors and ceilings.
1 This terminology does not mean that either T ( n) or f ( n) need be continuous, only that the domain of T ( n) is the real numbers, as opposed to integers.
5 Probabilistic Analysis and Randomized
This chapter introduces probabilistic analysis and randomized
algorithms. If you are unfamiliar with the basics of probability theory,
you should read Sections C.1–C.4 of Appendix C, which review this material. We’ll revisit probabilistic analysis and randomized algorithms
several times throughout this book.
Suppose that you need to hire a new office assistant. Your previous
attempts at hiring have been unsuccessful, and you decide to use an
employment agency. The employment agency sends you one candidate
each day. You interview that person and then decide either to hire that
person or not. You must pay the employment agency a small fee to
interview an applicant. To actually hire an applicant is more costly,
however, since you must fire your current office assistant and also pay a
substantial hiring fee to the employment agency. You are committed to
having, at all times, the best possible person for the job. Therefore, you
decide that, after interviewing each applicant, if that applicant is better
qualified than the current office assistant, you will fire the current office
assistant and hire the new applicant. You are willing to pay the resulting
price of this strategy, but you wish to estimate what that price will be.
The procedure HIRE-ASSISTANT on the facing page expresses this
strategy for hiring in pseudocode. The candidates for the office assistant
job are numbered 1 through n and interviewed in that order. The procedure assumes that after interviewing candidate i, you can
determine whether candidate i is the best candidate you have seen so far.
It starts by creating a dummy candidate, numbered 0, who is less
qualified than each of the other candidates.
The cost model for this problem differs from the model described in
Chapter 2. We focus not on the running time of HIRE-ASSISTANT, but instead on the fees paid for interviewing and hiring. On the surface,
analyzing the cost of this algorithm may seem very different from
analyzing the running time of, say, merge sort. The analytical
techniques used, however, are identical whether we are analyzing cost or
running time. In either case, we are counting the number of times
certain basic operations are executed.
HIRE-ASSISTANT( n)
1 best = 0 // candidate 0 is a least-qualified dummy candidate
2 for i = 1 to n
3
interview candidate i
4
if candidate i is better than candidate best
5
best = i
6
hire candidate i
Interviewing has a low cost, say ci, whereas hiring is expensive,
costing ch. Letting m be the number of people hired, the total cost associated with this algorithm is O( cin + chm). No matter how many people you hire, you always interview n candidates and thus always
incur the cost cin associated with interviewing. We therefore concentrate
on analyzing chm, the hiring cost. This quantity depends on the order in
which you interview candidates.
This scenario serves as a model for a common computational
paradigm. Algorithms often need to find the maximum or minimum
value in a sequence by examining each element of the sequence and
maintaining a current “winner.” The hiring problem models how often
a procedure updates its notion of which element is currently winning.
In the worst case, you actually hire every candidate that you interview.
This situation occurs if the candidates come in strictly increasing order
of quality, in which case you hire n times, for a total hiring cost of O( chn).
Of course, the candidates do not always come in increasing order of
quality. In fact, you have no idea about the order in which they arrive,
nor do you have any control over this order. Therefore, it is natural to
ask what we expect to happen in a typical or average case.
Probabilistic analysis
Probabilistic analysis is the use of probability in the analysis of
problems. Most commonly, we use probabilistic analysis to analyze the
running time of an algorithm. Sometimes we use it to analyze other
quantities, such as the hiring cost in procedure HIRE-ASSISTANT. In
order to perform a probabilistic analysis, we must use knowledge of, or
make assumptions about, the distribution of the inputs. Then we
analyze our algorithm, computing an average-case running time, where
we take the average, or expected value, over the distribution of the
possible inputs. When reporting such a running time, we refer to it as
the average-case running time.
You must be careful in deciding on the distribution of inputs. For
some problems, you may reasonably assume something about the set of
all possible inputs, and then you can use probabilistic analysis as a
technique for designing an efficient algorithm and as a means for
gaining insight into a problem. For other problems, you cannot
characterize a reasonable input distribution, and in these cases you
cannot use probabilistic analysis.
For the hiring problem, we can assume that the applicants come in a
random order. What does that mean for this problem? We assume that
you can compare any two candidates and decide which one is better
qualified, which is to say that there is a total order on the candidates.
(See Section B.2 for the definition of a total order.) Thus, you can rank each candidate with a unique number from 1 through n, using rank( i) to denote the rank of applicant i, and adopt the convention that a higher
rank corresponds to a better qualified applicant. The ordered list
〈 rank(1), rank(2), … , rank( n)〉 is a permutation of the list 〈1, 2, … , n〉.
Saying that the applicants come in a random order is equivalent to
saying that this list of ranks is equally likely to be any one of the n!
permutations of the numbers 1 through n. Alternatively, we say that the
ranks form a uniform random permutation, that is, each of the possible n!
permutations appears with equal probability.
Section 5.2 contains a probabilistic analysis of the hiring problem.
Randomized algorithms
In order to use probabilistic analysis, you need to know something
about the distribution of the inputs. In many cases, you know little
about the input distribution. Even if you do know something about the
distribution, you might not be able to model this knowledge
computationally. Yet, probability and randomness often serve as tools
for algorithm design and analysis, by making part of the algorithm
behave randomly.
In the hiring problem, it may seem as if the candidates are being
presented to you in a random order, but you have no way of knowing
whether they really are. Thus, in order to develop a randomized
algorithm for the hiring problem, you need greater control over the
order in which you’ll interview the candidates. We will, therefore,
change the model slightly. The employment agency sends you a list of
the n candidates in advance. On each day, you choose, randomly, which
candidate to interview. Although you know nothing about the
candidates (besides their names), we have made a significant change.
Instead of accepting the order given to you by the employment agency
and hoping that it’s random, you have instead gained control of the
process and enforced a random order.
More generally, we call an algorithm randomized if its behavior is
determined not only by its input but also by values produced by a
random-number generator. We assume that we have at our disposal a
random-number generator RANDOM. A call to RANDOM( a, b)
returns an integer between a and b, inclusive, with each such integer being equally likely. For example, RANDOM(0, 1) produces 0 with
probability 1/2, and it produces 1 with probability 1/2. A call to RANDOM(3, 7) returns any one of 3, 4, 5, 6, or 7, each with
probability 1/5. Each integer returned by RANDOM is independent of
the integers returned on previous calls. You may imagine RANDOM as
rolling a ( b – a + 1)-sided die to obtain its output. (In practice, most programming environments offer a pseudorandom-number generator: a
deterministic algorithm returning numbers that “look” statistically
random.)
When analyzing the running time of a randomized algorithm, we
take the expectation of the running time over the distribution of values
returned by the random number generator. We distinguish these
algorithms from those in which the input is random by referring to the
running time of a randomized algorithm as an expected running time. In
general, we discuss the average-case running time when the probability
distribution is over the inputs to the algorithm, and we discuss the
expected running time when the algorithm itself makes random choices.
Exercises
5.1-1
Show that the assumption that you are always able to determine which
candidate is best, in line 4 of procedure HIRE-ASSISTANT, implies
that you know a total order on the ranks of the candidates.
★ 5.1-2
Describe an implementation of the procedure RANDOM( a, b) that
makes calls only to RANDOM(0, 1). What is the expected running time
of your procedure, as a function of a and b?
★ 5.1-3
You wish to implement a program that outputs 0 with probability 1/2
and 1 with probability 1/2. At your disposal is a procedure BIASED-
RANDOM that outputs either 0 or 1, but it outputs 1 with some
probability p and 0 with probability 1 – p, where 0 < p < 1. You do not know what p is. Give an algorithm that uses BIASED-RANDOM as a
subroutine, and returns an unbiased answer, returning 0 with

probability 1/2 and 1 with probability 1/2. What is the expected running
time of your algorithm as a function of p?
5.2 Indicator random variables
In order to analyze many algorithms, including the hiring problem, we
use indicator random variables. Indicator random variables provide a
convenient method for converting between probabilities and
expectations. Given a sample space S and an event A, the indicator random variable I { A} associated with event A is defined as
As a simple example, let us determine the expected number of heads
obtained when flipping a fair coin. The sample space for a single coin
flip is S = { H, T}, with Pr { H} = Pr { T} = 1/2. We can then define an indicator random variable XH, associated with the coin coming up
heads, which is the event H. This variable counts the number of heads
obtained in this flip, and it is 1 if the coin comes up heads and 0
otherwise. We write
The expected number of heads obtained in one flip of the coin is simply
the expected value of our indicator variable XH:
E [ XH] = E [I { H}]
= 1 · Pr { H} + 0 · Pr { T}
= 1 · (1/2) + 0 · (1/2)
= 1/2.
Thus the expected number of heads obtained by one flip of a fair coin is
1/2. As the following lemma shows, the expected value of an indicator
random variable associated with an event A is equal to the probability
that A occurs.
Lemma 5.1
Given a sample space S and an event A in the sample space S, let XA =
I { A}. Then E [ XA] = Pr { A}.
Proof By the definition of an indicator random variable from equation
(5.1) and the definition of expected value, we have
E [ XA] = E [I { A}]
= 1 · Pr { A} + 0 · Pr { A}
= Pr { A},
where A denotes S – A, the complement of A.
▪
Although indicator random variables may seem cumbersome for an
application such as counting the expected number of heads on a flip of a
single coin, they are useful for analyzing situations that perform
repeated random trials. In Appendix C, for example, indicator random variables provide a simple way to determine the expected number of
heads in n coin flips. One option is to consider separately the probability
of obtaining 0 heads, 1 head, 2 heads, etc. to arrive at the result of
equation (C.41) on page 1199. Alternatively, we can employ the simpler
method proposed in equation (C.42), which uses indicator random
variables implicitly. Making this argument more explicit, let Xi be the indicator random variable associated with the event in which the i th flip
comes up heads: Xi = I {the i th flip results in the event H}. Let X be the random variable denoting the total number of heads in the n coin flips,
so that


In order to compute the expected number of heads, take the expectation
of both sides of the above equation to obtain
By Lemma 5.1, the expectation of each of the random variables is E [ Xi]
= 1/2 for i = 1, 2, … , n. Then we can compute the sum of the expectations:
. But equation (5.2) calls for the
expectation of the sum, not the sum of the expectations. How can we
resolve this conundrum? Linearity of expectation, equation (C.24) on
page 1192, to the rescue: the expectation of the sum always equals the
sum of the expectations. Linearity of expectation applies even when
there is dependence among the random variables. Combining indicator
random variables with linearity of expectation gives us a powerful
technique to compute expected values when multiple events occur. We
now can compute the expected number of heads:
Thus, compared with the method used in equation (C.41), indicator
random variables greatly simplify the calculation. We use indicator
random variables throughout this book.
Analysis of the hiring problem using indicator random variables
Returning to the hiring problem, we now wish to compute the expected
number of times that you hire a new office assistant. In order to use a
probabilistic analysis, let’s assume that the candidates arrive in a
random order, as discussed in Section 5.1. (We’ll see in Section 5.3 how


to remove this assumption.) Let X be the random variable whose value
equals the number of times you hire a new office assistant. We could
then apply the definition of expected value from equation (C.23) on
page 1192 to obtain
but this calculation would be cumbersome. Instead, let’s simplify the
calculation by using indicator random variables.
To use indicator random variables, instead of computing E [ X] by
defining just one variable denoting the number of times you hire a new
office assistant, think of the process of hiring as repeated random trials
and define n variables indicating whether each particular candidate is hired. In particular, let Xi be the indicator random variable associated
with the event in which the i th candidate is hired. Thus,
and
Lemma 5.1 gives
E [ Xi] = Pr {candidate i is hired},
and we must therefore compute the probability that lines 5–6 of HIRE-
ASSISTANT are executed.
Candidate i is hired, in line 6, exactly when candidate i is better than each of candidates 1 through i – 1. Because we have assumed that the
candidates arrive in a random order, the first i candidates have appeared
in a random order. Any one of these first i candidates is equally likely to
be the best qualified so far. Candidate i has a probability of 1/ i of being better qualified than candidates 1 through i – 1 and thus a probability of
1/ i of being hired. By Lemma 5.1, we conclude that

Now we can compute E [ X]:
Even though you interview n people, you actually hire only
approximately ln n of them, on average. We summarize this result in the
following lemma.
Lemma 5.2
Assuming that the candidates are presented in a random order,
algorithm HIRE-ASSISTANT has an average-case total hiring cost of
O( ch ln n).
Proof The bound follows immediately from our definition of the hiring
cost and equation (5.6), which shows that the expected number of hires
is approximately ln n.
▪
The average-case hiring cost is a significant improvement over the
worst-case hiring cost of O( chn).
Exercises
5.2-1
In HIRE-ASSISTANT, assuming that the candidates are presented in a
random order, what is the probability that you hire exactly one time?
What is the probability that you hire exactly n times?
In HIRE-ASSISTANT, assuming that the candidates are presented in a
random order, what is the probability that you hire exactly twice?
5.2-3
Use indicator random variables to compute the expected value of the
sum of n dice.
5.2-4
This exercise asks you to (partly) verify that linearity of expectation
holds even if the random variables are not independent. Consider two 6-
sided dice that are rolled independently. What is the expected value of
the sum? Now consider the case where the first die is rolled normally
and then the second die is set equal to the value shown on the first die.
What is the expected value of the sum? Now consider the case where the
first die is rolled normally and the second die is set equal to 7 minus the
value of the first die. What is the expected value of the sum?
5.2-5
Use indicator random variables to solve the following problem, which is
known as the hat-check problem. Each of n customers gives a hat to a
hat-check person at a restaurant. The hat-check person gives the hats
back to the customers in a random order. What is the expected number
of customers who get back their own hat?
5.2-6
Let A[1 : n] be an array of n distinct numbers. If i < j and A[ i] > A[ j], then the pair ( i, j) is called an inversion of A. (See Problem 2-4 on page 47 for more on inversions.) Suppose that the elements of A form a
uniform random permutation of 〈1, 2, … , n〉. Use indicator random
variables to compute the expected number of inversions.
In the previous section, we showed how knowing a distribution on the
inputs can help us to analyze the average-case behavior of an algorithm.
What if you do not know the distribution? Then you cannot perform an
average-case analysis. As mentioned in Section 5.1, however, you might be able to use a randomized algorithm.
For a problem such as the hiring problem, in which it is helpful to
assume that all permutations of the input are equally likely, a
probabilistic analysis can guide us when developing a randomized
algorithm. Instead of assuming a distribution of inputs, we impose a distribution. In particular, before running the algorithm, let’s randomly
permute the candidates in order to enforce the property that every
permutation is equally likely. Although we have modified the algorithm,
we still expect to hire a new office assistant approximately ln n times.
But now we expect this to be the case for any input, rather than for inputs drawn from a particular distribution.
Let us further explore the distinction between probabilistic analysis
and randomized algorithms. In Section 5.2, we claimed that, assuming that the candidates arrive in a random order, the expected number of
times you hire a new office assistant is about ln n. This algorithm is deterministic: for any particular input, the number of times a new office
assistant is hired is always the same. Furthermore, the number of times
you hire a new office assistant differs for different inputs, and it depends
on the ranks of the various candidates. Since this number depends only
on the ranks of the candidates, to represent a particular input, we can
just list, in order, the ranks 〈 rank(1), rank(2), … , rank( n)〉 of the candidates. Given the rank list A 1 = 〈1, 2, 3, 4, 5, 6, 7, 8, 9, 10〉, a new
office assistant is always hired 10 times, since each successive candidate
is better than the previous one, and lines 5–6 of HIRE-ASSISTANT are
executed in each iteration. Given the list of ranks A 2 = 〈10, 9, 8, 7, 6, 5,
4, 3, 2, 1〉, a new office assistant is hired only once, in the first iteration.
Given a list of ranks A 3 = 〈5, 2, 1, 8, 4, 7, 10, 9, 3, 6〉, a new office assistant is hired three times, upon interviewing the candidates with
ranks 5, 8, and 10. Recalling that the cost of our algorithm depends on
how many times you hire a new office assistant, we see that there are
expensive inputs such as A 1, inexpensive inputs such as A 2, and moderately expensive inputs such as A 3.
Consider, on the other hand, the randomized algorithm that first permutes the list of candidates and then determines the best candidate.
In this case, we randomize in the algorithm, not in the input
distribution. Given a particular input, say A 3 above, we cannot say how
many times the maximum is updated, because this quantity differs with
each run of the algorithm. The first time you run the algorithm on A 3, it
might produce the permutation A 1 and perform 10 updates. But the
second time you run the algorithm, it might produce the permutation
A 2 and perform only one update. The third time you run the algorithm,
it might perform some other number of updates. Each time you run the
algorithm, its execution depends on the random choices made and is
likely to differ from the previous execution of the algorithm. For this
algorithm and many other randomized algorithms, no particular input
elicits its worst-case behavior. Even your worst enemy cannot produce a
bad input array, since the random permutation makes the input order
irrelevant. The randomized algorithm performs badly only if the
random-number generator produces an “unlucky” permutation.
For the hiring problem, the only change needed in the code is to
randomly permute the array, as done in the RANDOMIZED-HIRE-
ASSISTANT procedure. This simple change creates a randomized
algorithm whose performance matches that obtained by assuming that
the candidates were presented in a random order.
RANDOMIZED-HIRE-ASSISTANT( n)
1randomly permute the list
of candidates
2HIRE-ASSISTANT( n)
Lemma 5.3
The expected hiring cost of the procedure RANDOMIZED-HIRE-
ASSISTANT is O( ch ln n).
Proof Permuting the input array achieves a situation identical to that
of the probabilistic analysis of HIRE-ASSISTANT in Secetion 5.2.
By carefully comparing Lemmas 5.2 and 5.3, you can see the
difference between probabilistic analysis and randomized algorithms.
Lemma 5.2 makes an assumption about the input. Lemma 5.3 makes no
such assumption, although randomizing the input takes some
additional time. To remain consistent with our terminology, we couched
Lemma 5.2 in terms of the average-case hiring cost and Lemma 5.3 in
terms of the expected hiring cost. In the remainder of this section, we
discuss some issues involved in randomly permuting inputs.
Randomly permuting arrays
Many randomized algorithms randomize the input by permuting a
given input array. We’ll see elsewhere in this book other ways to
randomize an algorithm, but now, let’s see how we can randomly
permute an array of n elements. The goal is to produce a uniform random permutation, that is, a permutation that is as likely as any other
permutation. Since there are n! possible permutations, we want the
probability that any particular permutation is produced to be 1/ n!.
You might think that to prove that a permutation is a uniform
random permutation, it suffices to show that, for each element A[ i], the probability that the element winds up in position j is 1/ n. Exercise 5.3-4
shows that this weaker condition is, in fact, insufficient.
Our method to generate a random permutation permutes the array
in place: at most a constant number of elements of the input array are
ever stored outside the array. The procedure RANDOMLY-
PERMUTE permutes an array A[1 : n] in place in Θ( n) time. In its i th iteration, it chooses the element A[ i] randomly from among elements A[ i] through A[ n]. After the i th iteration, A[ i] is never altered.
RANDOMLY-PERMUTE( A, n)
1 for i = 1 to n
2
swap A[ i] with A[RANDOM( i, n)]
We use a loop invariant to show that procedure RANDOMLY-
PERMUTE produces a uniform random permutation. A k-permutation
on a set of n elements is a sequence containing k of the n elements, with no repetitions. (See page 1180 in Appendix C. ) There are n!/( n – k)! such possible k-permutations.
Lemma 5.4
Procedure RANDOMLY-PERMUTE computes a uniform random
permutation.
Proof We use the following loop invariant:
Just prior to the i th iteration of the for loop of lines 1–2, for each possible ( i – 1)-permutation of the n elements, the subarray
A[1 : i – 1] contains this ( i – 1)-permutation with probability ( n
– i + 1)!/ n!.
We need to show that this invariant is true prior to the first loop
iteration, that each iteration of the loop maintains the invariant, that
the loop terminates, and that the invariant provides a useful property to
show correctness when the loop terminates.
Initialization: Consider the situation just before the first loop iteration,
so that i = 1. The loop invariant says that for each possible 0-
permutation, the subarray A[1 : 0] contains this 0-permutation with
probability ( n – i + 1)!/ n! = n!/ n! = 1. The subarray A[1 : 0] is an empty subarray, and a 0-permutation has no elements. Thus, A[1 : 0]
contains any 0-permutation with probability 1, and the loop invariant
holds prior to the first iteration.
Maintenance: By the loop invariant, we assume that just before the i th
iteration, each possible ( i – 1)-permutation appears in the subarray
A[1 : i – 1] with probability ( n – i + 1)!/ n!. We shall show that after the i th iteration, each possible i-permutation appears in the subarray A[1 : i] with probability ( n – i)!/ n!. Incrementing i for the next iteration then maintains the loop invariant.
Let us examine the i th iteration. Consider a particular i-permutation, and denote the elements in it by 〈 x 1, x 2, … , xi〉. This permutation consists of an ( i – 1)-permutation 〈 x 1, … , xi–1〉 followed by the value xi that the algorithm places in A[ i]. Let E 1 denote the event in which the first i – 1 iterations have created the particular ( i – 1)-permutation
〈 x 1, … , xi–1〉 in A[1 : i – 1]. By the loop invariant, Pr { E 1} = ( n – i +
1)!/ n!. Let E 2 be the event that the i th iteration puts xi in position A[ i].
The i-permutation 〈 x 1, … , xi〉 appears in A[1 : i] precisely when both E 1 and E 2 occur, and so we wish to compute Pr { E 2 ∩ E 1}. Using equation (C.16) on page 1187, we have
Pr { E 2 ∩ E 1} = Pr { E 2 | E 1} Pr { E 1}.
The probability Pr { E 2 | E 1} equals 1/( n – i + 1) because in line 2 the algorithm chooses xi randomly from the n – i + 1 values in positions A[ i : n]. Thus, we have
Termination: The loop terminates, since it is a for loop iterating n times.
At termination, i = n + 1, and we have that the subarray A[1 : n] is a given n-permutation with probability ( n – ( n + 1) + 1)!/ n! = 0!/ n! =
1/ n!.
Thus, RANDOMLY-PERMUTE produces a uniform random
permutation.
▪
A randomized algorithm is often the simplest and most efficient way
to solve a problem.
Exercises
Professor Marceau objects to the loop invariant used in the proof of
Lemma 5.4. He questions whether it holds prior to the first iteration. He
reasons that we could just as easily declare that an empty subarray
contains no 0-permutations. Therefore, the probability that an empty
subarray contains a 0-permutation should be 0, thus invalidating the
loop invariant prior to the first iteration. Rewrite the procedure
RANDOMLY-PERMUTE so that its associated loop invariant applies
to a nonempty subarray prior to the first iteration, and modify the
proof of Lemma 5.4 for your procedure.
5.3-2
Professor Kelp decides to write a procedure that produces at random
any permutation except the identity permutation, in which every element
ends up where it started. He proposes the procedure PERMUTE-
WITHOUT-IDENTITY. Does this procedure do what Professor Kelp
intends?
PERMUTE-WITHOUT-IDENTITY( A, n)
1 for i = 1 to n – 1
2
swap A[ i] with A[RANDOM( i + 1, n)]
5.3-3
Consider the PERMUTE-WITH-ALL procedure on the facing page,
which instead of swapping element A[ i] with a random element from the
subarray A[ i : n], swaps it with a random element from anywhere in the array. Does PERMUTE-WITH-ALL produce a uniform random
permutation? Why or why not?
PERMUTE-WITH-ALL( A, n)
1 for i = 1 to n
2
swap A[ i] with A[RANDOM(1, n)]
5.3-4
Professor Knievel suggests the procedure PERMUTE-BY-CYCLE to
generate a uniform random permutation. Show that each element A[ i]
has a 1/ n probability of winding up in any particular position in B. Then show that Professor Knievel is mistaken by showing that the resulting
permutation is not uniformly random.
PERMUTE-BY-CYCLE( A, n)
1 let B[1 : n] be a new array
2 offset = RANDOM(1, n)
3 for i = 1 to n
4
dest = i + offset
5
if dest > n
6
dest = dest – n
7
B[ dest] = A[ i]
8 return B
5.3-5
Professor Gallup wants to create a random sample of the set {1, 2, 3, … ,
n}, that is, an m-element subset S, where 0 ≤ m ≤ n, such that each m-
subset is equally likely to be created. One way is to set A[ i] = i, for i = 1, 2, 3, … , n, call RANDOMLY-PERMUTE( A), and then take just the
first m array elements. This method makes n calls to the RANDOM
procedure. In Professor Gallup’s application, n is much larger than m,
and so the professor wants to create a random sample with fewer calls
to RANDOM.
RANDOM-SAMPLE( m, n)
1 S = ∅
2 for k = n – m + 1 to n
// iterates m times
3
i = RANDOM(1, k)
4
if i ∈ S
5
S = S ⋃ { k}
6
else S = S ⋃ { i}
7 return S
Show that the procedure RANDOM-SAMPLE on the previous page
returns a random m-subset S of {1, 2, 3, … , n}, in which each m-subset is equally likely, while making only m calls to RANDOM.
★ 5.4 Probabilistic analysis and further uses of indicator
This advanced section further illustrates probabilistic analysis by way of
four examples. The first determines the probability that in a room of k
people, two of them share the same birthday. The second example
examines what happens when randomly tossing balls into bins. The
third investigates “streaks” of consecutive heads when flipping coins.
The final example analyzes a variant of the hiring problem in which you
have to make decisions without actually interviewing all the candidates.
5.4.1 The birthday paradox
Our first example is the birthday paradox. How many people must there
be in a room before there is a 50% chance that two of them were born
on the same day of the year? The answer is surprisingly few. The
paradox is that it is in fact far fewer than the number of days in a year,
or even half the number of days in a year, as we shall see.
To answer this question, we index the people in the room with the
integers 1, 2, … , k, where k is the number of people in the room. We
ignore the issue of leap years and assume that all years have n = 365
days. For i = 1, 2, … , k, let bi be the day of the year on which person i’s birthday falls, where 1 ≤ bi ≤ n. We also assume that birthdays are uniformly distributed across the n days of the year, so that Pr { bi = r} =
1/ n for i = 1, 2, … , k and r = 1, 2, … , n.
The probability that two given people, say i and j, have matching birthdays depends on whether the random selection of birthdays is
independent. We assume from now on that birthdays are independent,
so that the probability that i’s birthday and j’s birthday both fall on day r is



Thus, the probability that they both fall on the same day is
More intuitively, once bi is chosen, the probability that bj is chosen to be the same day is 1/ n. As long as the birthdays are independent, the
probability that i and j have the same birthday is the same as the probability that the birthday of one of them falls on a given day.
We can analyze the probability of at least 2 out of k people having
matching birthdays by looking at the complementary event. The
probability that at least two of the birthdays match is 1 minus the
probability that all the birthdays are different. The event Bk that k people have distinct birthdays is
where Ai is the event that person i’s birthday is different from person j’s for all j < i. Since we can write Bk = Ak ∩ Bk–1, we obtain from equation (C.18) on page 1189 the recurrence
where we take Pr { B 1} = Pr { A 1} = 1 as an initial condition. In other words, the probability that b 1, b 2, … , bk are distinct birthdays equals the probability that b 1, b 2, … , bk–1 are distinct birthdays multiplied by


the probability that bk ≠ bi for i = 1, 2, … , k – 1, given that b 1, b 2, … , bk–1 are distinct.
If b 1, b 2, … , bk–1 are distinct, the conditional probability that bk ≠
bi for i = 1, 2, … , k – 1 is Pr { Ak | Bk–1} = ( n – k + 1)/ n, since out of the n days, n – ( k – 1) days are not taken. We iteratively apply the recurrence (5.8) to obtain
Inequality (3.14) on page 66, 1 + x ≤ ex, gives us
when – k( k – 1)/2 n ≤ ln(1/2). The probability that all k birthdays are distinct is at most 1/2 when k( k – 1) ≥ 2 n ln 2 or, solving the quadratic equation, when
. For n = 365, we must have k ≥
23. Thus, if at least 23 people are in a room, the probability is at least
1/2 that at least two people have the same birthday. Since a year on
Mars is 669 Martian days long, it takes 31 Martians to get the same
effect.
An analysis using indicator random variables
Indicator random variables afford a simpler but approximate analysis of
the birthday paradox. For each pair ( i, j) of the k people in the room,



define the indicator random variable Xij, for 1 ≤ i < j ≤ k, by By equation (5.7), the probability that two people have matching
birthdays is 1/ n, and thus by Lemma 5.1 on page 130, we have
E [ Xij] = Pr {person i and person j have the same birthday}
= 1/ n.
Letting X be the random variable that counts the number of pairs of
individuals having the same birthday, we have
Taking expectations of both sides and applying linearity of expectation,
we obtain
When k( k – 1) ≥ 2 n, therefore, the expected number of pairs of people with the same birthday is at least 1. Thus, if we have at least
individuals in a room, we can expect at least two to have the same
birthday. For n = 365, if k = 28, the expected number of pairs with the same birthday is (28 · 27)/(2 · 365) ≈ 1.0356. Thus, with at least 28
people, we expect to find at least one matching pair of birthdays. On
Mars, with 669 days per year, we need at least 38 Martians.
The first analysis, which used only probabilities, determined the
number of people required for the probability to exceed 1/2 that a
matching pair of birthdays exists, and the second analysis, which used
indicator random variables, determined the number such that the
expected number of matching birthdays is 1. Although the exact
numbers of people differ for the two situations, they are the same
asymptotically:
.
5.4.2 Balls and bins
Consider a process in which you randomly toss identical balls into b
bins, numbered 1, 2, … , b. The tosses are independent, and on each toss the ball is equally likely to end up in any bin. The probability that a
tossed ball lands in any given bin is 1/ b. If we view the ball-tossing process as a sequence of Bernoulli trials (see Appendix C.4), where success means that the ball falls in the given bin, then each trial has a
probability 1/ b of success. This model is particularly useful for analyzing
hashing (see Chapter 11), and we can answer a variety of interesting questions about the ball-tossing process. (Problem C-2 asks additional
questions about balls and bins.)
How many balls fall in a given bin? The number of balls that fall in
a given bin follows the binomial distribution b( k; n, 1/ b). If you toss n balls, equation (C.41) on page 1199 tells us that the
expected number of balls that fall in the given bin is n/ b.
How many balls must you toss, on the average, until a given bin
contains a ball? The number of tosses until the given bin receives a
ball follows the geometric distribution with probability 1/ b and,
by equation (C.36) on page 1197, the expected number of tosses
until success is 1/(1/ b) = b.
How many balls must you toss until every bin contains at least one
ball? Let us call a toss in which a ball falls into an empty bin a
“hit.” We want to know the expected number n of tosses required
to get b hits.


Using the hits, we can partition the n tosses into stages. The i th
stage consists of the tosses after the ( i – 1)st hit up to and
including the i th hit. The first stage consists of the first toss, since
you are guaranteed to have a hit when all bins are empty. For each
toss during the i th stage, i – 1 bins contain balls and b – i + 1 bins are empty. Thus, for each toss in the i th stage, the probability of
obtaining a hit is ( b – i + 1)/ b.
Let ni denote the number of tosses in the i th stage. The number of
tosses required to get b hits is
. Each random variable
ni has a geometric distribution with probability of success ( b – i +
1)/ b and thus, by equation (C.36), we have
By linearity of expectation, we have
It therefore takes approximately b ln b tosses before we can expect
that every bin has a ball. This problem is also known as the
coupon collector’s problem, which says that if you are trying to
collect each of b different coupons, then you should expect to
acquire approximately b ln b randomly obtained coupons in order
to succeed.
5.4.3 Streaks
Suppose that you flip a fair coin n times. What is the longest streak of
consecutive heads that you expect to see? We’ll prove upper and lower
bounds separately to show that the answer is Θ(lg n).
We first prove that the expected length of the longest streak of heads
is O(lg n). The probability that each coin flip is a head is 1/2. Let Aik be the event that a streak of heads of length at least k begins with the i th coin flip or, more precisely, the event that the k consecutive coin flips i, i
+ 1, … , i + k – 1 yield only heads, where 1 ≤ k ≤ n and 1 ≤ i ≤ n – k + 1.