lg lg n = lg(lg n) (composition).
We adopt the following notational convention: in the absence of
parentheses, a logarithm function applies only to the next term in the
formula, so that lg n + 1 means (lg n) + 1 and not lg( n + 1).
For any constant b > 1, the function log b n is undefined if n ≤ 0, strictly increasing if n > 0, negative if 0 < n < 1, positive if n > 1, and 0 if n = 1. For all real a > 0, b > 0, c > 0, and n, we have



where, in each equation above, logarithm bases are not 1.
By equation (3.19), changing the base of a logarithm from one
constant to another changes the value of the logarithm by only a
constant factor. Consequently, we often use the notation “lg n” when we
don’t care about constant factors, such as in O-notation. Computer
scientists find 2 to be the most natural base for logarithms because so
many algorithms and data structures involve splitting a problem into
two parts.
There is a simple series expansion for ln(1 + x) when | x| < 1:
We also have the following inequalities for x > – 1:
where equality holds only for x = 0.
We say that a function f ( n) is polylogarithmically bounded if f ( n) =
O(lg k n) for some constant k. We can relate the growth of polynomials and polylogarithms by substituting lg n for n and 2 a for a in equation (3.13). For all real constants a > 0 and b, we have
Thus, any positive polynomial function grows faster than any
polylogarithmic function.




Factorials
The notation n! (read “n factorial”) is defined for integers n ≥ 0 as Thus, n! = 1 · 2 · 3 ⋯ n.
A weak upper bound on the factorial function is n! ≤ nn, since each
of the n terms in the factorial product is at most n. Stirling’s approximation,
where e is the base of the natural logarithm, gives us a tighter upper bound, and a lower bound as well. Exercise 3.3-4 asks you to prove the
three facts
where Stirling’s approximation is helpful in proving equation (3.28). The
following equation also holds for all n ≥ 1:
where
Functional iteration
We use the notation f( i) ( n) to denote the function f ( n) iteratively applied i times to an initial value of n. Formally, let f ( n) be a function over the reals. For nonnegative integers i, we recursively define
For example, if f ( n) = 2 n, then f( i) ( n) = 2 in.
The iterated logarithm function
We use the notation lg* n (read “log star of n”) to denote the iterated logarithm, defined as follows. Let lg( i) n be as defined above, with f ( n) =
lg n. Because the logarithm of a nonpositive number is undefined, lg (i)
n is defined only if lg( i–1) n > 0. Be sure to distinguish lg( i) n (the logarithm function applied i times in succession, starting with argument
n) from lg i n (the logarithm of n raised to the i th power). Then we define the iterated logarithm function as
lg* n = min { i ≥ 0 : lg( i) n ≤ 1}.
The iterated logarithm is a very slowly growing function:
lg* 2 = 1,
lg* 4 = 2,
lg* 16 = 3,
lg* 65536 = 4,
lg* (265536) = 5.
Since the number of atoms in the observable universe is estimated to be
about 1080, which is much less than 265536 = 1065536/lg 10 ≈ 1019,728,
we rarely encounter an input size n for which lg* n > 5.
Fibonacci numbers
We define the Fibonacci numbers Fi, for i ≥ 0, as follows:






Thus, after the first two, each Fibonacci number is the sum of the two
previous ones, yielding the sequence
0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, ….
Fibonacci numbers are related to the golden ratio ϕ and its conjugate , which are the two roots of the equation
x 2 = x + 1.
As Exercise 3.3-7 asks you to prove, the golden ratio is given by
and its conjugate, by
Specifically, we have
which can be proved by induction (Exercise 3.3-8). Since
, we have
which implies that

which is to say that the i th Fibonacci number Fi is equal to rounded to the nearest integer. Thus, Fibonacci numbers grow
exponentially.
Exercises
3.3-1
Show that if f ( n) and g( n) are monotonically increasing functions, then so are the functions f ( n) + g( n) and f ( g( n)), and if f ( n) and g( n) are in addition nonnegative, then f ( n) · g( n) is monotonically increasing.
3.3-2
Prove that ⌊ αn⌊ + ⌈(1 – α) n⌉ = n for any integer n and real number α in the range 0 ≤ α ≤ 1.
3.3-3
Use equation (3.14) or other means to show that ( n + o( n)) k = Θ( nk) for any real constant k. Conclude that ⌈ n⌉ k = Θ( nk) and ⌊ n⌊ k = Θ( nk).
3.3-4
Prove the following:
a. Equation (3.21).
b. Equations (3.26)–(3.28).
c. lg(Θ( n)) = Θ(lg n).
★ 3.3-5
Is the function ⌈lg n⌉! polynomially bounded? Is the function ⌈lg lg n⌉!
polynomially bounded?
★ 3.3-6
Which is asymptotically larger: lg(lg* n) or lg*(lg n)?



3.3-7
Show that the golden ratio ϕ and its conjugate both satisfy the
equation x 2 = x + 1.
3.3-8
Prove by induction that the i th Fibonacci number satisfies the equation
where ϕ is the golden ratio and is its conjugate.
3.3-9
Show that k lg k = Θ( n) implies k = Θ( n/lg n).
Problems
3-1 Asymptotic behavior of polynomials
Let
where ad > 0, be a degree- d polynomial in n, and let k be a constant.
Use the definitions of the asymptotic notations to prove the following
properties.
a. If k ≥ d, then p( n) = O( nk).
b. If k ≤ d, then p( n) = Ω( nk).
c. If k = d, then p( n) = Θ( nk).
d. If k > d, then p( n) = o( nk).
e. If k < d, then p( n) = ω( nk).
3-2 Relative asymptotic growths





Indicate, for each pair of expressions ( A, B) in the table below whether A is O, o, Ω, ω, or Θ of B. Assume that k ≥ 1, ϵ > 0, and c > 1 are constants. Write your answer in the form of the table with “yes” or “no”
written in each box.
3-3 Ordering by asymptotic growth rates
a. Rank the following functions by order of growth. That is, find an
arrangement g 1, g 2, … , g 30 of the functions satisfying g 1 = Ω( g 2), g 2
= Ω( g 3), … , g 29 = Ω( g 30). Partition your list into equivalence classes such that functions f ( n) and g( n) belong to the same class if and only if f ( n) = Θ( g( n)).
lg(lg* n) 2lg* n
n 2
n!
(lg
n)!
(3/2) n
n 3
lg2 n lg( n!)
n 1/lg
n
ln ln n lg* n n · 2 n n lg lg ln n
1
n
2lg n (lg n)lg en
4lg n ( n +
n
1)!
lg*(lg n)
n
2 n n lg n
b. Give an example of a single nonnegative function f ( n) such that for all functions gi( n) in part (a), f ( n) is neither O( gi( n)) nor Ω( gi( n)).

3-4 Asymptotic notation properties
Let f ( n) and g( n) be asymptotically positive functions. Prove or disprove each of the following conjectures.
a. f ( n) = O( g( n)) implies g( n) = O( f ( n)).
b. f ( n) + g( n) = Θ(min { f ( n), g( n)}).
c. f ( n) = O( g( n)) implies lg f ( n) = O(lg g( n)), where lg g( n) ≥ 1 and f ( n)
≥ 1 for all sufficiently large n.
d. f ( n) = O( g( n)) implies 2 f( n) = O (2 g( n)).
e. f ( n) = O (( f ( n))2).
f. f ( n) = O( g( n)) implies g( n) = Ω( f ( n)).
g. f ( n) = Θ( f ( n/2)).
h. f ( n) + o( f ( n)) = Θ( f ( n)).
3-5 Manipulating asymptotic notation
Let f ( n) and g( n) be asymptotically positive functions. Prove the following identities:
a. Θ(Θ( f ( n))) = Θ( f ( n)).
b. Θ( f ( n)) + O( f ( n)) = Θ( f ( n)).
c. Θ( f ( n)) + Θ( g( n)) = Θ( f ( n) + g( n)).
d. Θ( f ( n)) · Θ( g( n)) = Θ( f ( n) · g( n)).
e. Argue that for any real constants a 1, b 1 > 0 and integer constants k 1, k 2, the following asymptotic bound holds:
★ f. Prove that for S ⊆ Z, we have






assuming that both sums converge.
★ g. Show that for S ⊆ Z, the following asymptotic bound does not
necessarily hold, even assuming that both products converge, by
giving a counterexample:
3-6 Variations on O and Ω
Some authors define Ω-notation in a slightly different way than this
textbook does. We’ll use the nomenclature (read “omega infinity”) for
this alternative definition. We say that
if there exists a
positive constant c such that f ( n) ≥ cg( n) ≥ 0 for infinitely many integers n.
a. Show that for any two asymptotically nonnegative functions f ( n) and g( n), we have f ( n) = O( g( n)) or (or both).
b. Show that there exist two asymptotically nonnegative functions f ( n) and g( n) for which neither f ( n) = O( g( n)) nor f ( n) = Ω( g( n)) holds.
c. Describe the potential advantages and disadvantages of using -
notation instead of Ω-notation to characterize the running times of
programs.
Some authors also define O in a slightly different manner. We’ll use O′
for the alternative definition: f ( n) = O′( g( n)) if and only if | f ( n)| =
O( g( n)).
d. What happens to each direction of the “if and only if” in Theorem
3.1 on page 56 if we substitute O′ for O but still use Ω?
Some authors define (read “soft-oh”) to mean O with logarithmic
factors ignored:
: there exist positive constants c, k, and n 0








such that 0 ≤ f ( n) ≤ cg( n) lg k( n) for all n
≥ n 0}.
e. Define and in a similar manner. Prove the corresponding analog
to Theorem 3.1.
3-7 Iterated functions
We can apply the iteration operator * used in the lg* function to any
monotonically increasing function f ( n) over the reals. For a given constant c ∈ R, we define the iterated function by
which need not be well defined in all cases. In other words, the quantity
is the minimum number of iterated applications of the function f
required to reduce its argument down to c or less.
For each of the functions f ( n) and constants c in the table below, give as tight a bound as possible on
. If there is no i such that f( i)( n)
≤ c, write “undefined” as your answer.
f ( n)
c
a.
n – 1
0
b.
lg n
1
c.
n/2
1
d.
n/2
2
e.
2
f.
1
g.
n 1/3
2
Chapter notes
Knuth [259] traces the origin of the O-notation to a number-theory text by P. Bachmann in 1892. The o-notation was invented by E. Landau in


1909 for his discussion of the distribution of prime numbers. The Ω and
Θ notations were advocated by Knuth [265] to correct the popular, but technically sloppy, practice in the literature of using O-notation for both
upper and lower bounds. As noted earlier in this chapter, many people
continue to use the O-notation where the Θ-notation is more technically
precise. The soft-oh notation in Problem 3-6 was introduced by Babai,
Luks, and Seress [31], although it was originally written as O~. Some authors now define
as ignoring factors that are logarithmic in
g( n), rather than in n. With this definition, we can say that
,
but with the definition in Problem 3-6, this statement is not true.
Further discussion of the history and development of asymptotic
notations appears in works by Knuth [259, 265] and Brassard and Bratley [70].
Not all authors define the asymptotic notations in the same way,
although the various definitions agree in most common situations. Some
of the alternative definitions encompass functions that are not
asymptotically nonnegative, as long as their absolute values are
appropriately bounded.
Equation (3.29) is due to Robbins [381]. Other properties of elementary mathematical functions can be found in any good
mathematical reference, such as Abramowitz and Stegun [1] or Zwillinger [468], or in a calculus book, such as Apostol [19] or Thomas et al. [433]. Knuth [259] and Graham, Knuth, and Patashnik [199]
contain a wealth of material on discrete mathematics as used in
computer science.
1 Within set notation, a colon means “such that.”
The divide-and-conquer method is a powerful strategy for designing
asymptotically efficient algorithms. We saw an example of divide-and-
conquer in Section 2.3.1 when learning about merge sort. In this chapter, we’ll explore applications of the divide-and-conquer method
and acquire valuable mathematical tools that you can use to solve the
recurrences that arise when analyzing divide-and-conquer algorithms.
Recall that for divide-and-conquer, you solve a given problem
(instance) recursively. If the problem is small enough—the base case—
you just solve it directly without recursing. Otherwise—the recursive
case—you perform three characteristic steps:
Divide the problem into one or more subproblems that are smaller
instances of the same problem.
Conquer the subproblems by solving them recursively.
Combine the subproblem solutions to form a solution to the original
problem.
A divide-and-conquer algorithm breaks down a large problem into
smaller subproblems, which themselves may be broken down into even
smaller subproblems, and so forth. The recursion bottoms out when it
reaches a base case and the subproblem is small enough to solve directly
without further recursing.
Recurrences
To analyze recursive divide-and-conquer algorithms, we’ll need some mathematical tools. A recurrence is an equation that describes a
function in terms of its value on other, typically smaller, arguments.
Recurrences go hand in hand with the divide-and-conquer method
because they give us a natural way to characterize the running times of
recursive algorithms mathematically. You saw an example of a
recurrence in Section 2.3.2 when we analyzed the worst-case running time of merge sort.
For the divide-and-conquer matrix-multiplication algorithms
presented in Sections 4.1 and 4.2, we’ll derive recurrences that describe their worst-case running times. To understand why these two divide-and-conquer algorithms perform the way they do, you’ll need to learn
how to solve the recurrences that describe their running times. Sections
4.3–4.7 teach several methods for solving recurrences. These sections
also explore the mathematics behind recurrences, which can give you
stronger intuition for designing your own divide-and-conquer
algorithms.
We want to get to the algorithms as soon as possible. So, let’s just
cover a few recurrence basics now, and then we’ll look more deeply at
recurrences, especially how to solve them, after we see the matrix-
multiplication examples.
The general form of a recurrence is an equation or inequality that
describes a function over the integers or reals using the function itself. It
contains two or more cases, depending on the argument. If a case
involves the recursive invocation of the function on different (usually
smaller) inputs, it is a recursive case. If a case does not involve a recursive invocation, it is a base case. There may be zero, one, or many
functions that satisfy the statement of the recurrence. The recurrence is
well defined if there is at least one function that satisfies it, and ill defined otherwise.
Algorithmic recurrences
We’ll be particularly interested in recurrences that describe the running
times of divide-and-conquer algorithms. A recurrence T ( n) is
algorithmic if, for every sufficiently large threshold constant n 0 > 0, the following two properties hold:
1. For all n < n 0, we have T ( n) = Θ(1).
2. For all n ≥ n 0, every path of recursion terminates in a defined base case within a finite number of recursive invocations.
Similar to how we sometimes abuse asymptotic notation (see page 60),
when a function is not defined for all arguments, we understand that
this definition is constrained to values of n for which T ( n) is defined.
Why would a recurrence T ( n) that represents a (correct) divide-and-
conquer algorithm’s worst-case running time satisfy these properties for
all sufficiently large threshold constants? The first property says that
there exist constants c 1, c 2 such that 0 < c 1 ≤ T ( n) ≤ c 2 for n < n 0. For every legal input, the algorithm must output the solution to the problem
it’s solving in finite time (see Section 1.1). Thus we can let c 1 be the minimum amount of time to call and return from a procedure, which
must be positive, because machine instructions need to be executed to
invoke a procedure. The running time of the algorithm may not be
defined for some values of n if there are no legal inputs of that size, but
it must be defined for at least one, or else the “algorithm” doesn’t solve
any problem. Thus we can let c 2 be the algorithm’s maximum running
time on any input of size n < n 0, where n 0 is sufficiently large that the algorithm solves at least one problem of size less than n 0. The
maximum is well defined, since there are at most a finite number of
inputs of size less than n 0, and there is at least one if n 0 is sufficiently large. Consequently, T ( n) satisfies the first property. If the second property fails to hold for T ( n), then the algorithm isn’t correct, because it would end up in an infinite recursive loop or otherwise fail to
compute a solution. Thus, it stands to reason that a recurrence for the
worst-case running time of a correct divide-and-conquer algorithm
would be algorithmic.
Conventions for recurrences
We adopt the following convention:
Whenever a recurrence is stated without an explicit base case, we
assume that the recurrence is algorithmic.
That means you’re free to pick any sufficiently large threshold constant
n 0 for the range of base cases where T ( n) = Θ(1). Interestingly, the asymptotic solutions of most algorithmic recurrences you’re likely to see
when analyzing algorithms don’t depend on the choice of threshold
constant, as long as it’s large enough to make the recurrence well
defined.
Asymptotic solutions of algorithmic divide-and-conquer recurrences
also don’t tend to change when we drop any floors or ceilings in a
recurrence defined on the integers to convert it to a recurrence defined
on the reals. Section 4.7 gives a sufficient condition for ignoring floors and ceilings that applies to most of the divide-and-conquer recurrences
you’re likely to see. Consequently, we’ll frequently state algorithmic
recurrences without floors and ceilings. Doing so generally simplifies the
statement of the recurrences, as well as any math that we do with them.
You may sometimes see recurrences that are not equations, but
rather inequalities, such as T ( n) ≤ 2 T ( n/2) + Θ( n). Because such a recurrence states only an upper bound on T ( n), we express its solution using O-notation rather than Θ-notation. Similarly, if the inequality is
reversed to T ( n) ≥ 2 T ( n/2) + Θ( n), then, because the recurrence gives only a lower bound on T ( n), we use Ω-notation in its solution.
Divide-and-conquer and recurrences
This chapter illustrates the divide-and-conquer method by presenting
and using recurrences to analyze two divide-and-conquer algorithms for
multiplying n × n matrices. Section 4.1 presents a simple divide-and-conquer algorithm that solves a matrix-multiplication problem of size n
by breaking it into four subproblems of size n/2, which it then solves recursively. The running time of the algorithm can be characterized by
the recurrence
T ( n) = 8 T ( n/2) + Θ(1),
which turns out to have the solution T ( n) = Θ( n 3). Although this divide-and-conquer algorithm is no faster than the straightforward
method that uses a triply nested loop, it leads to an asymptotically
faster divide-and-conquer algorithm due to V. Strassen, which we’ll
explore in Section 4.2. Strassen’s remarkable algorithm divides a problem of size n into seven subproblems of size n/2 which it solves recursively. The running time of Strassen’s algorithm can be described
by the recurrence
T ( n) = 7 T ( n/2) + Θ( n 2),
which has the solution T ( n) = Θ( n lg 7) = O( n 2.81). Strassen’s algorithm beats the straightforward looping method asymptotically.
These two divide-and-conquer algorithms both break a problem of
size n into several subproblems of size n/2. Although it is common when using divide-and-conquer for all the subproblems to have the same size,
that isn’t always the case. Sometimes it’s productive to divide a problem
of size n into subproblems of different sizes, and then the recurrence describing the running time reflects the irregularity. For example,
consider a divide-and-conquer algorithm that divides a problem of size
n into one subproblem of size n/3 and another of size 2 n/3, taking Θ( n) time to divide the problem and combine the solutions to the
subproblems. Then the algorithm’s running time can be described by the
recurrence
T ( n) = T ( n/3) + T (2 n/3) + Θ( n), which turns out to have solution T ( n) = Θ( n lg n). We’ll even see an algorithm in Chapter 9 that solves a problem of size n by recursively solving a subproblem of size n/5 and another of size 7 n/10, taking Θ( n) time for the divide and combine steps. Its performance satisfies the
recurrence
T ( n) = T ( n/5) + T (7 n/10) + Θ( n), which has solution T ( n) = Θ( n).
Although divide-and-conquer algorithms usually create subproblems
with sizes a constant fraction of the original problem size, that’s not
always the case. For example, a recursive version of linear search (see Exercise 2.1-4) creates just one subproblem, with one element less than
the original problem. Each recursive call takes constant time plus the
time to recursively solve a subproblem with one less element, leading to
the recurrence
T ( n) = T ( n – 1) + Θ(1),
which has solution T ( n) = Θ( n). Nevertheless, the vast majority of efficient divide-and-conquer algorithms solve subproblems that are a
constant fraction of the size of the original problem, which is where
we’ll focus our efforts.
Solving recurrences
After learning about divide-and-conquer algorithms for matrix
multiplication in Sections 4.1 and 4.2, we’ll explore several mathematical tools for solving recurrences—that is, for obtaining
asymptotic Θ-, O-, or Ω-bounds on their solutions. We want simple-to-
use tools that can handle the most commonly occurring situations. But
we also want general tools that work, perhaps with a little more effort,
for less common cases. This chapter offers four methods for solving
recurrences:
In the substitution method (Section 4.3), you guess the form of a bound and then use mathematical induction to prove your guess
correct and solve for constants. This method is perhaps the most
robust method for solving recurrences, but it also requires you to
make a good guess and to produce an inductive proof.
The recursion-tree method (Section 4.4) models the recurrence as a tree whose nodes represent the costs incurred at various levels of
the recursion. To solve the recurrence, you determine the costs at
each level and add them up, perhaps using techniques for
bounding summations from Section A.2. Even if you don’t use this method to formally prove a bound, it can be helpful in
guessing the form of the bound for use in the substitution
method.
The master method (Sections 4.5 and 4.6) is the easiest method, when it applies. It provides bounds for recurrences of the form
T ( n) = aT ( n/ b) + f ( n),
where a > 0 and b > 1 are constants and f ( n) is a given “driving”
function. This type of recurrence tends to arise more frequently in
the study of algorithms than any other. It characterizes a divide-
and-conquer algorithm that creates a subproblems, each of which
is 1/ b times the size of the original problem, using f ( n) time for the divide and combine steps. To apply the master method, you
need to memorize three cases, but once you do, you can easily
determine asymptotic bounds on running times for many divide-
and-conquer algorithms.
The Akra-Bazzi method (Section 4.7) is a general method for solving divide-and-conquer recurrences. Although it involves
calculus, it can be used to attack more complicated recurrences
than those addressed by the master method.
4.1 Multiplying square matrices
We can use the divide-and-conquer method to multiply square matrices.
If you’ve seen matrices before, then you probably know how to multiply
them. (Otherwise, you should read Section D.1.) Let A = ( aik) and B =
( bjk) be square n × n matrices. The matrix product C = A · B is also an n
× n matrix, where for i, j = 1, 2, … , n, the ( i, j) entry of C is given by Generally, we’ll assume that the matrices are dense, meaning that most
of the n 2 entries are not 0, as opposed to sparse, where most of the n 2
entries are 0 and the nonzero entries can be stored more compactly than
in an n × n array.
Computing the matrix C requires computing n 2 matrix entries, each of which is the sum of n pairwise products of input elements from A and B. The MATRIX-MULTIPLY procedure implements this strategy in a
straightforward manner, and it generalizes the problem slightly. It takes
as input three n × n matrices A, B, and C, and it adds the matrix product A · B to C, storing the result in C. Thus, it computes C = C + A
· B, instead of just C = A · B. If only the product A · B is needed, just initialize all n 2 entries of C to 0 before calling the procedure, which takes an additional Θ( n 2) time. We’ll see that the cost of matrix
multiplication asymptotically dominates this initialization cost.
MATRIX-MULTIPLY( A, B, C, n)
1 for i = 1 to n
// compute entries in each of n rows
2
for j = 1 to n
// compute n entries in row i
3
for k = 1 to n
4
cij = cij + aik · bkj // add in another term of equation (4.1)
The pseudocode for MATRIX-MULTIPLY works as follows. The
for loop of lines 1–4 computes the entries of each row i, and within a
given row i, the for loop of lines 2–4 computes each of the entries cij for each column j. Each iteration of the for loop of lines 3–4 adds in one
more term of equation (4.1).
Because each of the triply nested for loops runs for exactly n
iterations, and each execution of line 4 takes constant time, the
MATRIX-MULTIPLY procedure operates in Θ( n 3) time. Even if we
add in the Θ( n 2) time for initializing C to 0, the running time is still Θ( n 3).
A simple divide-and-conquer algorithm
Let’s see how to compute the matrix product A · B using divide-and-conquer. For n > 1, the divide step partitions the n × n matrices into four n/2 × n/2 submatrices. We’ll assume that n is an exact power of 2, so


that as the algorithm recurses, we are guaranteed that the submatrix
dimensions are integer. (Exercise 4.1-1 asks you to relax this
assumption.) As with MATRIX-MULTIPLY, we’ll actually compute C
= C + A · B. But to simplify the math behind the algorithm, let’s assume that C has been initialized to the zero matrix, so that we are indeed computing C = A · B.
The divide step views each of the n × n matrices A, B, and C as four n/2 × n/2 submatrices:
Then we can write the matrix product as
which corresponds to the equations
Equations (4.5)–(4.8) involve eight n/2 × n/2 multiplications and four additions of n/2 × n/2 submatrices.
As we look to transform these equations to an algorithm that can be
described with pseudocode, or even implemented for real, there are two
common approaches for implementing the matrix partitioning.
One strategy is to allocate temporary storage to hold A’s four
submatrices A 11, A 12, A 21, and A 22 and B’s four submatrices B 11, B 12, B 21, and B 22. Then copy each element in A and B to its corresponding location in the appropriate submatrix. After the recursive
conquer step, copy the elements in each of C’s four submatrices C 11,
C 12, C 21, and C 22 to their corresponding locations in C. This approach takes Θ( n 2) time, since 3 n 2 elements are copied.
The second approach uses index calculations and is faster and more
practical. A submatrix can be specified within a matrix by indicating
where within the matrix the submatrix lies without touching any matrix
elements. Partitioning a matrix (or recursively, a submatrix) only
involves arithmetic on this location information, which has constant
size independent of the size of the matrix. Changes to the submatrix
elements update the original matrix, since they occupy the same storage.
Going forward, we’ll assume that index calculations are used and
that partitioning can be performed in Θ(1) time. Exercise 4.1-3 asks you
to show that it makes no difference to the overall asymptotic running
time of matrix multiplication, however, whether the partitioning of
matrices uses the first method of copying or the second method of index
calculation. But for other divide-and-conquer matrix calculations, such
as matrix addition, it can make a difference, as Exercise 4.1-4 asks you
to show.
The procedure MATRIX-MULTIPLY-RECURSIVE uses equations
(4.5)–(4.8) to implement a divide-and-conquer strategy for square-
matrix multiplication. Like MATRIX-MULTIPLY, the procedure
MATRIX-MULTIPLY-RECURSIVE computes C = C + A · B since, if necessary, C can be initialized to 0 before the procedure is called in order to compute only C = A · B.
MATRIX-MULTIPLY-RECURSIVE( A, B, C, n)
1 if n == 1
2 // Base case.
3
c 11 = c 11 + a 11 · b 11
4
return
5 // Divide.
6 partition A, B, and C into n/2 × n/2 submatrices A 11, A 12, A 21, A 22; B 11, B 12, B 21, B 22; and C 11, C 12, C 21, C 22; respectively
7 // Conquer.
8 MATRIX-MULTIPLY-RECURSIVE( A 11, B 11, C 11, n/2)
9 MATRIX-MULTIPLY-RECURSIVE( A 11, B 12, C 12, n/2)
10 MATRIX-MULTIPLY-RECURSIVE( A 21, B 11, C 21, n/2)
11 MATRIX-MULTIPLY-RECURSIVE( A 21, B 12, C 22, n/2)
12 MATRIX-MULTIPLY-RECURSIVE( A 12, B 21, C 11, n/2)
13 MATRIX-MULTIPLY-RECURSIVE( A 12, B 22, C 12, n/2)
14 MATRIX-MULTIPLY-RECURSIVE( A 22, B 21, C 21, n/2)
15 MATRIX-MULTIPLY-RECURSIVE( A 22, B 22, C 22, n/2)
As we walk through the pseudocode, we’ll derive a recurrence to
characterize its running time. Let T ( n) be the worst-case time to multiply two n × n matrices using this procedure.
In the base case, when n = 1, line 3 performs just the one scalar multiplication and one addition, which means that T (1) = Θ(1). As is
our convention for constant base cases, we can omit this base case in the
statement of the recurrence.
The recursive case occurs when n > 1. As discussed, we’ll use index
calculations to partition the matrices in line 6, taking Θ(1) time. Lines
8–15 recursively call MATRIX-MULTIPLY-RECURSIVE a total of
eight times. The first four recursive calls compute the first terms of
equations (4.5)–(4.8), and the subsequent four recursive calls compute
and add in the second terms. Each recursive call adds the product of a
submatrix of A and a submatrix of B to the appropriate submatrix of C
in place, thanks to index calculations. Because each recursive call
multiplies two n/2 × n/2 matrices, thereby contributing T ( n/2) to the overall running time, the time taken by all eight recursive calls is 8 T
( n/2). There is no combine step, because the matrix C is updated in place. The total time for the recursive case, therefore, is the sum of the
partitioning time and the time for all the recursive calls, or Θ(1) + 8 T
( n/2).
Thus, omitting the statement of the base case, our recurrence for the
running time of MATRIX-MULTIPLY-RECURSIVE is
As we’ll see from the master method in Section 4.5, recurrence (4.9) has the solution T ( n) = Θ( n 3), which means that it has the same asymptotic running time as the straightforward MATRIX-MULTIPLY procedure.
Why is the Θ( n 3) solution to this recurrence so much larger than the
Θ( n lg n) solution to the merge-sort recurrence (2.3) on page 41? After all, the recurrence for merge sort contains a Θ( n) term, whereas the recurrence for recursive matrix multiplication contains only a Θ(1) term.
Let’s think about what the recursion tree for recurrence (4.9) would
look like as compared with the recursion tree for merge sort, illustrated
in Figure 2.5 on page 43. The factor of 2 in the merge-sort recurrence determines how many children each tree node has, which in turn
determines how many terms contribute to the sum at each level of the
tree. In comparison, for the recurrence (4.9) for MATRIX-MULTIPLY-
RECURSIVE, each internal node in the recursion tree has eight
children, not two, leading to a “bushier” recursion tree with many more
leaves, despite the fact that the internal nodes are each much smaller.
Consequently, the solution to recurrence (4.9) grows much more quickly
than the solution to recurrence (2.3), which is borne out in the actual
solutions: Θ( n 3) versus Θ( n lg n).
Exercises
Note: You may wish to read Section 4.5 before attempting some of these exercises.
4.1-1
Generalize MATRIX-MULTIPLY-RECURSIVE to multiply n × n
matrices for which n is not necessarily an exact power of 2. Give a recurrence describing its running time. Argue that it runs in Θ( n 3) time
in the worst case.
4.1-2
How quickly can you multiply a k n × n matrix ( k n rows and n columns) by an n × k n matrix, where k ≥ 1, using MATRIX-MULTIPLY-RECURSIVE as a subroutine? Answer the same question
for multiplying an n × k n matrix by a k n × n matrix. Which is asymptotically faster, and by how much?
4.1-3
Suppose that instead of partitioning matrices by index calculation in
MATRIX-MULTIPLY-RECURSIVE, you copy the appropriate
elements of A, B, and C into separate n/2 × n/2 submatrices A 11, A 12, A 21, A 22; B 11, B 12, B 21, B 22; and C 11, C 12, C 21, C 22, respectively.
After the recursive calls, you copy the results from C 11, C 12, C 21, and C 22 back into the appropriate places in C. How does recurrence (4.9)
change, and what is its solution?
4.1-4
Write pseudocode for a divide-and-conquer algorithm MATRIX-ADD-
RECURSIVE that sums two n × n matrices A and B by partitioning each of them into four n/2 × n/2 submatrices and then recursively summing corresponding pairs of submatrices. Assume that matrix
partitioning uses Θ(1)-time index calculations. Write a recurrence for
the worst-case running time of MATRIX-ADD-RECURSIVE, and
solve your recurrence. What happens if you use Θ( n 2)-time copying to
implement the partitioning instead of index calculations?
4.2 Strassen’s algorithm for matrix multiplication
You might find it hard to imagine that any matrix multiplication
algorithm could take less than Θ( n 3) time, since the natural definition of
matrix multiplication requires n 3 scalar multiplications. Indeed, many
mathematicians presumed that it was not possible to multiply matrices
in o( n 3) time until 1969, when V. Strassen [424] published a remarkable recursive algorithm for multiplying n × n matrices. Strassen’s algorithm runs in Θ( n lg 7) time. Since lg 7 = 2.8073549 …, Strassen’s algorithm runs in O( n 2.81) time, which is asymptotically better than the Θ( n 3)
MATRIX-MULTIPLY and MATRIX-MULTIPLY-RECURSIVE
procedures.
The key to Strassen’s method is to use the divide-and-conquer idea
from the MATRIX-MULTIPLY-RECURSIVE procedure, but make
the recursion tree less bushy. We’ll actually increase the work for each
divide and combine step by a constant factor, but the reduction in
bushiness will pay off. We won’t reduce the bushiness from the eight-way
branching of recurrence (4.9) all the way down to the two-way
branching of recurrence (2.3), but we’ll improve it just a little, and that
will make a big difference. Instead of performing eight recursive
multiplications of n/2 × n/2 matrices, Strassen’s algorithm performs only seven. The cost of eliminating one matrix multiplication is several new
additions and subtractions of n/2 × n/2 matrices, but still only a constant number. Rather than saying “additions and subtractions”
everywhere, we’ll adopt the common terminology of calling them both
“additions” because subtraction is structurally the same computation as
addition, except for a change of sign.
To get an inkling how the number of multiplications might be
reduced, as well as why reducing the number of multiplications might be
desirable for matrix calculations, suppose that you have two numbers x
and y, and you want to calculate the quantity x 2 – y 2. The straightforward calculation requires two multiplications to square x and
y, followed by one subtraction (which you can think of as a “negative
addition”). But let’s recall the old algebra trick x 2 – y 2 = x 2 – xy + xy –
y 2 = x( x – y) + y( x – y) = ( x + y)( x – y). Using this formulation of the desired quantity, you could instead compute the sum x + y and the difference x – y and then multiply them, requiring only a single multiplication and two additions. At the cost of an extra addition, only
one multiplication is needed to compute an expression that looks as if it
requires two. If x and y are scalars, there’s not much difference: both approaches require three scalar operations. If x and y are large matrices, however, the cost of multiplying outweighs the cost of adding, in which
case the second method outperforms the first, although not
asymptotically.
Strassen’s strategy for reducing the number of matrix multiplications
at the expense of more matrix additions is not at all obvious—perhaps
the biggest understatement in this book! As with MATRIX-
MULTIPLY-RECURSIVE, Strassen’s algorithm uses the divide-and-
conquer method to compute C = C + A · B, where A, B, and C are all n
× n matrices and n is an exact power of 2. Strassen’s algorithm computes the four submatrices C 11, C 12, C 21, and C 22 of C from equations (4.5)–(4.8) on page 82 in four steps. We’ll analyze costs as we
go along to develop a recurrence T ( n) for the overall running time. Let’s see how it works:
1. If n = 1, the matrices each contain a single element. Perform a
single scalar multiplication and a single scalar addition, as in line
3 of MATRIX-MULTIPLY-RECURSIVE, taking Θ(1) time,
and return. Otherwise, partition the input matrices A and B and
output matrix C into n/2 × n/2 submatrices, as in equation (4.2).
This step takes Θ(1) time by index calculation, just as in
MATRIX-MULTIPLY-RECURSIVE.
2. Create n/2 × n/2 matrices S 1, S 2, … , S 10, each of which is the sum or difference of two submatrices from step 1. Create and
zero the entries of seven n/2 × n/2 matrices P 1, P 2, … , P 7 to hold seven n/2 × n/2 matrix products. All 17 matrices can be created, and the Pi initialized, in Θ( n 2) time.
3. Using the submatrices from step 1 and the matrices S 1, S 2, … ,
S 10 created in step 2, recursively compute each of the seven
matrix products P 1, P 2, … , P 7, taking 7 T ( n/2) time.
4. Update the four submatrices C 11, C 12, C 21, C 22 of the result matrix C by adding or subtracting various Pi matrices, which
takes Θ( n 2) time.
We’ll see the details of steps 2–4 in a moment, but we already have
enough information to set up a recurrence for the running time of
Strassen’s method. As is common, the base case in step 1 takes Θ(1)
time, which we’ll omit when stating the recurrence. When n > 1, steps 1,
2, and 4 take a total of Θ( n 2) time, and step 3 requires seven
multiplications of n/2 × n/2 matrices. Hence, we obtain the following recurrence for the running time of Strassen’s algorithm:
Compared with MATRIX-MULTIPLY-RECURSIVE, we have traded
off one recursive submatrix multiplication for a constant number of
submatrix additions. Once you understand recurrences and their
solutions, you’ll be able to see why this trade-off actually leads to a
lower asymptotic running time. By the master method in Section 4.5, recurrence (4.10) has the solution T ( n) = Θ( n lg 7) = O( n 2.81), beating the Θ( n 3)-time algorithms.
Now, let’s delve into the details. Step 2 creates the following 10
matrices:
S 1 = B 12 – B 22,
S 2 = A 11 + A 12,
S 3 = A 21 + A 22,
S 4 = B 21 – B 11,
S 5 = A 11 + A 22,
S 6 = B 11 + B 22,
S 7 = A 12 – A 22,
S 8 = B 21 + B 22,
S 9 = A 11 – A 21,
S 10 = B 11 + B 12.
This step adds or subtracts n/2 × n/2 matrices 10 times, taking Θ( n 2) time.
Step 3 recursively multiplies n/2 × n/2 matrices 7 times to compute
the following n/2 × n/2 matrices, each of which is the sum or difference
of products of A and B submatrices:
P 1 = A 11 · S 1 (= A 11 · B 12 – A 11 · B 22), P 2 = S 2 · B 22 (= A 11 · B 22 + A 12 · B 22), P 3 = S 3 · B 11 (= A 21 · B 11 + A 22 · B 11), P 4 = A 22 · S 4 (= A 22 · B 21 – A 22 · B 11), P 5 = S 5 · S 6 (= A 11 · B 11 + A 11 · B 22 + A 22 · B 11 + A 22 · B 22), P 6 = S 7 · S 8 (= A 12 · B 21 + A 12 · B 22 – A 22 · B 21 – A 22 · B 22), P 7 = S 9 · S 10 (= A 11 · B 11 + A 11 · B 12 – A 21 · B 11 – A 21 · B 12).
The only multiplications that the algorithm performs are those in the
middle column of these equations. The right-hand column just shows
what these products equal in terms of the original submatrices created
in step 1, but the terms are never explicitly calculated by the algorithm.
Step 4 adds to and subtracts from the four n/2 × n/2 submatrices of
the product C the various Pi matrices created in step 3. We start with
C 11 = C 11 + P 5 + P 4 – P 2 + P 6.
Expanding the calculation on the right-hand side, with the expansion of
each Pi on its own line and vertically aligning terms that cancel out, we
see that the update to C 11 equals
which corresponds to equation (4.5). Similarly, setting
C 12 = C 12 + P 1 + P 2
means that the update to C 12 equals


corresponding to equation (4.6). Setting
C 21 = C 21 + P 3 + P 4
means that the update to C 21 equals
corresponding to equation (4.7). Finally, setting
C 22 = C 22 + P 5 + P 1 – P 3 – P 7
means that the update to C 22 equals
which corresponds to equation (4.8). Altogether, since we add or
subtract n/2× n/2 matrices 12 times in step 4, this step indeed takes Θ( n 2) time.
We can see that Strassen’s remarkable algorithm, comprising steps 1–
4, produces the correct matrix product using 7 submatrix
multiplications and 18 submatrix additions. We can also see that
recurrence (4.10) characterizes its running time. Since Section 4.5 shows that this recurrence has the solution T ( n) = Θ( n lg 7) = o( n 3), Strassen’s method asymptotically beats the Θ( n 3) MATRIX-MULTIPLY and
MATRIX-MULTIPLY-RECURSIVE procedures.
Exercises
Note: You may wish to read Section 4.5 before attempting some of these exercises.
4.2-1
Use Strassen’s algorithm to compute the matrix product
Show your work.
4.2-2
Write pseudocode for Strassen’s algorithm.
4.2-3
What is the largest k such that if you can multiply 3 × 3 matrices using k
multiplications (not assuming commutativity of multiplication), then
you can multiply n × n matrices in o( n lg 7) time? What is the running time of this algorithm?
4.2-4
V. Pan discovered a way of multiplying 68 × 68 matrices using 132,464
multiplications, a way of multiplying 70 × 70 matrices using 143,640
multiplications, and a way of multiplying 72 × 72 matrices using
155,424 multiplications. Which method yields the best asymptotic
running time when used in a divide-and-conquer matrix-multiplication
algorithm? How does it compare with Strassen’s algorithm?
4.2-5
Show how to multiply the complex numbers a + bi and c + d i using only three multiplications of real numbers. The algorithm should take a,
b, c, and d as input and produce the real component ac – bd and the imaginary component ad + bc separately.
4.2-6
Suppose that you have a Θ( nα)-time algorithm for squaring n × n matrices, where α ≥ 2. Show how to use that algorithm to multiply two
different n × n matrices in Θ( nα) time.
4.3 The substitution method for solving recurrences
Now that you have seen how recurrences characterize the running times
of divide-and-conquer algorithms, let’s learn how to solve them. We
start in this section with the substitution method, which is the most general of the four methods in this chapter. The substitution method
comprises two steps:
1. Guess the form of the solution using symbolic constants.
2. Use mathematical induction to show that the solution works,
and find the constants.
To apply the inductive hypothesis, you substitute the guessed solution
for the function on smaller values—hence the name “substitution
method.” This method is powerful, but you must guess the form of the
answer. Although generating a good guess might seem difficult, a little
practice can quickly improve your intuition.
You can use the substitution method to establish either an upper or a
lower bound on a recurrence. It’s usually best not to try to do both at
the same time. That is, rather than trying to prove a Θ-bound directly,
first prove an O-bound, and then prove an Ω-bound. Together, they give
you a Θ-bound (Theorem 3.1 on page 56).
As an example of the substitution method, let’s determine an
asymptotic upper bound on the recurrence:
This recurrence is similar to recurrence (2.3) on page 41 for merge sort,
except for the floor function, which ensures that T ( n) is defined over the integers. Let’s guess that the asymptotic upper bound is the same— T ( n)
= O( n lg n)—and use the substitution method to prove it.
We’ll adopt the inductive hypothesis that T ( n) ≤ c n lg n for all n ≥
n 0, where we’ll choose the specific constants c > 0 and n 0 > 0 later, after we see what constraints they need to obey. If we can establish this
inductive hypothesis, we can conclude that T ( n) = O( n lg n). It would be dangerous to use T ( n) = O( n lg n) as the inductive hypothesis because
the constants matter, as we’ll see in a moment in our discussion of pitfalls.
Assume by induction that this bound holds for all numbers at least
as big as n 0 and less than n. In particular, therefore, if n ≥ 2 n 0, it holds for ⌊ n/2⌊, yielding T (⌊ n/2 ⌊) ≤ c ⌊ n/2⌊ lg(⌊ n/2⌊). Substituting into recurrence (4.11)—hence the name “substitution” method—yields
T ( n) ≤ 2( c ⌊ n/2⌊ lg(⌊ n/2⌊)) + Θ( n)
≤ 2( c( n/2) lg( n/2)) + Θ( n)
= cn lg( n/2) + Θ( n)
= cn lg n – cn lg 2 + Θ( n)
= cn lg n – cn + Θ( n)
≤ cn lg n,
where the last step holds if we constrain the constants n 0 and c to be sufficiently large that for n ≥ 2 n 0, the quantity cn dominates the anonymous function hidden by the Θ( n) term.
We’ve shown that the inductive hypothesis holds for the inductive
case, but we also need to prove that the inductive hypothesis holds for
the base cases of the induction, that is, that T ( n) ≤ cn lg n when n 0 ≤ n < 2 n 0. As long as n 0 > 1 (a new constraint on n 0), we have lg n > 0, which implies that n lg n > 0. So let’s pick n 0 = 2. Since the base case of recurrence (4.11) is not stated explicitly, by our convention, T ( n) is algorithmic, which means that T (2) and T (3) are constant (as they should be if they describe the worst-case running time of any real
program on inputs of size 2 or 3). Picking c = max { T (2), T (3)} yields T (2) ≤ c < (2 lg 2) c and T (3) ≤ c < (3 lg 3) c, establishing the inductive hypothesis for the base cases.
Thus, we have T ( n) ≤ cn lg n for all n ≥ 2, which implies that the solution to recurrence (4.11) is T ( n) = O( n lg n).
In the algorithms literature, people rarely carry out their substitution
proofs to this level of detail, especially in their treatment of base cases.
The reason is that for most algorithmic divide-and-conquer recurrences,
the base cases are all handled in pretty much the same way. You ground




the induction on a range of values from a convenient positive constant
n 0 up to some constant
such that for
, the recurrence
always bottoms out in a constant-sized base case between n 0 and .
(This example used
.) Then, it’s usually apparent, without
spelling out the details, that with a suitably large choice of the leading
constant (such as c for this example), the inductive hypothesis can be made to hold for all the values in the range from n 0 to .
Making a good guess
Unfortunately, there is no general way to correctly guess the tightest
asymptotic solution to an arbitrary recurrence. Making a good guess
takes experience and, occasionally, creativity. Fortunately, learning some
recurrence-solving heuristics, as well as playing around with recurrences
to gain experience, can help you become a good guesser. You can also
use recursion trees, which we’ll see in Section 4.4, to help generate good guesses.
If a recurrence is similar to one you’ve seen before, then guessing a
similar solution is reasonable. As an example, consider the recurrence
T ( n) = 2 T ( n/2 + 17) + Θ( n),
defined on the reals. This recurrence looks somewhat like the merge-sort
recurrence (2.3), but it’s more complicated because of the added “17” in
the argument to T on the right-hand side. Intuitively, however, this
additional term shouldn’t substantially affect the solution to the
recurrence. When n is large, the relative difference between n/2 and n/2 +
17 is not that large: both cut n nearly in half. Consequently, it makes
sense to guess that T ( n) = O( n lg n), which you can verify is correct using the substitution method (see Exercise 4.3-1).
Another way to make a good guess is to determine loose upper and
lower bounds on the recurrence and then reduce your range of
uncertainty. For example, you might start with a lower bound of T ( n) =
Ω( n) for recurrence (4.11), since the recurrence includes the term Θ( n), and you can prove an initial upper bound of T ( n) = O( n 2). Then split your time between trying to lower the upper bound and trying to raise
the lower bound until you converge on the correct, asymptotically tight
solution, which in this case is T ( n) = Θ( n lg n).
A trick of the trade: subtracting a low-order term
Sometimes, you might correctly guess a tight asymptotic bound on the
solution of a recurrence, but somehow the math fails to work out in the
induction proof. The problem frequently turns out to be that the
inductive assumption is not strong enough. The trick to resolving this
problem is to revise your guess by subtracting a lower-order term when
you hit such a snag. The math then often goes through.
Consider the recurrence
defined on the reals. Let’s guess that the solution is T ( n) = O( n) and try to show that T ( n) ≤ cn for n ≥ n 0, where we choose the constants c, n 0 > 0 suitably. Substituting our guess into the recurrence, we obtain
T ( n) ≤ 2( c( n/2)) + Θ(1)
= cn + Θ(1),
which, unfortunately, does not imply that T ( n) ≤ cn for any choice of c.
We might be tempted to try a larger guess, say T ( n) = O( n 2). Although this larger guess works, it provides only a loose upper bound. It turns
out that our original guess of T ( n) = O( n) is correct and tight. In order to show that it is correct, however, we must strengthen our inductive
hypothesis.
Intuitively, our guess is nearly right: we are off only by Θ(1), a lower-
order term. Nevertheless, mathematical induction requires us to prove
the exact form of the inductive hypothesis. Let’s try our trick of
subtracting a lower-order term from our previous guess: T ( n) ≤ cn – d, where d ≥ 0 is a constant. We now have
T ( n) ≤ 2( c( n/2) – d) + Θ(1)
= cn – 2 d + Θ(1)
≤ cn – d – ( d – Θ(1))
≤ cn – d
as long as we choose d to be larger than the anonymous upper-bound
constant hidden by the Θ-notation. Subtracting a lower-order term
works! Of course, we must not forget to handle the base case, which is to
choose the constant c large enough that cn – d dominates the implicit base cases.
You might find the idea of subtracting a lower-order term to be
counterintuitive. After all, if the math doesn’t work out, shouldn’t you
increase your guess? Not necessarily! When the recurrence contains
more than one recursive invocation (recurrence (4.12) contains two), if
you add a lower-order term to the guess, then you end up adding it once
for each of the recursive invocations. Doing so takes you even further
away from the inductive hypothesis. On the other hand, if you subtract a
lower-order term from the guess, then you get to subtract it once for
each of the recursive invocations. In the above example, we subtracted
the constant d twice because the coefficient of T ( n/2) is 2. We ended up with the inequality T ( n) ≤ cn – d – ( d – Θ(1)), and we readily found a suitable value for d.
Avoiding pitfalls
Avoid using asymptotic notation in the inductive hypothesis for the
substitution method because it’s error prone. For example, for
recurrence (4.11), we can falsely “prove” that T ( n) = O( n) if we unwisely adopt T ( n) = O( n) as our inductive hypothesis: T ( n) ≤ 2 · O(⌊ n/2⌊) + Θ( n)
= 2 · O( n) + Θ( n)
= O( n).
wrong!
The problem with this reasoning is that the constant hidden by the O-
notation changes. We can expose the fallacy by repeating the “proof”
using an explicit constant. For the inductive hypothesis, assume that T
( n) ≤ cn for all n ≥ n 0, where c, n 0 > 0 are constants. Repeating the first two steps in the inequality chain yields
T ( n) ≤ 2( c ⌊ n/2⌊) + Θ( n)
≤ cn + Θ( n).
Now, indeed cn + Θ( n) = O( n), but the constant hidden by the O-
notation must be larger than c because the anonymous function hidden
by the Θ( n) is asymptotically positive. We cannot take the third step to
conclude that cn + Θ( n) ≤ cn, thus exposing the fallacy.
When using the substitution method, or more generally
mathematical induction, you must be careful that the constants hidden
by any asymptotic notation are the same constants throughout the
proof. Consequently, it’s best to avoid asymptotic notation in your
inductive hypothesis and to name constants explicitly.
Here’s another fallacious use of the substitution method to show that
the solution to recurrence (4.11) is T ( n) = O( n). We guess T ( n) ≤ cn and then argue
T ( n) ≤ 2( c ⌊ n/2⌊) + Θ( n)
≤ cn + Θ( n)
= O( n),
wrong!
since c is a positive constant. The mistake stems from the difference between our goal—to prove that T ( n) = O( n)—and our inductive hypothesis—to prove that T ( n) ≤ cn. When using the substitution method, or in any inductive proof, you must prove the exact statement
of the inductive hypothesis. In this case, we must explicitly prove that T
( n) ≤ cn to show that T ( n) = O( n).
Exercises
4.3-1
Use the substitution method to show that each of the following
recurrences defined on the reals has the asymptotic solution specified:
a. T ( n) = T ( n – 1) + n has solution T ( n) = O( n 2).
b. T ( n) = T ( n/2) + Θ(1) has solution T ( n) = O(lg n).
c. T ( n) = 2 T ( n/2) + n has solution T ( n) = Θ( n lg n).
d. T ( n) = 2 T ( n/2 + 17) + n has solution T ( n) = O( n lg n).
e. T ( n) = 2 T ( n/3) + Θ( n) has solution T ( n) = Θ( n).
f. T ( n) = 4 T ( n/2) + Θ( n) has solution T ( n) = Θ( n 2).
4.3-2
The solution to the recurrence T ( n) = 4 T ( n/2)+ n turns out to be T ( n)
= Θ( n 2). Show that a substitution proof with the assumption T ( n) ≤ cn 2
fails. Then show how to subtract a lower-order term to make a
substitution proof work.
4.3-3
The recurrence T ( n) = 2 T ( n – 1) + 1 has the solution T ( n) = O(2 n).
Show that a substitution proof fails with the assumption T ( n) ≤ c 2 n, where c > 0 is constant. Then show how to subtract a lower-order term
to make a substitution proof work.
4.4 The recursion-tree method for solving recurrences
Although you can use the substitution method to prove that a solution
to a recurrence is correct, you might have trouble coming up with a
good guess. Drawing out a recursion tree, as we did in our analysis of
the merge-sort recurrence in Section 2.3.2, can help. In a recursion tree, each node represents the cost of a single subproblem somewhere in the
set of recursive function invocations. You typically sum the costs within
each level of the tree to obtain the per-level costs, and then you sum all
the per-level costs to determine the total cost of all levels of the
recursion. Sometimes, however, adding up the total cost takes more
creativity.
A recursion tree is best used to generate intuition for a good guess,
which you can then verify by the substitution method. If you are
meticulous when drawing out a recursion tree and summing the costs,
however, you can use a recursion tree as a direct proof of a solution to a
recurrence. But if you use it only to generate a good guess, you can often
tolerate a small amount of “sloppiness,” which can simplify the math.
When you verify your guess with the substitution method later on, your
math should be precise. This section demonstrates how you can use
recursion trees to solve recurrences, generate good guesses, and gain
intuition for recurrences.
An illustrative example
Let’s see how a recursion tree can provide a good guess for an upper-
bound solution to the recurrence
Figure 4.1 shows how to derive the recursion tree for T ( n) = 3 T ( n/4) +
cn 2, where the constant c > 0 is the upper-bound constant in the Θ( n 2) term. Part (a) of the figure shows T ( n), which part (b) expands into an equivalent tree representing the recurrence. The cn 2 term at the root represents the cost at the top level of recursion, and the three subtrees of
the root represent the costs incurred by the subproblems of size n/4. Part
(c) shows this process carried one step further by expanding each node
with cost T ( n/4) from part (b). The cost for each of the three children of the root is c( n/4)2. We continue expanding each node in the tree by breaking it into its constituent parts as determined by the recurrence.
Figure 4.1 Constructing a recursion tree for the recurrence T ( n) = 3 T ( n/4) + cn 2. Part (a) shows T ( n), which progressively expands in (b)–(d) to form the recursion tree. The fully expanded tree in (d) has height log4 n.
Because subproblem sizes decrease by a factor of 4 every time we go
down one level, the recursion must eventually bottom out in a base case
where n < n 0. By convention, the base case is T ( n) = Θ(1) for n < n 0, where n 0 > 0 is any threshold constant sufficiently large that the


recurrence is well defined. For the purpose of intuition, however, let’s
simplify the math a little. Let’s assume that n is an exact power of 4 and
that the base case is T (1) = Θ(1). As it turns out, these assumptions don’t affect the asymptotic solution.
What’s the height of the recursion tree? The subproblem size for a
node at depth i is n/4 i. As we descend the tree from the root, the subproblem size hits n = 1 when n/4 i = 1 or, equivalently, when i =
log4 n. Thus, the tree has internal nodes at depths 0, 1, 2, … , log4 n – 1
and leaves at depth log4 n.
Part (d) of Figure 4.1 shows the cost at each level of the tree. Each level has three times as many nodes as the level above, and so the
number of nodes at depth i is 3 i. Because subproblem sizes reduce by a
factor of 4 for each level further from the root, each internal node at
depth i = 0, 1, 2, … , log4 n – 1 has a cost of c( n/4 i)2. Multiplying, we see that the total cost of all nodes at a given depth i is 3 ic( n/4 i)2 =
(3/16) icn 2. The bottom level, at depth log4 n, contains
leaves (using equation (3.21) on page 66). Each leaf contributes Θ(1),
leading to a total leaf cost of
.
Now we add up the costs over all levels to determine the cost for the
entire tree:
We’ve derived the guess of T ( n) = O( n 2) for the original recurrence. In this example, the coefficients of cn 2 form a decreasing geometric series.
By equation (A.7), the sum of these coefficients is bounded from above
by the constant 16/13. Since the root’s contribution to the total cost is
cn 2, the cost of the root dominates the total cost of the tree.
In fact, if O( n 2) is indeed an upper bound for the recurrence (as we’ll verify in a moment), then it must be a tight bound. Why? The first
recursive call contributes a cost of Θ( n 2), and so Ω( n 2) must be a lower bound for the recurrence.
Let’s now use the substitution method to verify that our guess is
correct, namely, that T ( n) = O( n 2) is an upper bound for the recurrence T ( n) = 3 T ( n/4)+Θ( n 2). We want to show that T ( n) ≤ dn 2 for some constant d > 0. Using the same constant c > 0 as before, we have
T ( n) ≤ 3 T ( n/4) + cn 2
≤ 3 d( n/4)2 + cn 2
=
≤ dn 2,
where the last step holds if we choose d ≥ (16/13) c.
For the base case of the induction, let n 0 > 0 be a sufficiently large
threshold constant that the recurrence is well defined when T ( n) = Θ(1) for n < n 0. We can pick d large enough that d dominates the constant hidden by the Θ, in which case dn 2 ≥ d ≥ T ( n) for 1 ≤ n < n 0, completing the proof of the base case.
The substitution proof we just saw involves two named constants, c
and d. We named c and used it to stand for the upper-bound constant
hidden and guaranteed to exist by the Θ-notation. We cannot pick c
arbitrarily—it’s given to us—although, for any such c, any constant c′ ≥
c also suffices. We also named d, but we were free to choose any value
for it that fit our needs. In this example, the value of d happened to
depend on the value of c, which is fine, since d is constant if c is constant.
An irregular example
Let’s find an asymptotic upper bound for another, more irregular,
example. Figure 4.2 shows the recursion tree for the recurrence
This recursion tree is unbalanced, with different root-to-leaf paths
having different lengths. Going left at any node produces a subproblem
of one-third the size, and going right produces a subproblem of two-
thirds the size. Let n 0 > 0 be the implicit threshold constant such that T
( n) = Θ(1) for 0 < n < n 0, and let c represent the upper-bound constant hidden by the Θ( n) term for n ≥ n 0. There are actually two n 0 constants here—one for the threshold in the recurrence, and the other for the
threshold in the Θ-notation, so we’ll let n 0 be the larger of the two constants.
Figure 4.2 A recursion tree for the recurrence T ( n) = T ( n/3) + T (2 n/3) + cn.
The height of the tree runs down the right edge of the tree,
corresponding to subproblems of sizes n, (2/3) n, (4/9) n, … , Θ(1) with costs bounded by cn, c(2 n/3), c(4 n/9), … , Θ(1), respectively. We hit the rightmost leaf when (2/3) hn < n 0 ≤ (2/3) h–1 n, which happens when h =
⌊log3/2( n/ n 0)⌊ + 1 since, applying the floor bounds in equation (3.2) on page 64 with x = log3/2 ( n/ n 0), we have (2/3) hn = (2/3)⌊ x⌊+1 n < (2/3) xn
= ( n 0/ n) n = n 0 and (2/3) h–1 n = (2/3)⌊ x⌊ n > (2/3) xn = ( n 0/ n) n = n 0. Thus, the height of the tree is h = Θ(lg n).
We’re now in a position to understand the upper bound. Let’s
postpone dealing with the leaves for a moment. Summing the costs of
internal nodes across each level, we have at most cn per level times the
Θ(lg n) tree height for a total cost of O( n lg n) for all internal nodes.


It remains to deal with the leaves of the recursion tree, which
represent base cases, each costing Θ(1). How many leaves are there? It’s
tempting to upper-bound their number by the number of leaves in a
complete binary tree of height h = ⌊log3/2( n/ n 0)⌊ + 1, since the recursion tree is contained within such a complete binary tree. But this approach
turns out to give us a poor bound. The complete binary tree has 1 node
at the root, 2 nodes at depth 1, and generally 2 k nodes at depth k. Since the height is h = ⌊log3/2 n⌊ + 1, there are
leaves in the complete binary tree, which is an upper bound on the
number of leaves in the recursion tree. Because the cost of each leaf is
Θ(1), this analysis says that the total cost of all leaves in the recursion
tree is
, which is an asymptotically greater bound
than the O( n lg n) cost of all internal nodes. In fact, as we’re about to see, this bound is not tight. The cost of all leaves in the recursion tree is
O( n)—asymptotically less than O( n lg n). In other words, the cost of the internal nodes dominates the cost of the leaves, not vice versa.
Rather than analyzing the leaves, we could quit right now and prove
by substitution that T ( n) = Θ( n lg n). This approach works (see Exercise 4.4-3), but it’s instructive to understand how many leaves this recursion
tree has. You may see recurrences for which the cost of leaves dominates
the cost of internal nodes, and then you’ll be in better shape if you’ve
had some experience analyzing the number of leaves.
To figure out how many leaves there really are, let’s write a
recurrence L( n) for the number of leaves in the recursion tree for T ( n).
Since all the leaves in T ( n) belong either to the left subtree or the right subtree of the root, we have
This recurrence is similar to recurrence (4.14), but it’s missing the Θ( n)
term, and it contains an explicit base case. Because this recurrence omits
the Θ( n) term, it is much easier to solve. Let’s apply the substitution method to show that it has solution L( n) = O( n). Using the inductive
hypothesis L( n) ≤ dn for some constant d > 0, and assuming that the inductive hypothesis holds for all values less than n, we have
L( n) = L( n/3) + L(2 n/3)
≤ dn/3 + 2( dn)/3
≤ dn,
which holds for any d > 0. We can now choose d large enough to handle the base case L( n) = 1 for 0 < n < n 0, for which d = 1 suffices, thereby completing the substitution method for the upper bound on leaves.
(Exercise 4.4-2 asks you to prove that L( n) = Θ( n).)
Returning to recurrence (4.14) for T ( n), it now becomes apparent that the total cost of leaves over all levels must be L( n) · Θ(1) = Θ( n).
Since we have derived the bound of O( n lg n) on the cost of the internal nodes, it follows that the solution to recurrence (4.14) is T ( n) = O( n lg n) + Θ( n) = O( n lg n). (Exercise 4.4-3 asks you to prove that T ( n) = Θ( n lg n).)
It’s wise to verify any bound obtained with a recursion tree by using
the substitution method, especially if you’ve made simplifying
assumptions. But another strategy altogether is to use more-powerful
mathematics, typically in the form of the master method in the next
section (which unfortunately doesn’t apply to recurrence (4.14)) or the
Akra-Bazzi method (which does, but requires calculus). Even if you use
a powerful method, a recursion tree can improve your intuition for
what’s going on beneath the heavy math.
Exercises
4.4-1
For each of the following recurrences, sketch its recursion tree, and
guess a good asymptotic upper bound on its solution. Then use the
substitution method to verify your answer.
a. T ( n) = T ( n/2) + n 3.
b. T ( n) = 4 T ( n/3) + n.
c. T ( n) = 4 T ( n/2) + n.
d. T ( n) = 3 T ( n – 1) + 1.
4.4-2
Use the substitution method to prove that recurrence (4.15) has the
asymptotic lower bound L( n) = Ω( n). Conclude that L( n) = Θ( n).
4.4-3
Use the substitution method to prove that recurrence (4.14) has the
solution T ( n) = Ω( n lg n). Conclude that T ( n) = Θ( n lg n).
4.4-4
Use a recursion tree to justify a good guess for the solution to the
recurrence T ( n) = T ( αn)+ T ((1– α) n)+Θ( n), where α is a constant in the range 0 < α < 1.
4.5 The master method for solving recurrences
The master method provides a “cookbook” method for solving
algorithmic recurrences of the form
where a > 0 and b > 1 are constants. We call f ( n) a driving function, and we call a recurrence of this general form a master recurrence. To use the
master method, you need to memorize three cases, but then you’ll be
able to solve many master recurrences quite easily.
A master recurrence describes the running time of a divide-and-
conquer algorithm that divides a problem of size n into a subproblems,
each of size n/ b < n. The algorithm solves the a subproblems recursively, each in T ( n/ b) time. The driving function f ( n) encompasses the cost of dividing the problem before the recursion, as well as the cost of
combining the results of the recursive solutions to subproblems. For
example, the recurrence arising from Strassen’s algorithm is a master
recurrence with a = 7, b = 2, and driving function f ( n) = Θ( n 2).
As we have mentioned, in solving a recurrence that describes the
running time of an algorithm, one technicality that we’d often prefer to
ignore is the requirement that the input size n be an integer. For


example, we saw that the running time of merge sort can be described by
recurrence (2.3), T ( n) = 2 T ( n/2) + Θ( n), on page 41. But if n is an odd number, we really don’t have two problems of exactly half the size.
Rather, to ensure that the problem sizes are integers, we round one
subproblem down to size ⌊ n/2⌊ and the other up to size ⌈ n/2⌉, so the true recurrence is T ( n) = T (⌈ n/2⌉ + T (⌊ n/2⌊) + Θ( n). But this floors-and-ceilings recurrence is longer to write and messier to deal with than
recurrence (2.3), which is defined on the reals. We’d rather not worry
about floors and ceilings, if we don’t have to, especially since the two
recurrences have the same Θ( n lg n) solution.
The master method allows you to state a master recurrence without
floors and ceilings and implicitly infer them. No matter how the
arguments are rounded up or down to the nearest integer, the
asymptotic bounds that it provides remain the same. Moreover, as we’ll
see in Section 4.6, if you define your master recurrence on the reals, without implicit floors and ceilings, the asymptotic bounds still don’t
change. Thus you can ignore floors and ceilings for master recurrences.
Section 4.7 gives sufficient conditions for ignoring floors and ceilings in more general divide-and-conquer recurrences.
The master theorem
The master method depends upon the following theorem.
Theorem 4.1 (Master theorem)
Let a > 0 and b > 1 be constants, and let f ( n) be a driving function that is defined and nonnegative on all sufficiently large reals. Define the
recurrence T ( n) on n ∈ N by
where aT ( n/ b) actually means a′ T (⌊ n/ b⌊) + a″ T (⌈ n/ b⌉) for some constants a′ ≥ 0 and a″ ≥ 0 satisfying a = a′ + a″. Then the asymptotic behavior of T ( n) can be characterized as follows:
1. If there exists a constant ϵ > 0 such that
, then
.








2. If there exists a constant k ≥ 0 such that
,
then
.
3. If there exists a constant ϵ > 0 such that
, and if
f ( n) additionally satisfies the regularity condition af ( n/ b) ≤ cf ( n) for some constant c < 1 and all sufficiently large n, then T ( n) =
Θ( f ( n)).
▪
Before applying the master theorem to some examples, let’s spend a
few moments to understand broadly what it says. The function
is
called the watershed function. In each of the three cases, we compare the
driving function f ( n) to the watershed function
. Intuitively, if the
watershed function grows asymptotically faster than the driving
function, then case 1 applies. Case 2 applies if the two functions grow at
nearly the same asymptotic rate. Case 3 is the “opposite” of case 1,
where the driving function grows asymptotically faster than the
watershed function. But the technical details matter.
In case 1, not only must the watershed function grow asymptotically
faster than the driving function, it must grow polynomially faster. That
is, the watershed function
must be asymptotically larger than the
driving function f ( n) by at least a factor of Θ( nϵ) for some constant ϵ > 0. The master theorem then says that the solution is
. In
this case, if we look at the recursion tree for the recurrence, the cost per
level grows at least geometrically from root to leaves, and the total cost
of leaves dominates the total cost of the internal nodes.
In case 2, the watershed and driving functions grow at nearly the
same asymptotic rate. But more specifically, the driving function grows
faster than the watershed function by a factor of Θ(lg k n), where k ≥ 0.
The master theorem says that we tack on an extra lg n factor to f ( n), yielding the solution
. In this case, each level of
the recursion tree costs approximately the same—
—and
there are Θ(lg n) levels. In practice, the most common situation for case
2 occurs when k = 0, in which case the watershed and driving functions



have the same asymptotic growth, and the solution is
. Case 3 mirrors case 1. Not only must the driving function grow
asymptotically faster than the watershed function, it must grow
polynomially faster. That is, the driving function f ( n) must be asymptotically larger than the watershed function
by at least a
factor of Θ( nϵ) for some constant ϵ > 0. Moreover, the driving function must satisfy the regularity condition that af ( n/ b) ≤ cf ( n). This condition is satisfied by most of the polynomially bounded functions that you’re
likely to encounter when applying case 3. The regularity condition
might not be satisfied if the driving function grows slowly in local areas,
yet relatively quickly overall. (Exercise 4.5-5 gives an example of such a
function.) For case 3, the master theorem says that the solution is T ( n)
= Θ( f ( n)). If we look at the recursion tree, the cost per level drops at least geometrically from the root to the leaves, and the root cost
dominates the cost of all other nodes.
It’s worth looking again at the requirement that there be polynomial
separation between the watershed function and the driving function for
either case 1 or case 3 to apply. The separation doesn’t need to be much,
but it must be there, and it must grow polynomially. For example, for
the recurrence T ( n) = 4 T ( n/2) + n 1.99 (admittedly not a recurrence you’re likely to see when analyzing an algorithm), the watershed
function is
. Hence the driving function f ( n) = n 1.99 is
polynomially smaller by a factor of n 0.01. Thus case 1 applies with ϵ =
0.01.
Using the master method
To use the master method, you determine which case (if any) of the
master theorem applies and write down the answer.
As a first example, consider the recurrence T ( n) = 9 T ( n/3) + n. For this recurrence, we have a = 9 and b = 3, which implies that
. Since f ( n) = n = O( n 2– ϵ) for any constant ϵ ≤ 1,







we can apply case 1 of the master theorem to conclude that the solution
is T ( n) = Θ( n 2).
Now consider the recurrence T ( n) = T (2 n/3) + 1, which has a = 1
and b = 3/2, which means that the watershed function is
.
Case
2
applies
since
. The solution to the recurrence is T ( n)
= Θ(lg n).
For the recurrence T ( n) = 3 T ( n/4) + n lg n, we have a = 3 and b = 4, which
means
that
.
Since
, where ϵ can be as large as approximately 0.2,
case 3 applies as long as the regularity condition holds for f ( n). It does, because for sufficiently large n, we have that af ( n/ b) = 3( n/4) lg( n/4) ≤
(3/4) n lg n = cf ( n) for c = 3/4. By case 3, the solution to the recurrence is T ( n) = Θ( n lg n).
Next, let’s look at the recurrence T ( n) = 2 T ( n/2) + n lg n, where we have a = 2, b = 2, and
. Case 2 applies since
. We conclude that the solution is T ( n) = Θ( n
lg2 n).
We can use the master method to solve the recurrences we saw in
Sections 2.3.2, 4.1, and 4.2.
Recurrence (2.3), T ( n) = 2 T ( n/2) + Θ( n), on page 41, characterizes the running time of merge sort. Since a = 2 and b = 2, the watershed
function is
. Case 2 applies because f ( n) = Θ( n), and
the solution is T ( n) = Θ( n lg n).
Recurrence (4.9), T ( n) = 8 T ( n/2) + Θ(1), on page 84, describes the running time of the simple recursive algorithm for matrix
multiplication. We have a = 8 and b = 2, which means that the watershed function is
. Since n 3 is polynomially
larger than the driving function f ( n) = Θ(1)—indeed, we have f ( n) =
O( n 3– ϵ) for any positive ϵ < 3—case 1 applies. We conclude that T ( n)
= Θ( n 3).
Finally, recurrence (4.10), T ( n) = 7 T ( n/2) + Θ( n 2), on page 87, arose from the analysis of Strassen’s algorithm for matrix multiplication. For






this recurrence, we have a = 7 and b = 2, and the watershed function is
. Observing that lg 7 = 2.807355 …, we can let ϵ = 0.8 and
bound the driving function f ( n) = Θ( n 2) = O( n lg 7– ϵ). Case 1 applies with solution T ( n) = Θ( n lg 7).
When the master method doesn’t apply
There are situations where you can’t use the master theorem. For
example, it can be that the watershed function and the driving function
cannot be asymptotically compared. We might have that
for an infinite number of values of n but also that
for an
infinite number of different values of n. As a practical matter, however,
most of the driving functions that arise in the study of algorithms can
be meaningfully compared with the watershed function. If you
encounter a master recurrence for which that’s not the case, you’ll have
to resort to substitution or other methods.
Even when the relative growths of the driving and watershed
functions can be compared, the master theorem does not cover all the
possibilities. There is a gap between cases 1 and 2 when
,
yet the watershed function does not grow polynomially faster than the
driving function. Similarly, there is a gap between cases 2 and 3 when
and the driving function grows more than
polylogarithmically faster than the watershed function, but it does not
grow polynomially faster. If the driving function falls into one of these
gaps, or if the regularity condition in case 3 fails to hold, you’ll need to
use something other than the master method to solve the recurrence.
As an example of a driving function falling into a gap, consider the
recurrence T ( n) = 2 T ( n/2) + n/lg n. Since a = 2 and b = 2, the watershed function is
. The driving function is
n/lg n = o( n), which means that it grows asymptotically more slowly than the watershed function n. But n/lg n grows only logarithmically slower than n, not polynomially slower. More precisely, equation (3.24) on page 67 says that lg n = o( nϵ) for any constant ϵ > 0, which means that 1/lg n = ω( n– ϵ) and
. Thus no constant



ϵ > 0 exists such that
, which is required for case 1 to
apply. Case 2 fails to apply as well, since
, where k
= –1, but k must be nonnegative for case 2 to apply.
To solve this kind of recurrence, you must use another method, such
as the substitution method (Section 4.3) or the Akra-Bazzi method (Section 4.7). (Exercise 4.6-3 asks you to show that the answer is Θ( n lg lg n).) Although the master theorem doesn’t handle this particular
recurrence, it does handle the overwhelming majority of recurrences
that tend to arise in practice.
Exercises
4.5-1
Use the master method to give tight asymptotic bounds for the
following recurrences.