r 1 = 1 from solution 1 = 1
(no cuts),
r 2 = 5 from solution 2 = 2
(no cuts),
r 3 = 8 from solution 3 = 3
(no cuts),
r 4 = 10 from solution 4 = 2 + 2,
r 5 = 13 from solution 5 = 2 + 3,
r 6 = 17 from solution 6 = 6
(no cuts),
r 7 = 18 from solution 7 = 1 + 6 or 7 = 2 + 2 + 3,
r 8 = 22 from solution 8 = 2 + 6,
r 9 = 25 from solution 9 = 3 + 6,
r 10 = 30 from solution 10 = 10
(no cuts).
More generally, we can express the values rn for n ≥ 1 in terms of optimal revenues from shorter rods:
The first argument, pn, corresponds to making no cuts at all and selling
the rod of length n as is. The other n − 1 arguments to max correspond
to the maximum revenue obtained by making an initial cut of the rod
into two pieces of size i and n − i, for each i = 1, 2, …, n − 1, and then optimally cutting up those pieces further, obtaining revenues ri and rn− i from those two pieces. Since you don’t know ahead of time which value
of i optimizes revenue, you have to consider all possible values for i and pick the one that maximizes revenue. You also have the option of
picking no i at all if the greatest revenue comes from selling the rod uncut.
To solve the original problem of size n, you solve smaller problems of
the same type. Once you make the first cut, the two resulting pieces form
independent instances of the rod-cutting problem. The overall optimal
solution incorporates optimal solutions to the two resulting
subproblems, maximizing revenue from each of those two pieces. We say
that the rod-cutting problem exhibits optimal substructure: optimal
solutions to a problem incorporate optimal solutions to related
subproblems, which you may solve independently.
In a related, but slightly simpler, way to arrange a recursive structure
for the rod-cutting problem, let’s view a decomposition as consisting of
a first piece of length i cut off the left-hand end, and then a right-hand
remainder of length n − i. Only the remainder, and not the first piece, may be further divided. Think of every decomposition of a length- n rod
in this way: as a first piece followed by some decomposition of the
remainder. Then we can express the solution with no cuts at all by
saying that the first piece has size i = n and revenue pn and that the remainder has size 0 with corresponding revenue r 0 = 0. We thus obtain
the following simpler version of equation (14.1):
In this formulation, an optimal solution embodies the solution to only
one related subproblem—the remainder—rather than two.
Recursive top-down implementation
The CUT-ROD procedure on the following page implements the
computation implicit in equation (14.2) in a straightforward, top-down,
recursive manner. It takes as input an array p[1 : n] of prices and an integer n, and it returns the maximum revenue possible for a rod of length n. For length n = 0, no revenue is possible, and so CUT-ROD
returns 0 in line 2. Line 3 initializes the maximum revenue q to −∞, so
that the for loop in lines 4–5 correctly computes q = max { pi + CUT-
ROD( p, n − i) : 1 ≤ i ≤ n}. Line 6 then returns this value. A simple
induction on n proves that this answer is equal to the desired answer rn, using equation (14.2).
CUT-ROD( p, n)
1 if n == 0
2
return 0
3 q = −∞
4 for i = 1 to n
5
q = max { q, p[ i] + CUT-ROD( p, n − i)}
6 return q
If you code up CUT-ROD in your favorite programming language
and run it on your computer, you’ll find that once the input size
becomes moderately large, your program takes a long time to run. For n
= 40, your program may take several minutes and possibly more than an
hour. For large values of n, you’ll also discover that each time you increase n by 1, your program’s running time approximately doubles.
Why is CUT-ROD so inefficient? The problem is that CUT-ROD
calls itself recursively over and over again with the same parameter
values, which means that it solves the same subproblems repeatedly.
Figure 14.3 shows a recursion tree demonstrating what happens for n =
4: CUT-ROD( p, n) calls CUT-ROD( p, n − i) for i = 1, 2, …, n.
Equivalently, CUT-ROD( p, n) calls CUT-ROD( p, j) for each j = 0, 1,
…, n − 1. When this process unfolds recursively, the amount of work
done, as a function of n, grows explosively.
To analyze the running time of CUT-ROD, let T( n) denote the total
number of calls made to CUT-ROD( p, n) for a particular value of n.
This expression equals the number of nodes in a subtree whose root is
labeled n in the recursion tree. The count includes the initial call at its
root. Thus, T(0) = 1 and

The initial 1 is for the call at the root, and the term T( j) counts the number of calls (including recursive calls) due to the call CUT-ROD( p,
n − i), where j = n − i. As Exercise 14.1-1 asks you to show, and so the running time of CUT-ROD is exponential in n.
In retrospect, this exponential running time is not so surprising.
CUT-ROD explicitly considers all possible ways of cutting up a rod of
length n. How many ways are there? A rod of length n has n − 1
potential locations to cut. Each possible way to cut up the rod makes a
cut at some subset of these n − 1 locations, including the empty set, which makes for no cuts. Viewing each cut location as a distinct member
of a set of n − 1 elements, you can see that there are 2 n−1 subsets. Each leaf in the recursion tree of Figure 14.3 corresponds to one possible way to cut up the rod. Hence, the recursion tree has 2 n−1 leaves. The labels
on the simple path from the root to a leaf give the sizes of each
remaining right-hand piece before making each cut. That is, the labels
give the corresponding cut points, measured from the right-hand end of
the rod.
Figure 14.3 The recursion tree showing recursive calls resulting from a call CUT-ROD( p, n) for n = 4. Each node label gives the size n of the corresponding subproblem, so that an edge from a parent with label s to a child with label t corresponds to cutting off an initial piece of size s − t and leaving a remaining subproblem of size t. A path from the root to a leaf corresponds to one of the 2 n−1 ways of cutting up a rod of length n. In general, this recursion tree has 2 n nodes and 2 n−1 leaves.
Using dynamic programming for optimal rod cutting
Now, let’s see how to use dynamic programming to convert CUT-ROD
into an efficient algorithm.
The dynamic-programming method works as follows. Instead of
solving the same subproblems repeatedly, as in the naive recursion
solution, arrange for each subproblem to be solved only once. There’s actually an obvious way to do so: the first time you solve a subproblem,
save its solution. If you need to refer to this subproblem’s solution again
later, just look it up, rather than recomputing it.
Saving subproblem solutions comes with a cost: the additional
memory needed to store solutions. Dynamic programming thus serves
as an example of a time-memory trade-off. The savings may be dramatic.
For example, we’re about to use dynamic programming to go from the
exponential-time algorithm for rod cutting down to a Θ( n 2)-time
algorithm. A dynamic-programming approach runs in polynomial time
when the number of distinct subproblems involved is polynomial in the
input size and you can solve each such subproblem in polynomial time.
There are usually two equivalent ways to implement a dynamic-
programming approach. Solutions to the rod-cutting problem illustrate
both of them.
The first approach is top-down with memoization. 2 In this approach, you write the procedure recursively in a natural manner, but modified to
save the result of each subproblem (usually in an array or hash table).
The procedure now first checks to see whether it has previously solved
this subproblem. If so, it returns the saved value, saving further
computation at this level. If not, the procedure computes the value in
the usual manner but also saves it. We say that the recursive procedure
has been memoized: it “remembers” what results it has computed
previously.
The second approach is the bottom-up method. This approach
typically depends on some natural notion of the “size” of a subproblem,
such that solving any particular subproblem depends only on solving
“smaller” subproblems. Solve the subproblems in size order, smallest
first, storing the solution to each subproblem when it is first solved. In
this way, when solving a particular subproblem, there are already saved solutions for all of the smaller subproblems its solution depends upon.
You need to solve each subproblem only once, and when you first see it,
you have already solved all of its prerequisite subproblems.
These two approaches yield algorithms with the same asymptotic
running time, except in unusual circumstances where the top-down
approach does not actually recurse to examine all possible subproblems.
The bottom-up approach often has much better constant factors, since
it has lower overhead for procedure calls.
The procedures MEMOIZED-CUT-ROD and MEMOIZED-CUT-
ROD-AUX on the facing page demonstrate how to memoize the top-
down CUT-ROD procedure. The main procedure MEMOIZED-CUT-
ROD initializes a new auxiliary array r[0 : n] with the value −∞ which, since known revenue values are always nonnegative, is a convenient
choice for denoting “unknown.” MEMOIZED-CUT-ROD then calls
its helper procedure, MEMOIZED-CUT-ROD-AUX, which is just the
memoized version of the exponential-time procedure, CUT-ROD. It
first checks in line 1 to see whether the desired value is already known
and, if it is, then line 2 returns it. Otherwise, lines 3–7 compute the
desired value q in the usual manner, line 8 saves it in r[ n], and line 9
returns it.
The bottom-up version, BOTTOM-UP-CUT-ROD on the next
page, is even simpler. Using the bottom-up dynamic-programming
approach, BOTTOM-UP-CUT-ROD takes advantage of the natural
ordering of the subproblems: a subproblem of size i is “smaller” than a
subproblem of size j if i < j. Thus, the procedure solves subproblems of sizes j = 0, 1, …, n, in that order.
MEMOIZED-CUT-ROD( p, n)
1 let r[0 : n] be a new array
// will remember solution values in r
2 for i = 0 to n
3
r[ i] = −∞
4 return MEMOIZED-CUT-ROD-AUX( p, n, r)
MEMOIZED-CUT-ROD-AUX( p, n, r)
// already have a solution for length n?
2
return r[ n]
3 if n == 0
4
q = 0
5 else q = −∞
6
for i = 1 to n // i is the position of the first cut
7
q = max { q, p[ i] + MEMOIZED-CUT-ROD-AUX( p, n − i, r)}
8 r[ n] = q
// remember the solution value for length n
9 return q
BOTTOM-UP-CUT-ROD( p, n)
1 let r[0 : n] be a new array // will remember solution values in r 2 r[0] = 0
3 for j = 1 to n
// for increasing rod length j
4
q = −∞
5
for i = 1 to j
// i is the position of the first cut
6
q = max { q, p[ i] + r[ j − i]}
7
r[ j] = q
// remember the solution value for length j
8 return r[ n]
Line 1 of BOTTOM-UP-CUT-ROD creates a new array r[0 : n] in
which to save the results of the subproblems, and line 2 initializes r[0] to
0, since a rod of length 0 earns no revenue. Lines 3–6 solve each
subproblem of size j, for j = 1, 2, …, n, in order of increasing size. The approach used to solve a problem of a particular size j is the same as
that used by CUT-ROD, except that line 6 now directly references array
entry r[ j − i] instead of making a recursive call to solve the subproblem of size j − i. Line 7 saves in r[ j] the solution to the subproblem of size j.
Finally, line 8 returns r[ n], which equals the optimal value rn.
The bottom-up and top-down versions have the same asymptotic
running time. The running time of BOTTOM-UP-CUT-ROD is Θ( n 2),
due to its doubly nested loop structure. The number of iterations of its
inner for loop, in lines 5–6, forms an arithmetic series. The running time
of its top-down counterpart, MEMOIZEDCUT-ROD, is also Θ( n 2),
although this running time may be a little harder to see. Because a
recursive call to solve a previously solved subproblem returns
immediately, MEMOIZED-CUT-ROD solves each subproblem just
once. It solves subproblems for sizes 0, 1, …, n. To solve a subproblem
of size n, the for loop of lines 6–7 iterates n times. Thus, the total number of iterations of this for loop, over all recursive calls of
MEMOIZED-CUT-ROD, forms an arithmetic series, giving a total of
Θ( n 2) iterations, just like the inner for loop of BOTTOM-UP-CUT-
ROD. (We actually are using a form of aggregate analysis here. We’ll see
aggregate analysis in detail in Section 16.1.)
Figure 14.4 The subproblem graph for the rod-cutting problem with n = 4. The vertex labels give the sizes of the corresponding subproblems. A directed edge ( x, y) indicates that solving subproblem x requires a solution to subproblem y. This graph is a reduced version of the recursion tree of Figure 14.3, in which all nodes with the same label are collapsed into a single vertex and all edges go from parent to child.
Subproblem graphs
When you think about a dynamic-programming problem, you need to
understand the set of subproblems involved and how subproblems
depend on one another.
The subproblem graph for the problem embodies exactly this
information. Figure 14.4 shows the subproblem graph for the rod-cutting problem with n = 4. It is a directed graph, containing one vertex
for each distinct subproblem. The subproblem graph has a directed edge
from the vertex for subproblem x to the vertex for subproblem y if
determining an optimal solution for subproblem x involves directly considering an optimal solution for subproblem y. For example, the
subproblem graph contains an edge from x to y if a top-down recursive
procedure for solving x directly calls itself to solve y. You can think of the subproblem graph as a “reduced” or “collapsed” version of the
recursion tree for the top-down recursive method, with all nodes for the
same subproblem coalesced into a single vertex and all edges directed
from parent to child.
The bottom-up method for dynamic programming considers the
vertices of the subproblem graph in such an order that you solve the
subproblems y adjacent to a given subproblem x before you solve subproblem x. (As Section B.4 notes, the adjacency relation in a directed graph is not necessarily symmetric.) Using terminology that
we’ll see in Section 20.4, in a bottom-up dynamic-programming algorithm, you consider the vertices of the subproblem graph in an
order that is a “reverse topological sort,” or a “topological sort of the
transpose” of the subproblem graph. In other words, no subproblem is
considered until all of the subproblems it depends upon have been
solved. Similarly, using notions that we’ll visit in Section 20.3, you can view the top-down method (with memoization) for dynamic
programming as a “depth-first search” of the subproblem graph.
The size of the subproblem graph G = ( V, E) can help you determine the running time of the dynamic-programming algorithm. Since you
solve each subproblem just once, the running time is the sum of the
times needed to solve each subproblem. Typically, the time to compute
the solution to a subproblem is proportional to the degree (number of
outgoing edges) of the corresponding vertex in the subproblem graph,
and the number of subproblems is equal to the number of vertices in the
subproblem graph. In this common case, the running time of dynamic
programming is linear in the number of vertices and edges.
Reconstructing a solution
The procedures MEMOIZED-CUT-ROD and BOTTOM-UP-CUT-
ROD return the value of an optimal solution to the rod-cutting
problem, but they do not return the solution itself: a list of piece sizes.
Let’s see how to extend the dynamic-programming approach to
record not only the optimal value computed for each subproblem, but
also a choice that led to the optimal value. With this information, you
can readily print an optimal solution. The procedure EXTENDED-
BOTTOM-UP-CUT-ROD on the next page computes, for each rod size
j, not only the maximum revenue rj, but also sj, the optimal size of the first piece to cut off. It’s similar to BOTTOM-UP-CUT-ROD, except
that it creates the array s in line 1, and it updates s[ j] in line 8 to hold the optimal size i of the first piece to cut off when solving a subproblem of
size j.
The procedure PRINT-CUT-ROD-SOLUTION on the following
page takes as input an array p[1 : n] of prices and a rod size n. It calls EXTENDED-BOTTOM-UP-CUT-ROD to compute the array s[1 : n]
of optimal first-piece sizes. Then it prints out the complete list of piece
sizes in an optimal decomposition of a rod of length n. For the sample
price chart appearing in Figure 14.1, the call EXTENDED-BOTTOM-
UP-CUT-ROD( p, 10) returns the following arrays:
i
0 1 2 3 4 5 6
7 8 9
10
r[ i] 0 1 5 8 10 13 17 18 22 25 30
s[ i]
1 2 3 2 2 6
1 2 3
10
A call to PRINT-CUT-ROD-SOLUTION( p, 10) prints just 10, but a
call with n = 7 prints the cuts 1 and 6, which correspond to the first optimal decomposition for r 7 given earlier.
EXTENDED-BOTTOM-UP-CUT-ROD( p, n)
1 let r[0 : n] and s[1 : n] be new arrays
2 r[0] = 0
3 for j = 1 to n
// for increasing rod length j
4
q = −∞
5
for i = 1 to j
// i is the position of the first cut
6
if q < p[ i] + r[ j − i]
7
q = p[ i] + r[ j − i]
8
s[ j] = i
// best cut location so far for length j
9
r[ j] = q
// remember the solution value for length j
PRINT-CUT-ROD-SOLUTION( p, n)
1 ( r, s) = EXTENDED-BOTTOM-UP-CUT-ROD( p, n)
2 while n > 0
3
print s[ n]
// cut location for length n
4
n = n − s[ n]
// length of the remainder of the rod
Exercises
14.1-1
Show that equation (14.4) follows from equation (14.3) and the initial
condition T(0) = 1.
14.1-2
Show, by means of a counterexample, that the following “greedy”
strategy does not always determine an optimal way to cut rods. Define
the density of a rod of length i to be pi/ i, that is, its value per inch. The greedy strategy for a rod of length n cuts off a first piece of length i, where 1 ≤ i ≤ n, having maximum density. It then continues by applying
the greedy strategy to the remaining piece of length n − i.
14.1-3
Consider a modification of the rod-cutting problem in which, in
addition to a price pi for each rod, each cut incurs a fixed cost of c. The revenue associated with a solution is now the sum of the prices of the
pieces minus the costs of making the cuts. Give a dynamic-programming
algorithm to solve this modified problem.
14.1-4
Modify CUT-ROD and MEMOIZED-CUT-ROD-AUX so that their
for loops go up to only ⌊ n/2⌊, rather than up to n. What other changes
to the procedures do you need to make? How are their running times
affected?
14.1-5
Modify MEMOIZED-CUT-ROD to return not only the value but the
actual solution.
14.1-6
The Fibonacci numbers are defined by recurrence (3.31) on page 69.
Give an O( n)-time dynamic-programming algorithm to compute the n th Fibonacci number. Draw the subproblem graph. How many vertices
and edges does the graph contain?
14.2 Matrix-chain multiplication
Our next example of dynamic programming is an algorithm that solves
the problem of matrix-chain multiplication. Given a sequence (chain)
〈 A 1, A 2, …, An〉 of n matrices to be multiplied, where the matrices aren’t necessarily square, the goal is to compute the product
using the standard algorithm3 for multiplying rectangular matrices, which we’ll see in a moment, while minimizing the number of scalar
multiplications.
You can evaluate the expression (14.5) using the algorithm for
multiplying pairs of rectangular matrices as a subroutine once you have
parenthesized it to resolve all ambiguities in how the matrices are
multiplied together. Matrix multiplication is associative, and so all
parenthesizations yield the same product. A product of matrices is fully
parenthesized if it is either a single matrix or the product of two fully parenthesized matrix products, surrounded by parentheses. For
example, if the chain of matrices is 〈 A 1, A 2, A 3, A 4〉, then you can fully parenthesize the product A 1 A 2 A 3 A 4 in five distinct ways: ( A 1( A 2( A 3 A 4))),
( A 1(( A 2 A 3) A 4)),
(( A 1 A 2)( A 3 A 4)),
(( A 1( A 2 A 3)) A 4),
How you parenthesize a chain of matrices can have a dramatic
impact on the cost of evaluating the product. Consider first the cost of
multiplying two rectangular matrices. The standard algorithm is given
by the procedure RECTANGULAR-MATRIX-MULTIPLY, which
generalizes the square-matrix multiplication procedure MATRIX-
MULTIPLY on page 81. The RECTANGULAR-MATRIX-
MULTIPLY procedure computes C = C + A · B for three matrices A =
( aij), B = ( bij), and C = ( cij), where A is p × q, B is q × r, and C is p × r.
RECTANGULAR-MATRIX-MULTIPLY( A, B, C, p, q, r) 1 for i = 1 to p
2
for j = 1 to r
3
for k = 1 to q
4
cij = cij + aik · bkj
The running time of RECTANGULAR-MATRIX-MULTIPLY is
dominated by the number of scalar multiplications in line 4, which is
pqr. Therefore, we’ll consider the cost of multiplying matrices to be the
number of scalar multiplications. (The number of scalar multiplications
dominates even if we consider initializing C = 0 to perform just C = A
· B.)To illustrate the different costs incurred by different
parenthesizations of a matrix product, consider the problem of a chain
〈 A 1, A 2, A 3〉 of three matrices. Suppose that the dimensions of the matrices are 10 × 100, 100 × 5, and 5 × 50, respectively. Multiplying
according to the parenthesization (( A 1 A 2) A 3) performs 10 · 100 · 5 =
5000 scalar multiplications to compute the 10 × 5 matrix product A 1 A 2, plus another 10 · 5 · 50 = 2500 scalar multiplications to multiply this
matrix by A 3, for a total of 7500 scalar multiplications. Multiplying according to the alternative parenthesization ( A 1( A 2 A 3)) performs 100 ·
5 · 50 = 25,000 scalar multiplications to compute the 100 × 50 matrix
product A 2 A 3, plus another 10 · 100 · 50 = 50,000 scalar multiplications to multiply A 1 by this matrix, for a total of 75,000 scalar
multiplications. Thus, computing the product according to the first
parenthesization is 10 times faster.
We state the matrix-chain multiplication problem as follows: given a
chain 〈 A 1, A 2, …, An〉 of n matrices, where for i = 1, 2, …, n, matrix Ai has dimension pi−1 × pi, fully parenthesize the product A 1 A 2 ⋯ An in a way that minimizes the number of scalar multiplications. The input is
the sequence of dimensions 〈 p 0, p 1, p 2, …, pn〉.
The matrix-chain multiplication problem does not entail actually
multiplying matrices. The goal is only to determine an order for
multiplying matrices that has the lowest cost. Typically, the time
invested in determining this optimal order is more than paid for by the
time saved later on when actually performing the matrix multiplications
(such as performing only 7500 scalar multiplications instead of 75,000).
Counting the number of parenthesizations
Before solving the matrix-chain multiplication problem by dynamic
programming, let us convince ourselves that exhaustively checking all
possible parenthesizations is not an efficient algorithm. Denote the
number of alternative parenthesizations of a sequence of n matrices by
P( n). When n = 1, the sequence consists of just one matrix, and therefore there is only one way to fully parenthesize the matrix product.
When n ≥ 2, a fully parenthesized matrix product is the product of two
fully parenthesized matrix subproducts, and the split between the two
subproducts may occur between the k th and ( k + 1)st matrices for any k
= 1, 2, …, n − 1. Thus, we obtain the recurrence
Problem 12-4 on page 329 asked you to show that the solution to a
similar recurrence is the sequence of Catalan numbers, which grows as
Ω(4 n/ n 3/2). A simpler exercise (see Exercise 14.2-3) is to show that the solution to the recurrence (14.6) is Ω(2 n). The number of solutions is thus exponential in n, and the brute-force method of exhaustive search
makes for a poor strategy when determining how to optimally
parenthesize a matrix chain.
Applying dynamic programming
Let’s use the dynamic-programming method to determine how to
optimally parenthesize a matrix chain, by following the four-step
sequence that we stated at the beginning of this chapter:
1. Characterize the structure of an optimal solution.
2. Recursively define the value of an optimal solution.
3. Compute the value of an optimal solution.
4. Construct an optimal solution from computed information.
We’ll go through these steps in order, demonstrating how to apply each
step to the problem.
Step 1: The structure of an optimal parenthesization
In the first step of the dynamic-programming method, you find the
optimal substructure and then use it to construct an optimal solution to
the problem from optimal solutions to subproblems. To perform this
step for the matrix-chain multiplication problem, it’s convenient to first
introduce some notation. Let Ai: j, where i ≤ j, denote the matrix that results from evaluating the product AiAi+1 ⋯ Aj. If the problem is nontrivial, that is, i < j, then to parenthesize the product AiAi+1 ⋯ Aj, the product must split between Ak and Ak+1 for some integer k in the range i ≤ k < j. That is, for some value of k, first compute the matrices Ai: k and Ak+1: j, and then multiply them together to produce the final product Ai: j. The cost of parenthesizing this way is the cost of
computing the matrix Ai: k, plus the cost of computing Ak+1: j, plus the cost of multiplying them together.
The optimal substructure of this problem is as follows. Suppose that
to optimally parenthesize AiAi+1 ⋯ Aj, you split the product between
Ak and Ak+1. Then the way you parenthesize the “prefix” subchain AiAi+1 ⋯ Ak within this optimal parenthesization of AiAi+1 ⋯ Aj must be an optimal parenthesization of AiAi+1 ⋯ Ak. Why? If there were a less costly way to parenthesize AiAi+1 ⋯ Ak, then you could substitute that parenthesization in the optimal parenthesization of
AiAi+1 ⋯ Aj to produce another way to parenthesize AiAi+1 ⋯ Aj whose cost is lower than the optimum: a contradiction. A similar
observation holds for how to parenthesize the subchain Ak+1 Ak+2 ⋯
Aj in the optimal parenthesization of AiAi+1 ⋯ Aj: it must be an optimal parenthesization of Ak+1 Ak+2 ⋯ Aj.
Now let’s use the optimal substructure to show how to construct an
optimal solution to the problem from optimal solutions to subproblems.
Any solution to a nontrivial instance of the matrix-chain multiplication
problem requires splitting the product, and any optimal solution
contains within it optimal solutions to subproblem instances. Thus, to
build an optimal solution to an instance of the matrix-chain
multiplication problem, split the problem into two subproblems
(optimally parenthesizing AiAi+1 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj), find optimal solutions to the two subproblem instances, and then combine
these optimal subproblem solutions. To ensure that you’ve examined the
optimal split, you must consider all possible splits.
Step 2: A recursive solution
The next step is to define the cost of an optimal solution recursively in
terms of the optimal solutions to subproblems. For the matrix-chain
multiplication problem, a subproblem is to determine the minimum cost
of parenthesizing AiAi+1 ⋯ Aj for 1 ≤ i ≤ j ≤ n. Given the input
dimensions 〈 p 0, p 1, p 2, …, pn〉, an index pair i, j specifies a subproblem.
Let m[ i, j] be the minimum number of scalar multiplications needed to compute the matrix Ai: j. For the full problem, the lowest-cost way to compute A 1: n is thus m[1, n].
We can define m[ i, j] recursively as follows. If i = j, the problem is trivial: the chain consists of just one matrix Ai: i = Ai, so that no scalar multiplications are necessary to compute the product. Thus, m[ i, i] = 0
for i = 1, 2, …, n. To compute m[ i, j] when i < j, we take advantage of the structure of an optimal solution from step 1. Suppose that an
optimal parenthesization splits the product AiAi+1 ⋯ Aj between Ak and Ak+1, where i ≤ k < j. Then, m[ i, j] equals the minimum cost m[ i, k]
for computing the subproduct Ai: k, plus the minimum cost m[ k+1, j] for computing the subproduct, Ak+1: j, plus the cost of multiplying these
two matrices together. Because each matrix Ai is pi−1 × pi, computing the matrix product Ai: kAk+1: j takes pi−1 pk pj scalar multiplications.
Thus, we obtain
m[ i, j] = m[ i, k] + m[ k + 1, j] + pi−1 pk pj.
This recursive equation assumes that you know the value of k. But
you don’t, at least not yet. You have to try all possible values of k. How
many are there? Just j − i, namely k = i, i + 1, …, j − 1. Since the optimal parenthesization must use one of these values for k, you need
only check them all to find the best. Thus, the recursive definition for
the minimum cost of parenthesizing the product AiAi+1 ⋯ Aj becomes
The m[ i, j] values give the costs of optimal solutions to subproblems, but they do not provide all the information you need to construct an
optimal solution. To help you do so, let’s define s[ i, j] to be a value of k at which you split the product AiAi+1 ⋯ Aj in an optimal
parenthesization. That is, s[ i, j] equals a value k such that m[ i, j] = m[ i, k]
+ m[ k + 1, j] + pi−1 pk pj.
Step 3: Computing the optimal costs
At this point, you could write a recursive algorithm based on recurrence
(14.7) to compute the minimum cost m[1, n] for multiplying A 1 A 2 ⋯
An. But as we saw for the rod-cutting problem, and as we shall see in
Section 14.3, this recursive algorithm takes exponential time. That’s no better than the brute-force method of checking each way of
parenthesizing the product.
Fortunately, there aren’t all that many distinct subproblems: just one
subproblem for each choice of i and j satisfying 1 ≤ i ≤ j ≤ n, or in all.4 A recursive algorithm may encounter each
subproblem many times in different branches of its recursion tree. This
property of overlapping subproblems is the second hallmark of when
dynamic programming applies (the first hallmark being optimal
substructure).
Instead of computing the solution to recurrence (14.7) recursively,
let’s compute the optimal cost by using a tabular, bottom-up approach,
as in the procedure MATRIX-CHAIN-ORDER. (The corresponding
top-down approach using memoization appears in Section 14.3. ) The input is a sequence p = 〈 p 0, p 1, …, pn〉 of matrix dimensions, along with n, so that for i = 1, 2, …, n, matrix Ai has dimensions pi−1 × pi. The procedure uses an auxiliary table m[1 : n, 1 : n] to store the m[ i, j] costs and another auxiliary table s[1 : n − 1, 2 : n] that records which index k achieved the optimal cost in computing m[ i, j]. The table s will help in constructing an optimal solution.
MATRIX-CHAIN-ORDER( p, n)
1 let m[1 : n, 1 : n] and s[1 : n − 1, 2 : n] be new tables 2 for i = 1 to n
// chain length 1
3
m[ i, i] = 0
4 for l = 2 to n
// l is the chain length
for i = 1 to n − l + 1
// chain begins at Ai
6
j = i + l − 1
// chain ends at Aj
7
m[ i, j] = ∞
8
for k = i to j − 1
// try Ai: kAk+1: j
9
q = m[ i, k] + m[ k + 1, j] + pi−1 pk pj 10
if q < m[ i, j]
11
m[ i, j] = q
// remember this cost
12
s[ i, j] = k
// remember this index
13 return m and s
In what order should the algorithm fill in the table entries? To answer
this question, let’s see which entries of the table need to be accessed
when computing the cost m[ i, j]. Equation (14.7) tells us that to compute the cost of matrix product Ai: j, first the costs of the products Ai: k and Ak+1: j need to have been computed for all k = i, i + 1, …, j − 1. The chain AiAi+1 ⋯ Aj consists of j − i + 1 matrices, and the chains AiAi+1
… Ak and Ak+1 Ak+2 … Aj consist of k − i + 1 and j − k matrices, respectively. Since k < j, a chain of k − i + 1 matrices consists of fewer than j − i + 1 matrices. Likewise, since k ≥ i, a chain of j − k matrices consists of fewer than j − i + 1 matrices. Thus, the algorithm should fill in the table m from shorter matrix chains to longer matrix chains. That
is, for the subproblem of optimally parenthesizing the chain AiAi+1 ⋯
Aj, it makes sense to consider the subproblem size as the length j − i + 1
of the chain.
Now, let’s see how the MATRIX-CHAIN-ORDER procedure fills in
the m[ i, j] entries in order of increasing chain length. Lines 2–3 initialize m[ i, i] = 0 for i = 1, 2, …, n, since any matrix chain with just one matrix requires no scalar multiplications. In the for loop of lines 4–12, the loop
variable l denotes the length of matrix chains whose minimum costs are
being computed. Each iteration of this loop uses recurrence (14.7) to
compute m[ i, i + l − 1] for i = 1, 2, …, n − l + 1. In the first iteration, l =
2, and so the loop computes m[ i, i + 1] for i = 1, 2, …, n − 1: the minimum costs for chains of length l = 2. The second time through the
loop, it computes m[ i, i + 2] for i = 1, 2, …, n − 2: the minimum costs for chains of length l = 3. And so on, ending with a single matrix chain
of length l = n and computing m[1, n]. When lines 7–12 compute an m[ i, j] cost, this cost depends only on table entries m[ i, k] and m[ k + 1, j], which have already been computed.
Figure 14.5 illustrates the m and s tables, as filled in by the MATRIX-CHAIN-ORDER procedure on a chain of n = 6 matrices.
Since m[ i, j] is defined only for i ≤ j, only the portion of the table m on or above the main diagonal is used. The figure shows the table rotated to
make the main diagonal run horizontally. The matrix chain is listed
along the bottom. Using this layout, the minimum cost m[ i, j] for multiplying a subchain AiAi+1 ⋯ Aj of matrices appears at the intersection of lines running northeast from Ai and northwest from Aj.
Reading across, each diagonal in the table contains the entries for
matrix chains of the same length. MATRIX-CHAIN-ORDER
computes the rows from bottom to top and from left to right within
each row. It computes each entry m[ i, j] using the products pi−1 pk pj for k = i, i + 1, …, j − 1 and all entries southwest and southeast from m[ i, j].
A simple inspection of the nested loop structure of MATRIX-
CHAIN-ORDER yields a running time of O( n 3) for the algorithm. The
loops are nested three deep, and each loop index ( l, i, and k) takes on at most n − 1 values. Exercise 14.2-5 asks you to show that the running time of this algorithm is in fact also Ω( n 3). The algorithm requires Θ( n 2) space to store the m and s tables. Thus, MATRIX-CHAIN-ORDER is
much more efficient than the exponential-time method of enumerating
all possible parenthesizations and checking each one.

Figure 14.5 The m and s tables computed by MATRIX-CHAIN-ORDER for n = 6 and the following matrix dimensions:
matrix
A 1
A 2
A 3
A 4
A 5
A 6
dimension 30 × 35 35 × 15 15 × 5 5 × 10 10 × 20 20 × 25
The tables are rotated so that the main diagonal runs horizontally. The m table uses only the main diagonal and upper triangle, and the s table uses only the upper triangle. The minimum number of scalar multiplications to multiply the 6 matrices is m[1, 6] = 15,125. Of the entries that are not tan, the pairs that have the same color are taken together in line 9 when computing Step 4: Constructing an optimal solution
Although MATRIX-CHAIN-ORDER determines the optimal number
of scalar multiplications needed to compute a matrix-chain product, it
does not directly show how to multiply the matrices. The table s[1 : n −
1, 2 : n] provides the information needed to do so. Each entry s[ i, j]
records a value of k such that an optimal parenthesization of AiAi+1 ⋯
Aj splits the product between Ak and Ak+1. The final matrix multiplication in computing A 1: n optimally is A 1: s[1, n] As[1, n]+1: n. The s table contains the information needed to determine the earlier matrix
multiplications as well, using recursion: s[1, s[1, n]] determines the last matrix multiplication when computing A 1: s[1, n] and s[ s[1, n] + 1, n]
determines the last matrix multiplication when computing As[1, n]+1: n.
The recursive procedure PRINT-OPTIMAL-PARENS on the facing
page prints an optimal parenthesization of the matrix chain product
AiAi+1 ⋯ Aj, given the s table computed by MATRIX-CHAINORDER and the indices i and j. The initial call PRINT-OPTIMAL-PARENS( s, 1, n) prints an optimal parenthesization of the full matrix
chain product A 1 A 2 ⋯ An. In the example of Figure 14.5, the call PRINT-OPTIMAL-PARENS( s,
1,
6)
prints
the
optimal
parenthesization (( A 1( A 2 A 3))(( A 4 A 5) A 6)).
PRINT-OPTIMAL-PARENS( s, i, j)
1 if i == j
2
print “A” i
3 else print “(”
4
PRINT-OPTIMAL-PARENS( s, i, s[ i, j])
5
PRINT-OPTIMAL-PARENS( s, s[ i, j] + 1, j)
6
print “)”
Exercises
14.2-1
Find an optimal parenthesization of a matrix-chain product whose
sequence of dimensions is 〈5, 10, 3, 12, 5, 50, 6〉.
14.2-2
Give a recursive algorithm MATRIX-CHAIN-MULTIPLY( A, s, i, j) that actually performs the optimal matrix-chain multiplication, given
the sequence of matrices 〈 A 1, A 2, …, An〉, the s table computed by MATRIX-CHAIN-ORDER, and the indices i and j. (The initial call is
MATRIX-CHAIN-MULTIPLY( A, s, 1, n).) Assume that the call
RECTANGULAR-MATRIX-MULTIPLY( A, B) returns the product
of matrices A and B.
14.2-3
Use the substitution method to show that the solution to the recurrence
(14.6) is Ω(2 n).
14.2-4
Describe the subproblem graph for matrix-chain multiplication with an
input chain of length n. How many vertices does it have? How many
edges does it have, and which edges are they?
14.2-5
Let R( i, j) be the number of times that table entry m[ i, j] is referenced while computing other table entries in a call of MATRIX-CHAIN-ORDER. Show that the total number of references for the entire table is
( Hint: You may find equation (A.4) on page 1141 useful.)
14.2-6
Show that a full parenthesization of an n-element expression has exactly
n − 1 pairs of parentheses.
14.3 Elements of dynamic programming
Although you have just seen two complete examples of the dynamic-
programming method, you might still be wondering just when the
method applies. From an engineering perspective, when should you look
for a dynamic-programming solution to a problem? In this section, we’ll
examine the two key ingredients that an optimization problem must
have in order for dynamic programming to apply: optimal substructure
and overlapping subproblems. We’ll also revisit and discuss more fully
how memoization might help you take advantage of the overlapping-
subproblems property in a top-down recursive approach.
Optimal substructure
The first step in solving an optimization problem by dynamic
programming is to characterize the structure of an optimal solution.
Recall that a problem exhibits optimal substructure if an optimal
solution to the problem contains within it optimal solutions to
subproblems. When a problem exhibits optimal substructure, that gives
you a good clue that dynamic programming might apply. (As Chapter
15 discusses, it also might mean that a greedy strategy applies, however.)
Dynamic programming builds an optimal solution to the problem from
optimal solutions to subproblems. Consequently, you must take care to
ensure that the range of subproblems you consider includes those used
in an optimal solution.
Optimal substructure was key to solving both of the previous
problems in this chapter. In Section 14.1, we observed that the optimal way of cutting up a rod of length n (if Serling Enterprises makes any cuts at all) involves optimally cutting up the two pieces resulting from
the first cut. In Section 14.2, we noted that an optimal parenthesization of the matrix chain product AiAi+1 ⋯ Aj that splits the product between Ak and Ak+1 contains within it optimal solutions to the problems of parenthesizing AiAi+1 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj.
You will find yourself following a common pattern in discovering
optimal substructure:
1. You show that a solution to the problem consists of making a
choice, such as choosing an initial cut in a rod or choosing an
index at which to split the matrix chain. Making this choice
leaves one or more subproblems to be solved.
2. You suppose that for a given problem, you are given the choice
that leads to an optimal solution. You do not concern yourself
yet with how to determine this choice. You just assume that it
has been given to you.
3. Given this choice, you determine which subproblems ensue and
how to best characterize the resulting space of subproblems.
4. You show that the solutions to the subproblems used within an
optimal solution to the problem must themselves be optimal by
using a “cut-and-paste” technique. You do so by supposing that
each of the subproblem solutions is not optimal and then
deriving a contradiction. In particular, by “cutting out” the
nonoptimal solution to each subproblem and “pasting in” the
optimal one, you show that you can get a better solution to the
original problem, thus contradicting your supposition that you
already had an optimal solution. If an optimal solution gives rise
to more than one subproblem, they are typically so similar that
you can modify the cut-and-paste argument for one to apply to
the others with little effort.
To characterize the space of subproblems, a good rule of thumb says
to try to keep the space as simple as possible and then expand it as
necessary. For example, the space of subproblems for the rod-cutting
problem contained the problems of optimally cutting up a rod of length
i for each size i. This subproblem space worked well, and it was not necessary to try a more general space of subproblems.
Conversely, suppose that you tried to constrain the subproblem
space for matrix-chain multiplication to matrix products of the form
A 1 A 2 ⋯ Aj. As before, an optimal parenthesization must split this product between Ak and Ak+1 for some 1 ≤ k < j. Unless you can guarantee that k always equals j − 1, you will find that you have subproblems of the form A 1 A 2 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj. Moreover, the latter subproblem does not have the form A 1 A 2 ⋯ Aj. To solve this problem by dynamic programming, you need to allow the subproblems
to vary at “both ends.” That is, both i and j need to vary in the subproblem of parenthesizing the product AiAi+1 ⋯ Aj.
Optimal substructure varies across problem domains in two ways:
1. how many subproblems an optimal solution to the original
problem uses, and
2. how many choices you have in determining which subproblem(s)
to use in an optimal solution.
In the rod-cutting problem, an optimal solution for cutting up a rod of size n uses just one subproblem (of size n − i), but we have to consider n choices for i in order to determine which one yields an optimal solution.
Matrix-chain multiplication for the subchain AiAi+1 ⋯ Aj serves an example with two subproblems and j − i choices. For a given matrix Ak where the product splits, two subproblems arise—parenthesizing
AiAi+1 ⋯ Ak and parenthesizing Ak+1 Ak+2 ⋯ Aj—and we have to solve both of them optimally. Once we determine the optimal solutions
to subproblems, we choose from among j − i candidates for the index k.
Informally, the running time of a dynamic-programming algorithm
depends on the product of two factors: the number of subproblems
overall and how many choices you look at for each subproblem. In rod
cutting, we had Θ( n) subproblems overall, and at most n choices to examine for each, yielding an O( n 2) running time. Matrix-chain multiplication had Θ( n 2) subproblems overall, and each had at most n −
1 choices, giving an O( n 3) running time (actually, a Θ( n 3) running time, by Exercise 14.2-5).
Usually, the subproblem graph gives an alternative way to perform
the same analysis. Each vertex corresponds to a subproblem, and the
choices for a subproblem are the edges incident from that subproblem.
Recall that in rod cutting, the subproblem graph has n vertices and at
most n edges per vertex, yielding an O( n 2) running time. For matrix-chain multiplication, if you were to draw the subproblem graph, it
would have Θ( n 2) vertices and each vertex would have degree at most n
− 1, giving a total of O( n 3) vertices and edges.
Dynamic programming often uses optimal substructure in a bottom-
up fashion. That is, you first find optimal solutions to subproblems and,
having solved the subproblems, you find an optimal solution to the
problem. Finding an optimal solution to the problem entails making a
choice among subproblems as to which you will use in solving the
problem. The cost of the problem solution is usually the subproblem
costs plus a cost that is directly attributable to the choice itself. In rod
cutting, for example, first we solved the subproblems of determining
optimal ways to cut up rods of length i for i = 0, 1, …, n − 1, and then we determined which of these subproblems yielded an optimal solution
for a rod of length n, using equation (14.2). The cost attributable to the
choice itself is the term pi in equation (14.2). In matrix-chain
multiplication, we determined optimal parenthesizations of subchains
of AiAi+1 ⋯ Aj, and then we chose the matrix Ak at which to split the product. The cost attributable to the choice itself is the term pi−1 pk pj.
Chapter 15 explores “greedy algorithms,” which have many
similarities to dynamic programming. In particular, problems to which
greedy algorithms apply have optimal substructure. One major
difference between greedy algorithms and dynamic programming is that
instead of first finding optimal solutions to subproblems and then
making an informed choice, greedy algorithms first make a “greedy”
choice—the choice that looks best at the time—and then solve a
resulting subproblem, without bothering to solve all possible related
smaller subproblems. Surprisingly, in some cases this strategy works!
Subtleties
You should be careful not to assume that optimal substructure applies
when it does not. Consider the following two problems whose input
consists of a directed graph G = ( V, E) and vertices u, v ∈ V.
Unweighted shortest path:5 Find a path from u to v consisting of the fewest edges. Such a path must be simple, since removing a cycle from
a path produces a path with fewer edges.
Unweighted longest simple path: Find a simple path from u to v
consisting of the most edges. (Without the requirement that the path
must be simple, the problem is undefined, since repeatedly traversing a
cycle creates paths with an arbitrarily large number of edges.)
The unweighted shortest-path problem exhibits optimal
substructure. Here’s how. Suppose that u ≠ v, so that the problem is nontrivial. Then, any path p from u to v must contain an intermediate vertex, say w. (Note that w may be u or v.) Then, we can decompose the







path
into subpaths
. The number of edges in p equals
the number of edges in p 1 plus the number of edges in p 2. We claim that if p is an optimal (i.e., shortest) path from u to v, then p 1 must be a shortest path from u to w. Why? As suggested earlier, use a “cut-and-paste” argument: if there were another path, say , from u to w with
fewer edges than p 1, then we could cut out p 1 and paste in to produce a path
with fewer edges than p, thus contradicting p’s
optimality. Likewise, p 2 must be a shortest path from w to v. Thus, to find a shortest path from u to v, consider all intermediate vertices w, find a shortest path from u to w and a shortest path from w to v, and choose an intermediate vertex w that yields the overall shortest path.
Section 23.2 uses a variant of this observation of optimal substructure to find a shortest path between every pair of vertices on a weighted,
directed graph.
You might be tempted to assume that the problem of finding an
unweighted longest simple path exhibits optimal substructure as well.
After all, if we decompose a longest simple path
into subpaths
, then mustn’t p 1 be a longest simple path from u to w, and
mustn’t p 2 be a longest simple path from w to v? The answer is no!
Figure 14.6 supplies an example. Consider the path q → r → t, which is a longest simple path from q to t. Is q → r a longest simple path from q to r? No, for the path q → s → t → r is a simple path that is longer. Is r
→ t a longest simple path from r to t? No again, for the path r → q → s
→ t is a simple path that is longer.
Figure 14.6 A directed graph showing that the problem of finding a longest simple path in an unweighted directed graph does not have optimal substructure. The path q → r → t is a longest simple path from q to t, but the subpath q → r is not a longest simple path from q to r, nor is the subpath r → t a longest simple path from r to t.
This example shows that for longest simple paths, not only does the
problem lack optimal substructure, but you cannot necessarily assemble
a “legal” solution to the problem from solutions to subproblems. If you
combine the longest simple paths q → s → t → r and r → q → s → t, you get the path q → s → t → r → q → s → t, which is not simple.
Indeed, the problem of finding an unweighted longest simple path does
not appear to have any sort of optimal substructure. No efficient
dynamic-programming algorithm for this problem has ever been found.
In fact, this problem is NP-complete, which—as we shall see in Chapter
34—means that we are unlikely to find a way to solve it in polynomial
time.
Why is the substructure of a longest simple path so different from
that of a shortest path? Although a solution to a problem for both
longest and shortest paths uses two subproblems, the subproblems in
finding the longest simple path are not independent, whereas for shortest
paths they are. What do we mean by subproblems being independent?
We mean that the solution to one subproblem does not affect the
solution to another subproblem of the same problem. For the example
of Figure 14.6, we have the problem of finding a longest simple path from q to t with two subproblems: finding longest simple paths from q to r and from r to t. For the first of these subproblems, we chose the path q → s → t → r, which used the vertices s and t. These vertices cannot appear in a solution to the second subproblem, since the
combination of the two solutions to subproblems yields a path that is
not simple. If vertex t cannot be in the solution to the second problem,
then there is no way to solve it, since t is required to be on the path that
forms the solution, and it is not the vertex where the subproblem
solutions are “spliced” together (that vertex being r). Because vertices s
and t appear in one subproblem solution, they cannot appear in the
other subproblem solution. One of them must be in the solution to the
other subproblem, however, and an optimal solution requires both.
Thus, we say that these subproblems are not independent. Looked at
another way, using resources in solving one subproblem (those resources
being vertices) renders them unavailable for the other subproblem.
Why, then, are the subproblems independent for finding a shortest
path? The answer is that by nature, the subproblems do not share




resources. We claim that if a vertex w is on a shortest path p from u to v, then we can splice together any shortest path
and any shortest
path
to produce a shortest path from u to v. We are assured that,
other than w, no vertex can appear in both paths p 1 and p 2. Why?
Suppose that some vertex x ≠ w appears in both p 1 and p 2, so that we can decompose p 1 as
and p 2 as
. By the optimal
substructure of this problem, path p has as many edges as p 1 and p 2
together. Let’s say that p has e edges. Now let us construct a path from u to v. Because we have excised the paths from x to
w and from w to x, each of which contains at least one edge, path p′
contains at most e − 2 edges, which contradicts the assumption that p is a shortest path. Thus, we are assured that the subproblems for the
shortest-path problem are independent.
The two problems examined in Sections 14.1 and 14.2 have independent subproblems. In matrix-chain multiplication, the
subproblems are multiplying subchains AiAi+1 ⋯ Ak and Ak+1 Ak+2
⋯ Aj. These subchains are disjoint, so that no matrix could possibly be
included in both of them. In rod cutting, to determine the best way to
cut up a rod of length n, we looked at the best ways of cutting up rods of
length i for i = 0, 1, …, n − 1. Because an optimal solution to the length-n problem includes just one of these subproblem solutions (after cutting
off the first piece), independence of subproblems is not an issue.
Overlapping subproblems
The second ingredient that an optimization problem must have for
dynamic programming to apply is that the space of subproblems must
be “small” in the sense that a recursive algorithm for the problem solves
the same subproblems over and over, rather than always generating new
subproblems. Typically, the total number of distinct subproblems is a
polynomial in the input size. When a recursive algorithm revisits the
same problem repeatedly, we say that the optimization problem has
overlapping subproblems. 6 In contrast, a problem for which a divide-and-conquer approach is suitable usually generates brand-new problems
at each step of the recursion. Dynamic-programming algorithms
typically take advantage of overlapping subproblems by solving each
subproblem once and then storing the solution in a table where it can be
looked up when needed, using constant time per lookup.
Figure 14.7 The recursion tree for the computation of RECURSIVE-MATRIX-CHAIN( p, 1, 4). Each node contains the parameters i and j. The computations performed in a subtree shaded blue are replaced by a single table lookup in MEMOIZED-MATRIX-CHAIN.
In Section 14.1, we briefly examined how a recursive solution to rod cutting makes exponentially many calls to find solutions of smaller
subproblems. The dynamic-programming solution reduces the running
time from the exponential time of the recursive algorithm down to
quadratic time.
To illustrate the overlapping-subproblems property in greater detail,
let’s revisit the matrix-chain multiplication problem. Referring back to
Figure 14.5, observe that MATRIX-CHAIN-ORDER repeatedly looks
up the solution to subproblems in lower rows when solving subproblems
in higher rows. For example, it references entry m[3, 4] four times: during the computations of m[2, 4], m[1, 4], m[3, 5], and m[3, 6]. If the algorithm were to recompute m[3, 4] each time, rather than just looking
it up, the running time would increase dramatically. To see how,
consider the inefficient recursive procedure RECURSIVE-MATRIX-
CHAIN on the facing page, which determines m[ i, j], the minimum number of scalar multiplications needed to compute the matrix-chain
product Ai: j = AiAi+1 ⋯ Aj. The procedure is based directly on the

recurrence (14.7). Figure 14.7 shows the recursion tree produced by the call RECURSIVE-MATRIX-CHAIN( p, 1, 4). Each node is labeled by
the values of the parameters i and j. Observe that some pairs of values occur many times.
In fact, the time to compute m[1, n] by this recursive procedure is at
least exponential in n. To see why, let T( n) denote the time taken by RECURSIVE-MATRIX-CHAIN
to
compute
an
optimal
parenthesization of a chain of n matrices. Because the execution of lines
1–2 and of lines 6–7 each take at least unit time, as does the
multiplication in line 5, inspection of the procedure yields the recurrence
RECURSIVE-MATRIX-CHAIN( p, i, j)
1 if i == j
2
return 0
3 m[ i, j] = ∞
4 for k = i to j − 1
5
q = RECURSIVE-MATRIX-CHAIN( p, i, k)
+ RECURSIVE-MATRIX-CHAIN( p, k + 1, j)
+ pi−1 pk pj
6
if q < m[ i, j]
7
m[ i, j] = q
8 return m[ i, j]
Noting that for i = 1, 2, …, n − 1, each term T( i) appears once as T( k) and once as T( n − k), and collecting the n − 1 1s in the summation together with the 1 out front, we can rewrite the recurrence as
Let’s prove that T( n) = Ω(2 n) using the substitution method.
Specifically, we’ll show that T( n) ≥ 2 n−1 for all n ≥ 1. For the base case n
= 1, the summation is empty, and we get T(1) ≥ 1 = 20. Inductively, for n
≥ 2 we have
which completes the proof. Thus, the total amount of work performed
by the call RECURSIVE-MATRIX-CHAIN( p, 1, n) is at least
exponential in n.
Compare this top-down, recursive algorithm (without memoization)
with the bottom-up dynamic-programming algorithm. The latter is
more efficient because it takes advantage of the overlapping-
subproblems property. Matrix-chain multiplication has only Θ( n 2)
distinct subproblems, and the dynamic-programming algorithm solves
each exactly once. The recursive algorithm, on the other hand, must
solve each subproblem every time it reappears in the recursion tree.
Whenever a recursion tree for the natural recursive solution to a
problem contains the same subproblem repeatedly, and the total
number of distinct subproblems is small, dynamic programming can
improve efficiency, sometimes dramatically.
Reconstructing an optimal solution
As a practical matter, you’ll often want to store in a separate table
which choice you made in each subproblem so that you do not have to
reconstruct this information from the table of costs.
For matrix-chain multiplication, the table s[ i, j] saves a significant amount of work when we need to reconstruct an optimal solution.
Suppose that the MATRIX-CHAIN-ORDER procedure on page 378
did not maintain the s[ i, j] table, so that it filled in only the table m[ i, j]
containing optimal subproblem costs. The procedure chooses from
among j − i possibilities when determining which subproblems to use in
an optimal solution to parenthesizing AiAi+1 ⋯ Aj, and j − i is not a constant. Therefore, it would take Θ( j − i) = ω(1) time to reconstruct which subproblems it chose for a solution to a given problem. Because
MATRIX-CHAIN-ORDER stores in s[ i, j] the index of the matrix at which it split the product AiAi+1 ⋯ Aj, the PRINT-OPTIMAL-PARENS procedure on page 381 can look up each choice in O(1) time.
Memoization
As we saw for the rod-cutting problem, there is an alternative approach
to dynamic programming that often offers the efficiency of the bottom-
up dynamic-programming approach while maintaining a top-down
strategy. The idea is to memoize the natural, but inefficient, recursive algorithm. As in the bottom-up approach, you maintain a table with
subproblem solutions, but the control structure for filling in the table is
more like the recursive algorithm.
A memoized recursive algorithm maintains an entry in a table for the
solution to each subproblem. Each table entry initially contains a
special value to indicate that the entry has yet to be filled in. When the
subproblem is first encountered as the recursive algorithm unfolds, its
solution is computed and then stored in the table. Each subsequent
encounter of this subproblem simply looks up the value stored in the
table and returns it. 7
The procedure MEMOIZED-MATRIX-CHAIN is a memoized
version of the procedure RECURSIVE-MATRIX-CHAIN on page
389. Note where it resembles the memoized top-down method on page
369 for the rod-cutting problem.
MEMOIZED-MATRIX-CHAIN( p, n)
1 let m[1 : n, 1 : n] be a new table
2 for i = 1 to n
for j = i to n
4
m[ i, j] = ∞
5 return LOOKUP-CHAIN( m, p, 1, n)
LOOKUP-CHAIN( m, p, i, j)
1 if m[ i, j] < ∞
2
return m[ i, j]
3 if i == j
4
m[ i, j] = 0
5 else for k = i to j − 1
6
q = LOOKUP-CHAIN( m, p, i, k)
+ LOOKUP-CHAIN( m, p, k + 1, j) + pi−1 pk pj 7
if q < m[ i, j]
8
m[ i, j] = q
9 return m[ i, j]
The MEMOIZED-MATRIX-CHAIN procedure, like the bottom-
up MATRIX-CHAIN-ORDER procedure on page 378, maintains a
table m[1 : n, 1 : n] of computed values of m[ i, j], the minimum number of scalar multiplications needed to compute the matrix Ai: j. Each table entry initially contains the value ∞ to indicate that the entry has yet to
be filled in. Upon calling LOOKUP-CHAIN( m, p, i, j), if line 1 finds that m[ i, j] < ∞, then the procedure simply returns the previously computed cost m[ i, j] in line 2. Otherwise, the cost is computed as in RECURSIVE-MATRIX-CHAIN, stored in m[ i, j], and returned. Thus,
LOOKUP-CHAIN( m, p, i, j) always returns the value of m[ i, j], but it computes it only upon the first call of LOOKUP-CHAIN with these
specific values of i and j. Figure 14.7 illustrates how MEMOIZED-MATRIX-CHAIN saves time compared with RECURSIVE-MATRIX-
CHAIN. Subtrees shaded blue represent values that are looked up
rather than recomputed.
Like the bottom-up procedure MATRIX-CHAIN-ORDER, the
memoized procedure MEMOIZED-MATRIX-CHAIN runs in O( n 3)
time. To begin with, line 4 of MEMOIZED-MATRIX-CHAIN executes
Θ( n 2) times, which dominates the running time outside of the call to
LOOKUP-CHAIN in line 5. We can categorize the calls of LOOKUP-
CHAIN into two types:
1. calls in which m[ i, j] = ∞, so that lines 3–9 execute, and
2. calls in which m[ i, j] < ∞, so that LOOKUP-CHAIN simply returns in line 2.
There are Θ( n 2) calls of the first type, one per table entry. All calls of the second type are made as recursive calls by calls of the first type.
Whenever a given call of LOOKUP-CHAIN makes recursive calls, it
makes O( n) of them. Therefore, there are O( n 3) calls of the second type in all. Each call of the second type takes O(1) time, and each call of the
first type takes O( n) time plus the time spent in its recursive calls. The total time, therefore, is O( n 3). Memoization thus turns an Ω(2 n)-time algorithm into an O( n 3)-time algorithm.
We have seen how to solve the matrix-chain multiplication problem
by either a top-down, memoized dynamic-programming algorithm or a
bottom-up dynamic-programming algorithm in O( n 3) time. Both the bottom-up and memoized methods take advantage of the overlapping-subproblems property. There are only Θ( n 2) distinct subproblems in
total, and either of these methods computes the solution to each
subproblem only once. Without memoization, the natural recursive
algorithm runs in exponential time, since solved subproblems are
repeatedly solved.
In general practice, if all subproblems must be solved at least once, a
bottom-up dynamic-programming algorithm usually outperforms the
corresponding top-down memoized algorithm by a constant factor,
because the bottom-up algorithm has no overhead for recursion and
less overhead for maintaining the table. Moreover, for some problems
you can exploit the regular pattern of table accesses in the dynamic-
programming algorithm to reduce time or space requirements even
further. On the other hand, in certain situations, some of the
subproblems in the subproblem space might not need to be solved at all.
In that case, the memoized solution has the advantage of solving only
those subproblems that are definitely required.
Exercises
14.3-1
Which is a more efficient way to determine the optimal number of
multiplications in a matrix-chain multiplication problem: enumerating
all the ways of parenthesizing the product and computing the number of
multiplications for each, or running RECURSIVE-MATRIX-CHAIN?
Justify your answer.
14.3-2
Draw the recursion tree for the MERGE-SORT procedure from Section
2.3.1 on an array of 16 elements. Explain why memoization fails to
speed up a good divide-and-conquer algorithm such as MERGE-
SORT.
14.3-3
Consider the antithetical variant of the matrix-chain multiplication
problem where the goal is to parenthesize the sequence of matrices so as
to maximize, rather than minimize, the number of scalar multiplications.
Does this problem exhibit optimal substructure?
14.3-4
As stated, in dynamic programming, you first solve the subproblems
and then choose which of them to use in an optimal solution to the
problem. Professor Capulet claims that she does not always need to
solve all the subproblems in order to find an optimal solution. She
suggests that she can find an optimal solution to the matrix-chain
multiplication problem by always choosing the matrix Ak at which to
split the subproduct AiAi+1 ⋯ Aj (by selecting k to minimize the quantity pi−1 pk pj) before solving the subproblems. Find an instance of the matrix-chain multiplication problem for which this greedy approach
yields a suboptimal solution.
14.3-5
Suppose that the rod-cutting problem of Section 14.1 also had a limit li on the number of pieces of length i allowed to be produced, for i = 1, 2,
…, n. Show that the optimal-substructure property described in Section
14.4 Longest common subsequence
Biological applications often need to compare the DNA of two (or
more) different organisms. A strand of DNA consists of a string of
molecules called bases, where the possible bases are adenine, cytosine,
guanine, and thymine. Representing each of these bases by its initial
letter, we can express a strand of DNA as a string over the 4-element set
{A, C, G, T}. (See Section C.1 for the definition of a string.) For example, the DNA of one organism may be S 1 =
ACCGGTCGAGTGCGCGGAAGCCGGCCGAA, and the DNA of another
organism may be S 2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA. One
reason to compare two strands of DNA is to determine how “similar”
the two strands are, as some measure of how closely related the two
organisms are. We can, and do, define similarity in many different ways.
For example, we can say that two DNA strands are similar if one is a
substring of the other. (Chapter 32 explores algorithms to solve this problem.) In our example, neither S 1 nor S 2 is a substring of the other.
Alternatively, we could say that two strands are similar if the number of
changes needed to turn one into the other is small. (Problem 14-5 looks
at this notion.) Yet another way to measure the similarity of strands S 1
and S 2 is by finding a third strand S 3 in which the bases in S 3 appear in each of S 1 and S 2. These bases must appear in the same order, but not necessarily consecutively. The longer the strand S 3 we can find, the more similar S 1 and S 2 are. In our example, the longest strand S 3 is GTCGTCGGAAGCCGGCCGAA.
We formalize this last notion of similarity as the longest-common-
subsequence problem. A subsequence of a given sequence is just the
given sequence with 0 or more elements left out. Formally, given a
sequence X = 〈 x 1, x 2, …, xm〉, another sequence Z = 〈 z 1, z 2, …, zk〉 is a subsequence of X if there exists a strictly increasing sequence 〈 i 1, i 2, …, ik〉 of indices of X such that for all j = 1, 2, …, k, we have
. For
example, Z = 〈 B, C, D, B〉 is a subsequence of X = 〈 A, B, C, B, D, A, B〉
with corresponding index sequence 〈2, 3, 5, 7〉.
Given two sequences X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. For example, if X = 〈 A, B, C, B, D, A, B〉 and Y = 〈 B, D, C, A, B, A〉, the sequence 〈 B, C, A〉 is a common subsequence of both X and Y. The sequence 〈 B, C, A〉 is not a longest common subsequence ( LCS) of X
and Y, however, since it has length 3 and the sequence 〈 B, C, B, A〉, which is also common to both sequences X and Y, has length 4. The sequence 〈 B, C, B, A〉 is an LCS of X and Y, as is the sequence 〈 B, D, A, B〉, since X and Y have no common subsequence of length 5 or greater.
In the longest-common-subsequence problem, the input is two
sequences X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉, and the goal is to find a maximum-length common subsequence of X and Y. This
section shows how to efficiently solve the LCS problem using dynamic
programming.
Step 1: Characterizing a longest common subsequence
You can solve the LCS problem with a brute-force approach: enumerate
all subsequences of X and check each subsequence to see whether it is
also a subsequence of Y, keeping track of the longest subsequence you
find. Each subsequence of X corresponds to a subset of the indices {1,
2, …, m} of X. Because X has 2 m subsequences, this approach requires exponential time, making it impractical for long sequences.
The LCS problem has an optimal-substructure property, however, as
the following theorem shows. As we’ll see, the natural classes of
subproblems correspond to pairs of “prefixes” of the two input
sequences. To be precise, given a sequence X = 〈 x 1, x 2, …, xm〉, we define the i th prefix of X, for i = 0, 1, …, m, as Xi = 〈 x 1, x 2, …, xi〉. For
example, if X = 〈 A, B, C, B, D, A, B〉, then X 4 = 〈 A, B, C, B〉 and X 0 is the empty sequence.
Theorem 14.1 (Optimal substructure of an LCS)
Let X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 be sequences, and let Z
= 〈 z 1, z 2, …, zk〉 be any LCS of X and Y.
1. If xm = yn, then zk = xm = yn and Zk−1 is an LCS of Xm−1
and Yn−1.
2. If xm ≠ yn and zk ≠ xm, then Z is an LCS of Xm−1 and Y.
3. If xm ≠ yn and zk ≠ yn, then Z is an LCS of X and Yn−1.
Proof (1) If zk ≠ xm, then we could append xm = yn to Z to obtain a common subsequence of X and Y of length k + 1, contradicting the supposition that Z is a longest common subsequence of X and Y. Thus, we must have zk = xm = yn. Now, the prefix Zk−1 is a length-( k − 1) common subsequence of Xm−1 and Yn−1. We wish to show that it is an
LCS. Suppose for the purpose of contradiction that there exists a
common subsequence W of Xm−1 and Yn−1 with length greater than k
− 1. Then, appending xm = yn to W produces a common subsequence of X and Y whose length is greater than k, which is a contradiction.
(2) If zk ≠ xm, then Z is a common subsequence of Xm−1 and Y. If there were a common subsequence W of Xm−1 and Y with length greater than k, then W would also be a common subsequence of Xm and Y, contradicting the assumption that Z is an LCS of X and Y.
(3) The proof is symmetric to (2).
▪
The way that Theorem 14.1 characterizes longest common
subsequences says that an LCS of two sequences contains within it an
LCS of prefixes of the two sequences. Thus, the LCS problem has an
optimal-substructure property. A recursive solution also has the
overlapping-subproblems property, as we’ll see in a moment.
Step 2: A recursive solution
Theorem 14.1 implies that you should examine either one or two
subproblems when finding an LCS of X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉. If xm = yn, you need to find an LCS of Xm−1 and Yn−1.
Appending xm = yn to this LCS yields an LCS of X and Y. If xm ≠ yn, then you have to solve two subproblems: finding an LCS of Xm−1 and
Y and finding an LCS of X and Yn−1. Whichever of these two LCSs is longer is an LCS of X and Y. Because these cases exhaust all possibilities, one of the optimal subproblem solutions must appear
within an LCS of X and Y.
The LCS problem has the overlapping-subproblems property. Here’s
how. To find an LCS of X and Y, you might need to find the LCSs of X
and Yn−1 and of Xm−1 and Y. But each of these subproblems has the subsubproblem of finding an LCS of Xm−1 and Yn−1. Many other
subproblems share subsubproblems.
As in the matrix-chain multiplication problem, solving the LCS
problem recursively involves establishing a recurrence for the value of an
optimal solution. Let’s define c[ i, j] to be the length of an LCS of the sequences Xi and Yj. If either i = 0 or j = 0, one of the sequences has length 0, and so the LCS has length 0. The optimal substructure of the
LCS problem gives the recursive formula
In this recursive formulation, a condition in the problem restricts
which subproblems to consider. When xi = yj, you can and should consider the subproblem of finding an LCS of Xi−1 and Yj−1.
Otherwise, you instead consider the two subproblems of finding an LCS
of Xi and Yj−1 and of Xi−1 and Yj. In the previous dynamic-programming algorithms we have examined—for rod cutting and
matrix-chain multiplication—we didn’t rule out any subproblems due to
conditions in the problem. Finding an LCS is not the only dynamic-programming algorithm that rules out subproblems based on conditions
in the problem. For example, the edit-distance problem (see Problem 14-
5) has this characteristic.
Step 3: Computing the length of an LCS
Based on equation (14.9), you could write an exponential-time recursive
algorithm to compute the length of an LCS of two sequences. Since the
LCS problem has only Θ( mn) distinct subproblems (computing c[ i, j] for 0 ≤ i ≤ m and 0 ≤ j ≤ n), dynamic programming can compute the solutions bottom up.
The procedure LCS-LENGTH on the next page takes two sequences
X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 as inputs, along with their lengths. It stores the c[ i, j] values in a table c[0 : m, 0 : n], and it computes the entries in row-major order. That is, the procedure fills in
the first row of c from left to right, then the second row, and so on. The
procedure also maintains the table b[1 : m, 1 : n] to help in constructing an optimal solution. Intuitively, b[ i, j] points to the table entry corresponding to the optimal subproblem solution chosen when
computing c[ i, j]. The procedure returns the b and c tables, where c[ m, n]
contains the length of an LCS of X and Y. Figure 14.8 shows the tables produced by LCS-LENGTH on the sequences X = 〈 A, B, C, B, D, A, B〉
and Y = 〈 B, D, C, A, B, A〉. The running time of the procedure is Θ( mn), since each table entry takes Θ(1) time to compute.
LCS-LENGTH( X, Y, m, n)
1 let b[1 : m, 1 : n] and c[0 : m, 0 : n] be new tables 2 for i = 1 to m
3
c[ i, 0] = 0
4 for j = 0 to n
5
c[0, j] = 0
6 for i = 1 to m
// compute table entries in row-major order
7
for j = 1 to n
8
if xi == yj
c[ i, j] = c[ i − 1, j − 1] + 1
10
b[ i, j] = “↖”
11
elseif c[ i − 1, j] ≥ c[ i, j − 1]
12
c[ i, j] = c[ i − 1, j]
13
b[ i, j] = “↑”
14
else c[ i, j] = c[ i, j − 1]
15
b[ i, j] = “←”
16 return c and b
PRINT-LCS( b, X, i, j)
1 if i == 0 or j == 0
2
return
// the LCS has length 0
3 if b[ i, j] == “↖”
4
PRINT-LCS( b, X, i − 1, j − 1)
5
print xi
// same as yj
6 elseif b[ i, j] == “↑”
7
PRINT-LCS( b, X, i − 1, j)
8 else PRINT-LCS( b, X, i, j − 1)
Step 4: Constructing an LCS
With the b table returned by LCS-LENGTH, you can quickly construct
an LCS of X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉. Begin at b[ m, n]
and trace through the table by following the arrows. Each “↖”
encountered in an entry b[ i, j] implies that xi = yj is an element of the LCS that LCS-LENGTH found. This method gives you the elements of
this LCS in reverse order. The recursive procedure PRINT-LCS prints
out an LCS of X and Y in the proper, forward order.
Figure 14.8 The c and b tables computed by LCS-LENGTH on the sequences X = 〈 A, B, C, B, D, A, B〉 and Y = 〈 B, D, C, A, B, A〉. The square in row i and column j contains the value of c[ i, j] and the appropriate arrow for the value of b[ i, j]. The entry 4 in c[7, 6]—the lower right-hand corner of the table—is the length of an LCS 〈 B, C, B, A〉 of X and Y. For i, j > 0, entry c[ i, j]
depends only on whether xi = yj and the values in entries c[ i − 1, j], c[ i, j − 1], and c[ i − 1, j − 1], which are computed before c[ i, j]. To reconstruct the elements of an LCS, follow the b[ i, j] arrows from the lower right-hand corner, as shown by the sequence shaded blue. Each “↖” on the shaded-blue sequence corresponds to an entry (highlighted) for which xi = yj is a member of an LCS.
The initial call is PRINT-LCS( b, X, m, n). For the b table in Figure
14.8, this procedure prints BCBA. The procedure takes O( m + n) time,
since it decrements at least one of i and j in each recursive call.
Improving the code
Once you have developed an algorithm, you will often find that you can
improve on the time or space it uses. Some changes can simplify the
code and improve constant factors but otherwise yield no asymptotic
improvement in performance. Others can yield substantial asymptotic
savings in time and space.
In the LCS algorithm, for example, you can eliminate the b table
altogether. Each c[ i, j] entry depends on only three other c table entries: c[ i − 1, j − 1], c[ i − 1, j], and c[ i, j − 1]. Given the value of c[ i, j], you can
determine in O(1) time which of these three values was used to compute c[ i, j], without inspecting table b. Thus, you can reconstruct an LCS in O( m+ n) time using a procedure similar to PRINT-LCS. (Exercise 14.4-2
asks you to give the pseudocode.) Although this method saves Θ( mn)
space, the auxiliary space requirement for computing an LCS does not
asymptotically decrease, since the c table takes Θ( mn) space anyway.
You can, however, reduce the asymptotic space requirements for
LCS-LENGTH, since it needs only two rows of table c at a time: the
row being computed and the previous row. (In fact, as Exercise 14.4-4
asks you to show, you can use only slightly more than the space for one
row of c to compute the length of an LCS.) This improvement works if
you need only the length of an LCS. If you need to reconstruct the
elements of an LCS, the smaller table does not keep enough information
to retrace the algorithm’s steps in O( m + n) time.
Exercises
14.4-1
Determine an LCS of 〈1, 0, 0, 1, 0, 1, 0, 1〉 and 〈0, 1, 0, 1, 1, 0, 1, 1, 0〉.
14.4-2
Give pseudocode to reconstruct an LCS from the completed c table and
the original sequences X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 in O( m + n) time, without using the b table.
14.4-3
Give a memoized version of LCS-LENGTH that runs in O( mn) time.
14.4-4
Show how to compute the length of an LCS using only 2 · min { m, n}
entries in the c table plus O(1) additional space. Then show how to do
the same thing, but using min { m, n} entries plus O(1) additional space.
14.4-5
Give an O( n 2)-time algorithm to find the longest monotonically increasing subsequence of a sequence of n numbers.
Give an O( n lg n)-time algorithm to find the longest monotonically increasing subsequence of a sequence of n numbers. ( Hint: The last element of a candidate subsequence of length i is at least as large as the
last element of a candidate subsequence of length i −1. Maintain
candidate subsequences by linking them through the input sequence.)
14.5 Optimal binary search trees
Suppose that you are designing a program to translate text from English
to Latvian. For each occurrence of each English word in the text, you
need to look up its Latvian equivalent. You can perform these lookup
operations by building a binary search tree with n English words as keys
and their Latvian equivalents as satellite data. Because you will search
the tree for each individual word in the text, you want the total time
spent searching to be as low as possible. You can ensure an O(lg n) search time per occurrence by using a red-black tree or any other
balanced binary search tree. Words appear with different frequencies,
however, and a frequently used word such as the can end up appearing
far from the root while a rarely used word such as naumachia appears
near the root. Such an organization would slow down the translation,
since the number of nodes visited when searching for a key in a binary
search tree equals 1 plus the depth of the node containing the key. You
want words that occur frequently in the text to be placed nearer the
root. 8 Moreover, some words in the text might have no Latvian translation,9 and such words would not appear in the binary search tree at all. How can you organize a binary search tree so as to minimize the
number of nodes visited in all searches, given that you know how often
each word occurs?
What you need is an optimal binary search tree. Formally, given a
sequence K = 〈 k 1, k 2, …, kn〉 of n distinct keys such that k 1 < k 2 < … < kn, build a binary search tree containing them. For each key ki, you are given the probability pi that any given search is for key ki. Since some searches may be for values not in K, you also have n + 1 “dummy” keys
d 0, d 1, d 2, …, dn representing those values. In particular, d 0 represents all values less than k 1, dn represents all values greater than kn, and for i
= 1, 2, …, n − 1, the dummy key di represents all values between ki and ki+1. For each dummy key di, you have the probability qi that a search corresponds to di. Figure 14.9 shows two binary search trees for a set of n = 5 keys. Each key ki is an internal node, and each dummy key di is a leaf. Since every search is either successful (finding some key ki) or unsuccessful (finding some dummy key di), we have
Figure 14.9 Two binary search trees for a set of n = 5 keys with the following probabilities: i
0
1
2
3
4
5
pi
0.15
0.10
0.05
0.10
0.20
qi
0.05
0.10
0.05
0.05
0.05
0.10
(a) A binary search tree with expected search cost 2.80. (b) A binary search tree with expected search cost 2.75. This tree is optimal.
Knowing the probabilities of searches for each key and each dummy
key allows us to determine the expected cost of a search in a given
binary search tree T. Let us assume that the actual cost of a search equals the number of nodes examined, which is the depth of the node
found by the search in T, plus 1. Then the expected cost of a search in T
is
where depth T denotes a node’s depth in the tree T. The last equation
follows from equation (14.10). Figure 14.9 shows how to calculate the expected search cost node by node.
For a given set of probabilities, your goal is to construct a binary
search tree whose expected search cost is smallest. We call such a tree an
optimal binary search tree. Figure 14.9(a) shows one binary search tree, with expected cost 2.80, for the probabilities given in the figure caption.
Part (b) of the figure displays an optimal binary search tree, with
expected cost 2.75. This example demonstrates that an optimal binary
search tree is not necessarily a tree whose overall height is smallest. Nor
does an optimal binary search tree always have the key with the greatest
probability at the root. Here, key k 5 has the greatest search probability
of any key, yet the root of the optimal binary search tree shown is k 2.
(The lowest expected cost of any binary search tree with k 5 at the root is
2.85.)
As with matrix-chain multiplication, exhaustive checking of all
possibilities fails to yield an efficient algorithm. You can label the nodes
of any n-node binary tree with the keys k 1, k 2, …, kn to construct a binary search tree, and then add in the dummy keys as leaves. In
Problem 12-4 on page 329, we saw that the number of binary trees with
n nodes is Ω(4 n/ n 3/2). Thus you would need to examine an exponential number of binary search trees to perform an exhaustive search. We’ll see
how to solve this problem more efficiently with dynamic programming.
Step 1: The structure of an optimal binary search tree
To characterize the optimal substructure of optimal binary search trees,
we start with an observation about subtrees. Consider any subtree of a
binary search tree. It must contain keys in a contiguous range ki, …, kj,
for some 1 ≤ i ≤ j ≤ n. In addition, a subtree that contains keys ki, …, kj must also have as its leaves the dummy keys di−1, …, dj.
Now we can state the optimal substructure: if an optimal binary
search tree T has a subtree T′ containing keys ki, …, kj, then this subtree T′ must be optimal as well for the subproblem with keys ki, …,
kj and dummy keys di−1, …, dj. The usual cut-and-paste argument applies. If there were a subtree T″ whose expected cost is lower than that
of T′, then cutting T′ out of T and pasting in T″ would result in a binary search tree of lower expected cost than T, thus contradicting the
optimality of T.
With the optimal substructure in hand, here is how to construct an
optimal solution to the problem from optimal solutions to subproblems.
Given keys ki, …, kj, one of these keys, say kr ( i ≤ r ≤ j), is the root of an optimal subtree containing these keys. The left subtree of the root kr
contains the keys ki, …, kr−1 (and dummy keys di−1, …, dr−1), and the right subtree contains the keys kr+1, …, kj (and dummy keys dr, …, dj).
As long as you examine all candidate roots kr, where i ≤ r ≤ j, and you determine all optimal binary search trees containing ki, …, kr−1 and those containing kr+1, …, kj, you are guaranteed to find an optimal binary search tree.
There is one technical detail worth understanding about “empty”
subtrees. Suppose that in a subtree with keys ki, …, kj, you select ki as the root. By the above argument, ki’s left subtree contains the keys ki,