r 1 = 1 from solution 1 = 1

(no cuts),

r 2 = 5 from solution 2 = 2

(no cuts),

r 3 = 8 from solution 3 = 3

(no cuts),

r 4 = 10 from solution 4 = 2 + 2,

r 5 = 13 from solution 5 = 2 + 3,

r 6 = 17 from solution 6 = 6

(no cuts),

r 7 = 18 from solution 7 = 1 + 6 or 7 = 2 + 2 + 3,

r 8 = 22 from solution 8 = 2 + 6,

r 9 = 25 from solution 9 = 3 + 6,

r 10 = 30 from solution 10 = 10

(no cuts).

More generally, we can express the values rn for n ≥ 1 in terms of optimal revenues from shorter rods:

The first argument, pn, corresponds to making no cuts at all and selling

the rod of length n as is. The other n − 1 arguments to max correspond

to the maximum revenue obtained by making an initial cut of the rod

into two pieces of size i and ni, for each i = 1, 2, …, n − 1, and then optimally cutting up those pieces further, obtaining revenues ri and rni from those two pieces. Since you don’t know ahead of time which value

of i optimizes revenue, you have to consider all possible values for i and pick the one that maximizes revenue. You also have the option of

Image 474

picking no i at all if the greatest revenue comes from selling the rod uncut.

To solve the original problem of size n, you solve smaller problems of

the same type. Once you make the first cut, the two resulting pieces form

independent instances of the rod-cutting problem. The overall optimal

solution incorporates optimal solutions to the two resulting

subproblems, maximizing revenue from each of those two pieces. We say

that the rod-cutting problem exhibits optimal substructure: optimal

solutions to a problem incorporate optimal solutions to related

subproblems, which you may solve independently.

In a related, but slightly simpler, way to arrange a recursive structure

for the rod-cutting problem, let’s view a decomposition as consisting of

a first piece of length i cut off the left-hand end, and then a right-hand

remainder of length ni. Only the remainder, and not the first piece, may be further divided. Think of every decomposition of a length- n rod

in this way: as a first piece followed by some decomposition of the

remainder. Then we can express the solution with no cuts at all by

saying that the first piece has size i = n and revenue pn and that the remainder has size 0 with corresponding revenue r 0 = 0. We thus obtain

the following simpler version of equation (14.1):

In this formulation, an optimal solution embodies the solution to only

one related subproblem—the remainder—rather than two.

Recursive top-down implementation

The CUT-ROD procedure on the following page implements the

computation implicit in equation (14.2) in a straightforward, top-down,

recursive manner. It takes as input an array p[1 : n] of prices and an integer n, and it returns the maximum revenue possible for a rod of length n. For length n = 0, no revenue is possible, and so CUT-ROD

returns 0 in line 2. Line 3 initializes the maximum revenue q to −∞, so

that the for loop in lines 4–5 correctly computes q = max { pi + CUT-

ROD( p, ni) : 1 ≤ in}. Line 6 then returns this value. A simple

Image 475

induction on n proves that this answer is equal to the desired answer rn, using equation (14.2).

CUT-ROD( p, n)

1 if n == 0

2

return 0

3 q = −∞

4 for i = 1 to n

5

q = max { q, p[ i] + CUT-ROD( p, ni)}

6 return q

If you code up CUT-ROD in your favorite programming language

and run it on your computer, you’ll find that once the input size

becomes moderately large, your program takes a long time to run. For n

= 40, your program may take several minutes and possibly more than an

hour. For large values of n, you’ll also discover that each time you increase n by 1, your program’s running time approximately doubles.

Why is CUT-ROD so inefficient? The problem is that CUT-ROD

calls itself recursively over and over again with the same parameter

values, which means that it solves the same subproblems repeatedly.

Figure 14.3 shows a recursion tree demonstrating what happens for n =

4: CUT-ROD( p, n) calls CUT-ROD( p, ni) for i = 1, 2, …, n.

Equivalently, CUT-ROD( p, n) calls CUT-ROD( p, j) for each j = 0, 1,

…, n − 1. When this process unfolds recursively, the amount of work

done, as a function of n, grows explosively.

To analyze the running time of CUT-ROD, let T( n) denote the total

number of calls made to CUT-ROD( p, n) for a particular value of n.

This expression equals the number of nodes in a subtree whose root is

labeled n in the recursion tree. The count includes the initial call at its

root. Thus, T(0) = 1 and

Image 476

Image 477

The initial 1 is for the call at the root, and the term T( j) counts the number of calls (including recursive calls) due to the call CUT-ROD( p,

ni), where j = ni. As Exercise 14.1-1 asks you to show, and so the running time of CUT-ROD is exponential in n.

In retrospect, this exponential running time is not so surprising.

CUT-ROD explicitly considers all possible ways of cutting up a rod of

length n. How many ways are there? A rod of length n has n − 1

potential locations to cut. Each possible way to cut up the rod makes a

cut at some subset of these n − 1 locations, including the empty set, which makes for no cuts. Viewing each cut location as a distinct member

of a set of n − 1 elements, you can see that there are 2 n−1 subsets. Each leaf in the recursion tree of Figure 14.3 corresponds to one possible way to cut up the rod. Hence, the recursion tree has 2 n−1 leaves. The labels

on the simple path from the root to a leaf give the sizes of each

remaining right-hand piece before making each cut. That is, the labels

give the corresponding cut points, measured from the right-hand end of

the rod.

Figure 14.3 The recursion tree showing recursive calls resulting from a call CUT-ROD( p, n) for n = 4. Each node label gives the size n of the corresponding subproblem, so that an edge from a parent with label s to a child with label t corresponds to cutting off an initial piece of size st and leaving a remaining subproblem of size t. A path from the root to a leaf corresponds to one of the 2 n−1 ways of cutting up a rod of length n. In general, this recursion tree has 2 n nodes and 2 n−1 leaves.

Using dynamic programming for optimal rod cutting

Now, let’s see how to use dynamic programming to convert CUT-ROD

into an efficient algorithm.

The dynamic-programming method works as follows. Instead of

solving the same subproblems repeatedly, as in the naive recursion

solution, arrange for each subproblem to be solved only once. There’s actually an obvious way to do so: the first time you solve a subproblem,

save its solution. If you need to refer to this subproblem’s solution again

later, just look it up, rather than recomputing it.

Saving subproblem solutions comes with a cost: the additional

memory needed to store solutions. Dynamic programming thus serves

as an example of a time-memory trade-off. The savings may be dramatic.

For example, we’re about to use dynamic programming to go from the

exponential-time algorithm for rod cutting down to a Θ( n 2)-time

algorithm. A dynamic-programming approach runs in polynomial time

when the number of distinct subproblems involved is polynomial in the

input size and you can solve each such subproblem in polynomial time.

There are usually two equivalent ways to implement a dynamic-

programming approach. Solutions to the rod-cutting problem illustrate

both of them.

The first approach is top-down with memoization. 2 In this approach, you write the procedure recursively in a natural manner, but modified to

save the result of each subproblem (usually in an array or hash table).

The procedure now first checks to see whether it has previously solved

this subproblem. If so, it returns the saved value, saving further

computation at this level. If not, the procedure computes the value in

the usual manner but also saves it. We say that the recursive procedure

has been memoized: it “remembers” what results it has computed

previously.

The second approach is the bottom-up method. This approach

typically depends on some natural notion of the “size” of a subproblem,

such that solving any particular subproblem depends only on solving

“smaller” subproblems. Solve the subproblems in size order, smallest

first, storing the solution to each subproblem when it is first solved. In

this way, when solving a particular subproblem, there are already saved solutions for all of the smaller subproblems its solution depends upon.

You need to solve each subproblem only once, and when you first see it,

you have already solved all of its prerequisite subproblems.

These two approaches yield algorithms with the same asymptotic

running time, except in unusual circumstances where the top-down

approach does not actually recurse to examine all possible subproblems.

The bottom-up approach often has much better constant factors, since

it has lower overhead for procedure calls.

The procedures MEMOIZED-CUT-ROD and MEMOIZED-CUT-

ROD-AUX on the facing page demonstrate how to memoize the top-

down CUT-ROD procedure. The main procedure MEMOIZED-CUT-

ROD initializes a new auxiliary array r[0 : n] with the value −∞ which, since known revenue values are always nonnegative, is a convenient

choice for denoting “unknown.” MEMOIZED-CUT-ROD then calls

its helper procedure, MEMOIZED-CUT-ROD-AUX, which is just the

memoized version of the exponential-time procedure, CUT-ROD. It

first checks in line 1 to see whether the desired value is already known

and, if it is, then line 2 returns it. Otherwise, lines 3–7 compute the

desired value q in the usual manner, line 8 saves it in r[ n], and line 9

returns it.

The bottom-up version, BOTTOM-UP-CUT-ROD on the next

page, is even simpler. Using the bottom-up dynamic-programming

approach, BOTTOM-UP-CUT-ROD takes advantage of the natural

ordering of the subproblems: a subproblem of size i is “smaller” than a

subproblem of size j if i < j. Thus, the procedure solves subproblems of sizes j = 0, 1, …, n, in that order.

MEMOIZED-CUT-ROD( p, n)

1 let r[0 : n] be a new array

// will remember solution values in r

2 for i = 0 to n

3

r[ i] = −∞

4 return MEMOIZED-CUT-ROD-AUX( p, n, r)

MEMOIZED-CUT-ROD-AUX( p, n, r)

1 if r[ n] ≥ 0

// already have a solution for length n?

2

return r[ n]

3 if n == 0

4

q = 0

5 else q = −∞

6

for i = 1 to n // i is the position of the first cut

7

q = max { q, p[ i] + MEMOIZED-CUT-ROD-AUX( p, ni, r)}

8 r[ n] = q

// remember the solution value for length n

9 return q

BOTTOM-UP-CUT-ROD( p, n)

1 let r[0 : n] be a new array // will remember solution values in r 2 r[0] = 0

3 for j = 1 to n

// for increasing rod length j

4

q = −∞

5

for i = 1 to j

// i is the position of the first cut

6

q = max { q, p[ i] + r[ ji]}

7

r[ j] = q

// remember the solution value for length j

8 return r[ n]

Line 1 of BOTTOM-UP-CUT-ROD creates a new array r[0 : n] in

which to save the results of the subproblems, and line 2 initializes r[0] to

0, since a rod of length 0 earns no revenue. Lines 3–6 solve each

subproblem of size j, for j = 1, 2, …, n, in order of increasing size. The approach used to solve a problem of a particular size j is the same as

that used by CUT-ROD, except that line 6 now directly references array

entry r[ ji] instead of making a recursive call to solve the subproblem of size ji. Line 7 saves in r[ j] the solution to the subproblem of size j.

Finally, line 8 returns r[ n], which equals the optimal value rn.

The bottom-up and top-down versions have the same asymptotic

running time. The running time of BOTTOM-UP-CUT-ROD is Θ( n 2),

due to its doubly nested loop structure. The number of iterations of its

inner for loop, in lines 5–6, forms an arithmetic series. The running time

of its top-down counterpart, MEMOIZEDCUT-ROD, is also Θ( n 2),

Image 478

although this running time may be a little harder to see. Because a

recursive call to solve a previously solved subproblem returns

immediately, MEMOIZED-CUT-ROD solves each subproblem just

once. It solves subproblems for sizes 0, 1, …, n. To solve a subproblem

of size n, the for loop of lines 6–7 iterates n times. Thus, the total number of iterations of this for loop, over all recursive calls of

MEMOIZED-CUT-ROD, forms an arithmetic series, giving a total of

Θ( n 2) iterations, just like the inner for loop of BOTTOM-UP-CUT-

ROD. (We actually are using a form of aggregate analysis here. We’ll see

aggregate analysis in detail in Section 16.1.)

Figure 14.4 The subproblem graph for the rod-cutting problem with n = 4. The vertex labels give the sizes of the corresponding subproblems. A directed edge ( x, y) indicates that solving subproblem x requires a solution to subproblem y. This graph is a reduced version of the recursion tree of Figure 14.3, in which all nodes with the same label are collapsed into a single vertex and all edges go from parent to child.

Subproblem graphs

When you think about a dynamic-programming problem, you need to

understand the set of subproblems involved and how subproblems

depend on one another.

The subproblem graph for the problem embodies exactly this

information. Figure 14.4 shows the subproblem graph for the rod-cutting problem with n = 4. It is a directed graph, containing one vertex

for each distinct subproblem. The subproblem graph has a directed edge

from the vertex for subproblem x to the vertex for subproblem y if

determining an optimal solution for subproblem x involves directly considering an optimal solution for subproblem y. For example, the

subproblem graph contains an edge from x to y if a top-down recursive

procedure for solving x directly calls itself to solve y. You can think of the subproblem graph as a “reduced” or “collapsed” version of the

recursion tree for the top-down recursive method, with all nodes for the

same subproblem coalesced into a single vertex and all edges directed

from parent to child.

The bottom-up method for dynamic programming considers the

vertices of the subproblem graph in such an order that you solve the

subproblems y adjacent to a given subproblem x before you solve subproblem x. (As Section B.4 notes, the adjacency relation in a directed graph is not necessarily symmetric.) Using terminology that

we’ll see in Section 20.4, in a bottom-up dynamic-programming algorithm, you consider the vertices of the subproblem graph in an

order that is a “reverse topological sort,” or a “topological sort of the

transpose” of the subproblem graph. In other words, no subproblem is

considered until all of the subproblems it depends upon have been

solved. Similarly, using notions that we’ll visit in Section 20.3, you can view the top-down method (with memoization) for dynamic

programming as a “depth-first search” of the subproblem graph.

The size of the subproblem graph G = ( V, E) can help you determine the running time of the dynamic-programming algorithm. Since you

solve each subproblem just once, the running time is the sum of the

times needed to solve each subproblem. Typically, the time to compute

the solution to a subproblem is proportional to the degree (number of

outgoing edges) of the corresponding vertex in the subproblem graph,

and the number of subproblems is equal to the number of vertices in the

subproblem graph. In this common case, the running time of dynamic

programming is linear in the number of vertices and edges.

Reconstructing a solution

The procedures MEMOIZED-CUT-ROD and BOTTOM-UP-CUT-

ROD return the value of an optimal solution to the rod-cutting

problem, but they do not return the solution itself: a list of piece sizes.

Let’s see how to extend the dynamic-programming approach to

record not only the optimal value computed for each subproblem, but

also a choice that led to the optimal value. With this information, you

can readily print an optimal solution. The procedure EXTENDED-

BOTTOM-UP-CUT-ROD on the next page computes, for each rod size

j, not only the maximum revenue rj, but also sj, the optimal size of the first piece to cut off. It’s similar to BOTTOM-UP-CUT-ROD, except

that it creates the array s in line 1, and it updates s[ j] in line 8 to hold the optimal size i of the first piece to cut off when solving a subproblem of

size j.

The procedure PRINT-CUT-ROD-SOLUTION on the following

page takes as input an array p[1 : n] of prices and a rod size n. It calls EXTENDED-BOTTOM-UP-CUT-ROD to compute the array s[1 : n]

of optimal first-piece sizes. Then it prints out the complete list of piece

sizes in an optimal decomposition of a rod of length n. For the sample

price chart appearing in Figure 14.1, the call EXTENDED-BOTTOM-

UP-CUT-ROD( p, 10) returns the following arrays:

i

0 1 2 3 4 5 6

7 8 9

10

r[ i] 0 1 5 8 10 13 17 18 22 25 30

s[ i]

1 2 3 2 2 6

1 2 3

10

A call to PRINT-CUT-ROD-SOLUTION( p, 10) prints just 10, but a

call with n = 7 prints the cuts 1 and 6, which correspond to the first optimal decomposition for r 7 given earlier.

EXTENDED-BOTTOM-UP-CUT-ROD( p, n)

1 let r[0 : n] and s[1 : n] be new arrays

2 r[0] = 0

3 for j = 1 to n

// for increasing rod length j

4

q = −∞

5

for i = 1 to j

// i is the position of the first cut

6

if q < p[ i] + r[ ji]

7

q = p[ i] + r[ ji]

8

s[ j] = i

// best cut location so far for length j

9

r[ j] = q

// remember the solution value for length j

10 return r and s

PRINT-CUT-ROD-SOLUTION( p, n)

1 ( r, s) = EXTENDED-BOTTOM-UP-CUT-ROD( p, n)

2 while n > 0

3

print s[ n]

// cut location for length n

4

n = ns[ n]

// length of the remainder of the rod

Exercises

14.1-1

Show that equation (14.4) follows from equation (14.3) and the initial

condition T(0) = 1.

14.1-2

Show, by means of a counterexample, that the following “greedy”

strategy does not always determine an optimal way to cut rods. Define

the density of a rod of length i to be pi/ i, that is, its value per inch. The greedy strategy for a rod of length n cuts off a first piece of length i, where 1 ≤ in, having maximum density. It then continues by applying

the greedy strategy to the remaining piece of length ni.

14.1-3

Consider a modification of the rod-cutting problem in which, in

addition to a price pi for each rod, each cut incurs a fixed cost of c. The revenue associated with a solution is now the sum of the prices of the

pieces minus the costs of making the cuts. Give a dynamic-programming

algorithm to solve this modified problem.

14.1-4

Modify CUT-ROD and MEMOIZED-CUT-ROD-AUX so that their

for loops go up to only ⌊ n/2⌊, rather than up to n. What other changes

to the procedures do you need to make? How are their running times

affected?

14.1-5

Image 479

Modify MEMOIZED-CUT-ROD to return not only the value but the

actual solution.

14.1-6

The Fibonacci numbers are defined by recurrence (3.31) on page 69.

Give an O( n)-time dynamic-programming algorithm to compute the n th Fibonacci number. Draw the subproblem graph. How many vertices

and edges does the graph contain?

14.2 Matrix-chain multiplication

Our next example of dynamic programming is an algorithm that solves

the problem of matrix-chain multiplication. Given a sequence (chain)

A 1, A 2, …, An〉 of n matrices to be multiplied, where the matrices aren’t necessarily square, the goal is to compute the product

using the standard algorithm3 for multiplying rectangular matrices, which we’ll see in a moment, while minimizing the number of scalar

multiplications.

You can evaluate the expression (14.5) using the algorithm for

multiplying pairs of rectangular matrices as a subroutine once you have

parenthesized it to resolve all ambiguities in how the matrices are

multiplied together. Matrix multiplication is associative, and so all

parenthesizations yield the same product. A product of matrices is fully

parenthesized if it is either a single matrix or the product of two fully parenthesized matrix products, surrounded by parentheses. For

example, if the chain of matrices is 〈 A 1, A 2, A 3, A 4〉, then you can fully parenthesize the product A 1 A 2 A 3 A 4 in five distinct ways: ( A 1( A 2( A 3 A 4))),

( A 1(( A 2 A 3) A 4)),

(( A 1 A 2)( A 3 A 4)),

(( A 1( A 2 A 3)) A 4),

((( A 1 A 2) A 3) A 4).

How you parenthesize a chain of matrices can have a dramatic

impact on the cost of evaluating the product. Consider first the cost of

multiplying two rectangular matrices. The standard algorithm is given

by the procedure RECTANGULAR-MATRIX-MULTIPLY, which

generalizes the square-matrix multiplication procedure MATRIX-

MULTIPLY on page 81. The RECTANGULAR-MATRIX-

MULTIPLY procedure computes C = C + A · B for three matrices A =

( aij), B = ( bij), and C = ( cij), where A is p × q, B is q × r, and C is p × r.

RECTANGULAR-MATRIX-MULTIPLY( A, B, C, p, q, r) 1 for i = 1 to p

2

for j = 1 to r

3

for k = 1 to q

4

cij = cij + aik · bkj

The running time of RECTANGULAR-MATRIX-MULTIPLY is

dominated by the number of scalar multiplications in line 4, which is

pqr. Therefore, we’ll consider the cost of multiplying matrices to be the

number of scalar multiplications. (The number of scalar multiplications

dominates even if we consider initializing C = 0 to perform just C = A

· B.)To illustrate the different costs incurred by different

parenthesizations of a matrix product, consider the problem of a chain

A 1, A 2, A 3〉 of three matrices. Suppose that the dimensions of the matrices are 10 × 100, 100 × 5, and 5 × 50, respectively. Multiplying

according to the parenthesization (( A 1 A 2) A 3) performs 10 · 100 · 5 =

5000 scalar multiplications to compute the 10 × 5 matrix product A 1 A 2, plus another 10 · 5 · 50 = 2500 scalar multiplications to multiply this

matrix by A 3, for a total of 7500 scalar multiplications. Multiplying according to the alternative parenthesization ( A 1( A 2 A 3)) performs 100 ·

5 · 50 = 25,000 scalar multiplications to compute the 100 × 50 matrix

Image 480

product A 2 A 3, plus another 10 · 100 · 50 = 50,000 scalar multiplications to multiply A 1 by this matrix, for a total of 75,000 scalar

multiplications. Thus, computing the product according to the first

parenthesization is 10 times faster.

We state the matrix-chain multiplication problem as follows: given a

chain 〈 A 1, A 2, …, An〉 of n matrices, where for i = 1, 2, …, n, matrix Ai has dimension pi−1 × pi, fully parenthesize the product A 1 A 2 ⋯ An in a way that minimizes the number of scalar multiplications. The input is

the sequence of dimensions 〈 p 0, p 1, p 2, …, pn〉.

The matrix-chain multiplication problem does not entail actually

multiplying matrices. The goal is only to determine an order for

multiplying matrices that has the lowest cost. Typically, the time

invested in determining this optimal order is more than paid for by the

time saved later on when actually performing the matrix multiplications

(such as performing only 7500 scalar multiplications instead of 75,000).

Counting the number of parenthesizations

Before solving the matrix-chain multiplication problem by dynamic

programming, let us convince ourselves that exhaustively checking all

possible parenthesizations is not an efficient algorithm. Denote the

number of alternative parenthesizations of a sequence of n matrices by

P( n). When n = 1, the sequence consists of just one matrix, and therefore there is only one way to fully parenthesize the matrix product.

When n ≥ 2, a fully parenthesized matrix product is the product of two

fully parenthesized matrix subproducts, and the split between the two

subproducts may occur between the k th and ( k + 1)st matrices for any k

= 1, 2, …, n − 1. Thus, we obtain the recurrence

Problem 12-4 on page 329 asked you to show that the solution to a

similar recurrence is the sequence of Catalan numbers, which grows as

Ω(4 n/ n 3/2). A simpler exercise (see Exercise 14.2-3) is to show that the solution to the recurrence (14.6) is Ω(2 n). The number of solutions is thus exponential in n, and the brute-force method of exhaustive search

makes for a poor strategy when determining how to optimally

parenthesize a matrix chain.

Applying dynamic programming

Let’s use the dynamic-programming method to determine how to

optimally parenthesize a matrix chain, by following the four-step

sequence that we stated at the beginning of this chapter:

1. Characterize the structure of an optimal solution.

2. Recursively define the value of an optimal solution.

3. Compute the value of an optimal solution.

4. Construct an optimal solution from computed information.

We’ll go through these steps in order, demonstrating how to apply each

step to the problem.

Step 1: The structure of an optimal parenthesization

In the first step of the dynamic-programming method, you find the

optimal substructure and then use it to construct an optimal solution to

the problem from optimal solutions to subproblems. To perform this

step for the matrix-chain multiplication problem, it’s convenient to first

introduce some notation. Let Ai: j, where ij, denote the matrix that results from evaluating the product AiAi+1 ⋯ Aj. If the problem is nontrivial, that is, i < j, then to parenthesize the product AiAi+1 ⋯ Aj, the product must split between Ak and Ak+1 for some integer k in the range ik < j. That is, for some value of k, first compute the matrices Ai: k and Ak+1: j, and then multiply them together to produce the final product Ai: j. The cost of parenthesizing this way is the cost of

computing the matrix Ai: k, plus the cost of computing Ak+1: j, plus the cost of multiplying them together.

The optimal substructure of this problem is as follows. Suppose that

to optimally parenthesize AiAi+1 ⋯ Aj, you split the product between

Ak and Ak+1. Then the way you parenthesize the “prefix” subchain AiAi+1 ⋯ Ak within this optimal parenthesization of AiAi+1 ⋯ Aj must be an optimal parenthesization of AiAi+1 ⋯ Ak. Why? If there were a less costly way to parenthesize AiAi+1 ⋯ Ak, then you could substitute that parenthesization in the optimal parenthesization of

AiAi+1 ⋯ Aj to produce another way to parenthesize AiAi+1 ⋯ Aj whose cost is lower than the optimum: a contradiction. A similar

observation holds for how to parenthesize the subchain Ak+1 Ak+2 ⋯

Aj in the optimal parenthesization of AiAi+1 ⋯ Aj: it must be an optimal parenthesization of Ak+1 Ak+2 ⋯ Aj.

Now let’s use the optimal substructure to show how to construct an

optimal solution to the problem from optimal solutions to subproblems.

Any solution to a nontrivial instance of the matrix-chain multiplication

problem requires splitting the product, and any optimal solution

contains within it optimal solutions to subproblem instances. Thus, to

build an optimal solution to an instance of the matrix-chain

multiplication problem, split the problem into two subproblems

(optimally parenthesizing AiAi+1 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj), find optimal solutions to the two subproblem instances, and then combine

these optimal subproblem solutions. To ensure that you’ve examined the

optimal split, you must consider all possible splits.

Step 2: A recursive solution

The next step is to define the cost of an optimal solution recursively in

terms of the optimal solutions to subproblems. For the matrix-chain

multiplication problem, a subproblem is to determine the minimum cost

of parenthesizing AiAi+1 ⋯ Aj for 1 ≤ ijn. Given the input

Image 481

dimensions 〈 p 0, p 1, p 2, …, pn〉, an index pair i, j specifies a subproblem.

Let m[ i, j] be the minimum number of scalar multiplications needed to compute the matrix Ai: j. For the full problem, the lowest-cost way to compute A 1: n is thus m[1, n].

We can define m[ i, j] recursively as follows. If i = j, the problem is trivial: the chain consists of just one matrix Ai: i = Ai, so that no scalar multiplications are necessary to compute the product. Thus, m[ i, i] = 0

for i = 1, 2, …, n. To compute m[ i, j] when i < j, we take advantage of the structure of an optimal solution from step 1. Suppose that an

optimal parenthesization splits the product AiAi+1 ⋯ Aj between Ak and Ak+1, where ik < j. Then, m[ i, j] equals the minimum cost m[ i, k]

for computing the subproduct Ai: k, plus the minimum cost m[ k+1, j] for computing the subproduct, Ak+1: j, plus the cost of multiplying these

two matrices together. Because each matrix Ai is pi−1 × pi, computing the matrix product Ai: kAk+1: j takes pi−1 pk pj scalar multiplications.

Thus, we obtain

m[ i, j] = m[ i, k] + m[ k + 1, j] + pi−1 pk pj.

This recursive equation assumes that you know the value of k. But

you don’t, at least not yet. You have to try all possible values of k. How

many are there? Just ji, namely k = i, i + 1, …, j − 1. Since the optimal parenthesization must use one of these values for k, you need

only check them all to find the best. Thus, the recursive definition for

the minimum cost of parenthesizing the product AiAi+1 ⋯ Aj becomes

The m[ i, j] values give the costs of optimal solutions to subproblems, but they do not provide all the information you need to construct an

optimal solution. To help you do so, let’s define s[ i, j] to be a value of k at which you split the product AiAi+1 ⋯ Aj in an optimal

Image 482

parenthesization. That is, s[ i, j] equals a value k such that m[ i, j] = m[ i, k]

+ m[ k + 1, j] + pi−1 pk pj.

Step 3: Computing the optimal costs

At this point, you could write a recursive algorithm based on recurrence

(14.7) to compute the minimum cost m[1, n] for multiplying A 1 A 2 ⋯

An. But as we saw for the rod-cutting problem, and as we shall see in

Section 14.3, this recursive algorithm takes exponential time. That’s no better than the brute-force method of checking each way of

parenthesizing the product.

Fortunately, there aren’t all that many distinct subproblems: just one

subproblem for each choice of i and j satisfying 1 ≤ ijn, or in all.4 A recursive algorithm may encounter each

subproblem many times in different branches of its recursion tree. This

property of overlapping subproblems is the second hallmark of when

dynamic programming applies (the first hallmark being optimal

substructure).

Instead of computing the solution to recurrence (14.7) recursively,

let’s compute the optimal cost by using a tabular, bottom-up approach,

as in the procedure MATRIX-CHAIN-ORDER. (The corresponding

top-down approach using memoization appears in Section 14.3. ) The input is a sequence p = 〈 p 0, p 1, …, pn〉 of matrix dimensions, along with n, so that for i = 1, 2, …, n, matrix Ai has dimensions pi−1 × pi. The procedure uses an auxiliary table m[1 : n, 1 : n] to store the m[ i, j] costs and another auxiliary table s[1 : n − 1, 2 : n] that records which index k achieved the optimal cost in computing m[ i, j]. The table s will help in constructing an optimal solution.

MATRIX-CHAIN-ORDER( p, n)

1 let m[1 : n, 1 : n] and s[1 : n − 1, 2 : n] be new tables 2 for i = 1 to n

// chain length 1

3

m[ i, i] = 0

4 for l = 2 to n

// l is the chain length

5

for i = 1 to nl + 1

// chain begins at Ai

6

j = i + l − 1

// chain ends at Aj

7

m[ i, j] = ∞

8

for k = i to j − 1

// try Ai: kAk+1: j

9

q = m[ i, k] + m[ k + 1, j] + pi−1 pk pj 10

if q < m[ i, j]

11

m[ i, j] = q

// remember this cost

12

s[ i, j] = k

// remember this index

13 return m and s

In what order should the algorithm fill in the table entries? To answer

this question, let’s see which entries of the table need to be accessed

when computing the cost m[ i, j]. Equation (14.7) tells us that to compute the cost of matrix product Ai: j, first the costs of the products Ai: k and Ak+1: j need to have been computed for all k = i, i + 1, …, j − 1. The chain AiAi+1 ⋯ Aj consists of ji + 1 matrices, and the chains AiAi+1

Ak and Ak+1 Ak+2 … Aj consist of ki + 1 and jk matrices, respectively. Since k < j, a chain of ki + 1 matrices consists of fewer than ji + 1 matrices. Likewise, since ki, a chain of jk matrices consists of fewer than ji + 1 matrices. Thus, the algorithm should fill in the table m from shorter matrix chains to longer matrix chains. That

is, for the subproblem of optimally parenthesizing the chain AiAi+1 ⋯

Aj, it makes sense to consider the subproblem size as the length ji + 1

of the chain.

Now, let’s see how the MATRIX-CHAIN-ORDER procedure fills in

the m[ i, j] entries in order of increasing chain length. Lines 2–3 initialize m[ i, i] = 0 for i = 1, 2, …, n, since any matrix chain with just one matrix requires no scalar multiplications. In the for loop of lines 4–12, the loop

variable l denotes the length of matrix chains whose minimum costs are

being computed. Each iteration of this loop uses recurrence (14.7) to

compute m[ i, i + l − 1] for i = 1, 2, …, nl + 1. In the first iteration, l =

2, and so the loop computes m[ i, i + 1] for i = 1, 2, …, n − 1: the minimum costs for chains of length l = 2. The second time through the

loop, it computes m[ i, i + 2] for i = 1, 2, …, n − 2: the minimum costs for chains of length l = 3. And so on, ending with a single matrix chain

of length l = n and computing m[1, n]. When lines 7–12 compute an m[ i, j] cost, this cost depends only on table entries m[ i, k] and m[ k + 1, j], which have already been computed.

Figure 14.5 illustrates the m and s tables, as filled in by the MATRIX-CHAIN-ORDER procedure on a chain of n = 6 matrices.

Since m[ i, j] is defined only for ij, only the portion of the table m on or above the main diagonal is used. The figure shows the table rotated to

make the main diagonal run horizontally. The matrix chain is listed

along the bottom. Using this layout, the minimum cost m[ i, j] for multiplying a subchain AiAi+1 ⋯ Aj of matrices appears at the intersection of lines running northeast from Ai and northwest from Aj.

Reading across, each diagonal in the table contains the entries for

matrix chains of the same length. MATRIX-CHAIN-ORDER

computes the rows from bottom to top and from left to right within

each row. It computes each entry m[ i, j] using the products pi−1 pk pj for k = i, i + 1, …, j − 1 and all entries southwest and southeast from m[ i, j].

A simple inspection of the nested loop structure of MATRIX-

CHAIN-ORDER yields a running time of O( n 3) for the algorithm. The

loops are nested three deep, and each loop index ( l, i, and k) takes on at most n − 1 values. Exercise 14.2-5 asks you to show that the running time of this algorithm is in fact also Ω( n 3). The algorithm requires Θ( n 2) space to store the m and s tables. Thus, MATRIX-CHAIN-ORDER is

much more efficient than the exponential-time method of enumerating

all possible parenthesizations and checking each one.

Image 483

Image 484

Figure 14.5 The m and s tables computed by MATRIX-CHAIN-ORDER for n = 6 and the following matrix dimensions:

matrix

A 1

A 2

A 3

A 4

A 5

A 6

dimension 30 × 35 35 × 15 15 × 5 5 × 10 10 × 20 20 × 25

The tables are rotated so that the main diagonal runs horizontally. The m table uses only the main diagonal and upper triangle, and the s table uses only the upper triangle. The minimum number of scalar multiplications to multiply the 6 matrices is m[1, 6] = 15,125. Of the entries that are not tan, the pairs that have the same color are taken together in line 9 when computing Step 4: Constructing an optimal solution

Although MATRIX-CHAIN-ORDER determines the optimal number

of scalar multiplications needed to compute a matrix-chain product, it

does not directly show how to multiply the matrices. The table s[1 : n

1, 2 : n] provides the information needed to do so. Each entry s[ i, j]

records a value of k such that an optimal parenthesization of AiAi+1 ⋯

Aj splits the product between Ak and Ak+1. The final matrix multiplication in computing A 1: n optimally is A 1: s[1, n] As[1, n]+1: n. The s table contains the information needed to determine the earlier matrix

multiplications as well, using recursion: s[1, s[1, n]] determines the last matrix multiplication when computing A 1: s[1, n] and s[ s[1, n] + 1, n]

determines the last matrix multiplication when computing As[1, n]+1: n.

The recursive procedure PRINT-OPTIMAL-PARENS on the facing

page prints an optimal parenthesization of the matrix chain product

AiAi+1 ⋯ Aj, given the s table computed by MATRIX-CHAINORDER and the indices i and j. The initial call PRINT-OPTIMAL-PARENS( s, 1, n) prints an optimal parenthesization of the full matrix

chain product A 1 A 2 ⋯ An. In the example of Figure 14.5, the call PRINT-OPTIMAL-PARENS( s,

1,

6)

prints

the

optimal

parenthesization (( A 1( A 2 A 3))(( A 4 A 5) A 6)).

PRINT-OPTIMAL-PARENS( s, i, j)

1 if i == j

2

print “Ai

3 else print “(”

4

PRINT-OPTIMAL-PARENS( s, i, s[ i, j])

5

PRINT-OPTIMAL-PARENS( s, s[ i, j] + 1, j)

6

print “)”

Exercises

14.2-1

Find an optimal parenthesization of a matrix-chain product whose

sequence of dimensions is 〈5, 10, 3, 12, 5, 50, 6〉.

14.2-2

Give a recursive algorithm MATRIX-CHAIN-MULTIPLY( A, s, i, j) that actually performs the optimal matrix-chain multiplication, given

the sequence of matrices 〈 A 1, A 2, …, An〉, the s table computed by MATRIX-CHAIN-ORDER, and the indices i and j. (The initial call is

MATRIX-CHAIN-MULTIPLY( A, s, 1, n).) Assume that the call

RECTANGULAR-MATRIX-MULTIPLY( A, B) returns the product

of matrices A and B.

Image 485

14.2-3

Use the substitution method to show that the solution to the recurrence

(14.6) is Ω(2 n).

14.2-4

Describe the subproblem graph for matrix-chain multiplication with an

input chain of length n. How many vertices does it have? How many

edges does it have, and which edges are they?

14.2-5

Let R( i, j) be the number of times that table entry m[ i, j] is referenced while computing other table entries in a call of MATRIX-CHAIN-ORDER. Show that the total number of references for the entire table is

( Hint: You may find equation (A.4) on page 1141 useful.)

14.2-6

Show that a full parenthesization of an n-element expression has exactly

n − 1 pairs of parentheses.

14.3 Elements of dynamic programming

Although you have just seen two complete examples of the dynamic-

programming method, you might still be wondering just when the

method applies. From an engineering perspective, when should you look

for a dynamic-programming solution to a problem? In this section, we’ll

examine the two key ingredients that an optimization problem must

have in order for dynamic programming to apply: optimal substructure

and overlapping subproblems. We’ll also revisit and discuss more fully

how memoization might help you take advantage of the overlapping-

subproblems property in a top-down recursive approach.

Optimal substructure

The first step in solving an optimization problem by dynamic

programming is to characterize the structure of an optimal solution.

Recall that a problem exhibits optimal substructure if an optimal

solution to the problem contains within it optimal solutions to

subproblems. When a problem exhibits optimal substructure, that gives

you a good clue that dynamic programming might apply. (As Chapter

15 discusses, it also might mean that a greedy strategy applies, however.)

Dynamic programming builds an optimal solution to the problem from

optimal solutions to subproblems. Consequently, you must take care to

ensure that the range of subproblems you consider includes those used

in an optimal solution.

Optimal substructure was key to solving both of the previous

problems in this chapter. In Section 14.1, we observed that the optimal way of cutting up a rod of length n (if Serling Enterprises makes any cuts at all) involves optimally cutting up the two pieces resulting from

the first cut. In Section 14.2, we noted that an optimal parenthesization of the matrix chain product AiAi+1 ⋯ Aj that splits the product between Ak and Ak+1 contains within it optimal solutions to the problems of parenthesizing AiAi+1 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj.

You will find yourself following a common pattern in discovering

optimal substructure:

1. You show that a solution to the problem consists of making a

choice, such as choosing an initial cut in a rod or choosing an

index at which to split the matrix chain. Making this choice

leaves one or more subproblems to be solved.

2. You suppose that for a given problem, you are given the choice

that leads to an optimal solution. You do not concern yourself

yet with how to determine this choice. You just assume that it

has been given to you.

3. Given this choice, you determine which subproblems ensue and

how to best characterize the resulting space of subproblems.

4. You show that the solutions to the subproblems used within an

optimal solution to the problem must themselves be optimal by

using a “cut-and-paste” technique. You do so by supposing that

each of the subproblem solutions is not optimal and then

deriving a contradiction. In particular, by “cutting out” the

nonoptimal solution to each subproblem and “pasting in” the

optimal one, you show that you can get a better solution to the

original problem, thus contradicting your supposition that you

already had an optimal solution. If an optimal solution gives rise

to more than one subproblem, they are typically so similar that

you can modify the cut-and-paste argument for one to apply to

the others with little effort.

To characterize the space of subproblems, a good rule of thumb says

to try to keep the space as simple as possible and then expand it as

necessary. For example, the space of subproblems for the rod-cutting

problem contained the problems of optimally cutting up a rod of length

i for each size i. This subproblem space worked well, and it was not necessary to try a more general space of subproblems.

Conversely, suppose that you tried to constrain the subproblem

space for matrix-chain multiplication to matrix products of the form

A 1 A 2 ⋯ Aj. As before, an optimal parenthesization must split this product between Ak and Ak+1 for some 1 ≤ k < j. Unless you can guarantee that k always equals j − 1, you will find that you have subproblems of the form A 1 A 2 ⋯ Ak and Ak+1 Ak+2 ⋯ Aj. Moreover, the latter subproblem does not have the form A 1 A 2 ⋯ Aj. To solve this problem by dynamic programming, you need to allow the subproblems

to vary at “both ends.” That is, both i and j need to vary in the subproblem of parenthesizing the product AiAi+1 ⋯ Aj.

Optimal substructure varies across problem domains in two ways:

1. how many subproblems an optimal solution to the original

problem uses, and

2. how many choices you have in determining which subproblem(s)

to use in an optimal solution.

In the rod-cutting problem, an optimal solution for cutting up a rod of size n uses just one subproblem (of size ni), but we have to consider n choices for i in order to determine which one yields an optimal solution.

Matrix-chain multiplication for the subchain AiAi+1 ⋯ Aj serves an example with two subproblems and ji choices. For a given matrix Ak where the product splits, two subproblems arise—parenthesizing

AiAi+1 ⋯ Ak and parenthesizing Ak+1 Ak+2 ⋯ Aj—and we have to solve both of them optimally. Once we determine the optimal solutions

to subproblems, we choose from among ji candidates for the index k.

Informally, the running time of a dynamic-programming algorithm

depends on the product of two factors: the number of subproblems

overall and how many choices you look at for each subproblem. In rod

cutting, we had Θ( n) subproblems overall, and at most n choices to examine for each, yielding an O( n 2) running time. Matrix-chain multiplication had Θ( n 2) subproblems overall, and each had at most n

1 choices, giving an O( n 3) running time (actually, a Θ( n 3) running time, by Exercise 14.2-5).

Usually, the subproblem graph gives an alternative way to perform

the same analysis. Each vertex corresponds to a subproblem, and the

choices for a subproblem are the edges incident from that subproblem.

Recall that in rod cutting, the subproblem graph has n vertices and at

most n edges per vertex, yielding an O( n 2) running time. For matrix-chain multiplication, if you were to draw the subproblem graph, it

would have Θ( n 2) vertices and each vertex would have degree at most n

− 1, giving a total of O( n 3) vertices and edges.

Dynamic programming often uses optimal substructure in a bottom-

up fashion. That is, you first find optimal solutions to subproblems and,

having solved the subproblems, you find an optimal solution to the

problem. Finding an optimal solution to the problem entails making a

choice among subproblems as to which you will use in solving the

problem. The cost of the problem solution is usually the subproblem

costs plus a cost that is directly attributable to the choice itself. In rod

cutting, for example, first we solved the subproblems of determining

optimal ways to cut up rods of length i for i = 0, 1, …, n − 1, and then we determined which of these subproblems yielded an optimal solution

for a rod of length n, using equation (14.2). The cost attributable to the

choice itself is the term pi in equation (14.2). In matrix-chain

multiplication, we determined optimal parenthesizations of subchains

of AiAi+1 ⋯ Aj, and then we chose the matrix Ak at which to split the product. The cost attributable to the choice itself is the term pi−1 pk pj.

Chapter 15 explores “greedy algorithms,” which have many

similarities to dynamic programming. In particular, problems to which

greedy algorithms apply have optimal substructure. One major

difference between greedy algorithms and dynamic programming is that

instead of first finding optimal solutions to subproblems and then

making an informed choice, greedy algorithms first make a “greedy”

choice—the choice that looks best at the time—and then solve a

resulting subproblem, without bothering to solve all possible related

smaller subproblems. Surprisingly, in some cases this strategy works!

Subtleties

You should be careful not to assume that optimal substructure applies

when it does not. Consider the following two problems whose input

consists of a directed graph G = ( V, E) and vertices u, vV.

Unweighted shortest path:5 Find a path from u to v consisting of the fewest edges. Such a path must be simple, since removing a cycle from

a path produces a path with fewer edges.

Unweighted longest simple path: Find a simple path from u to v

consisting of the most edges. (Without the requirement that the path

must be simple, the problem is undefined, since repeatedly traversing a

cycle creates paths with an arbitrarily large number of edges.)

The unweighted shortest-path problem exhibits optimal

substructure. Here’s how. Suppose that uv, so that the problem is nontrivial. Then, any path p from u to v must contain an intermediate vertex, say w. (Note that w may be u or v.) Then, we can decompose the

Image 486

Image 487

Image 488

Image 489

Image 490

Image 491

Image 492

Image 493

path

into subpaths

. The number of edges in p equals

the number of edges in p 1 plus the number of edges in p 2. We claim that if p is an optimal (i.e., shortest) path from u to v, then p 1 must be a shortest path from u to w. Why? As suggested earlier, use a “cut-and-paste” argument: if there were another path, say , from u to w with

fewer edges than p 1, then we could cut out p 1 and paste in to produce a path

with fewer edges than p, thus contradicting p’s

optimality. Likewise, p 2 must be a shortest path from w to v. Thus, to find a shortest path from u to v, consider all intermediate vertices w, find a shortest path from u to w and a shortest path from w to v, and choose an intermediate vertex w that yields the overall shortest path.

Section 23.2 uses a variant of this observation of optimal substructure to find a shortest path between every pair of vertices on a weighted,

directed graph.

You might be tempted to assume that the problem of finding an

unweighted longest simple path exhibits optimal substructure as well.

After all, if we decompose a longest simple path

into subpaths

, then mustn’t p 1 be a longest simple path from u to w, and

mustn’t p 2 be a longest simple path from w to v? The answer is no!

Figure 14.6 supplies an example. Consider the path qrt, which is a longest simple path from q to t. Is qr a longest simple path from q to r? No, for the path qstr is a simple path that is longer. Is r

t a longest simple path from r to t? No again, for the path rqs

t is a simple path that is longer.

Figure 14.6 A directed graph showing that the problem of finding a longest simple path in an unweighted directed graph does not have optimal substructure. The path qrt is a longest simple path from q to t, but the subpath qr is not a longest simple path from q to r, nor is the subpath rt a longest simple path from r to t.

This example shows that for longest simple paths, not only does the

problem lack optimal substructure, but you cannot necessarily assemble

a “legal” solution to the problem from solutions to subproblems. If you

combine the longest simple paths qstr and rqst, you get the path qstrqst, which is not simple.

Indeed, the problem of finding an unweighted longest simple path does

not appear to have any sort of optimal substructure. No efficient

dynamic-programming algorithm for this problem has ever been found.

In fact, this problem is NP-complete, which—as we shall see in Chapter

34—means that we are unlikely to find a way to solve it in polynomial

time.

Why is the substructure of a longest simple path so different from

that of a shortest path? Although a solution to a problem for both

longest and shortest paths uses two subproblems, the subproblems in

finding the longest simple path are not independent, whereas for shortest

paths they are. What do we mean by subproblems being independent?

We mean that the solution to one subproblem does not affect the

solution to another subproblem of the same problem. For the example

of Figure 14.6, we have the problem of finding a longest simple path from q to t with two subproblems: finding longest simple paths from q to r and from r to t. For the first of these subproblems, we chose the path qstr, which used the vertices s and t. These vertices cannot appear in a solution to the second subproblem, since the

combination of the two solutions to subproblems yields a path that is

not simple. If vertex t cannot be in the solution to the second problem,

then there is no way to solve it, since t is required to be on the path that

forms the solution, and it is not the vertex where the subproblem

solutions are “spliced” together (that vertex being r). Because vertices s

and t appear in one subproblem solution, they cannot appear in the

other subproblem solution. One of them must be in the solution to the

other subproblem, however, and an optimal solution requires both.

Thus, we say that these subproblems are not independent. Looked at

another way, using resources in solving one subproblem (those resources

being vertices) renders them unavailable for the other subproblem.

Why, then, are the subproblems independent for finding a shortest

path? The answer is that by nature, the subproblems do not share

Image 494

Image 495

Image 496

Image 497

Image 498

resources. We claim that if a vertex w is on a shortest path p from u to v, then we can splice together any shortest path

and any shortest

path

to produce a shortest path from u to v. We are assured that,

other than w, no vertex can appear in both paths p 1 and p 2. Why?

Suppose that some vertex xw appears in both p 1 and p 2, so that we can decompose p 1 as

and p 2 as

. By the optimal

substructure of this problem, path p has as many edges as p 1 and p 2

together. Let’s say that p has e edges. Now let us construct a path from u to v. Because we have excised the paths from x to

w and from w to x, each of which contains at least one edge, path p

contains at most e − 2 edges, which contradicts the assumption that p is a shortest path. Thus, we are assured that the subproblems for the

shortest-path problem are independent.

The two problems examined in Sections 14.1 and 14.2 have independent subproblems. In matrix-chain multiplication, the

subproblems are multiplying subchains AiAi+1 ⋯ Ak and Ak+1 Ak+2

Aj. These subchains are disjoint, so that no matrix could possibly be

included in both of them. In rod cutting, to determine the best way to

cut up a rod of length n, we looked at the best ways of cutting up rods of

length i for i = 0, 1, …, n − 1. Because an optimal solution to the length-n problem includes just one of these subproblem solutions (after cutting

off the first piece), independence of subproblems is not an issue.

Overlapping subproblems

The second ingredient that an optimization problem must have for

dynamic programming to apply is that the space of subproblems must

be “small” in the sense that a recursive algorithm for the problem solves

the same subproblems over and over, rather than always generating new

subproblems. Typically, the total number of distinct subproblems is a

polynomial in the input size. When a recursive algorithm revisits the

same problem repeatedly, we say that the optimization problem has

overlapping subproblems. 6 In contrast, a problem for which a divide-and-conquer approach is suitable usually generates brand-new problems

Image 499

at each step of the recursion. Dynamic-programming algorithms

typically take advantage of overlapping subproblems by solving each

subproblem once and then storing the solution in a table where it can be

looked up when needed, using constant time per lookup.

Figure 14.7 The recursion tree for the computation of RECURSIVE-MATRIX-CHAIN( p, 1, 4). Each node contains the parameters i and j. The computations performed in a subtree shaded blue are replaced by a single table lookup in MEMOIZED-MATRIX-CHAIN.

In Section 14.1, we briefly examined how a recursive solution to rod cutting makes exponentially many calls to find solutions of smaller

subproblems. The dynamic-programming solution reduces the running

time from the exponential time of the recursive algorithm down to

quadratic time.

To illustrate the overlapping-subproblems property in greater detail,

let’s revisit the matrix-chain multiplication problem. Referring back to

Figure 14.5, observe that MATRIX-CHAIN-ORDER repeatedly looks

up the solution to subproblems in lower rows when solving subproblems

in higher rows. For example, it references entry m[3, 4] four times: during the computations of m[2, 4], m[1, 4], m[3, 5], and m[3, 6]. If the algorithm were to recompute m[3, 4] each time, rather than just looking

it up, the running time would increase dramatically. To see how,

consider the inefficient recursive procedure RECURSIVE-MATRIX-

CHAIN on the facing page, which determines m[ i, j], the minimum number of scalar multiplications needed to compute the matrix-chain

product Ai: j = AiAi+1 ⋯ Aj. The procedure is based directly on the

Image 500

Image 501

recurrence (14.7). Figure 14.7 shows the recursion tree produced by the call RECURSIVE-MATRIX-CHAIN( p, 1, 4). Each node is labeled by

the values of the parameters i and j. Observe that some pairs of values occur many times.

In fact, the time to compute m[1, n] by this recursive procedure is at

least exponential in n. To see why, let T( n) denote the time taken by RECURSIVE-MATRIX-CHAIN

to

compute

an

optimal

parenthesization of a chain of n matrices. Because the execution of lines

1–2 and of lines 6–7 each take at least unit time, as does the

multiplication in line 5, inspection of the procedure yields the recurrence

RECURSIVE-MATRIX-CHAIN( p, i, j)

1 if i == j

2

return 0

3 m[ i, j] = ∞

4 for k = i to j − 1

5

q = RECURSIVE-MATRIX-CHAIN( p, i, k)

+ RECURSIVE-MATRIX-CHAIN( p, k + 1, j)

+ pi−1 pk pj

6

if q < m[ i, j]

7

m[ i, j] = q

8 return m[ i, j]

Noting that for i = 1, 2, …, n − 1, each term T( i) appears once as T( k) and once as T( nk), and collecting the n − 1 1s in the summation together with the 1 out front, we can rewrite the recurrence as

Image 502

Let’s prove that T( n) = Ω(2 n) using the substitution method.

Specifically, we’ll show that T( n) ≥ 2 n−1 for all n ≥ 1. For the base case n

= 1, the summation is empty, and we get T(1) ≥ 1 = 20. Inductively, for n

≥ 2 we have

which completes the proof. Thus, the total amount of work performed

by the call RECURSIVE-MATRIX-CHAIN( p, 1, n) is at least

exponential in n.

Compare this top-down, recursive algorithm (without memoization)

with the bottom-up dynamic-programming algorithm. The latter is

more efficient because it takes advantage of the overlapping-

subproblems property. Matrix-chain multiplication has only Θ( n 2)

distinct subproblems, and the dynamic-programming algorithm solves

each exactly once. The recursive algorithm, on the other hand, must

solve each subproblem every time it reappears in the recursion tree.

Whenever a recursion tree for the natural recursive solution to a

problem contains the same subproblem repeatedly, and the total

number of distinct subproblems is small, dynamic programming can

improve efficiency, sometimes dramatically.

Reconstructing an optimal solution

As a practical matter, you’ll often want to store in a separate table

which choice you made in each subproblem so that you do not have to

reconstruct this information from the table of costs.

For matrix-chain multiplication, the table s[ i, j] saves a significant amount of work when we need to reconstruct an optimal solution.

Suppose that the MATRIX-CHAIN-ORDER procedure on page 378

did not maintain the s[ i, j] table, so that it filled in only the table m[ i, j]

containing optimal subproblem costs. The procedure chooses from

among ji possibilities when determining which subproblems to use in

an optimal solution to parenthesizing AiAi+1 ⋯ Aj, and ji is not a constant. Therefore, it would take Θ( ji) = ω(1) time to reconstruct which subproblems it chose for a solution to a given problem. Because

MATRIX-CHAIN-ORDER stores in s[ i, j] the index of the matrix at which it split the product AiAi+1 ⋯ Aj, the PRINT-OPTIMAL-PARENS procedure on page 381 can look up each choice in O(1) time.

Memoization

As we saw for the rod-cutting problem, there is an alternative approach

to dynamic programming that often offers the efficiency of the bottom-

up dynamic-programming approach while maintaining a top-down

strategy. The idea is to memoize the natural, but inefficient, recursive algorithm. As in the bottom-up approach, you maintain a table with

subproblem solutions, but the control structure for filling in the table is

more like the recursive algorithm.

A memoized recursive algorithm maintains an entry in a table for the

solution to each subproblem. Each table entry initially contains a

special value to indicate that the entry has yet to be filled in. When the

subproblem is first encountered as the recursive algorithm unfolds, its

solution is computed and then stored in the table. Each subsequent

encounter of this subproblem simply looks up the value stored in the

table and returns it. 7

The procedure MEMOIZED-MATRIX-CHAIN is a memoized

version of the procedure RECURSIVE-MATRIX-CHAIN on page

389. Note where it resembles the memoized top-down method on page

369 for the rod-cutting problem.

MEMOIZED-MATRIX-CHAIN( p, n)

1 let m[1 : n, 1 : n] be a new table

2 for i = 1 to n

3

for j = i to n

4

m[ i, j] = ∞

5 return LOOKUP-CHAIN( m, p, 1, n)

LOOKUP-CHAIN( m, p, i, j)

1 if m[ i, j] < ∞

2

return m[ i, j]

3 if i == j

4

m[ i, j] = 0

5 else for k = i to j − 1

6

q = LOOKUP-CHAIN( m, p, i, k)

+ LOOKUP-CHAIN( m, p, k + 1, j) + pi−1 pk pj 7

if q < m[ i, j]

8

m[ i, j] = q

9 return m[ i, j]

The MEMOIZED-MATRIX-CHAIN procedure, like the bottom-

up MATRIX-CHAIN-ORDER procedure on page 378, maintains a

table m[1 : n, 1 : n] of computed values of m[ i, j], the minimum number of scalar multiplications needed to compute the matrix Ai: j. Each table entry initially contains the value ∞ to indicate that the entry has yet to

be filled in. Upon calling LOOKUP-CHAIN( m, p, i, j), if line 1 finds that m[ i, j] < ∞, then the procedure simply returns the previously computed cost m[ i, j] in line 2. Otherwise, the cost is computed as in RECURSIVE-MATRIX-CHAIN, stored in m[ i, j], and returned. Thus,

LOOKUP-CHAIN( m, p, i, j) always returns the value of m[ i, j], but it computes it only upon the first call of LOOKUP-CHAIN with these

specific values of i and j. Figure 14.7 illustrates how MEMOIZED-MATRIX-CHAIN saves time compared with RECURSIVE-MATRIX-

CHAIN. Subtrees shaded blue represent values that are looked up

rather than recomputed.

Like the bottom-up procedure MATRIX-CHAIN-ORDER, the

memoized procedure MEMOIZED-MATRIX-CHAIN runs in O( n 3)

time. To begin with, line 4 of MEMOIZED-MATRIX-CHAIN executes

Θ( n 2) times, which dominates the running time outside of the call to

LOOKUP-CHAIN in line 5. We can categorize the calls of LOOKUP-

CHAIN into two types:

1. calls in which m[ i, j] = ∞, so that lines 3–9 execute, and

2. calls in which m[ i, j] < ∞, so that LOOKUP-CHAIN simply returns in line 2.

There are Θ( n 2) calls of the first type, one per table entry. All calls of the second type are made as recursive calls by calls of the first type.

Whenever a given call of LOOKUP-CHAIN makes recursive calls, it

makes O( n) of them. Therefore, there are O( n 3) calls of the second type in all. Each call of the second type takes O(1) time, and each call of the

first type takes O( n) time plus the time spent in its recursive calls. The total time, therefore, is O( n 3). Memoization thus turns an Ω(2 n)-time algorithm into an O( n 3)-time algorithm.

We have seen how to solve the matrix-chain multiplication problem

by either a top-down, memoized dynamic-programming algorithm or a

bottom-up dynamic-programming algorithm in O( n 3) time. Both the bottom-up and memoized methods take advantage of the overlapping-subproblems property. There are only Θ( n 2) distinct subproblems in

total, and either of these methods computes the solution to each

subproblem only once. Without memoization, the natural recursive

algorithm runs in exponential time, since solved subproblems are

repeatedly solved.

In general practice, if all subproblems must be solved at least once, a

bottom-up dynamic-programming algorithm usually outperforms the

corresponding top-down memoized algorithm by a constant factor,

because the bottom-up algorithm has no overhead for recursion and

less overhead for maintaining the table. Moreover, for some problems

you can exploit the regular pattern of table accesses in the dynamic-

programming algorithm to reduce time or space requirements even

further. On the other hand, in certain situations, some of the

subproblems in the subproblem space might not need to be solved at all.

In that case, the memoized solution has the advantage of solving only

those subproblems that are definitely required.

Exercises

14.3-1

Which is a more efficient way to determine the optimal number of

multiplications in a matrix-chain multiplication problem: enumerating

all the ways of parenthesizing the product and computing the number of

multiplications for each, or running RECURSIVE-MATRIX-CHAIN?

Justify your answer.

14.3-2

Draw the recursion tree for the MERGE-SORT procedure from Section

2.3.1 on an array of 16 elements. Explain why memoization fails to

speed up a good divide-and-conquer algorithm such as MERGE-

SORT.

14.3-3

Consider the antithetical variant of the matrix-chain multiplication

problem where the goal is to parenthesize the sequence of matrices so as

to maximize, rather than minimize, the number of scalar multiplications.

Does this problem exhibit optimal substructure?

14.3-4

As stated, in dynamic programming, you first solve the subproblems

and then choose which of them to use in an optimal solution to the

problem. Professor Capulet claims that she does not always need to

solve all the subproblems in order to find an optimal solution. She

suggests that she can find an optimal solution to the matrix-chain

multiplication problem by always choosing the matrix Ak at which to

split the subproduct AiAi+1 ⋯ Aj (by selecting k to minimize the quantity pi−1 pk pj) before solving the subproblems. Find an instance of the matrix-chain multiplication problem for which this greedy approach

yields a suboptimal solution.

14.3-5

Suppose that the rod-cutting problem of Section 14.1 also had a limit li on the number of pieces of length i allowed to be produced, for i = 1, 2,

…, n. Show that the optimal-substructure property described in Section

14.1 no longer holds.

14.4 Longest common subsequence

Biological applications often need to compare the DNA of two (or

more) different organisms. A strand of DNA consists of a string of

molecules called bases, where the possible bases are adenine, cytosine,

guanine, and thymine. Representing each of these bases by its initial

letter, we can express a strand of DNA as a string over the 4-element set

{A, C, G, T}. (See Section C.1 for the definition of a string.) For example, the DNA of one organism may be S 1 =

ACCGGTCGAGTGCGCGGAAGCCGGCCGAA, and the DNA of another

organism may be S 2 = GTCGTTCGGAATGCCGTTGCTCTGTAAA. One

reason to compare two strands of DNA is to determine how “similar”

the two strands are, as some measure of how closely related the two

organisms are. We can, and do, define similarity in many different ways.

For example, we can say that two DNA strands are similar if one is a

substring of the other. (Chapter 32 explores algorithms to solve this problem.) In our example, neither S 1 nor S 2 is a substring of the other.

Alternatively, we could say that two strands are similar if the number of

changes needed to turn one into the other is small. (Problem 14-5 looks

at this notion.) Yet another way to measure the similarity of strands S 1

and S 2 is by finding a third strand S 3 in which the bases in S 3 appear in each of S 1 and S 2. These bases must appear in the same order, but not necessarily consecutively. The longer the strand S 3 we can find, the more similar S 1 and S 2 are. In our example, the longest strand S 3 is GTCGTCGGAAGCCGGCCGAA.

We formalize this last notion of similarity as the longest-common-

subsequence problem. A subsequence of a given sequence is just the

given sequence with 0 or more elements left out. Formally, given a

Image 503

sequence X = 〈 x 1, x 2, …, xm〉, another sequence Z = 〈 z 1, z 2, …, zk〉 is a subsequence of X if there exists a strictly increasing sequence 〈 i 1, i 2, …, ik〉 of indices of X such that for all j = 1, 2, …, k, we have

. For

example, Z = 〈 B, C, D, B〉 is a subsequence of X = 〈 A, B, C, B, D, A, B

with corresponding index sequence 〈2, 3, 5, 7〉.

Given two sequences X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y. For example, if X = 〈 A, B, C, B, D, A, B〉 and Y = 〈 B, D, C, A, B, A〉, the sequence 〈 B, C, A〉 is a common subsequence of both X and Y. The sequence 〈 B, C, A〉 is not a longest common subsequence ( LCS) of X

and Y, however, since it has length 3 and the sequence 〈 B, C, B, A〉, which is also common to both sequences X and Y, has length 4. The sequence 〈 B, C, B, A〉 is an LCS of X and Y, as is the sequence 〈 B, D, A, B〉, since X and Y have no common subsequence of length 5 or greater.

In the longest-common-subsequence problem, the input is two

sequences X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉, and the goal is to find a maximum-length common subsequence of X and Y. This

section shows how to efficiently solve the LCS problem using dynamic

programming.

Step 1: Characterizing a longest common subsequence

You can solve the LCS problem with a brute-force approach: enumerate

all subsequences of X and check each subsequence to see whether it is

also a subsequence of Y, keeping track of the longest subsequence you

find. Each subsequence of X corresponds to a subset of the indices {1,

2, …, m} of X. Because X has 2 m subsequences, this approach requires exponential time, making it impractical for long sequences.

The LCS problem has an optimal-substructure property, however, as

the following theorem shows. As we’ll see, the natural classes of

subproblems correspond to pairs of “prefixes” of the two input

sequences. To be precise, given a sequence X = 〈 x 1, x 2, …, xm〉, we define the i th prefix of X, for i = 0, 1, …, m, as Xi = 〈 x 1, x 2, …, xi〉. For

example, if X = 〈 A, B, C, B, D, A, B〉, then X 4 = 〈 A, B, C, B〉 and X 0 is the empty sequence.

Theorem 14.1 (Optimal substructure of an LCS)

Let X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 be sequences, and let Z

= 〈 z 1, z 2, …, zk〉 be any LCS of X and Y.

1. If xm = yn, then zk = xm = yn and Zk−1 is an LCS of Xm−1

and Yn−1.

2. If xmyn and zkxm, then Z is an LCS of Xm−1 and Y.

3. If xmyn and zkyn, then Z is an LCS of X and Yn−1.

Proof (1) If zkxm, then we could append xm = yn to Z to obtain a common subsequence of X and Y of length k + 1, contradicting the supposition that Z is a longest common subsequence of X and Y. Thus, we must have zk = xm = yn. Now, the prefix Zk−1 is a length-( k − 1) common subsequence of Xm−1 and Yn−1. We wish to show that it is an

LCS. Suppose for the purpose of contradiction that there exists a

common subsequence W of Xm−1 and Yn−1 with length greater than k

− 1. Then, appending xm = yn to W produces a common subsequence of X and Y whose length is greater than k, which is a contradiction.

(2) If zkxm, then Z is a common subsequence of Xm−1 and Y. If there were a common subsequence W of Xm−1 and Y with length greater than k, then W would also be a common subsequence of Xm and Y, contradicting the assumption that Z is an LCS of X and Y.

(3) The proof is symmetric to (2).

The way that Theorem 14.1 characterizes longest common

subsequences says that an LCS of two sequences contains within it an

LCS of prefixes of the two sequences. Thus, the LCS problem has an

optimal-substructure property. A recursive solution also has the

overlapping-subproblems property, as we’ll see in a moment.

Image 504

Step 2: A recursive solution

Theorem 14.1 implies that you should examine either one or two

subproblems when finding an LCS of X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉. If xm = yn, you need to find an LCS of Xm−1 and Yn−1.

Appending xm = yn to this LCS yields an LCS of X and Y. If xmyn, then you have to solve two subproblems: finding an LCS of Xm−1 and

Y and finding an LCS of X and Yn−1. Whichever of these two LCSs is longer is an LCS of X and Y. Because these cases exhaust all possibilities, one of the optimal subproblem solutions must appear

within an LCS of X and Y.

The LCS problem has the overlapping-subproblems property. Here’s

how. To find an LCS of X and Y, you might need to find the LCSs of X

and Yn−1 and of Xm−1 and Y. But each of these subproblems has the subsubproblem of finding an LCS of Xm−1 and Yn−1. Many other

subproblems share subsubproblems.

As in the matrix-chain multiplication problem, solving the LCS

problem recursively involves establishing a recurrence for the value of an

optimal solution. Let’s define c[ i, j] to be the length of an LCS of the sequences Xi and Yj. If either i = 0 or j = 0, one of the sequences has length 0, and so the LCS has length 0. The optimal substructure of the

LCS problem gives the recursive formula

In this recursive formulation, a condition in the problem restricts

which subproblems to consider. When xi = yj, you can and should consider the subproblem of finding an LCS of Xi−1 and Yj−1.

Otherwise, you instead consider the two subproblems of finding an LCS

of Xi and Yj−1 and of Xi−1 and Yj. In the previous dynamic-programming algorithms we have examined—for rod cutting and

matrix-chain multiplication—we didn’t rule out any subproblems due to

conditions in the problem. Finding an LCS is not the only dynamic-programming algorithm that rules out subproblems based on conditions

in the problem. For example, the edit-distance problem (see Problem 14-

5) has this characteristic.

Step 3: Computing the length of an LCS

Based on equation (14.9), you could write an exponential-time recursive

algorithm to compute the length of an LCS of two sequences. Since the

LCS problem has only Θ( mn) distinct subproblems (computing c[ i, j] for 0 ≤ im and 0 ≤ jn), dynamic programming can compute the solutions bottom up.

The procedure LCS-LENGTH on the next page takes two sequences

X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 as inputs, along with their lengths. It stores the c[ i, j] values in a table c[0 : m, 0 : n], and it computes the entries in row-major order. That is, the procedure fills in

the first row of c from left to right, then the second row, and so on. The

procedure also maintains the table b[1 : m, 1 : n] to help in constructing an optimal solution. Intuitively, b[ i, j] points to the table entry corresponding to the optimal subproblem solution chosen when

computing c[ i, j]. The procedure returns the b and c tables, where c[ m, n]

contains the length of an LCS of X and Y. Figure 14.8 shows the tables produced by LCS-LENGTH on the sequences X = 〈 A, B, C, B, D, A, B

and Y = 〈 B, D, C, A, B, A〉. The running time of the procedure is Θ( mn), since each table entry takes Θ(1) time to compute.

LCS-LENGTH( X, Y, m, n)

1 let b[1 : m, 1 : n] and c[0 : m, 0 : n] be new tables 2 for i = 1 to m

3

c[ i, 0] = 0

4 for j = 0 to n

5

c[0, j] = 0

6 for i = 1 to m

// compute table entries in row-major order

7

for j = 1 to n

8

if xi == yj

9

c[ i, j] = c[ i − 1, j − 1] + 1

10

b[ i, j] = “↖”

11

elseif c[ i − 1, j] ≥ c[ i, j − 1]

12

c[ i, j] = c[ i − 1, j]

13

b[ i, j] = “↑”

14

else c[ i, j] = c[ i, j − 1]

15

b[ i, j] = “←”

16 return c and b

PRINT-LCS( b, X, i, j)

1 if i == 0 or j == 0

2

return

// the LCS has length 0

3 if b[ i, j] == “↖”

4

PRINT-LCS( b, X, i − 1, j − 1)

5

print xi

// same as yj

6 elseif b[ i, j] == “↑”

7

PRINT-LCS( b, X, i − 1, j)

8 else PRINT-LCS( b, X, i, j − 1)

Step 4: Constructing an LCS

With the b table returned by LCS-LENGTH, you can quickly construct

an LCS of X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉. Begin at b[ m, n]

and trace through the table by following the arrows. Each “↖”

encountered in an entry b[ i, j] implies that xi = yj is an element of the LCS that LCS-LENGTH found. This method gives you the elements of

this LCS in reverse order. The recursive procedure PRINT-LCS prints

out an LCS of X and Y in the proper, forward order.

Image 505

Figure 14.8 The c and b tables computed by LCS-LENGTH on the sequences X = 〈 A, B, C, B, D, A, B〉 and Y = 〈 B, D, C, A, B, A〉. The square in row i and column j contains the value of c[ i, j] and the appropriate arrow for the value of b[ i, j]. The entry 4 in c[7, 6]—the lower right-hand corner of the table—is the length of an LCS 〈 B, C, B, A〉 of X and Y. For i, j > 0, entry c[ i, j]

depends only on whether xi = yj and the values in entries c[ i − 1, j], c[ i, j − 1], and c[ i − 1, j − 1], which are computed before c[ i, j]. To reconstruct the elements of an LCS, follow the b[ i, j] arrows from the lower right-hand corner, as shown by the sequence shaded blue. Each “↖” on the shaded-blue sequence corresponds to an entry (highlighted) for which xi = yj is a member of an LCS.

The initial call is PRINT-LCS( b, X, m, n). For the b table in Figure

14.8, this procedure prints BCBA. The procedure takes O( m + n) time,

since it decrements at least one of i and j in each recursive call.

Improving the code

Once you have developed an algorithm, you will often find that you can

improve on the time or space it uses. Some changes can simplify the

code and improve constant factors but otherwise yield no asymptotic

improvement in performance. Others can yield substantial asymptotic

savings in time and space.

In the LCS algorithm, for example, you can eliminate the b table

altogether. Each c[ i, j] entry depends on only three other c table entries: c[ i − 1, j − 1], c[ i − 1, j], and c[ i, j − 1]. Given the value of c[ i, j], you can

determine in O(1) time which of these three values was used to compute c[ i, j], without inspecting table b. Thus, you can reconstruct an LCS in O( m+ n) time using a procedure similar to PRINT-LCS. (Exercise 14.4-2

asks you to give the pseudocode.) Although this method saves Θ( mn)

space, the auxiliary space requirement for computing an LCS does not

asymptotically decrease, since the c table takes Θ( mn) space anyway.

You can, however, reduce the asymptotic space requirements for

LCS-LENGTH, since it needs only two rows of table c at a time: the

row being computed and the previous row. (In fact, as Exercise 14.4-4

asks you to show, you can use only slightly more than the space for one

row of c to compute the length of an LCS.) This improvement works if

you need only the length of an LCS. If you need to reconstruct the

elements of an LCS, the smaller table does not keep enough information

to retrace the algorithm’s steps in O( m + n) time.

Exercises

14.4-1

Determine an LCS of 〈1, 0, 0, 1, 0, 1, 0, 1〉 and 〈0, 1, 0, 1, 1, 0, 1, 1, 0〉.

14.4-2

Give pseudocode to reconstruct an LCS from the completed c table and

the original sequences X = 〈 x 1, x 2, …, xm〉 and Y = 〈 y 1, y 2, …, yn〉 in O( m + n) time, without using the b table.

14.4-3

Give a memoized version of LCS-LENGTH that runs in O( mn) time.

14.4-4

Show how to compute the length of an LCS using only 2 · min { m, n}

entries in the c table plus O(1) additional space. Then show how to do

the same thing, but using min { m, n} entries plus O(1) additional space.

14.4-5

Give an O( n 2)-time algorithm to find the longest monotonically increasing subsequence of a sequence of n numbers.

14.4-6

Give an O( n lg n)-time algorithm to find the longest monotonically increasing subsequence of a sequence of n numbers. ( Hint: The last element of a candidate subsequence of length i is at least as large as the

last element of a candidate subsequence of length i −1. Maintain

candidate subsequences by linking them through the input sequence.)

14.5 Optimal binary search trees

Suppose that you are designing a program to translate text from English

to Latvian. For each occurrence of each English word in the text, you

need to look up its Latvian equivalent. You can perform these lookup

operations by building a binary search tree with n English words as keys

and their Latvian equivalents as satellite data. Because you will search

the tree for each individual word in the text, you want the total time

spent searching to be as low as possible. You can ensure an O(lg n) search time per occurrence by using a red-black tree or any other

balanced binary search tree. Words appear with different frequencies,

however, and a frequently used word such as the can end up appearing

far from the root while a rarely used word such as naumachia appears

near the root. Such an organization would slow down the translation,

since the number of nodes visited when searching for a key in a binary

search tree equals 1 plus the depth of the node containing the key. You

want words that occur frequently in the text to be placed nearer the

root. 8 Moreover, some words in the text might have no Latvian translation,9 and such words would not appear in the binary search tree at all. How can you organize a binary search tree so as to minimize the

number of nodes visited in all searches, given that you know how often

each word occurs?

What you need is an optimal binary search tree. Formally, given a

sequence K = 〈 k 1, k 2, …, kn〉 of n distinct keys such that k 1 < k 2 < … < kn, build a binary search tree containing them. For each key ki, you are given the probability pi that any given search is for key ki. Since some searches may be for values not in K, you also have n + 1 “dummy” keys

Image 506

d 0, d 1, d 2, …, dn representing those values. In particular, d 0 represents all values less than k 1, dn represents all values greater than kn, and for i

= 1, 2, …, n − 1, the dummy key di represents all values between ki and ki+1. For each dummy key di, you have the probability qi that a search corresponds to di. Figure 14.9 shows two binary search trees for a set of n = 5 keys. Each key ki is an internal node, and each dummy key di is a leaf. Since every search is either successful (finding some key ki) or unsuccessful (finding some dummy key di), we have

Image 507

Figure 14.9 Two binary search trees for a set of n = 5 keys with the following probabilities: i

0

1

2

3

4

5

pi

0.15

0.10

0.05

0.10

0.20

qi

0.05

0.10

0.05

0.05

0.05

0.10

(a) A binary search tree with expected search cost 2.80. (b) A binary search tree with expected search cost 2.75. This tree is optimal.

Knowing the probabilities of searches for each key and each dummy

key allows us to determine the expected cost of a search in a given

binary search tree T. Let us assume that the actual cost of a search equals the number of nodes examined, which is the depth of the node

found by the search in T, plus 1. Then the expected cost of a search in T

is

Image 508

where depth T denotes a node’s depth in the tree T. The last equation

follows from equation (14.10). Figure 14.9 shows how to calculate the expected search cost node by node.

For a given set of probabilities, your goal is to construct a binary

search tree whose expected search cost is smallest. We call such a tree an

optimal binary search tree. Figure 14.9(a) shows one binary search tree, with expected cost 2.80, for the probabilities given in the figure caption.

Part (b) of the figure displays an optimal binary search tree, with

expected cost 2.75. This example demonstrates that an optimal binary

search tree is not necessarily a tree whose overall height is smallest. Nor

does an optimal binary search tree always have the key with the greatest

probability at the root. Here, key k 5 has the greatest search probability

of any key, yet the root of the optimal binary search tree shown is k 2.

(The lowest expected cost of any binary search tree with k 5 at the root is

2.85.)

As with matrix-chain multiplication, exhaustive checking of all

possibilities fails to yield an efficient algorithm. You can label the nodes

of any n-node binary tree with the keys k 1, k 2, …, kn to construct a binary search tree, and then add in the dummy keys as leaves. In

Problem 12-4 on page 329, we saw that the number of binary trees with

n nodes is Ω(4 n/ n 3/2). Thus you would need to examine an exponential number of binary search trees to perform an exhaustive search. We’ll see

how to solve this problem more efficiently with dynamic programming.

Step 1: The structure of an optimal binary search tree

To characterize the optimal substructure of optimal binary search trees,

we start with an observation about subtrees. Consider any subtree of a

binary search tree. It must contain keys in a contiguous range ki, …, kj,

for some 1 ≤ ijn. In addition, a subtree that contains keys ki, …, kj must also have as its leaves the dummy keys di−1, …, dj.

Now we can state the optimal substructure: if an optimal binary

search tree T has a subtree T′ containing keys ki, …, kj, then this subtree T′ must be optimal as well for the subproblem with keys ki, …,

kj and dummy keys di−1, …, dj. The usual cut-and-paste argument applies. If there were a subtree T″ whose expected cost is lower than that

of T′, then cutting T′ out of T and pasting in T″ would result in a binary search tree of lower expected cost than T, thus contradicting the

optimality of T.

With the optimal substructure in hand, here is how to construct an

optimal solution to the problem from optimal solutions to subproblems.

Given keys ki, …, kj, one of these keys, say kr ( irj), is the root of an optimal subtree containing these keys. The left subtree of the root kr

contains the keys ki, …, kr−1 (and dummy keys di−1, …, dr−1), and the right subtree contains the keys kr+1, …, kj (and dummy keys dr, …, dj).

As long as you examine all candidate roots kr, where irj, and you determine all optimal binary search trees containing ki, …, kr−1 and those containing kr+1, …, kj, you are guaranteed to find an optimal binary search tree.

There is one technical detail worth understanding about “empty”

subtrees. Suppose that in a subtree with keys ki, …, kj, you select ki as the root. By the above argument, ki’s left subtree contains the keys ki,