The question remains of how to choose the first derivatives of f( x) at the knots. One method is to require the second derivatives to be continuous

at the knots:

for i = 0, 1, … , n−2. At the first and last knots, assume that and

. These assumptions make f( x) a

natural cubic spline.

b. Use the continuity constraints on the second derivative to show that

for i = 1, 2, … , n − 1,

c. Show that

Image 1032

d. Rewrite equations (28.23)–(28.25) as a matrix equation involving the

vector D = ( D 0 D 1 D 2 ⋯ Dn)T of unknowns. What attributes does the matrix in your equation have?

e. Argue that a natural cubic spline can interpolate a set of n + 1 point-value pairs in O( n) time (see Problem 28-1).

f. Show how to determine a natural cubic spline that interpolates a set

of n + 1 points ( xi, yi) satisfying x 0 < x 1 < ⋯ < xn, even when xi is not necessarily equal to i. What matrix equation must your method

solve, and how quickly does your algorithm run?

Chapter notes

Many excellent texts describe numerical and scientific computation in

much greater detail than we have room for here. The following are

especially readable: George and Liu [180], Golub and Van Loan [192], Press, Teukolsky, Vetterling, and Flannery [365, 366], and Strang [422,

423].

Golub and Van Loan [192] discuss numerical stability. They show why det( A) is not necessarily a good indicator of the stability of a matrix

A,

proposing

instead

to

use

A∥∞ ∥ A−1∥∞, where

. They also address the question of how to

compute this value without actually computing A−1.

Gaussian elimination, upon which the LU and LUP decompositions

are based, was the first systematic method for solving linear systems of

equations. It was also one of the earliest numerical algorithms.

Although it was known earlier, its discovery is commonly attributed to

C. F. Gauss (1777–1855). In his famous paper [424], Strassen showed that an n× n matrix can be inverted in O( n lg 7) time. Winograd [460]

originally proved that matrix multiplication is no harder than matrix

inversion, and the converse is due to Aho, Hopcroft, and Ullman [5].

Another important matrix decomposition is the singular value

decomposition, or SVD. The SVD factors an m × n matrix A into

Image 1033

, where Σ is an m× n matrix with nonzero values only on the

diagonal, Q 1 is m× m with mutually orthonormal columns, and Q 2 is n

× n, also with mutually orthonormal columns. Two vectors are

orthonormal if their inner product is 0 and each vector has a norm of 1.

The books by Strang [422, 423] and Golub and Van Loan [192] contain good treatments of the SVD.

Strang [423] has an excellent presentation of symmetric positive-definite matrices and of linear algebra in general.

1 The year in which Introduction to Algorithms was first published.

29 Linear Programming

Many problems take the form of maximizing or minimizing an

objective, given limited resources and competing constraints. If you can

specify the objective as a linear function of certain variables, and if you

can specify the constraints on resources as equalities or inequalities on

those variables, then you have a linear-programming problem. Linear

programs arise in a variety of practical applications. We begin by

studying an application in electoral politics.

A political problem

Suppose that you are a politician trying to win an election. Your district

has three different types of areas—urban, suburban, and rural. These

areas have, respectively, 100,000, 200,000, and 50,000 registered voters.

Although not all the registered voters actually go to the polls, you

decide that to govern effectively, you would like at least half the

registered voters in each of the three regions to vote for you. You are

honorable and would never consider supporting policies you don’t

believe in. You realize, however, that certain issues may be more

effective in winning votes in certain places. Your primary issues are

preparing for a zombie apocalypse, equipping sharks with lasers,

building highways for flying cars, and allowing dolphins to vote.

According to your campaign staff’s research, you can estimate how

many votes you win or lose from each population segment by spending

$1,000 on advertising on each issue. This information appears in the

table of Figure 29.1. In this table, each entry indicates the number of

Image 1034

thousands of either urban, suburban, or rural voters who would be won

over by spending $1,000 on advertising in support of a particular issue.

Negative entries denote votes that would be lost. Your task is to figure

out the minimum amount of money that you need to spend in order to

win 50,000 urban votes, 100,000 suburban votes, and 25,000 rural votes.

You could, by trial and error, devise a strategy that wins the required

number of votes, but the strategy you come up with might not be the

least expensive one. For example, you could devote $20,000 of

advertising to preparing for a zombie apocalypse, $0 to equipping

sharks with lasers, $4,000 to building highways for flying cars, and

$9,000 to allowing dolphins to vote. In this case, you would win (20 ·

−2) + (0 · 8) + (4 · 0) + (9 · 10) = 50 thousand urban votes, (20 · 5) + (0 ·

2) + (4 · 0) + (9 · 0) = 100 thousand suburban votes, and (20 · 3) + (0 ·

−5) + (4 · 10) + (9 · −2) = 82 thousand rural votes. You would win the

exact number of votes desired in the urban and suburban areas and

more than enough votes in the rural area. (In fact, according to your

model, in the rural area you would receive more votes than there are

voters.) In order to garner these votes, you would have paid for 20 + 0 +

4 + 9 = 33 thousand dollars of advertising.

Figure 29.1 The effects of policies on voters. Each entry describes the number of thousands of urban, suburban, or rural voters who could be won over by spending $1,000 on advertising support of a policy on a particular issue. Negative entries denote votes that would be lost.

It’s natural to wonder whether this strategy is the best possible. That

is, can you achieve your goals while spending less on advertising?

Additional trial and error might help you to answer this question, but a

better approach is to formulate (or model) this question mathematically.

The first step is to decide what decisions you have to make and to

introduce variables that capture these decisions. Since you have four

decisions, you introduce four decision variables:

Image 1035

Image 1036

Image 1037

Image 1038

Image 1039

x 1 is the number of thousands of dollars spent on advertising on

preparing for a zombie apocalypse,

x 2 is the number of thousands of dollars spent on advertising on

equipping sharks with lasers,

x 3 is the number of thousands of dollars spent on advertising on

building highways for flying cars, and

x 4 is the number of thousands of dollars spent on advertising on

allowing dolphins to vote.

You then think about constraints, which are limits, or restrictions, on the

values that the decision variables can take. You can write the

requirement that you win at least 50,000 urban votes as

Similarly, you can write the requirements that you win at least 100,000

suburban votes and 25,000 rural votes as

and

Any setting of the variables x 1, x 2, x 3, x 4 that satisfies inequalities (29.1)–(29.3) yields a strategy that wins a sufficient number of each type

of vote.

Finally, you think about your objective, which is the quantity that

you wish to either minimize or maximize. In order to keep costs as small

as possible, you would like to minimize the amount spent on

advertising. That is, you want to minimize the expression

Although negative advertising often occurs in political campaigns, there

is no such thing as negative-cost advertising. Consequently, you require

that

Image 1040

Image 1041

Combining inequalities (29.1)–(29.3) and (29.5) with the objective of

minimizing (29.4) produces what is known as a “linear program.” We

can format this problem tabularly as

subject to

The solution to this linear program yields your optimal strategy.

The remainder of this chapter covers how to formulate linear

programs and is an introduction to modeling in general. Modeling

refers to the general process of converting a problem into a

mathematical form amenable to solution by an algorithm. Section 29.1

discusses briefly the algorithmic aspects of linear programming,

although it does not include the details of a linear-programming

algorithm. Throughout this book, we have seen ways to model

problems, such as by shortest paths and connectivity in a graph. When

modeling a problem as a linear program, you go through the steps used

in this political example—identifying the decision variables, specifying

the constraints, and formulating the objective function. In order to

model a problem as a linear program, the constraints and objectives

must be linear. In Section 29.2, we will see several other examples of modeling via linear programs. Section 29.3 discusses duality, an important concept in linear programming and other optimization

algorithms.

29.1 Linear programming formulations and algorithms

Linear programs take a particular form, which we will examine in this

section. Multiple algorithms have been developed to solve linear

programs. Some run in polynomial time, some do not, but they are all

too complicated to show here. Instead, we will give an example that

Image 1042

Image 1043

demonstrates some ideas behind the simplex algorithm, which is

currently the most commonly deployed solution method.

General linear programs

In the general linear-programming problem, we wish to optimize a

linear function subject to a set of linear inequalities. Given a set of real

numbers a 1, a 2, … , an and a set of variables x 1, x 2, … , xn, we define a linear function f on those variables by

If b is a real number and f is a linear function, then the equation

f( x 1, x 2, … , xn) = b

is a linear equality and the inequalities

f( x 1, x 2, … , xn) ≤ b and f( x 1, x 2, … , xn) ≥ b are linear inequalities. We use the general term linear constraints to denote either linear equalities or linear inequalities. Linear

programming does not allow strict inequalities. Formally, a linear-

programming problem is the problem of either minimizing or

maximizing a linear function subject to a finite set of linear constraints.

If minimizing, we call the linear program a minimization linear program,

and if maximizing, we call the linear program a maximization linear

program.

In order to discuss linear-programming algorithms and properties, it

will be helpful to use a standard notation for the input. By convention,

a maximization linear program takes as input n real numbers c 1, c 2, … , cn; m real numbers b 1, b 2, … , bm; and mn real numbers aij for i = 1, 2,

… , m and j = 1, 2, … , n.

The goal is to find n real numbers x 1, x 2, … , xn that

Image 1044

Image 1045

Image 1046

subject to

We call expression (29.11) the objective function and the n + m inequalities in lines (29.12) and (29.13) the constraints. The n constraints in line (29.13) are the nonnegativity constraints. It can sometimes be more convenient to express a linear program in a more compact form. If

we create an m × n matrix A = ( aij), an m-vector b = ( bi), an n-vector c

= ( cj), and an n-vector x = ( xj), then we can rewrite the linear program defined in (29.11)–(29.13) as

subject to

In line (29.14), c T x is the inner product of two n-vectors. In inequality (29.15), Ax is the m-vector that is the product of an m × n matrix and an n-vector, and in inequality (29.16), x ≥ 0 means that each entry of the vector x must be nonnegative. We call this representation the standard

form for a linear program, and we adopt the convention that A, b, and c always have the dimensions given above.

The standard form above may not naturally correspond to real-life

situations you are trying to model. For example, you might have

equality constraints or variables that can take on negative values.

Exercises 29.1-6 and 29.1-7 ask you to show how to convert any linear

program into this standard form.

We now introduce terminology to describe solutions to linear

programs. We denote a particular setting of the values in a variable, say

x, by putting a bar over the variable name: x. If x satisfies all the constraints, then it is a feasible solution, but if it fails to satisfy at least one constraint, then it is an infeasible solution. We say that a solution x has objective value c T x. A feasible solution x whose objective value is maximum over all feasible solutions is an optimal solution, and we call

Image 1047

Image 1048

its objective value c T x the optimal objective value. If a linear program has no feasible solutions, we say that the linear program is infeasible, and otherwise, it is feasible. The set of points that satisfy all the constraints is the feasible region. If a linear program has some feasible

solutions but does not have a finite optimal objective value, then the

feasible region is unbounded and so is the linear program. Exercise 29.1-

5 asks you to show that a linear program can have a finite optimal

objective value even if the feasible region is unbounded.

One of the reasons for the power and popularity of linear

programming is that linear programs can, in general, be solved

efficiently. There are two classes of algorithms, known as ellipsoid

algorithms and interior-point algorithms, that solve linear programs in

polynomial time. In addition, the simplex algorithm is widely used.

Although it does not run in polynomial time in the worst case, it tends

to perform well in practice.

We will not give a detailed algorithm for linear programming, but

will discuss a few important ideas. First, we will give an example of

using a geometric procedure to solve a two-variable linear program.

Although this example does not immediately generalize to an efficient

algorithm for larger problems, it introduces some important concepts

for linear programming and for optimization in general.

A two-variable linear program

Let us first consider the following linear program with two variables:

subject to

Figure 29.2(a) graphs the constraints in the ( x 1, x 2)-Cartesian coordinate system. The feasible region in the two-dimensional space

(highlighted in blue in the figure) is convex. 1 Conceptually, you could

evaluate the objective function x 1 + x 2 at each point in the feasible region, and then identify a point that has the maximum objective value

as an optimal solution. For this example (and for most linear

programs), however, the feasible region contains an infinite number of

points, and so to solve this linear program, you need an efficient way to

find a point that achieves the maximum objective value without

explicitly evaluating the objective function at every point in the feasible

region.

In two dimensions, you can optimize via a graphical procedure. The

set of points for which x 1 + x 2 = z, for any z, is a line with a slope of

−1. Plotting x 1 + x 2 = 0 produces the line with slope −1 through the

origin, as in Figure 29.2(b). The intersection of this line and the feasible region is the set of feasible solutions that have an objective value of 0. In

this case, that intersection of the line with the feasible region is the

single point (0, 0). More generally, for any value z, the intersection of

the line x 1 + x 2 = z and the feasible region is the set of feasible solutions that have objective value z. Figure 29.2(b) shows the lines x 1 +

x 2 = 0, x 1 + x 2 = 4, and x 1 + x 2 = 8. Because the feasible region in

Figure 29.2 is bounded, there must be some maximum value z for which the intersection of the line x 1 + x 2 = z and the feasible region is nonempty. Any point in the feasible region that maximizes x 1 + x 2 is an optimal solution to the linear program, which in this case is the vertex

of the feasible region at x 1 = 2 and x 2 = 6, with objective value 8.

Image 1049

Figure 29.2 (a) The linear program given in (29.18)–(29.21). Each constraint is represented by a line and a direction. The intersection of the constraints, which is the feasible region, is highlighted in blue. (b) The red lines show, respectively, the points for which the objective value is 0, 4, and 8. The optimal solution to the linear program is x 1 = 2 and x 2 = 6 with objective value 8.

It is no accident that an optimal solution to the linear program

occurs at a vertex of the feasible region. The maximum value of z for

which the line x 1 + x 2 = z intersects the feasible region must be on the boundary of the feasible region, and thus the intersection of this line

with the boundary of the feasible region is either a single vertex or a line

segment. If the intersection is a single vertex, then there is just one

optimal solution, and it is that vertex. If the intersection is a line

segment, every point on that line segment must have the same objective

value. In particular, both endpoints of the line segment are optimal

solutions. Since each endpoint of a line segment is a vertex, there is an

optimal solution at a vertex in this case as well.

Although you cannot easily graph linear programs with more than

two variables, the same intuition holds. If you have three variables, then

each constraint corresponds to a half-space in three-dimensional space.

The intersection of these half-spaces forms the feasible region. The set

of points for which the objective function obtains a given value z is now

a plane (assuming no degenerate conditions). If all coefficients of the

objective function are nonnegative, and if the origin is a feasible

solution to the linear program, then as you move this plane away from

the origin, in a direction normal to the objective function, you find points of increasing objective value. (If the origin is not feasible or if

some coefficients in the objective function are negative, the intuitive

picture becomes slightly more complicated.) As in two dimensions,

because the feasible region is convex, the set of points that achieve the

optimal objective value must include a vertex of the feasible region.

Similarly, if you have n variables, each constraint defines a half-space in

n-dimensional space. We call the feasible region formed by the

intersection of these half-spaces a simplex. The objective function is now a hyperplane and, because of convexity, an optimal solution still

occurs at a vertex of the simplex. Any algorithm for linear programming

must also identify linear programs that have no solutions, as well as

linear programs that have no finite optimal solution.

The simplex algorithm takes as input a linear program and returns an

optimal solution. It starts at some vertex of the simplex and performs a

sequence of iterations. In each iteration, it moves along an edge of the

simplex from a current vertex to a neighboring vertex whose objective

value is no smaller than that of the current vertex (and usually is larger.)

The simplex algorithm terminates when it reaches a local maximum,

which is a vertex from which all neighboring vertices have a smaller

objective value. Because the feasible region is convex and the objective

function is linear, this local optimum is actually a global optimum. In

Section 29.3, we’ll see an important concept called “duality,” which we’ll use to prove that the solution returned by the simplex algorithm is

indeed optimal.

The simplex algorithm, when implemented carefully, often solves

general linear programs quickly in practice. With some carefully

contrived inputs, however, the simplex algorithm can require

exponential time. The first polynomial-time algorithm for linear

programming was the ellipsoid algorithm, which runs slowly in practice.

A second class of polynomial-time algorithms are known as interior-

point methods. In contrast to the simplex algorithm, which moves along

the exterior of the feasible region and maintains a feasible solution that

is a vertex of the simplex at each iteration, these algorithms move

through the interior of the feasible region. The intermediate solutions,

while feasible, are not necessarily vertices of the simplex, but the final

solution is a vertex. For large inputs, interior-point algorithms can run as fast as, and sometimes faster than, the simplex algorithm. The

chapter notes point you to more information about these algorithms.

If you add to a linear program the additional requirement that all

variables take on integer values, you have an integer linear program.

Exercise 34.5-3 on page 1098 asks you to show that just finding a

feasible solution to this problem is NP-hard. Since no polynomial-time

algorithms are known for any NP-hard problems, there is no known

polynomial-time algorithm for integer linear programming. In contrast,

a general linear-programming problem can be solved in polynomial

time.

Exercises

29.1-1

Consider the linear program

minimize −2 x 1 + 3 x 2

subject to

x 1 + x 2 = 7

x 1 − 2 x 2 ≤ 4

x 1 ≥ 0.

Give three feasible solutions to this linear program. What is the

objective value of each one?

29.1-2

Consider the following linear program, which has a nonpositivity

constraint:

minimize 2 x 1 + 7 x 2 + x 3

subject to

x 1

x 3 = 7

3 x 1 + x 2

≥ 24

x 2 ≥ 0

x 3 ≤ 0.

Give three feasible solutions to this linear program. What is the

objective value of each one?

29.1-3

Show that the following linear program is infeasible:

maximize 3 x 1 − 2 x 2

subject to

x 1 + x 2 ≤ 2

−2 x 1 − 2 x 2 ≤ −10

x 1, x 2 ≥ 0.

29.1-4

Show that the following linear program is unbounded:

maximize x 1 − x 2

subject to

−2 x 1 + x 2 ≤ −1

x 1 − 2 x 2 ≤ −2

x 1, x 2 ≥ 0.

29.1-5

Give an example of a linear program for which the feasible region is not

bounded, but the optimal objective value is finite.

29.1-6

Sometimes, in a linear program, you need to convert constraints from

one form to another.

Image 1050

Image 1051

Image 1052

a. Show how to convert an equality constraint into an equivalent set of

inequalities. That is, given a constraint

, give a set of

inequalities that will be satisfied if and only if

,

b. Show how to convert an inequality constraint

into an

equality constraint and a nonnegativity constraint. You will need to

introduce an additional variable s, and use the constraint that s ≥ 0.

29.1-7

Explain how to convert a minimization linear program to an equivalent

maximization linear program, and argue that your new linear program

is equivalent to the original one.

29.1-8

In the political problem at the beginning of this chapter, there are

feasible solutions that correspond to winning more voters than there

actually are in the district. For example, you can set x 2 to 200, x 3 to 200, and x 1 = x 4 = 0. That solution is feasible, yet it seems to say that you will win 400,000 suburban voters, even though there are only

200,000 actual suburban voters. What constraints can you add to the

linear program to ensure that you never seem to win more voters than

there actually are? Even if you don’t add these constraints, argue that

the optimal solution to this linear program can never win more voters

than there actually are in the district.

29.2 Formulating problems as linear programs

Linear programming has many applications. Any textbook on

operations research is filled with examples of linear programming, and

linear programming has become a standard tool taught to students in

most business schools. The election scenario is one typical example.

Here are two more examples:

An airline wishes to schedule its flight crews. The Federal Aviation

Administration imposes several constraints, such as limiting the

number of consecutive hours that each crew member can work

and insisting that a particular crew work only on one model of aircraft during each month. The airline wants to schedule crews

on all of its flights using as few crew members as possible.

An oil company wants to decide where to drill for oil. Siting a

drill at a particular location has an associated cost and, based on

geological surveys, an expected payoff of some number of barrels

of oil. The company has a limited budget for locating new drills

and wants to maximize the amount of oil it expects to find, given

this budget.

Linear programs also model and solve graph and combinatorial

problems, such as those appearing in this book. We have already seen a

special case of linear programming used to solve systems of difference

constraints in Section 22.4. In this section, we’ll study how to formulate

several graph and network-flow problems as linear programs. Section

35.4 uses linear programming as a tool to find an approximate solution

to another graph problem.

Perhaps the most important aspect of linear programming is to be

able to recognize when you can formulate a problem as a linear

program. Once you cast a problem as a polynomial-sized linear

program, you can solve it in polynomial time by the ellipsoid algorithm

or interior-point methods. Several linear-programming software

packages can solve problems efficiently, so that once the problem is in

the form of a linear program, such a package can solve it.

We’ll look at several concrete examples of linear-programming

problems. We start with two problems that we have already studied: the

single-source shortest-paths problem from Chapter 22 and the maximum-flow problem from Chapter 24. We then describe the minimum-cost-flow problem. (Although the minimum-cost-flow

problem has a polynomial-time algorithm that is not based on linear

programming, we won’t describe the algorithm.) Finally, we describe the

multicommodity-flow problem, for which the only known polynomial-

time algorithm is based on linear programming.

When we solved graph problems in Part VI, we used attribute notation, such as v. d and ( u, v). f. Linear programs typically use subscripted variables rather than objects with attached attributes,

Image 1053

Image 1054

however. Therefore, when we express variables in linear programs, we

indicate vertices and edges through subscripts. For example, we denote

the shortest-path weight for vertex v not by v. d but by dv, and we denote the flow from vertex u to vertex v not by ( u, v). f but by fuv. For quantities that are given as inputs to problems, such as edge weights or

capacities, we continue to use notations such as w( u, v) and c( u, v).

Shortest paths

We can formulate the single-source shortest-paths problem as a linear

program. We’ll focus on how to formulate the single-pair shortest-path

problem, leaving the extension to the more general single-source

shortest-paths problem as Exercise 29.2-2.

In the single-pair shortest-path problem, the input is a weighted,

directed graph G = ( V, E), with weight function w : E → ℝ mapping edges to real-valued weights, a source vertex s, and destination vertex t.

The goal is to compute the value dt, which is the weight of a shortest

path from s to t. To express this problem as a linear program, you need to determine a set of variables and constraints that define when you

have a shortest path from s to t. The triangle inequality (Lemma 22.10

on page 633) gives dvdu + w( u, v) for each edge ( u, v) ∈ E. The source vertex initially receives a value ds = 0, which never changes. Thus the

following linear program expresses the shortest-path weight from s to t: subject to

You might be surprised that this linear program maximizes an objective

function when it is supposed to compute shortest paths. Minimizing the

objective function would be a mistake, because when all the edge

weights are nonnegative, setting dv = 0 for all vV (recall that a bar over a variable name denotes a specific setting of the variable’s value)

would yield an optimal solution to the linear program without solving

Image 1055

Image 1056

the shortest-paths problem. Maximizing is the right thing to do because

an optimal solution to the shortest-paths problem sets each dv to min

{ du + w( u, v) : uV and ( u, v) ∈ E}, so that dv is the largest value that is less than or equal to all of the values in the set { du + w( u, v).

Therefore, it makes sense to maximize dv for all vertices v on a shortest path from s to t subject to these constraints, and maximizing dt achieves this goal.

This linear program has | V| variables dv, one for each vertex vV. It also has | E| + 1 constraints: one for each edge, plus the additional constraint that the source vertex’s shortest-path weight always has the

value 0.

Maximum flow

Next, let’s express the maximum-flow problem as a linear program.

Recall that the input is a directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0, and two distinguished vertices: a source s and a sink t. As defined in Section 24.1, a flow is a nonnegative real-valued function f : V × V → ℝ that satisfies the capacity constraint and flow conservation. A maximum flow is a flow

that satisfies these constraints and maximizes the flow value, which is

the total flow coming out of the source minus the total flow into the

source. A flow, therefore, satisfies linear constraints, and the value of a

flow is a linear function. Recalling also that we assume that c( u, v) = 0 if ( u, v) ∉ E and that there are no antiparallel edges, the maximum-flow problem can be expressed as a linear program:

subject to

Image 1057

This linear program has | V|2 variables, corresponding to the flow

between each pair of vertices, and it has 2 | V|2 + | V| − 2 constraints.

It is usually more efficient to solve a smaller-sized linear program.

The linear program in (29.25)–(29.28) has, for ease of notation, a flow

and capacity of 0 for each pair of vertices u, v with ( u, v) ∉ E. It is more efficient to rewrite the linear program so that it has O( V + E) constraints. Exercise 29.2-4 asks you to do so.

Minimum-cost flow

In this section, we have used linear programming to solve problems for

which we already knew efficient algorithms. In fact, an efficient

algorithm designed specifically for a problem, such as Dijkstra’s

algorithm for the single-source shortest-paths problem, will often be

more efficient than linear programming, both in theory and in practice.

The real power of linear programming comes from the ability to

solve new problems. Recall the problem faced by the politician in the

beginning of this chapter. The problem of obtaining a sufficient number

of votes, while not spending too much money, is not solved by any of

the algorithms that we have studied in this book, yet it can be solved by

linear programming. Books abound with such real-world problems that

linear programming can solve. Linear programming is also particularly

useful for solving variants of problems for which we may not already

know of an efficient algorithm.

Figure 29.3 (a) An example of a minimum-cost-flow problem. Capacities are denoted by c and costs by a. Vertex s is the source, and vertex t is the sink. The goal is to send 4 units of flow from s to t. (b) A solution to the minimum-cost flow problem in which 4 units of flow are sent from s to t. For each edge, the flow and capacity are written as flow/capacity.

Image 1058

Image 1059

Consider, for example, the following generalization of the maximum-

flow problem. Suppose that, in addition to a capacity c( u, v) for each edge ( u, v), you are given a real-valued cost a( u, v). As in the maximum-flow problem, assume that c( u, v) = 0 if ( u, v) ∉ E and that there are no antiparallel edges. If you send fuv units of flow over edge ( u, v), you incur a cost of a( u, v) · fuv. You are also given a flow demand d. You wish to send d units of flow from s to t while minimizing the total cost

∑( u, v)∈ E a( u, v) · fuv incurred by the flow. This problem is known as the minimum-cost-flow problem.

Figure 29.3(a) shows an example of the minimum-cost-flow problem,

with a goal of sending 4 units of flow from s to t while incurring the minimum total cost. Any particular legal flow, that is, a function f

satisfying constraints (29.26)–(29.28), incurs a total cost of

∑( u, v)∈ E a( u, v) · fuv. What is the particular 4-unit flow that minimizes this cost? Figure 29.3(b) shows an optimal solution, with total cost

∑( u, v)∈ E a( u, v) · fuv = (2 · 2) + (5 · 2) + (3 · 1) + (7 · 1) + (1 · 3) = 27.

There are polynomial-time algorithms specifically designed for the

minimum-cost-flow problem, but they are beyond the scope of this

book. The minimum-cost-flow problem can be expressed as a linear

program, however. The linear program looks similar to the one for the

maximum-flow problem with the additional constraint that the value of

the flow must be exactly d units, and with the new objective function of

minimizing the cost:

subject to

Multicommodity flow

Image 1060

As a final example, let’s consider another flow problem. Suppose that

the Lucky Puck company from Section 24.1 decides to diversify its product line and ship not only hockey pucks, but also hockey sticks and

hockey helmets. Each piece of equipment is manufactured in its own

factory, has its own warehouse, and must be shipped, each day, from

factory to warehouse. The sticks are manufactured in Vancouver and

are needed in Saskatoon, and the helmets are manufactured in

Edmonton and must be shipped to Regina. The capacity of the shipping

network does not change, however, and the different items, or

commodities, must share the same network.

This example is an instance of a multicommodity-flow problem. The

input to this problem is once again a directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0. As in the maximum-flow problem, implicitly assume that c( u, v) = 0 for ( u, v) ∉ E

and that the graph has no antiparallel edges. In addition, there are k

different commodities, K 1, K 2, … , Kk, with commodity i specified by the triple Ki = ( si, ti, di). Here, vertex si is the source of commodity i, vertex ti is the sink of commodity i, and di is the demand for commodity i, which is the desired flow value for the commodity from si to ti. We define a flow for commodity i, denoted by fi, (so that fiuv is the flow of commodity i from vertex u to vertex v) to be a real-valued function that satisfies the flow-conservation and capacity constraints. We define fuv, the aggregate flow, to be the sum of the various commodity flows, so that

. The aggregate flow on edge ( u, v) must be no more

than the capacity of edge ( u, v). This problem has no objective function: the question is to determine whether such a flow exists. Thus the linear

program for this problem has a “null” objective function:

Image 1061

The only known polynomial-time algorithm for this problem expresses

it as a linear program and then solves it with a polynomial-time linear-

programming algorithm.

Exercises

29.2-1

Write out explicitly the linear program corresponding to finding the

shortest path from vertex s to vertex x in Figure 22.2(a) on page 609.

29.2-2

Given a graph G, write a linear program for the single-source shortest-

paths problem. The solution should have the property that dv is the

shortest-path weight from the source vertex s to v for each vertex vV.

29.2-3

Write out explicitly the linear program corresponding to finding the

maximum flow in Figure 24.1(a).

29.2-4

Rewrite the linear program for maximum flow (29.25)–(29.28) so that it

uses only O( V + E) constraints.

29.2-5

Write a linear program that, given a bipartite graph G = ( V, E), solves the maximum-bipartite-matching problem.

29.2-6

There can be more than one way to model a particular problem as a linear program. This exercise gives an alternative formulation for the

maximum-flow problem. Let P = { P 1, P 2, … , Pp} be the set of all possible directed simple paths from source s to sink t. Using decision variables x 1, … , xp, where xi is the amount of flow on path i, formulate a linear program for the maximum-flow problem. What is an upper

bound on p, the number of directed simple paths from s to t?

29.2-7

In the minimum-cost multicommodity-flow problem, the input is a

directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0 and a cost a( u, v). As in the multicommodity-flow problem, there are k different commodities, K 1, K 2, … , Kk, with commodity i specified by the triple Ki = ( si, ti, di). We define the flow fi for commodity i and the aggregate flow fuv on edge ( u, v) as in the multicommodity-flow problem. A feasible flow is one in

which the aggregate flow on each edge ( u, v) is no more than the capacity of edge ( u, v). The cost of a flow is ∑ u, vE a( u, v) · fuv, and the goal is to find the feasible flow of minimum cost. Express this problem

as a linear program.

29.3 Duality

We will now introduce a powerful concept called linear-programming

duality. In general, given a maximization problem, duality allows you to

formulate a related minimization problem that has the same objective

value. The idea of duality is actually more general than linear

programming, but we restrict our attention to linear programming in

this section.

Duality enables us to prove that a solution is indeed optimal. We saw

an example of duality in Chapter 24 with Theorem 24.6, the max-flow min-cut theorem. Suppose that, given an instance of a maximum-flow

problem, you find a flow f with value | f|. How do you know whether f is a maximum flow? By the max-flow min-cut theorem, if you can find a

Image 1062

Image 1063

Image 1064

Image 1065

Image 1066

cut whose value is also | f|, then you have verified that f is indeed a maximum flow. This relationship provides an example of duality: given

a maximization problem, define a related minimization problem such

that the two problems have the same optimal objective values.

Given a linear program in standard form in which the objective is to

maximize, let’s see how to formulate a dual linear program in which the

objective is to minimize and whose optimal value is identical to that of

the original linear program. When referring to dual linear programs, we

call the original linear program the primal.

Given the primal linear program

subject to

its dual is

subject to

Mechanically, to form the dual, change the maximization to a

minimization, exchange the roles of coefficients on the right-hand sides

and in the objective function, and replace each ≤ by ≥. Each of the m

constraints in the primal corresponds to a variable yi in the dual.

Likewise, each of the n constraints in the dual corresponds to a variable

xj in the primal. For example, consider the following primal linear

program:

subject to

Image 1067

Image 1068

Image 1069

Its dual is

subject to

Although forming the dual can be considered a mechanical

operation, there is an intuitive explanation. Consider the primal

maximization problem (29.37)–(29.41). Each constraint gives an upper

bound on the objective function. In addition, if you take one or more

constraints and add together nonnegative multiples of them, you get a

valid constraint. For example, you can add constraints (29.38) and

(29.39) to obtain the constraint 3 x 1 + 3 x 2 + 8 x 3 ≤ 54. Any feasible solution to the primal must satisfy this new constraint, but there is

something else interesting about it. Comparing this new constraint to

the objective function (29.37), you can see that for each variable, the

corresponding coefficient is at least as large as the coefficient in the

objective function. Thus, since the variables x 1, x 2 and x 3 are nonnegative, we have that

3 x 1 + x 2 + 4 x 3 ≤ 3 x 1 + 3 x 2 + 8 x 3 ≤ 54, and so the solution value to the primal is at most 54. In other words,

adding these two constraints together has generated an upper bound on

the objective value.

In general, for any nonnegative multipliers y 1, y 2, and y 3, you can generate a constraint

y 1( x 1+ x 2+3 x 3)+ y 2(2 x 1+2 x 2+5 x 3)+ y 3(4 x 1+ x 2+2 x 3)

30 y 1+24 y 2+36 y 3

Image 1070

from the primal constraints or, by distributing and regrouping,

( y 1+2 y 2+4 y 3) x 1+( y 1+2 y 2+ y 3) x 2+(3 y 1+5 y 2+2 y 3) x 3

30 y 1+24 y 2+36 y 3.

Now, as long as this constraint has coefficients of x 1, x 2, and x 3 that are at least their objective-function coefficients, it is a valid upper

bound. That is, as long as

y 1 + 2 y 2 + 4 y 3 ≥ 3,

y 1 + 2 y 2 + y 3 ≥ 1,

3 y 1 + 5 y 2 + 2 y 3 ≥ 4,

you have a valid upper bound of 30 y 1+24 y 2+36 y 3. The multipliers y 1, y 2, and y 3 must be nonnegative, because otherwise you cannot combine

the inequalities. Of course, you would like the upper bound to be as

small as possible, and so you want to choose y to minimize 30 y 1 + 24 y 2

+ 36 y 3. Observe that we have just described the dual linear program as

the problem of finding the smallest possible upper bound on the primal.

We’ll formalize this idea and show in Theorem 29.4 that, if the linear

program and its dual are feasible and bounded, then the optimal value

of the dual linear program is always equal to the optimal value of the

primal linear program. We begin by demonstrating weak duality, which

states that any feasible solution to the primal linear program has a value

no greater than that of any feasible solution to the dual linear program.

Lemma 29.1 (Weak linear-programming duality)

Let x be any feasible solution to the primal linear program in (29.31)–

(29.33), and let ӯ be any feasible solution to its dual linear program in

(29.34)–(29.36). Then

Proof We have

Image 1071

Image 1072

Corollary 29.2

Let x be a feasible solution to the primal linear program in (29.31)–

(29.33), and let ӯ be a feasible solution to its dual linear program in (29.34)–(29.36). If

then x and ӯ are optimal solutions to the primal and dual linear programs, respectively.

Proof By Lemma 29.1, the objective value of a feasible solution to the

primal cannot exceed that of a feasible solution to the dual. The primal

linear program is a maximization problem and the dual is a

minimization problem. Thus, if feasible solutions x and ӯ have the same objective value, neither can be improved.

We now show that, at optimality, the primal and dual objective

values are indeed equal. To prove linear programming duality, we will

require one lemma from linear algebra, known as Farkas’s lemma, the

proof of which Problem 29-4 asks you to provide. Farkas’s lemma can

take several forms, each of which is about when a set of linear equalities

has a solution. In stating the lemma, we use m + 1 as a dimension because it matches our use below.

Lemma 29.3 (Farkas’s lemma)

Given M ∈ ℝ( m+1)× n and g ∈ ℝ m+1, exactly one of the following statements is true:

Image 1073

Image 1074

Image 1075

Image 1076

1. There exists v ∈ ℝ n such that Mvg,

2. There exists w ∈ ℝ m+1 such that w ≥ 0, w T M = 0 (an n-vector of all zeros), and w Tg < 0.

Theorem 29.4 (Linear-programming duality)

Given the primal linear program in (29.31)–(29.33) and its

corresponding dual in (29.34)–(29.36), if both are feasible and bounded,

then for optimal solutions x* and y*, we have c T x* = b T y*.

Proof Let μ = b T y* be the optimal value of the dual linear program given in (29.34)–(29.36). Consider an augmented set of primal

constraints in which we add a constraint to (29.31)–(29.33) that the

objective value is at least μ. We write out this augmented primal as

We can multiply (29.48) through by −1 and rewrite (29.47)–(29.48) as

Here,

denotes an ( m+1)× n matrix, x is an n-vector, and denotes an ( m + 1)-vector.

We claim that if there is a feasible solution x to the augmented

primal, then the theorem is proved. To establish this claim, observe that

x is also a feasible solution to the original primal and that it has objective value at least μ. We can then apply Lemma 29.1, which states

that the objective value of the primal is at most μ, to complete the proof

of the theorem.

It therefore remains to show that the augmented primal has a

feasible solution. Suppose, for the purpose of contradiction, that the

augmented primal is infeasible, which means that there is no v ∈ ℝ n

Image 1077

Image 1078

Image 1079

Image 1080

Image 1081

Image 1082

such that

. We can apply Farkas’s lemma, Lemma 29.3, to

inequalty (29.49) with

Because the augmented primal is infeasible, condition 1 of Farkas’s

lemma does not hold. Therefore, condition 2 must apply, so that there

must exist a w ∈ ℝ m+1 such that w ≥ 0, w T M = 0, and w T g < 0. Let’s write w as

for some ӯ ∈ ℝ m and λ ∈ ℝ, where ӯ ≥ 0 and λ ≥ 0.

Substituting for w, M, and g in condition 2 gives

Unpacking the matrix notation gives

We now show that the requirements in (29.50) contradict the

assumption that μ is the optimal solution value for the dual linear

program. We consider two cases.

The first case is when λ = 0. In this case, (29.50) simplifies to

We’ll now construct a dual feasible solution y′ with an objective value

smaller than b T y*. Set y′ = y* + ϵ ӯ, for any ϵ > 0. Since y′T A = ( y* + ϵ ӯ)T A

= y*T A + ϵ ӯ T A

= y*T A

(by (29.51))

c T

(because y* is feasible),

y′ is feasible. Now consider the objective value

b T y′ = b T( y* + ϵ ӯ)

Image 1083

= b T y* + ϵ b T ӯ

< b T y*,

where the last inequality follows because ϵ > 0 and, by (29.51), ӯ T b =

b T ӯ < 0 (since both ӯ T b and b T ӯ are the inner product of b and ӯ), and so their product is negative. Thus we have a feasible dual solution of

value less than μ, which contradicts μ being the optimal objective value.

We now consider the second case, where λ > 0. In this case, we can

take (29.50) and divide through by λ to obtain

Now set y′ = ӯ/ λ in (29.52), giving

y′T A = c T and y′T b < μ.

Thus, y′ is a feasible dual solution with objective value strictly less than

μ, a contradiction. We conclude that the augmented primal has a

feasible solution, and the theorem is proved.

Fundamental theorem of linear programming

We conclude this chapter by stating the fundamental theorem of linear

programming, which extends Theorem 29.4 to the cases when the linear

program may be either feasible or unbounded. Exercise 29.3-8 asks you

to provide the proof.

Theorem 29.5 (Fundamental theorem of linear programming)

Any linear program, given in standard form, either

1. has an optimal solution with a finite objective value,

2. is infeasible, or

3. is unbounded.

Exercises

29.3-1

Formulate the dual of the linear program given in lines (29.6)–(29.10)

on page 852.

29.3-2

You have a linear program that is not in standard form. You could

produce the dual by first converting it to standard form, and then

taking the dual. It would be more convenient, however, to produce the

dual directly. Explain how to directly take the dual of an arbitrary linear

program.

29.3-3

Write down the dual of the maximum-flow linear program, as given in

lines (29.25)–(29.28) on page 862. Explain how to interpret this

formulation as a minimum-cut problem.

29.3-4

Write down the dual of the minimum-cost-flow linear program, as given

in lines (29.29)–(29.30) on page 864. Explain how to interpret this

problem in terms of graphs and flows.

29.3-5

Show that the dual of the dual of a linear program is the primal linear

program.

29.3-6

Which result from Chapter 24 can be interpreted as weak duality for the maximum-flow problem?

29.3-7

Consider the following 1-variable primal linear program:

maximize tx

subject to

rx s

x ≥ 0,

where r, s, and t are arbitrary real numbers. State for which values of r, s, and t you can assert that

1. Both the primal linear program and its dual have optimal

solutions with finite objective values.

2. The primal is feasible, but the dual is infeasible.

3. The dual is feasible, but the primal is infeasible.

4. Neither the primal nor the dual is feasible.

29.3-8

Prove the fundamental theorem of linear programming, Theorem 29.5.

Problems

29-1 Linear-inequality feasibility

Given a set of m linear inequalities on n variables x 1, x 2, … , xn, the linear-inequality feasibility problem asks whether there is a setting of the

variables that simultaneously satisfies each of the inequalities.

a. Given an algorithm for the linear-programming problem, show how

to use it to solve a linear-inequality feasibility problem. The number

of variables and constraints that you use in the linear-programming

problem should be polynomial in n and m.

b. Given an algorithm for the linear-inequality feasibility problem, show

how to use it to solve a linear-programming problem. The number of

variables and linear inequalities that you use in the linear-inequality

feasibility problem should be polynomial in n and m, the number of

variables and constraints in the linear program.

29-2 Complementary slackness

Complementary slackness describes a relationship between the values of

primal variables and dual constraints and between the values of dual

variables and primal constraints. Let x be a feasible solution to the primal linear program given in (29.31)–(29.33), and let ӯ be a feasible

Image 1084

Image 1085

Image 1086

Image 1087

solution to the dual linear program given in (29.34)–(29.36).

Complementary slackness states that the following conditions are

necessary and sufficient for x and ӯ to be optimal:

and

a. Verify that complementary slackness holds for the linear program in

lines (29.37)–(29.41).

b. Prove that complementary slackness holds for any primal linear

program and its corresponding dual.

c. Prove that a feasible solution x to a primal linear program given in lines (29.31)–(29.33) is optimal if and only if there exist values ӯ =

( ӯ 1, ӯ 2, … , ӯm) such that

1. ӯ is a feasible solution to the dual linear program given in

(29.34)–(29.36),

2.

for all j such that xj > 0, and

3. ӯi = 0 for all i such that

.

29-3 Integer linear programming

An integer linear-programming problem is a linear-programming

problem with the additional constraint that the variables x must take on

integer values. Exercise 34.5-3 on page 1098 shows that just determining

whether an integer linear program has a feasible solution is NP-hard,

which means that there is no known polynomial-time algorithm for this

problem.

a. Show that weak duality (Lemma 29.1) holds for an integer linear

program.

b. Show that duality (Theorem 29.4) does not always hold for an integer linear program.

c. Given a primal linear program in standard form, let P be the optimal objective value for the primal linear program, D be the optimal

objective value for its dual, IP be the optimal objective value for the

integer version of the primal (that is, the primal with the added

constraint that the variables take on integer values), and ID be the

optimal objective value for the integer version of the dual. Assuming

that both the primal integer program and the dual integer program

are feasible and bounded, show that

IPP = DID.

29-4 Farkas’s lemma

Prove Farkas’s lemma, Lemma 29.3.

29-5 Minimum-cost circulation

This problem considers a variant of the minimum-cost-flow problem

from Section 29.2 in which there is no demand, source, or sink. Instead, the input, as before, contains a flow network, capacity constraints c( u, v), and edge costs a( u, v). A flow is feasible if it satisfies the capacity constraint on every edge and flow conservation at every vertex. The goal

is to find, among all feasible flows, the one of minimum cost. We call

this problem the minimum-cost-circulation problem.

a. Formulate the minimum-cost-circulation problem as a linear

program.

b. Suppose that for all edges ( u, v) ∈ E, we have a( u, v) > 0. What does an optimal solution to the minimum-cost-circulation problem look

like?

c. Formulate the maximum-flow problem as a minimum-cost-circulation

problem linear program. That is, given a maximum-flow problem

instance G = ( V, E) with source s, sink t and edge capacities c, create a minimum-cost-circulation problem by giving a (possibly different)

network G′ = ( V′, E′) with edge capacities c′ and edge costs a′ such

that you can derive a solution to the maximum-flow problem from a

solution to the minimum-cost-circulation problem.

d. Formulate the single-source shortest-path problem as a minimum-

cost-circulation problem linear program.

Chapter notes

This chapter only begins to study the wide field of linear programming.

A number of books are devoted exclusively to linear programming,

including those by Chvátal [94], Gass [178], Karloff [246], Schrijver

[398], and Vanderbei [444]. Many other books give a good coverage of linear programming, including those by Papadimitriou and Steiglitz

[353] and Ahuja, Magnanti, and Orlin [7]. The coverage in this chapter draws on the approach taken by Chvátal.

The simplex algorithm for linear programming was invented by G.

Dantzig in 1947. Shortly after, researchers discovered how to formulate

a number of problems in a variety of fields as linear programs and solve

them with the simplex algorithm. As a result, applications of linear

programming flourished, along with several algorithms. Variants of the

simplex algorithm remain the most popular methods for solving linear-

programming problems. This history appears in a number of places,

including the notes in [94] and [246].

The ellipsoid algorithm was the first polynomial-time algorithm for

linear programming and is due to L. G. Khachian in 1979. It was based

on earlier work by N. Z. Shor, D. B. Judin, and A. S. Nemirovskii.

Grötschel, Lovász, and Schrijver [201] describe how to use the ellipsoid algorithm to solve a variety of problems in combinatorial optimization.

To date, the ellipsoid algorithm does not appear to be competitive with

the simplex algorithm in practice.

Karmarkar’s paper [247] includes a description of the first interior-point algorithm. Many subsequent researchers designed interior-point

algorithms. Good surveys appear in the article of Goldfarb and Todd

[189] and the book by Ye [463].

Analysis of the simplex algorithm remains an active area of research.

V. Klee and G. J. Minty constructed an example on which the simplex

algorithm runs through 2 n − 1 iterations. The simplex algorithm usually performs well in practice, and many researchers have tried to give

theoretical justification for this empirical observation. A line of research

begun by K. H. Borgwardt, and carried on by many others, shows that

under certain probabilistic assumptions on the input, the simplex

algorithm converges in expected polynomial time. Spielman and Teng

[421] made progress in this area, introducing the “smoothed analysis of algorithms” and applying it to the simplex algorithm.

The simplex algorithm is known to run efficiently in certain special

cases. Particularly noteworthy is the network-simplex algorithm, which

is the simplex algorithm, specialized to network-flow problems. For

certain network problems, including the shortest-paths, maximum-flow,

and minimum-cost-flow problems, variants of the network-simplex

algorithm run in polynomial time. See, for example, the article by Orlin

[349] and the citations therein.

1 An intuitive definition of a convex region is that it fulfills the requirement that for any two points in the region, all points on a line segment between them are also in the region.

Image 1088

30 Polynomials and the FFT

The straightforward method of adding two polynomials of degree n

takes Θ( n) time, but the straightforward method of multiplying them

takes Θ( n 2) time. This chapter will show how the fast Fourier transform,

or FFT, can reduce the time to multiply polynomials to Θ( n lg n).

The most common use for Fourier transforms, and hence the FFT, is

in signal processing. A signal is given in the time domain: as a function

mapping time to amplitude. Fourier analysis expresses the signal as a

weighted sum of phase-shifted sinusoids of varying frequencies. The

weights and phases associated with the frequencies characterize the

signal in the frequency domain. Among the many everyday applications

of FFT’s are compression techniques used to encode digital video and

audio information, including MP3 files. Many fine books delve into the

rich area of signal processing, and the chapter notes reference a few of

them.

Polynomials

A polynomial in the variable x over an algebraic field F represents a function A( x) as a formal sum:

The values a 0, a 1, … , an−1 are the coefficients of the polynomial. The coefficients and x are drawn from a field F, typically the set ℂ of complex numbers. A polynomial A( x) has degree k if its highest nonzero

Image 1089

Image 1090

Image 1091

Image 1092

coefficient is ak, in which case we say that degree( A) = k. Any integer strictly greater than the degree of a polynomial is a degree-bound of that

polynomial. Therefore, the degree of a polynomial of degree-bound n

may be any integer between 0 and n − 1, inclusive.

A variety of operations extend to polynomials. For polynomial

addition, if A( x) and B( x) are polynomials of degree-bound n, their sum is a polynomial C( x), also of degree-bound n, such that C( x) =

A( x)+ B( x) for all x in the underlying field. That is, if then

where cj = aj + bj for j = 0, 1, … , n − 1. For example, given the polynomials A( x) = 6 x 3 + 7 x 2 − 10 x + 9 and B( x) = −2 x 3 + 4 x − 5, their sum is C( x) = 4 x 3 + 7 x 2 − 6 x + 4.

For polynomial multiplication, if A( x) and B( x) are polynomials of degree-bound n, their product C( x) is a polynomial of degree-bound 2 n

− 1 such that C( x) = A( x) B( x) for all x in the underlying field. You probably have multiplied polynomials before, by multiplying each term

in A( x) by each term in B( x) and then combining terms with equal powers. For example, you can multiply A( x) = 6 x 3 + 7 x 2 − 10 x + 9 and B( x) = −2 x 3 + 4 x − 5 as follows:

Another way to express the product C( x) is

Image 1093

Image 1094

where

(By the definition of degree, ak = 0 for all k > degree( A) and bk = 0 for all k > degree( B).) If A is a polynomial of degree-bound n a and B is a polynomial of degree-bound nb, then C must be a polynomial of degree-bound n a + nb − 1, because degree( C) = degree( A) + degree( B). Since a polynomial of degree-bound k is also a polynomial of degree-bound k +

1, we normally make the somewhat simpler statement that the product

polynomial C is a polynomial of degree-bound n a + nb.

Chapter outline

Section 30.1 presents two ways to represent polynomials: the coefficient representation and the point-value representation. The straightforward

method for multiplying polynomials of degree n—equations (30.1) and

(30.2)—takes Θ( n 2) time with polynomials represented in coefficient

form, but only Θ( n) time with point-value form. Converting between the

two representations, however, reduces the time to multiply polynomials

to just Θ( n lg n). To see why this approach works, you must first

understand complex roots of unity, which Section 30.2 covers. Section

30.2 then uses the FFT and its inverse to perform the conversions.

Because the FFT is used so often in signal processing, it is often

implemented as a circuit in hardware, and Section 30.3 illustrates the structure of such circuits.

This chapter relies on complex numbers, and within this chapter the

symbol i denotes

exclusively.

30.1 Representing polynomials

The coefficient and point-value representations of polynomials are in a

sense equivalent: a polynomial in point-value form has a unique

counterpart in coefficient form. This section introduces the two

Image 1095

Image 1096

representations and shows how to combine them in order to multiply

two degree-bound n polynomials in Θ( n lg n) time.

Coefficient representation

A coefficient representation of a polynomial

of degree-

bound n is a vector of coefficients a = ( a 0, a 1, … , an−1). Matrix equations in this chapter generally treat vectors as column vectors.

The coefficient representation is convenient for certain operations on

polynomials. For example, the operation of evaluating the polynomial

A( x) at a given point x 0 consists of computing the value of A( x 0). To evaluate a polynomial in Θ( n) time, use Horner’s rule:

Similarly, adding two polynomials represented by the coefficient vectors

a = ( a 0, a 1, … , an−1) and b = ( b 0, b 1, … , bn−1) takes Θ( n) time: just produce the coefficient vector c = ( c 0, c 1, … , cn−1), where cj = aj + bj for j = 0, 1, … , n− 1.

Now, consider multiplying two degree-bound n polynomials A( x) and B( x) represented in coefficient form. The method described by equations (30.1) and (30.2) takes Θ( n 2) time, since it multiplies each coefficient in the vector a by each coefficient in the vector b. The operation of multiplying polynomials in coefficient form seems to be

considerably more difficult than that of evaluating a polynomial or

adding two polynomials. The resulting coefficient vector c, given by

equation (30.2), is also called the convolution of the input vectors a and b, denoted c = ab. Since multiplying polynomials and computing convolutions are fundamental computational problems of considerable

practical importance, this chapter concentrates on efficient algorithms

for them.

Point-value representation

A point-value representation of a polynomial A( x) of degree-bound n is a set of n point-value pairs

Image 1097

Image 1098

{( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)}

such that all of the xk are distinct and

for k = 0, 1, … , n − 1. A polynomial has many different point-value

representations, since any set of n distinct points x 0, x 1, … , xn−1 can serve as a basis for the representation.

Computing a point-value representation for a polynomial given in

coefficient form is in principle straightforward, since all you have to do

is select n distinct points x 0, x 1, … , xn−1 and then evaluate A( xk) for k

= 0, 1, … , n − 1. With Horner’s method, evaluating a polynomial at n

points takes Θ( n 2) time. We’ll see later that if you choose the points xk

cleverly, you can accelerate this computation to run in Θ( n lg n) time.

The inverse of evaluation—determining the coefficient form of a

polynomial from a point-value representation—is interpolation. The

following theorem shows that interpolation is well defined when the

desired interpolating polynomial must have a degree-bound equal to the

given number of point-value pairs.

Theorem 30.1 (Uniqueness of an interpolating polynomial)

For any set {( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)} of n point-value pairs such that all the xk values are distinct, there is a unique polynomial A( x) of degree-bound n such that yk = A( xk) for k = 0, 1, … , n − 1.

Proof The proof relies on the existence of the inverse of a certain matrix. Equation (30.3) is equivalent to the matrix equation

The matrix on the left is denoted V( x 0, x 1, … , xn−1) and is known as a Vandermonde matrix. By Problem D-1 on page 1223, this matrix

has determinant

Image 1099

Image 1100

and therefore, by Theorem D.5 on page 1221, it is invertible (that is,

nonsingular) if the xk are distinct. To solve for the coefficients aj uniquely given the point-value representation, use the inverse of the

Vandermonde matrix:

a = V( x 0, x 1, … , xn−1)−1 y.

The proof of Theorem 30.1 describes an algorithm for interpolation

based on solving the set (30.4) of linear equations. Section 28.1 shows how to solve these equations in O( n 3) time.

A faster algorithm for n-point interpolation is based on Lagrange’s

formula:

You might want to verify that the right-hand side of equation (30.5) is a

polynomial of degree-bound n that satisfies A( xk) = yk for all k.

Exercise 30.1-5 asks you how to compute the coefficients of A using

Lagrange’s formula in Θ( n 2) time.

Thus, n-point evaluation and interpolation are well-defined inverse

operations that transform between the coefficient representation of a

polynomial and a point-value representation.1 The algorithms described above for these problems take Θ( n 2) time.

The point-value representation is quite convenient for many

operations on polynomials. For addition, if C( x) = A( x) + B( x), then C( xk) = A( xk) + B( xk) for any point xk. More precisely, given point-value representations for A,

{( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)},

Image 1101

Image 1102

Image 1103

Image 1104

and for B,

where A and B are evaluated at the same n points, then a point-value representation for C is

Thus the time to add two polynomials of degree-bound n in point-value

form is Θ( n).

Similarly, the point-value representation is convenient for

multiplying polynomials. If C( x) = A( x) B( x), then C( xk) = A( xk) B( xk) for any point xk, and to obtain a point-value representation for C, just pointwise multiply a point-value representation for A by a point-value

representation for B. Polynomial multiplication differs from polynomial

addition in one key aspect, however: degree( C) = degree( A) + degree( B), so that if A and B have degree-bound n, then C has degree-bound 2 n. A standard point-value representation for A and B consists of n point-value pairs for each polynomial. Multiplying these together gives n

point-value pairs, but 2 n pairs are necessary to interpolate a unique polynomial C of degree-bound 2 n. (See Exercise 30.1-4.) Instead, begin

with “extended” point-value representations for A and for B consisting

of 2 n point-value pairs each. Given an extended point-value

representation for A,

{( x 0, y 0), ( x 1, y 1), … , ( x 2 n−1, y 2 n−1)}, and a corresponding extended point-value representation for B,

then a point-value representation for C is

Given two input polynomials in extended point-value form, multiplying

them to obtain the point-value form of the result takes just Θ( n) time,

much less than the Θ( n 2) time required to multiply polynomials in

coefficient form.

Image 1105

Finally, let’s consider how to evaluate a polynomial given in point-

value form at a new point. For this problem, the simplest approach

known is to first convert the polynomial to coefficient form and then

evaluate it at the new point.

Fast multiplication of polynomials in coefficient form

Can the linear-time multiplication method for polynomials in point-

value form expedite polynomial multiplication in coefficient form? The

answer hinges on whether it is possible convert a polynomial quickly

from coefficient form to point-value form (evaluate) and vice versa

(interpolate).

Figure 30.1 A graphical outline of an efficient polynomial-multiplication process.

Representations on the top are in coefficient form, and those on the bottom are in point-value form. The arrows from left to right correspond to the multiplication operation. The ω 2 n terms are complex (2 n)th roots of unity.

Any points can serve as evaluation points, but certain evaluation

points allow conversion between representations in only Θ( n lg n) time.

As we’ll see in Section 30.2, if “complex roots of unity” are the evaluation points, then the discrete Fourier transform (or DFT)

evaluates and the inverse DFT interpolates. Section 30.2 shows how the FFT accomplishes the DFT and inverse DFT operations in Θ( n lg n) time.

Figure 30.1 shows this strategy graphically. One minor detail concerns degree-bounds. The product of two polynomials of degree-bound n is a polynomial of degree-bound 2 n. Before evaluating the input polynomials A and B, therefore, first double their degree-bounds

to 2 n by adding n high-order coefficients of 0. Because the vectors have 2 n elements, use “complex (2 n)th roots of unity,” which are denoted by

the ω 2 n terms in Figure 30.1.

The following procedure takes advantage of the FFT to multiply two

polynomials A( x) and B( x) of degree-bound n in Θ( n lg n)-time, where the input and output representations are in coefficient form. The

procedure assumes that n is an exact power of 2, so if it isn’t, just add

high-order zero coefficients.

1. Double degree-bound: Create coefficient representations of A( x) and B( x) as degree-bound 2 n polynomials by adding n high-order zero coefficients to each.

2. Evaluate: Compute point-value representations of A( x) and B( x) of length 2 n by applying the FFT of order 2 n on each

polynomial. These representations contain the values of the two

polynomials at the (2 n)th roots of unity.

3. Pointwise multiply: Compute a point-value representation for the

polynomial C( x) = A( x) B( x) by multiplying these values together pointwise. This representation contains the value of C( x) at each

(2 n)th root of unity.

4. Interpolate: Create the coefficient representation of the

polynomial C( x) by applying the FFT on 2 n point-value pairs to

compute the inverse DFT.

Steps (1) and (3) take Θ( n) time, and steps (2) and (4) take Θ( n lg n) time. Thus, once we show how to use the FFT, we will have proven the

following.

Theorem 30.2

Two polynomials of degree-bound n with both the input and output

representations in coefficient form can be multiplied in Θ( n lg n) time.

Exercises

30.1-1

Image 1106

Image 1107

Multiply the polynomials A( x) = 7 x 3 − x 2 + x − 10 and B( x) = 8 x 3 −

6 x + 3 using equations (30.1) and (30.2).

30.1-2

Another way to evaluate a polynomial A( x) of degree-bound n at a given point x 0 is to divide A( x) by the polynomial ( xx 0), obtaining a quotient polynomial q( x) of degree-bound n − 1 and a remainder r, such that

A( x) = q( x)( xx 0) + r.

Then we have A( x 0) = r. Show how to compute the remainder r and the coefficients of q( x) from x 0 and the coefficients of A in Θ( n) time.

30.1-3

Given a polynomial

, define

. Show how

to derive a point-value representation for A rev( x) from a point-value representation for A( x), assuming that none of the points is 0.

30.1-4

Prove that n distinct point-value pairs are necessary to uniquely specify

a polynomial of degree-bound n, that is, if fewer than n distinct point-value pairs are given, they fail to specify a unique polynomial of degree-

bound n. ( Hint: Using Theorem 30.1, what can you say about a set of n

− 1 point-value pairs to which you add one more arbitrarily chosen

point-value pair?)

30.1-5

Show how to use equation (30.5) to interpolate in Θ( n 2) time. ( Hint: First compute the coefficient representation of the polynomial ∏ j( x

xj) and then divide by ( xxk) as necessary for the numerator of each term (see Exercise 30.1-2). You can compute each of the n denominators

in O( n) time.)

30.1-6

Explain what is wrong with the “obvious” approach to polynomial division using a point-value representation: dividing the corresponding

y values. Discuss separately the case in which the division comes out exactly and the case in which it doesn’t.

30.1-7

Consider two sets A and B, each having n integers in the range from 0 to 10 n. The Cartesian sum of A and B is defined by

C = { x + y : xA and yB}.

The integers in C lie in the range from 0 to 20 n. Show how, in O( n lg n) time, to find the elements of C and the number of times each element of

C is realized as a sum of elements in A and B. ( Hint: Represent A and B

as polynomials of degree at most 10 n.)

30.2 The DFT and FFT

In Section 30.1, we claimed that by computing the DFT and its inverse by using the FFT, it is possible to evaluate and interpolate a degree n

polynomial at the complex roots of unity in Θ( n lg n) time. This section defines complex roots of unity, studies their properties, defines the DFT,

and then shows how the FFT computes the DFT and its inverse in Θ( n

lg n) time.

Complex roots of unity

A complex nth root of unity is a complex number ω such that

ωn = 1.

There are exactly n complex n th roots of unity: e 2 πik/ n for k = 0, 1, … , n − 1. To interpret this formula, use the definition of the exponential of

a complex number:

eiu = cos( u) + i sin( u).

Figure 30.2 shows that the n complex roots of unity are equally spaced around the circle of unit radius centered at the origin of the complex

Image 1108

Image 1109

Image 1110

Image 1111

Image 1112

Image 1113

Image 1114

Image 1115

Image 1116

plane. The value

Figure 30.2 The values of

in the complex plane, where ω 8 = e 2 πi/8 is the principal

8th root of unity.

is the principal nth root of unity. 2 All other complex n th roots of unity are powers of ωn.

The n complex n th roots of unity,

form a group under multiplication (see Section 31.3). This group has the same structure as the additive group (ℤ n, +) modulo n, since

implies that

. Similarly,

. The following

lemmas furnish some essential properties of the complex n th roots of unity.

Lemma 30.3 (Cancellation lemma)

For any integers n > 0, k ≥ 0, and d > 0,

Proof The lemma follows directly from equation (30.6), since

Image 1117

Image 1118

Image 1119

Image 1120

Image 1121

Image 1122

Image 1123

Image 1124

Image 1125

Corollary 30.4

For any even integer n > 0,

Proof The proof is left as Exercise 30.2-1.

Lemma 30.5 (Halving lemma)

If n > 0 is even, then the squares of the n complex n th roots of unity are the n/2 complex ( n/2)th roots of unity.

Proof By the cancellation lemma,

for any nonnegative integer

k. Squaring all of the complex n th roots of unity produces each ( n/2)th root of unity exactly twice, since

Thus and

have the same square. We could also have used

Corollary 30.4 to prove this property, since

implies

, and thus

.

As we’ll see, the halving lemma is essential to the divide-and-conquer

approach for converting between coefficient and point-value

representations of polynomials, since it guarantees that the recursive

subproblems are only half as large.

Lemma 30.6 (Summation lemma)

For any integer n ≥ 1 and nonzero integer k not divisible by n,

Image 1126

Image 1127

Image 1128

Image 1129

Image 1130

Image 1131

Image 1132

Proof Equation (A.6) on page 1142 applies to complex values as well

as to reals, giving

To see that the denominator is not 0, note that

only when k is

divisible by n, which the lemma statement prohibits.

The DFT

Recall the goal of evaluating a polynomial

of degree-bound n at , ,

(that is, at the n complex n th roots

of unity). 3 The polynomial A is given in coefficient form: a = ( a 0, a 1, …

, an−1). Let us define the results yk, for k = 0, 1, … , n − 1, by The vector y = ( y 0, y 1, … , yn−1) is the discrete Fourier transform (DFT) of the coefficient vector a = ( a 0, a 1, … , an−1). We also write y

= DFT n( a).

The FFT

The fast Fourier transform (FFT) takes advantage of the special

properties of the complex roots of unity to compute DFT n( a) in Θ( n lg n) time, as opposed to the Θ( n 2) time of the straightforward method.

Image 1133

Image 1134

Image 1135

Image 1136

Assume throughout that n is an exact power of 2. Although strategies

for dealing with sizes that are not exact powers of 2 are known, they are

beyond the scope of this book.

The FFT method employs a divide-and-conquer strategy, using the

even-indexed and odd-indexed coefficients of A( x) separately to define

the two new polynomials A even( x) and A odd( x) of degree-bound n/2: A even( x) = a 0 + a 2 x + a 4 x 2 + ⋯ + an−2 xn/2−1, A odd( x) = a 1 + a 3 x + a 5 x 2 + ⋯ + an−1 xn/2−1.

Note that A even contains all the even-indexed coefficients of A (the binary representation of the index ends in 0) and A odd contains all the

odd-indexed coefficients (the binary representation of the index ends in

1). It follows that

so that the problem of evaluating A( x) at ,

reduces to

1. evaluating the degree-bound n/2 polynomials A even( x) and A odd( x) at the points

and then

2. combining the results according to equation (30.9).

By the halving lemma, the list of values (30.10) consists not of n

distinct values but only of the n/2 complex ( n/2)th roots of unity, with each root occurring exactly twice. Therefore, the FFT recursively

evaluates the polynomials A even and A odd of degree-bound n/2 at the n/2 complex ( n/2)th roots of unity. These subproblems have exactly the

same form as the original problem, but are half the size, dividing an n-

element DFT n computation into two n/2-element DFT n/2

computations. This decomposition is the basis for the FFT procedure

Image 1137

Image 1138

Image 1139

Image 1140

Image 1141

Image 1142

on the next page, which computes the DFT of an n-element vector a =

( a 0, a 1, … , an−1), where n is an exact power of 2.

The FFT procedure works as follows. Lines 1–2 represent the base

case of the recursion. The DFT of 1 element is the element itself, since

in this case

y 0 =

= a 0 · 1

= a 0.

Lines 5–6 define the coefficient vectors for the polynomials A even and

A odd. Lines 3, 4, and 12 guarantee that ω is updated properly so that

whenever lines 10–11 are executed,

. (Keeping a running value of ω

from iteration to iteration saves time over computing from scratch

each time through the for loop.4) Lines 7–8 perform the recursive DFT n/2 computations, setting, for k = 0, 1, … , n/2 − 1,

FFT( a, n)

1if n == 1

2

return a

// DFT of 1 element is the element itself

3 ωn = e 2 πi/ n

4 ω = 1

5 a even = ( a 0, a 2, … , an−2)

6 a odd = ( a 1, a 3, … , an−1)

7 y even = FFT( a even, n/2)

8 y odd = FFT( a odd, n/2)

9for k = 0 to n/2 − 1

// at this point,

10

11

12

ω = ωωn

13return y

Image 1143

Image 1144

Image 1145

Image 1146

Image 1147

Image 1148

Image 1149

Image 1150

Image 1151

Image 1152

or, since

by the cancellation lemma,

Lines 10–11 combine the results of the recursive DFT n/2 calculations.

For the first n/2 results y 0, y 1, … , yn/2−1, line 10 yields For yn/2, yn/2+1, … , yn−1, letting k = 0, 1, … , n/2 − 1, line 11 yields Thus the vector y returned by FFT is indeed the DFT of the input

vector a.

Lines 10 and 11 multiply each value by , for k = 0, 1, … , n/2 −

1. Line 10 adds this product to

, and line 11 subtracts it. Because

each factor appears in both its positive and negative forms, we call the

factors twiddle factors.

To determine the running time of the procedure FFT, note that

exclusive of the recursive calls, each invocation takes Θ( n) time, where n

is the length of the input vector. The recurrence for the running time is

therefore

T( n) = 2 T( n/2) + Θ( n)

= Θ( n lg n),

by case 2 of the master theorem (Theorem 4.1). Thus the FFT can

evaluate a polynomial of degree-bound n at the complex n th roots of unity in Θ( n lg n) time.

Image 1153