The question remains of how to choose the first derivatives of f( x) at the knots. One method is to require the second derivatives to be continuous
at the knots:
for i = 0, 1, … , n−2. At the first and last knots, assume that and
. These assumptions make f( x) a
natural cubic spline.
b. Use the continuity constraints on the second derivative to show that
for i = 1, 2, … , n − 1,
c. Show that
d. Rewrite equations (28.23)–(28.25) as a matrix equation involving the
vector D = ( D 0 D 1 D 2 ⋯ Dn)T of unknowns. What attributes does the matrix in your equation have?
e. Argue that a natural cubic spline can interpolate a set of n + 1 point-value pairs in O( n) time (see Problem 28-1).
f. Show how to determine a natural cubic spline that interpolates a set
of n + 1 points ( xi, yi) satisfying x 0 < x 1 < ⋯ < xn, even when xi is not necessarily equal to i. What matrix equation must your method
solve, and how quickly does your algorithm run?
Chapter notes
Many excellent texts describe numerical and scientific computation in
much greater detail than we have room for here. The following are
especially readable: George and Liu [180], Golub and Van Loan [192], Press, Teukolsky, Vetterling, and Flannery [365, 366], and Strang [422,
423].
Golub and Van Loan [192] discuss numerical stability. They show why det( A) is not necessarily a good indicator of the stability of a matrix
A,
proposing
instead
to
use
∥ A∥∞ ∥ A−1∥∞, where
. They also address the question of how to
compute this value without actually computing A−1.
Gaussian elimination, upon which the LU and LUP decompositions
are based, was the first systematic method for solving linear systems of
equations. It was also one of the earliest numerical algorithms.
Although it was known earlier, its discovery is commonly attributed to
C. F. Gauss (1777–1855). In his famous paper [424], Strassen showed that an n× n matrix can be inverted in O( n lg 7) time. Winograd [460]
originally proved that matrix multiplication is no harder than matrix
inversion, and the converse is due to Aho, Hopcroft, and Ullman [5].
Another important matrix decomposition is the singular value
decomposition, or SVD. The SVD factors an m × n matrix A into
, where Σ is an m× n matrix with nonzero values only on the
diagonal, Q 1 is m× m with mutually orthonormal columns, and Q 2 is n
× n, also with mutually orthonormal columns. Two vectors are
orthonormal if their inner product is 0 and each vector has a norm of 1.
The books by Strang [422, 423] and Golub and Van Loan [192] contain good treatments of the SVD.
Strang [423] has an excellent presentation of symmetric positive-definite matrices and of linear algebra in general.
1 The year in which Introduction to Algorithms was first published.
Many problems take the form of maximizing or minimizing an
objective, given limited resources and competing constraints. If you can
specify the objective as a linear function of certain variables, and if you
can specify the constraints on resources as equalities or inequalities on
those variables, then you have a linear-programming problem. Linear
programs arise in a variety of practical applications. We begin by
studying an application in electoral politics.
A political problem
Suppose that you are a politician trying to win an election. Your district
has three different types of areas—urban, suburban, and rural. These
areas have, respectively, 100,000, 200,000, and 50,000 registered voters.
Although not all the registered voters actually go to the polls, you
decide that to govern effectively, you would like at least half the
registered voters in each of the three regions to vote for you. You are
honorable and would never consider supporting policies you don’t
believe in. You realize, however, that certain issues may be more
effective in winning votes in certain places. Your primary issues are
preparing for a zombie apocalypse, equipping sharks with lasers,
building highways for flying cars, and allowing dolphins to vote.
According to your campaign staff’s research, you can estimate how
many votes you win or lose from each population segment by spending
$1,000 on advertising on each issue. This information appears in the
table of Figure 29.1. In this table, each entry indicates the number of
thousands of either urban, suburban, or rural voters who would be won
over by spending $1,000 on advertising in support of a particular issue.
Negative entries denote votes that would be lost. Your task is to figure
out the minimum amount of money that you need to spend in order to
win 50,000 urban votes, 100,000 suburban votes, and 25,000 rural votes.
You could, by trial and error, devise a strategy that wins the required
number of votes, but the strategy you come up with might not be the
least expensive one. For example, you could devote $20,000 of
advertising to preparing for a zombie apocalypse, $0 to equipping
sharks with lasers, $4,000 to building highways for flying cars, and
$9,000 to allowing dolphins to vote. In this case, you would win (20 ·
−2) + (0 · 8) + (4 · 0) + (9 · 10) = 50 thousand urban votes, (20 · 5) + (0 ·
2) + (4 · 0) + (9 · 0) = 100 thousand suburban votes, and (20 · 3) + (0 ·
−5) + (4 · 10) + (9 · −2) = 82 thousand rural votes. You would win the
exact number of votes desired in the urban and suburban areas and
more than enough votes in the rural area. (In fact, according to your
model, in the rural area you would receive more votes than there are
voters.) In order to garner these votes, you would have paid for 20 + 0 +
4 + 9 = 33 thousand dollars of advertising.
Figure 29.1 The effects of policies on voters. Each entry describes the number of thousands of urban, suburban, or rural voters who could be won over by spending $1,000 on advertising support of a policy on a particular issue. Negative entries denote votes that would be lost.
It’s natural to wonder whether this strategy is the best possible. That
is, can you achieve your goals while spending less on advertising?
Additional trial and error might help you to answer this question, but a
better approach is to formulate (or model) this question mathematically.
The first step is to decide what decisions you have to make and to
introduce variables that capture these decisions. Since you have four
decisions, you introduce four decision variables:




x 1 is the number of thousands of dollars spent on advertising on
preparing for a zombie apocalypse,
x 2 is the number of thousands of dollars spent on advertising on
equipping sharks with lasers,
x 3 is the number of thousands of dollars spent on advertising on
building highways for flying cars, and
x 4 is the number of thousands of dollars spent on advertising on
allowing dolphins to vote.
You then think about constraints, which are limits, or restrictions, on the
values that the decision variables can take. You can write the
requirement that you win at least 50,000 urban votes as
Similarly, you can write the requirements that you win at least 100,000
suburban votes and 25,000 rural votes as
and
Any setting of the variables x 1, x 2, x 3, x 4 that satisfies inequalities (29.1)–(29.3) yields a strategy that wins a sufficient number of each type
of vote.
Finally, you think about your objective, which is the quantity that
you wish to either minimize or maximize. In order to keep costs as small
as possible, you would like to minimize the amount spent on
advertising. That is, you want to minimize the expression
Although negative advertising often occurs in political campaigns, there
is no such thing as negative-cost advertising. Consequently, you require
that

Combining inequalities (29.1)–(29.3) and (29.5) with the objective of
minimizing (29.4) produces what is known as a “linear program.” We
can format this problem tabularly as
subject to
The solution to this linear program yields your optimal strategy.
The remainder of this chapter covers how to formulate linear
programs and is an introduction to modeling in general. Modeling
refers to the general process of converting a problem into a
mathematical form amenable to solution by an algorithm. Section 29.1
discusses briefly the algorithmic aspects of linear programming,
although it does not include the details of a linear-programming
algorithm. Throughout this book, we have seen ways to model
problems, such as by shortest paths and connectivity in a graph. When
modeling a problem as a linear program, you go through the steps used
in this political example—identifying the decision variables, specifying
the constraints, and formulating the objective function. In order to
model a problem as a linear program, the constraints and objectives
must be linear. In Section 29.2, we will see several other examples of modeling via linear programs. Section 29.3 discusses duality, an important concept in linear programming and other optimization
algorithms.
29.1 Linear programming formulations and algorithms
Linear programs take a particular form, which we will examine in this
section. Multiple algorithms have been developed to solve linear
programs. Some run in polynomial time, some do not, but they are all
too complicated to show here. Instead, we will give an example that

demonstrates some ideas behind the simplex algorithm, which is
currently the most commonly deployed solution method.
General linear programs
In the general linear-programming problem, we wish to optimize a
linear function subject to a set of linear inequalities. Given a set of real
numbers a 1, a 2, … , an and a set of variables x 1, x 2, … , xn, we define a linear function f on those variables by
If b is a real number and f is a linear function, then the equation
f( x 1, x 2, … , xn) = b
is a linear equality and the inequalities
f( x 1, x 2, … , xn) ≤ b and f( x 1, x 2, … , xn) ≥ b are linear inequalities. We use the general term linear constraints to denote either linear equalities or linear inequalities. Linear
programming does not allow strict inequalities. Formally, a linear-
programming problem is the problem of either minimizing or
maximizing a linear function subject to a finite set of linear constraints.
If minimizing, we call the linear program a minimization linear program,
and if maximizing, we call the linear program a maximization linear
program.
In order to discuss linear-programming algorithms and properties, it
will be helpful to use a standard notation for the input. By convention,
a maximization linear program takes as input n real numbers c 1, c 2, … , cn; m real numbers b 1, b 2, … , bm; and mn real numbers aij for i = 1, 2,
… , m and j = 1, 2, … , n.
The goal is to find n real numbers x 1, x 2, … , xn that


subject to
We call expression (29.11) the objective function and the n + m inequalities in lines (29.12) and (29.13) the constraints. The n constraints in line (29.13) are the nonnegativity constraints. It can sometimes be more convenient to express a linear program in a more compact form. If
we create an m × n matrix A = ( aij), an m-vector b = ( bi), an n-vector c
= ( cj), and an n-vector x = ( xj), then we can rewrite the linear program defined in (29.11)–(29.13) as
subject to
In line (29.14), c T x is the inner product of two n-vectors. In inequality (29.15), Ax is the m-vector that is the product of an m × n matrix and an n-vector, and in inequality (29.16), x ≥ 0 means that each entry of the vector x must be nonnegative. We call this representation the standard
form for a linear program, and we adopt the convention that A, b, and c always have the dimensions given above.
The standard form above may not naturally correspond to real-life
situations you are trying to model. For example, you might have
equality constraints or variables that can take on negative values.
Exercises 29.1-6 and 29.1-7 ask you to show how to convert any linear
program into this standard form.
We now introduce terminology to describe solutions to linear
programs. We denote a particular setting of the values in a variable, say
x, by putting a bar over the variable name: x. If x satisfies all the constraints, then it is a feasible solution, but if it fails to satisfy at least one constraint, then it is an infeasible solution. We say that a solution x has objective value c T x. A feasible solution x whose objective value is maximum over all feasible solutions is an optimal solution, and we call

its objective value c T x the optimal objective value. If a linear program has no feasible solutions, we say that the linear program is infeasible, and otherwise, it is feasible. The set of points that satisfy all the constraints is the feasible region. If a linear program has some feasible
solutions but does not have a finite optimal objective value, then the
feasible region is unbounded and so is the linear program. Exercise 29.1-
5 asks you to show that a linear program can have a finite optimal
objective value even if the feasible region is unbounded.
One of the reasons for the power and popularity of linear
programming is that linear programs can, in general, be solved
efficiently. There are two classes of algorithms, known as ellipsoid
algorithms and interior-point algorithms, that solve linear programs in
polynomial time. In addition, the simplex algorithm is widely used.
Although it does not run in polynomial time in the worst case, it tends
to perform well in practice.
We will not give a detailed algorithm for linear programming, but
will discuss a few important ideas. First, we will give an example of
using a geometric procedure to solve a two-variable linear program.
Although this example does not immediately generalize to an efficient
algorithm for larger problems, it introduces some important concepts
for linear programming and for optimization in general.
A two-variable linear program
Let us first consider the following linear program with two variables:
subject to
Figure 29.2(a) graphs the constraints in the ( x 1, x 2)-Cartesian coordinate system. The feasible region in the two-dimensional space
(highlighted in blue in the figure) is convex. 1 Conceptually, you could
evaluate the objective function x 1 + x 2 at each point in the feasible region, and then identify a point that has the maximum objective value
as an optimal solution. For this example (and for most linear
programs), however, the feasible region contains an infinite number of
points, and so to solve this linear program, you need an efficient way to
find a point that achieves the maximum objective value without
explicitly evaluating the objective function at every point in the feasible
region.
In two dimensions, you can optimize via a graphical procedure. The
set of points for which x 1 + x 2 = z, for any z, is a line with a slope of
−1. Plotting x 1 + x 2 = 0 produces the line with slope −1 through the
origin, as in Figure 29.2(b). The intersection of this line and the feasible region is the set of feasible solutions that have an objective value of 0. In
this case, that intersection of the line with the feasible region is the
single point (0, 0). More generally, for any value z, the intersection of
the line x 1 + x 2 = z and the feasible region is the set of feasible solutions that have objective value z. Figure 29.2(b) shows the lines x 1 +
x 2 = 0, x 1 + x 2 = 4, and x 1 + x 2 = 8. Because the feasible region in
Figure 29.2 is bounded, there must be some maximum value z for which the intersection of the line x 1 + x 2 = z and the feasible region is nonempty. Any point in the feasible region that maximizes x 1 + x 2 is an optimal solution to the linear program, which in this case is the vertex
of the feasible region at x 1 = 2 and x 2 = 6, with objective value 8.
Figure 29.2 (a) The linear program given in (29.18)–(29.21). Each constraint is represented by a line and a direction. The intersection of the constraints, which is the feasible region, is highlighted in blue. (b) The red lines show, respectively, the points for which the objective value is 0, 4, and 8. The optimal solution to the linear program is x 1 = 2 and x 2 = 6 with objective value 8.
It is no accident that an optimal solution to the linear program
occurs at a vertex of the feasible region. The maximum value of z for
which the line x 1 + x 2 = z intersects the feasible region must be on the boundary of the feasible region, and thus the intersection of this line
with the boundary of the feasible region is either a single vertex or a line
segment. If the intersection is a single vertex, then there is just one
optimal solution, and it is that vertex. If the intersection is a line
segment, every point on that line segment must have the same objective
value. In particular, both endpoints of the line segment are optimal
solutions. Since each endpoint of a line segment is a vertex, there is an
optimal solution at a vertex in this case as well.
Although you cannot easily graph linear programs with more than
two variables, the same intuition holds. If you have three variables, then
each constraint corresponds to a half-space in three-dimensional space.
The intersection of these half-spaces forms the feasible region. The set
of points for which the objective function obtains a given value z is now
a plane (assuming no degenerate conditions). If all coefficients of the
objective function are nonnegative, and if the origin is a feasible
solution to the linear program, then as you move this plane away from
the origin, in a direction normal to the objective function, you find points of increasing objective value. (If the origin is not feasible or if
some coefficients in the objective function are negative, the intuitive
picture becomes slightly more complicated.) As in two dimensions,
because the feasible region is convex, the set of points that achieve the
optimal objective value must include a vertex of the feasible region.
Similarly, if you have n variables, each constraint defines a half-space in
n-dimensional space. We call the feasible region formed by the
intersection of these half-spaces a simplex. The objective function is now a hyperplane and, because of convexity, an optimal solution still
occurs at a vertex of the simplex. Any algorithm for linear programming
must also identify linear programs that have no solutions, as well as
linear programs that have no finite optimal solution.
The simplex algorithm takes as input a linear program and returns an
optimal solution. It starts at some vertex of the simplex and performs a
sequence of iterations. In each iteration, it moves along an edge of the
simplex from a current vertex to a neighboring vertex whose objective
value is no smaller than that of the current vertex (and usually is larger.)
The simplex algorithm terminates when it reaches a local maximum,
which is a vertex from which all neighboring vertices have a smaller
objective value. Because the feasible region is convex and the objective
function is linear, this local optimum is actually a global optimum. In
Section 29.3, we’ll see an important concept called “duality,” which we’ll use to prove that the solution returned by the simplex algorithm is
indeed optimal.
The simplex algorithm, when implemented carefully, often solves
general linear programs quickly in practice. With some carefully
contrived inputs, however, the simplex algorithm can require
exponential time. The first polynomial-time algorithm for linear
programming was the ellipsoid algorithm, which runs slowly in practice.
A second class of polynomial-time algorithms are known as interior-
point methods. In contrast to the simplex algorithm, which moves along
the exterior of the feasible region and maintains a feasible solution that
is a vertex of the simplex at each iteration, these algorithms move
through the interior of the feasible region. The intermediate solutions,
while feasible, are not necessarily vertices of the simplex, but the final
solution is a vertex. For large inputs, interior-point algorithms can run as fast as, and sometimes faster than, the simplex algorithm. The
chapter notes point you to more information about these algorithms.
If you add to a linear program the additional requirement that all
variables take on integer values, you have an integer linear program.
Exercise 34.5-3 on page 1098 asks you to show that just finding a
feasible solution to this problem is NP-hard. Since no polynomial-time
algorithms are known for any NP-hard problems, there is no known
polynomial-time algorithm for integer linear programming. In contrast,
a general linear-programming problem can be solved in polynomial
time.
Exercises
29.1-1
Consider the linear program
minimize −2 x 1 + 3 x 2
subject to
x 1 + x 2 = 7
x 1 − 2 x 2 ≤ 4
x 1 ≥ 0.
Give three feasible solutions to this linear program. What is the
objective value of each one?
29.1-2
Consider the following linear program, which has a nonpositivity
constraint:
minimize 2 x 1 + 7 x 2 + x 3
subject to
x 1
− x 3 = 7
3 x 1 + x 2
≥ 24
x 3 ≤ 0.
Give three feasible solutions to this linear program. What is the
objective value of each one?
29.1-3
Show that the following linear program is infeasible:
maximize 3 x 1 − 2 x 2
subject to
x 1 + x 2 ≤ 2
−2 x 1 − 2 x 2 ≤ −10
x 1, x 2 ≥ 0.
29.1-4
Show that the following linear program is unbounded:
maximize x 1 − x 2
subject to
−2 x 1 + x 2 ≤ −1
− x 1 − 2 x 2 ≤ −2
x 1, x 2 ≥ 0.
29.1-5
Give an example of a linear program for which the feasible region is not
bounded, but the optimal objective value is finite.
29.1-6
Sometimes, in a linear program, you need to convert constraints from
one form to another.


a. Show how to convert an equality constraint into an equivalent set of
inequalities. That is, given a constraint
, give a set of
inequalities that will be satisfied if and only if
,
b. Show how to convert an inequality constraint
into an
equality constraint and a nonnegativity constraint. You will need to
introduce an additional variable s, and use the constraint that s ≥ 0.
29.1-7
Explain how to convert a minimization linear program to an equivalent
maximization linear program, and argue that your new linear program
is equivalent to the original one.
29.1-8
In the political problem at the beginning of this chapter, there are
feasible solutions that correspond to winning more voters than there
actually are in the district. For example, you can set x 2 to 200, x 3 to 200, and x 1 = x 4 = 0. That solution is feasible, yet it seems to say that you will win 400,000 suburban voters, even though there are only
200,000 actual suburban voters. What constraints can you add to the
linear program to ensure that you never seem to win more voters than
there actually are? Even if you don’t add these constraints, argue that
the optimal solution to this linear program can never win more voters
than there actually are in the district.
29.2 Formulating problems as linear programs
Linear programming has many applications. Any textbook on
operations research is filled with examples of linear programming, and
linear programming has become a standard tool taught to students in
most business schools. The election scenario is one typical example.
Here are two more examples:
An airline wishes to schedule its flight crews. The Federal Aviation
Administration imposes several constraints, such as limiting the
number of consecutive hours that each crew member can work
and insisting that a particular crew work only on one model of aircraft during each month. The airline wants to schedule crews
on all of its flights using as few crew members as possible.
An oil company wants to decide where to drill for oil. Siting a
drill at a particular location has an associated cost and, based on
geological surveys, an expected payoff of some number of barrels
of oil. The company has a limited budget for locating new drills
and wants to maximize the amount of oil it expects to find, given
this budget.
Linear programs also model and solve graph and combinatorial
problems, such as those appearing in this book. We have already seen a
special case of linear programming used to solve systems of difference
constraints in Section 22.4. In this section, we’ll study how to formulate
several graph and network-flow problems as linear programs. Section
35.4 uses linear programming as a tool to find an approximate solution
to another graph problem.
Perhaps the most important aspect of linear programming is to be
able to recognize when you can formulate a problem as a linear
program. Once you cast a problem as a polynomial-sized linear
program, you can solve it in polynomial time by the ellipsoid algorithm
or interior-point methods. Several linear-programming software
packages can solve problems efficiently, so that once the problem is in
the form of a linear program, such a package can solve it.
We’ll look at several concrete examples of linear-programming
problems. We start with two problems that we have already studied: the
single-source shortest-paths problem from Chapter 22 and the maximum-flow problem from Chapter 24. We then describe the minimum-cost-flow problem. (Although the minimum-cost-flow
problem has a polynomial-time algorithm that is not based on linear
programming, we won’t describe the algorithm.) Finally, we describe the
multicommodity-flow problem, for which the only known polynomial-
time algorithm is based on linear programming.
When we solved graph problems in Part VI, we used attribute notation, such as v. d and ( u, v). f. Linear programs typically use subscripted variables rather than objects with attached attributes,

however. Therefore, when we express variables in linear programs, we
indicate vertices and edges through subscripts. For example, we denote
the shortest-path weight for vertex v not by v. d but by dv, and we denote the flow from vertex u to vertex v not by ( u, v). f but by fuv. For quantities that are given as inputs to problems, such as edge weights or
capacities, we continue to use notations such as w( u, v) and c( u, v).
Shortest paths
We can formulate the single-source shortest-paths problem as a linear
program. We’ll focus on how to formulate the single-pair shortest-path
problem, leaving the extension to the more general single-source
shortest-paths problem as Exercise 29.2-2.
In the single-pair shortest-path problem, the input is a weighted,
directed graph G = ( V, E), with weight function w : E → ℝ mapping edges to real-valued weights, a source vertex s, and destination vertex t.
The goal is to compute the value dt, which is the weight of a shortest
path from s to t. To express this problem as a linear program, you need to determine a set of variables and constraints that define when you
have a shortest path from s to t. The triangle inequality (Lemma 22.10
on page 633) gives dv ≤ du + w( u, v) for each edge ( u, v) ∈ E. The source vertex initially receives a value ds = 0, which never changes. Thus the
following linear program expresses the shortest-path weight from s to t: subject to
You might be surprised that this linear program maximizes an objective
function when it is supposed to compute shortest paths. Minimizing the
objective function would be a mistake, because when all the edge
weights are nonnegative, setting dv = 0 for all v ∈ V (recall that a bar over a variable name denotes a specific setting of the variable’s value)
would yield an optimal solution to the linear program without solving

the shortest-paths problem. Maximizing is the right thing to do because
an optimal solution to the shortest-paths problem sets each dv to min
{ du + w( u, v) : u ∈ V and ( u, v) ∈ E}, so that dv is the largest value that is less than or equal to all of the values in the set { du + w( u, v).
Therefore, it makes sense to maximize dv for all vertices v on a shortest path from s to t subject to these constraints, and maximizing dt achieves this goal.
This linear program has | V| variables dv, one for each vertex v ∈ V. It also has | E| + 1 constraints: one for each edge, plus the additional constraint that the source vertex’s shortest-path weight always has the
value 0.
Maximum flow
Next, let’s express the maximum-flow problem as a linear program.
Recall that the input is a directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0, and two distinguished vertices: a source s and a sink t. As defined in Section 24.1, a flow is a nonnegative real-valued function f : V × V → ℝ that satisfies the capacity constraint and flow conservation. A maximum flow is a flow
that satisfies these constraints and maximizes the flow value, which is
the total flow coming out of the source minus the total flow into the
source. A flow, therefore, satisfies linear constraints, and the value of a
flow is a linear function. Recalling also that we assume that c( u, v) = 0 if ( u, v) ∉ E and that there are no antiparallel edges, the maximum-flow problem can be expressed as a linear program:
subject to
This linear program has | V|2 variables, corresponding to the flow
between each pair of vertices, and it has 2 | V|2 + | V| − 2 constraints.
It is usually more efficient to solve a smaller-sized linear program.
The linear program in (29.25)–(29.28) has, for ease of notation, a flow
and capacity of 0 for each pair of vertices u, v with ( u, v) ∉ E. It is more efficient to rewrite the linear program so that it has O( V + E) constraints. Exercise 29.2-4 asks you to do so.
Minimum-cost flow
In this section, we have used linear programming to solve problems for
which we already knew efficient algorithms. In fact, an efficient
algorithm designed specifically for a problem, such as Dijkstra’s
algorithm for the single-source shortest-paths problem, will often be
more efficient than linear programming, both in theory and in practice.
The real power of linear programming comes from the ability to
solve new problems. Recall the problem faced by the politician in the
beginning of this chapter. The problem of obtaining a sufficient number
of votes, while not spending too much money, is not solved by any of
the algorithms that we have studied in this book, yet it can be solved by
linear programming. Books abound with such real-world problems that
linear programming can solve. Linear programming is also particularly
useful for solving variants of problems for which we may not already
know of an efficient algorithm.
Figure 29.3 (a) An example of a minimum-cost-flow problem. Capacities are denoted by c and costs by a. Vertex s is the source, and vertex t is the sink. The goal is to send 4 units of flow from s to t. (b) A solution to the minimum-cost flow problem in which 4 units of flow are sent from s to t. For each edge, the flow and capacity are written as flow/capacity.

Consider, for example, the following generalization of the maximum-
flow problem. Suppose that, in addition to a capacity c( u, v) for each edge ( u, v), you are given a real-valued cost a( u, v). As in the maximum-flow problem, assume that c( u, v) = 0 if ( u, v) ∉ E and that there are no antiparallel edges. If you send fuv units of flow over edge ( u, v), you incur a cost of a( u, v) · fuv. You are also given a flow demand d. You wish to send d units of flow from s to t while minimizing the total cost
∑( u, v)∈ E a( u, v) · fuv incurred by the flow. This problem is known as the minimum-cost-flow problem.
Figure 29.3(a) shows an example of the minimum-cost-flow problem,
with a goal of sending 4 units of flow from s to t while incurring the minimum total cost. Any particular legal flow, that is, a function f
satisfying constraints (29.26)–(29.28), incurs a total cost of
∑( u, v)∈ E a( u, v) · fuv. What is the particular 4-unit flow that minimizes this cost? Figure 29.3(b) shows an optimal solution, with total cost
∑( u, v)∈ E a( u, v) · fuv = (2 · 2) + (5 · 2) + (3 · 1) + (7 · 1) + (1 · 3) = 27.
There are polynomial-time algorithms specifically designed for the
minimum-cost-flow problem, but they are beyond the scope of this
book. The minimum-cost-flow problem can be expressed as a linear
program, however. The linear program looks similar to the one for the
maximum-flow problem with the additional constraint that the value of
the flow must be exactly d units, and with the new objective function of
minimizing the cost:
subject to
Multicommodity flow
As a final example, let’s consider another flow problem. Suppose that
the Lucky Puck company from Section 24.1 decides to diversify its product line and ship not only hockey pucks, but also hockey sticks and
hockey helmets. Each piece of equipment is manufactured in its own
factory, has its own warehouse, and must be shipped, each day, from
factory to warehouse. The sticks are manufactured in Vancouver and
are needed in Saskatoon, and the helmets are manufactured in
Edmonton and must be shipped to Regina. The capacity of the shipping
network does not change, however, and the different items, or
commodities, must share the same network.
This example is an instance of a multicommodity-flow problem. The
input to this problem is once again a directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0. As in the maximum-flow problem, implicitly assume that c( u, v) = 0 for ( u, v) ∉ E
and that the graph has no antiparallel edges. In addition, there are k
different commodities, K 1, K 2, … , Kk, with commodity i specified by the triple Ki = ( si, ti, di). Here, vertex si is the source of commodity i, vertex ti is the sink of commodity i, and di is the demand for commodity i, which is the desired flow value for the commodity from si to ti. We define a flow for commodity i, denoted by fi, (so that fiuv is the flow of commodity i from vertex u to vertex v) to be a real-valued function that satisfies the flow-conservation and capacity constraints. We define fuv, the aggregate flow, to be the sum of the various commodity flows, so that
. The aggregate flow on edge ( u, v) must be no more
than the capacity of edge ( u, v). This problem has no objective function: the question is to determine whether such a flow exists. Thus the linear
program for this problem has a “null” objective function:
The only known polynomial-time algorithm for this problem expresses
it as a linear program and then solves it with a polynomial-time linear-
programming algorithm.
Exercises
29.2-1
Write out explicitly the linear program corresponding to finding the
shortest path from vertex s to vertex x in Figure 22.2(a) on page 609.
29.2-2
Given a graph G, write a linear program for the single-source shortest-
paths problem. The solution should have the property that dv is the
shortest-path weight from the source vertex s to v for each vertex v ∈ V.
29.2-3
Write out explicitly the linear program corresponding to finding the
maximum flow in Figure 24.1(a).
29.2-4
Rewrite the linear program for maximum flow (29.25)–(29.28) so that it
uses only O( V + E) constraints.
29.2-5
Write a linear program that, given a bipartite graph G = ( V, E), solves the maximum-bipartite-matching problem.
29.2-6
There can be more than one way to model a particular problem as a linear program. This exercise gives an alternative formulation for the
maximum-flow problem. Let P = { P 1, P 2, … , Pp} be the set of all possible directed simple paths from source s to sink t. Using decision variables x 1, … , xp, where xi is the amount of flow on path i, formulate a linear program for the maximum-flow problem. What is an upper
bound on p, the number of directed simple paths from s to t?
29.2-7
In the minimum-cost multicommodity-flow problem, the input is a
directed graph G = ( V, E) in which each edge ( u, v) ∈ E has a nonnegative capacity c( u, v) ≥ 0 and a cost a( u, v). As in the multicommodity-flow problem, there are k different commodities, K 1, K 2, … , Kk, with commodity i specified by the triple Ki = ( si, ti, di). We define the flow fi for commodity i and the aggregate flow fuv on edge ( u, v) as in the multicommodity-flow problem. A feasible flow is one in
which the aggregate flow on each edge ( u, v) is no more than the capacity of edge ( u, v). The cost of a flow is ∑ u, v∈ E a( u, v) · fuv, and the goal is to find the feasible flow of minimum cost. Express this problem
as a linear program.
We will now introduce a powerful concept called linear-programming
duality. In general, given a maximization problem, duality allows you to
formulate a related minimization problem that has the same objective
value. The idea of duality is actually more general than linear
programming, but we restrict our attention to linear programming in
this section.
Duality enables us to prove that a solution is indeed optimal. We saw
an example of duality in Chapter 24 with Theorem 24.6, the max-flow min-cut theorem. Suppose that, given an instance of a maximum-flow
problem, you find a flow f with value | f|. How do you know whether f is a maximum flow? By the max-flow min-cut theorem, if you can find a




cut whose value is also | f|, then you have verified that f is indeed a maximum flow. This relationship provides an example of duality: given
a maximization problem, define a related minimization problem such
that the two problems have the same optimal objective values.
Given a linear program in standard form in which the objective is to
maximize, let’s see how to formulate a dual linear program in which the
objective is to minimize and whose optimal value is identical to that of
the original linear program. When referring to dual linear programs, we
call the original linear program the primal.
Given the primal linear program
subject to
its dual is
subject to
Mechanically, to form the dual, change the maximization to a
minimization, exchange the roles of coefficients on the right-hand sides
and in the objective function, and replace each ≤ by ≥. Each of the m
constraints in the primal corresponds to a variable yi in the dual.
Likewise, each of the n constraints in the dual corresponds to a variable
xj in the primal. For example, consider the following primal linear
program:
subject to


Its dual is
subject to
Although forming the dual can be considered a mechanical
operation, there is an intuitive explanation. Consider the primal
maximization problem (29.37)–(29.41). Each constraint gives an upper
bound on the objective function. In addition, if you take one or more
constraints and add together nonnegative multiples of them, you get a
valid constraint. For example, you can add constraints (29.38) and
(29.39) to obtain the constraint 3 x 1 + 3 x 2 + 8 x 3 ≤ 54. Any feasible solution to the primal must satisfy this new constraint, but there is
something else interesting about it. Comparing this new constraint to
the objective function (29.37), you can see that for each variable, the
corresponding coefficient is at least as large as the coefficient in the
objective function. Thus, since the variables x 1, x 2 and x 3 are nonnegative, we have that
3 x 1 + x 2 + 4 x 3 ≤ 3 x 1 + 3 x 2 + 8 x 3 ≤ 54, and so the solution value to the primal is at most 54. In other words,
adding these two constraints together has generated an upper bound on
the objective value.
In general, for any nonnegative multipliers y 1, y 2, and y 3, you can generate a constraint
y 1( x 1+ x 2+3 x 3)+ y 2(2 x 1+2 x 2+5 x 3)+ y 3(4 x 1+ x 2+2 x 3)
≤
30 y 1+24 y 2+36 y 3
from the primal constraints or, by distributing and regrouping,
( y 1+2 y 2+4 y 3) x 1+( y 1+2 y 2+ y 3) x 2+(3 y 1+5 y 2+2 y 3) x 3
≤
30 y 1+24 y 2+36 y 3.
Now, as long as this constraint has coefficients of x 1, x 2, and x 3 that are at least their objective-function coefficients, it is a valid upper
bound. That is, as long as
y 1 + 2 y 2 + 4 y 3 ≥ 3,
y 1 + 2 y 2 + y 3 ≥ 1,
3 y 1 + 5 y 2 + 2 y 3 ≥ 4,
you have a valid upper bound of 30 y 1+24 y 2+36 y 3. The multipliers y 1, y 2, and y 3 must be nonnegative, because otherwise you cannot combine
the inequalities. Of course, you would like the upper bound to be as
small as possible, and so you want to choose y to minimize 30 y 1 + 24 y 2
+ 36 y 3. Observe that we have just described the dual linear program as
the problem of finding the smallest possible upper bound on the primal.
We’ll formalize this idea and show in Theorem 29.4 that, if the linear
program and its dual are feasible and bounded, then the optimal value
of the dual linear program is always equal to the optimal value of the
primal linear program. We begin by demonstrating weak duality, which
states that any feasible solution to the primal linear program has a value
no greater than that of any feasible solution to the dual linear program.
Lemma 29.1 (Weak linear-programming duality)
Let x be any feasible solution to the primal linear program in (29.31)–
(29.33), and let ӯ be any feasible solution to its dual linear program in
(29.34)–(29.36). Then
Proof We have

▪
Corollary 29.2
Let x be a feasible solution to the primal linear program in (29.31)–
(29.33), and let ӯ be a feasible solution to its dual linear program in (29.34)–(29.36). If
then x and ӯ are optimal solutions to the primal and dual linear programs, respectively.
Proof By Lemma 29.1, the objective value of a feasible solution to the
primal cannot exceed that of a feasible solution to the dual. The primal
linear program is a maximization problem and the dual is a
minimization problem. Thus, if feasible solutions x and ӯ have the same objective value, neither can be improved.
▪
We now show that, at optimality, the primal and dual objective
values are indeed equal. To prove linear programming duality, we will
require one lemma from linear algebra, known as Farkas’s lemma, the
proof of which Problem 29-4 asks you to provide. Farkas’s lemma can
take several forms, each of which is about when a set of linear equalities
has a solution. In stating the lemma, we use m + 1 as a dimension because it matches our use below.
Lemma 29.3 (Farkas’s lemma)
Given M ∈ ℝ( m+1)× n and g ∈ ℝ m+1, exactly one of the following statements is true:



1. There exists v ∈ ℝ n such that Mv ≤ g,
2. There exists w ∈ ℝ m+1 such that w ≥ 0, w T M = 0 (an n-vector of all zeros), and w Tg < 0.
▪
Theorem 29.4 (Linear-programming duality)
Given the primal linear program in (29.31)–(29.33) and its
corresponding dual in (29.34)–(29.36), if both are feasible and bounded,
then for optimal solutions x* and y*, we have c T x* = b T y*.
Proof Let μ = b T y* be the optimal value of the dual linear program given in (29.34)–(29.36). Consider an augmented set of primal
constraints in which we add a constraint to (29.31)–(29.33) that the
objective value is at least μ. We write out this augmented primal as
We can multiply (29.48) through by −1 and rewrite (29.47)–(29.48) as
Here,
denotes an ( m+1)× n matrix, x is an n-vector, and denotes an ( m + 1)-vector.
We claim that if there is a feasible solution x to the augmented
primal, then the theorem is proved. To establish this claim, observe that
x is also a feasible solution to the original primal and that it has objective value at least μ. We can then apply Lemma 29.1, which states
that the objective value of the primal is at most μ, to complete the proof
of the theorem.
It therefore remains to show that the augmented primal has a
feasible solution. Suppose, for the purpose of contradiction, that the
augmented primal is infeasible, which means that there is no v ∈ ℝ n





such that
. We can apply Farkas’s lemma, Lemma 29.3, to
inequalty (29.49) with
Because the augmented primal is infeasible, condition 1 of Farkas’s
lemma does not hold. Therefore, condition 2 must apply, so that there
must exist a w ∈ ℝ m+1 such that w ≥ 0, w T M = 0, and w T g < 0. Let’s write w as
for some ӯ ∈ ℝ m and λ ∈ ℝ, where ӯ ≥ 0 and λ ≥ 0.
Substituting for w, M, and g in condition 2 gives
Unpacking the matrix notation gives
We now show that the requirements in (29.50) contradict the
assumption that μ is the optimal solution value for the dual linear
program. We consider two cases.
The first case is when λ = 0. In this case, (29.50) simplifies to
We’ll now construct a dual feasible solution y′ with an objective value
smaller than b T y*. Set y′ = y* + ϵ ӯ, for any ϵ > 0. Since y′T A = ( y* + ϵ ӯ)T A
= y*T A + ϵ ӯ T A
= y*T A
(by (29.51))
≥ c T
(because y* is feasible),
y′ is feasible. Now consider the objective value
b T y′ = b T( y* + ϵ ӯ)
= b T y* + ϵ b T ӯ
< b T y*,
where the last inequality follows because ϵ > 0 and, by (29.51), ӯ T b =
b T ӯ < 0 (since both ӯ T b and b T ӯ are the inner product of b and ӯ), and so their product is negative. Thus we have a feasible dual solution of
value less than μ, which contradicts μ being the optimal objective value.
We now consider the second case, where λ > 0. In this case, we can
take (29.50) and divide through by λ to obtain
Now set y′ = ӯ/ λ in (29.52), giving
y′T A = c T and y′T b < μ.
Thus, y′ is a feasible dual solution with objective value strictly less than
μ, a contradiction. We conclude that the augmented primal has a
feasible solution, and the theorem is proved.
▪
Fundamental theorem of linear programming
We conclude this chapter by stating the fundamental theorem of linear
programming, which extends Theorem 29.4 to the cases when the linear
program may be either feasible or unbounded. Exercise 29.3-8 asks you
to provide the proof.
Theorem 29.5 (Fundamental theorem of linear programming)
Any linear program, given in standard form, either
1. has an optimal solution with a finite objective value,
2. is infeasible, or
3. is unbounded.
▪
29.3-1
Formulate the dual of the linear program given in lines (29.6)–(29.10)
on page 852.
29.3-2
You have a linear program that is not in standard form. You could
produce the dual by first converting it to standard form, and then
taking the dual. It would be more convenient, however, to produce the
dual directly. Explain how to directly take the dual of an arbitrary linear
program.
29.3-3
Write down the dual of the maximum-flow linear program, as given in
lines (29.25)–(29.28) on page 862. Explain how to interpret this
formulation as a minimum-cut problem.
29.3-4
Write down the dual of the minimum-cost-flow linear program, as given
in lines (29.29)–(29.30) on page 864. Explain how to interpret this
problem in terms of graphs and flows.
29.3-5
Show that the dual of the dual of a linear program is the primal linear
program.
29.3-6
Which result from Chapter 24 can be interpreted as weak duality for the maximum-flow problem?
29.3-7
Consider the following 1-variable primal linear program:
maximize tx
subject to
rx ≤ s
x ≥ 0,
where r, s, and t are arbitrary real numbers. State for which values of r, s, and t you can assert that
1. Both the primal linear program and its dual have optimal
solutions with finite objective values.
2. The primal is feasible, but the dual is infeasible.
3. The dual is feasible, but the primal is infeasible.
4. Neither the primal nor the dual is feasible.
29.3-8
Prove the fundamental theorem of linear programming, Theorem 29.5.
Problems
29-1 Linear-inequality feasibility
Given a set of m linear inequalities on n variables x 1, x 2, … , xn, the linear-inequality feasibility problem asks whether there is a setting of the
variables that simultaneously satisfies each of the inequalities.
a. Given an algorithm for the linear-programming problem, show how
to use it to solve a linear-inequality feasibility problem. The number
of variables and constraints that you use in the linear-programming
problem should be polynomial in n and m.
b. Given an algorithm for the linear-inequality feasibility problem, show
how to use it to solve a linear-programming problem. The number of
variables and linear inequalities that you use in the linear-inequality
feasibility problem should be polynomial in n and m, the number of
variables and constraints in the linear program.
29-2 Complementary slackness
Complementary slackness describes a relationship between the values of
primal variables and dual constraints and between the values of dual
variables and primal constraints. Let x be a feasible solution to the primal linear program given in (29.31)–(29.33), and let ӯ be a feasible



solution to the dual linear program given in (29.34)–(29.36).
Complementary slackness states that the following conditions are
necessary and sufficient for x and ӯ to be optimal:
and
a. Verify that complementary slackness holds for the linear program in
lines (29.37)–(29.41).
b. Prove that complementary slackness holds for any primal linear
program and its corresponding dual.
c. Prove that a feasible solution x to a primal linear program given in lines (29.31)–(29.33) is optimal if and only if there exist values ӯ =
( ӯ 1, ӯ 2, … , ӯm) such that
1. ӯ is a feasible solution to the dual linear program given in
(29.34)–(29.36),
2.
for all j such that xj > 0, and
3. ӯi = 0 for all i such that
.
29-3 Integer linear programming
An integer linear-programming problem is a linear-programming
problem with the additional constraint that the variables x must take on
integer values. Exercise 34.5-3 on page 1098 shows that just determining
whether an integer linear program has a feasible solution is NP-hard,
which means that there is no known polynomial-time algorithm for this
problem.
a. Show that weak duality (Lemma 29.1) holds for an integer linear
program.
b. Show that duality (Theorem 29.4) does not always hold for an integer linear program.
c. Given a primal linear program in standard form, let P be the optimal objective value for the primal linear program, D be the optimal
objective value for its dual, IP be the optimal objective value for the
integer version of the primal (that is, the primal with the added
constraint that the variables take on integer values), and ID be the
optimal objective value for the integer version of the dual. Assuming
that both the primal integer program and the dual integer program
are feasible and bounded, show that
IP ≤ P = D ≤ ID.
29-4 Farkas’s lemma
Prove Farkas’s lemma, Lemma 29.3.
29-5 Minimum-cost circulation
This problem considers a variant of the minimum-cost-flow problem
from Section 29.2 in which there is no demand, source, or sink. Instead, the input, as before, contains a flow network, capacity constraints c( u, v), and edge costs a( u, v). A flow is feasible if it satisfies the capacity constraint on every edge and flow conservation at every vertex. The goal
is to find, among all feasible flows, the one of minimum cost. We call
this problem the minimum-cost-circulation problem.
a. Formulate the minimum-cost-circulation problem as a linear
program.
b. Suppose that for all edges ( u, v) ∈ E, we have a( u, v) > 0. What does an optimal solution to the minimum-cost-circulation problem look
like?
c. Formulate the maximum-flow problem as a minimum-cost-circulation
problem linear program. That is, given a maximum-flow problem
instance G = ( V, E) with source s, sink t and edge capacities c, create a minimum-cost-circulation problem by giving a (possibly different)
network G′ = ( V′, E′) with edge capacities c′ and edge costs a′ such
that you can derive a solution to the maximum-flow problem from a
solution to the minimum-cost-circulation problem.
d. Formulate the single-source shortest-path problem as a minimum-
cost-circulation problem linear program.
Chapter notes
This chapter only begins to study the wide field of linear programming.
A number of books are devoted exclusively to linear programming,
including those by Chvátal [94], Gass [178], Karloff [246], Schrijver
[398], and Vanderbei [444]. Many other books give a good coverage of linear programming, including those by Papadimitriou and Steiglitz
[353] and Ahuja, Magnanti, and Orlin [7]. The coverage in this chapter draws on the approach taken by Chvátal.
The simplex algorithm for linear programming was invented by G.
Dantzig in 1947. Shortly after, researchers discovered how to formulate
a number of problems in a variety of fields as linear programs and solve
them with the simplex algorithm. As a result, applications of linear
programming flourished, along with several algorithms. Variants of the
simplex algorithm remain the most popular methods for solving linear-
programming problems. This history appears in a number of places,
including the notes in [94] and [246].
The ellipsoid algorithm was the first polynomial-time algorithm for
linear programming and is due to L. G. Khachian in 1979. It was based
on earlier work by N. Z. Shor, D. B. Judin, and A. S. Nemirovskii.
Grötschel, Lovász, and Schrijver [201] describe how to use the ellipsoid algorithm to solve a variety of problems in combinatorial optimization.
To date, the ellipsoid algorithm does not appear to be competitive with
the simplex algorithm in practice.
Karmarkar’s paper [247] includes a description of the first interior-point algorithm. Many subsequent researchers designed interior-point
algorithms. Good surveys appear in the article of Goldfarb and Todd
[189] and the book by Ye [463].
Analysis of the simplex algorithm remains an active area of research.
V. Klee and G. J. Minty constructed an example on which the simplex
algorithm runs through 2 n − 1 iterations. The simplex algorithm usually performs well in practice, and many researchers have tried to give
theoretical justification for this empirical observation. A line of research
begun by K. H. Borgwardt, and carried on by many others, shows that
under certain probabilistic assumptions on the input, the simplex
algorithm converges in expected polynomial time. Spielman and Teng
[421] made progress in this area, introducing the “smoothed analysis of algorithms” and applying it to the simplex algorithm.
The simplex algorithm is known to run efficiently in certain special
cases. Particularly noteworthy is the network-simplex algorithm, which
is the simplex algorithm, specialized to network-flow problems. For
certain network problems, including the shortest-paths, maximum-flow,
and minimum-cost-flow problems, variants of the network-simplex
algorithm run in polynomial time. See, for example, the article by Orlin
[349] and the citations therein.
1 An intuitive definition of a convex region is that it fulfills the requirement that for any two points in the region, all points on a line segment between them are also in the region.
The straightforward method of adding two polynomials of degree n
takes Θ( n) time, but the straightforward method of multiplying them
takes Θ( n 2) time. This chapter will show how the fast Fourier transform,
or FFT, can reduce the time to multiply polynomials to Θ( n lg n).
The most common use for Fourier transforms, and hence the FFT, is
in signal processing. A signal is given in the time domain: as a function
mapping time to amplitude. Fourier analysis expresses the signal as a
weighted sum of phase-shifted sinusoids of varying frequencies. The
weights and phases associated with the frequencies characterize the
signal in the frequency domain. Among the many everyday applications
of FFT’s are compression techniques used to encode digital video and
audio information, including MP3 files. Many fine books delve into the
rich area of signal processing, and the chapter notes reference a few of
them.
Polynomials
A polynomial in the variable x over an algebraic field F represents a function A( x) as a formal sum:
The values a 0, a 1, … , an−1 are the coefficients of the polynomial. The coefficients and x are drawn from a field F, typically the set ℂ of complex numbers. A polynomial A( x) has degree k if its highest nonzero



coefficient is ak, in which case we say that degree( A) = k. Any integer strictly greater than the degree of a polynomial is a degree-bound of that
polynomial. Therefore, the degree of a polynomial of degree-bound n
may be any integer between 0 and n − 1, inclusive.
A variety of operations extend to polynomials. For polynomial
addition, if A( x) and B( x) are polynomials of degree-bound n, their sum is a polynomial C( x), also of degree-bound n, such that C( x) =
A( x)+ B( x) for all x in the underlying field. That is, if then
where cj = aj + bj for j = 0, 1, … , n − 1. For example, given the polynomials A( x) = 6 x 3 + 7 x 2 − 10 x + 9 and B( x) = −2 x 3 + 4 x − 5, their sum is C( x) = 4 x 3 + 7 x 2 − 6 x + 4.
For polynomial multiplication, if A( x) and B( x) are polynomials of degree-bound n, their product C( x) is a polynomial of degree-bound 2 n
− 1 such that C( x) = A( x) B( x) for all x in the underlying field. You probably have multiplied polynomials before, by multiplying each term
in A( x) by each term in B( x) and then combining terms with equal powers. For example, you can multiply A( x) = 6 x 3 + 7 x 2 − 10 x + 9 and B( x) = −2 x 3 + 4 x − 5 as follows:
Another way to express the product C( x) is

where
(By the definition of degree, ak = 0 for all k > degree( A) and bk = 0 for all k > degree( B).) If A is a polynomial of degree-bound n a and B is a polynomial of degree-bound nb, then C must be a polynomial of degree-bound n a + nb − 1, because degree( C) = degree( A) + degree( B). Since a polynomial of degree-bound k is also a polynomial of degree-bound k +
1, we normally make the somewhat simpler statement that the product
polynomial C is a polynomial of degree-bound n a + nb.
Chapter outline
Section 30.1 presents two ways to represent polynomials: the coefficient representation and the point-value representation. The straightforward
method for multiplying polynomials of degree n—equations (30.1) and
(30.2)—takes Θ( n 2) time with polynomials represented in coefficient
form, but only Θ( n) time with point-value form. Converting between the
two representations, however, reduces the time to multiply polynomials
to just Θ( n lg n). To see why this approach works, you must first
understand complex roots of unity, which Section 30.2 covers. Section
30.2 then uses the FFT and its inverse to perform the conversions.
Because the FFT is used so often in signal processing, it is often
implemented as a circuit in hardware, and Section 30.3 illustrates the structure of such circuits.
This chapter relies on complex numbers, and within this chapter the
symbol i denotes
exclusively.
The coefficient and point-value representations of polynomials are in a
sense equivalent: a polynomial in point-value form has a unique
counterpart in coefficient form. This section introduces the two

representations and shows how to combine them in order to multiply
two degree-bound n polynomials in Θ( n lg n) time.
Coefficient representation
A coefficient representation of a polynomial
of degree-
bound n is a vector of coefficients a = ( a 0, a 1, … , an−1). Matrix equations in this chapter generally treat vectors as column vectors.
The coefficient representation is convenient for certain operations on
polynomials. For example, the operation of evaluating the polynomial
A( x) at a given point x 0 consists of computing the value of A( x 0). To evaluate a polynomial in Θ( n) time, use Horner’s rule:
Similarly, adding two polynomials represented by the coefficient vectors
a = ( a 0, a 1, … , an−1) and b = ( b 0, b 1, … , bn−1) takes Θ( n) time: just produce the coefficient vector c = ( c 0, c 1, … , cn−1), where cj = aj + bj for j = 0, 1, … , n− 1.
Now, consider multiplying two degree-bound n polynomials A( x) and B( x) represented in coefficient form. The method described by equations (30.1) and (30.2) takes Θ( n 2) time, since it multiplies each coefficient in the vector a by each coefficient in the vector b. The operation of multiplying polynomials in coefficient form seems to be
considerably more difficult than that of evaluating a polynomial or
adding two polynomials. The resulting coefficient vector c, given by
equation (30.2), is also called the convolution of the input vectors a and b, denoted c = a ⊗ b. Since multiplying polynomials and computing convolutions are fundamental computational problems of considerable
practical importance, this chapter concentrates on efficient algorithms
for them.
Point-value representation
A point-value representation of a polynomial A( x) of degree-bound n is a set of n point-value pairs

{( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)}
such that all of the xk are distinct and
for k = 0, 1, … , n − 1. A polynomial has many different point-value
representations, since any set of n distinct points x 0, x 1, … , xn−1 can serve as a basis for the representation.
Computing a point-value representation for a polynomial given in
coefficient form is in principle straightforward, since all you have to do
is select n distinct points x 0, x 1, … , xn−1 and then evaluate A( xk) for k
= 0, 1, … , n − 1. With Horner’s method, evaluating a polynomial at n
points takes Θ( n 2) time. We’ll see later that if you choose the points xk
cleverly, you can accelerate this computation to run in Θ( n lg n) time.
The inverse of evaluation—determining the coefficient form of a
polynomial from a point-value representation—is interpolation. The
following theorem shows that interpolation is well defined when the
desired interpolating polynomial must have a degree-bound equal to the
given number of point-value pairs.
Theorem 30.1 (Uniqueness of an interpolating polynomial)
For any set {( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)} of n point-value pairs such that all the xk values are distinct, there is a unique polynomial A( x) of degree-bound n such that yk = A( xk) for k = 0, 1, … , n − 1.
Proof The proof relies on the existence of the inverse of a certain matrix. Equation (30.3) is equivalent to the matrix equation
The matrix on the left is denoted V( x 0, x 1, … , xn−1) and is known as a Vandermonde matrix. By Problem D-1 on page 1223, this matrix
has determinant

and therefore, by Theorem D.5 on page 1221, it is invertible (that is,
nonsingular) if the xk are distinct. To solve for the coefficients aj uniquely given the point-value representation, use the inverse of the
Vandermonde matrix:
a = V( x 0, x 1, … , xn−1)−1 y.
▪
The proof of Theorem 30.1 describes an algorithm for interpolation
based on solving the set (30.4) of linear equations. Section 28.1 shows how to solve these equations in O( n 3) time.
A faster algorithm for n-point interpolation is based on Lagrange’s
formula:
You might want to verify that the right-hand side of equation (30.5) is a
polynomial of degree-bound n that satisfies A( xk) = yk for all k.
Exercise 30.1-5 asks you how to compute the coefficients of A using
Lagrange’s formula in Θ( n 2) time.
Thus, n-point evaluation and interpolation are well-defined inverse
operations that transform between the coefficient representation of a
polynomial and a point-value representation.1 The algorithms described above for these problems take Θ( n 2) time.
The point-value representation is quite convenient for many
operations on polynomials. For addition, if C( x) = A( x) + B( x), then C( xk) = A( xk) + B( xk) for any point xk. More precisely, given point-value representations for A,
{( x 0, y 0), ( x 1, y 1), … , ( xn−1, yn−1)},



and for B,
where A and B are evaluated at the same n points, then a point-value representation for C is
Thus the time to add two polynomials of degree-bound n in point-value
form is Θ( n).
Similarly, the point-value representation is convenient for
multiplying polynomials. If C( x) = A( x) B( x), then C( xk) = A( xk) B( xk) for any point xk, and to obtain a point-value representation for C, just pointwise multiply a point-value representation for A by a point-value
representation for B. Polynomial multiplication differs from polynomial
addition in one key aspect, however: degree( C) = degree( A) + degree( B), so that if A and B have degree-bound n, then C has degree-bound 2 n. A standard point-value representation for A and B consists of n point-value pairs for each polynomial. Multiplying these together gives n
point-value pairs, but 2 n pairs are necessary to interpolate a unique polynomial C of degree-bound 2 n. (See Exercise 30.1-4.) Instead, begin
with “extended” point-value representations for A and for B consisting
of 2 n point-value pairs each. Given an extended point-value
representation for A,
{( x 0, y 0), ( x 1, y 1), … , ( x 2 n−1, y 2 n−1)}, and a corresponding extended point-value representation for B,
then a point-value representation for C is
Given two input polynomials in extended point-value form, multiplying
them to obtain the point-value form of the result takes just Θ( n) time,
much less than the Θ( n 2) time required to multiply polynomials in
coefficient form.
Finally, let’s consider how to evaluate a polynomial given in point-
value form at a new point. For this problem, the simplest approach
known is to first convert the polynomial to coefficient form and then
evaluate it at the new point.
Fast multiplication of polynomials in coefficient form
Can the linear-time multiplication method for polynomials in point-
value form expedite polynomial multiplication in coefficient form? The
answer hinges on whether it is possible convert a polynomial quickly
from coefficient form to point-value form (evaluate) and vice versa
(interpolate).
Figure 30.1 A graphical outline of an efficient polynomial-multiplication process.
Representations on the top are in coefficient form, and those on the bottom are in point-value form. The arrows from left to right correspond to the multiplication operation. The ω 2 n terms are complex (2 n)th roots of unity.
Any points can serve as evaluation points, but certain evaluation
points allow conversion between representations in only Θ( n lg n) time.
As we’ll see in Section 30.2, if “complex roots of unity” are the evaluation points, then the discrete Fourier transform (or DFT)
evaluates and the inverse DFT interpolates. Section 30.2 shows how the FFT accomplishes the DFT and inverse DFT operations in Θ( n lg n) time.
Figure 30.1 shows this strategy graphically. One minor detail concerns degree-bounds. The product of two polynomials of degree-bound n is a polynomial of degree-bound 2 n. Before evaluating the input polynomials A and B, therefore, first double their degree-bounds
to 2 n by adding n high-order coefficients of 0. Because the vectors have 2 n elements, use “complex (2 n)th roots of unity,” which are denoted by
the ω 2 n terms in Figure 30.1.
The following procedure takes advantage of the FFT to multiply two
polynomials A( x) and B( x) of degree-bound n in Θ( n lg n)-time, where the input and output representations are in coefficient form. The
procedure assumes that n is an exact power of 2, so if it isn’t, just add
high-order zero coefficients.
1. Double degree-bound: Create coefficient representations of A( x) and B( x) as degree-bound 2 n polynomials by adding n high-order zero coefficients to each.
2. Evaluate: Compute point-value representations of A( x) and B( x) of length 2 n by applying the FFT of order 2 n on each
polynomial. These representations contain the values of the two
polynomials at the (2 n)th roots of unity.
3. Pointwise multiply: Compute a point-value representation for the
polynomial C( x) = A( x) B( x) by multiplying these values together pointwise. This representation contains the value of C( x) at each
(2 n)th root of unity.
4. Interpolate: Create the coefficient representation of the
polynomial C( x) by applying the FFT on 2 n point-value pairs to
compute the inverse DFT.
Steps (1) and (3) take Θ( n) time, and steps (2) and (4) take Θ( n lg n) time. Thus, once we show how to use the FFT, we will have proven the
following.
Theorem 30.2
Two polynomials of degree-bound n with both the input and output
representations in coefficient form can be multiplied in Θ( n lg n) time.
▪
Exercises
30.1-1

Multiply the polynomials A( x) = 7 x 3 − x 2 + x − 10 and B( x) = 8 x 3 −
6 x + 3 using equations (30.1) and (30.2).
30.1-2
Another way to evaluate a polynomial A( x) of degree-bound n at a given point x 0 is to divide A( x) by the polynomial ( x − x 0), obtaining a quotient polynomial q( x) of degree-bound n − 1 and a remainder r, such that
A( x) = q( x)( x − x 0) + r.
Then we have A( x 0) = r. Show how to compute the remainder r and the coefficients of q( x) from x 0 and the coefficients of A in Θ( n) time.
30.1-3
Given a polynomial
, define
. Show how
to derive a point-value representation for A rev( x) from a point-value representation for A( x), assuming that none of the points is 0.
30.1-4
Prove that n distinct point-value pairs are necessary to uniquely specify
a polynomial of degree-bound n, that is, if fewer than n distinct point-value pairs are given, they fail to specify a unique polynomial of degree-
bound n. ( Hint: Using Theorem 30.1, what can you say about a set of n
− 1 point-value pairs to which you add one more arbitrarily chosen
point-value pair?)
30.1-5
Show how to use equation (30.5) to interpolate in Θ( n 2) time. ( Hint: First compute the coefficient representation of the polynomial ∏ j( x −
xj) and then divide by ( x − xk) as necessary for the numerator of each term (see Exercise 30.1-2). You can compute each of the n denominators
in O( n) time.)
30.1-6
Explain what is wrong with the “obvious” approach to polynomial division using a point-value representation: dividing the corresponding
y values. Discuss separately the case in which the division comes out exactly and the case in which it doesn’t.
30.1-7
Consider two sets A and B, each having n integers in the range from 0 to 10 n. The Cartesian sum of A and B is defined by
C = { x + y : x ∈ A and y ∈ B}.
The integers in C lie in the range from 0 to 20 n. Show how, in O( n lg n) time, to find the elements of C and the number of times each element of
C is realized as a sum of elements in A and B. ( Hint: Represent A and B
as polynomials of degree at most 10 n.)
In Section 30.1, we claimed that by computing the DFT and its inverse by using the FFT, it is possible to evaluate and interpolate a degree n
polynomial at the complex roots of unity in Θ( n lg n) time. This section defines complex roots of unity, studies their properties, defines the DFT,
and then shows how the FFT computes the DFT and its inverse in Θ( n
lg n) time.
Complex roots of unity
A complex nth root of unity is a complex number ω such that
ωn = 1.
There are exactly n complex n th roots of unity: e 2 πik/ n for k = 0, 1, … , n − 1. To interpret this formula, use the definition of the exponential of
a complex number:
eiu = cos( u) + i sin( u).
Figure 30.2 shows that the n complex roots of unity are equally spaced around the circle of unit radius centered at the origin of the complex








plane. The value
Figure 30.2 The values of
in the complex plane, where ω 8 = e 2 πi/8 is the principal
8th root of unity.
is the principal nth root of unity. 2 All other complex n th roots of unity are powers of ωn.
The n complex n th roots of unity,
form a group under multiplication (see Section 31.3). This group has the same structure as the additive group (ℤ n, +) modulo n, since
implies that
. Similarly,
. The following
lemmas furnish some essential properties of the complex n th roots of unity.
Lemma 30.3 (Cancellation lemma)
For any integers n > 0, k ≥ 0, and d > 0,
Proof The lemma follows directly from equation (30.6), since








▪
Corollary 30.4
For any even integer n > 0,
Proof The proof is left as Exercise 30.2-1.
▪
Lemma 30.5 (Halving lemma)
If n > 0 is even, then the squares of the n complex n th roots of unity are the n/2 complex ( n/2)th roots of unity.
Proof By the cancellation lemma,
for any nonnegative integer
k. Squaring all of the complex n th roots of unity produces each ( n/2)th root of unity exactly twice, since
Thus and
have the same square. We could also have used
Corollary 30.4 to prove this property, since
implies
, and thus
.
▪
As we’ll see, the halving lemma is essential to the divide-and-conquer
approach for converting between coefficient and point-value
representations of polynomials, since it guarantees that the recursive
subproblems are only half as large.
Lemma 30.6 (Summation lemma)
For any integer n ≥ 1 and nonzero integer k not divisible by n,






Proof Equation (A.6) on page 1142 applies to complex values as well
as to reals, giving
To see that the denominator is not 0, note that
only when k is
divisible by n, which the lemma statement prohibits.
▪
The DFT
Recall the goal of evaluating a polynomial
of degree-bound n at , ,
(that is, at the n complex n th roots
of unity). 3 The polynomial A is given in coefficient form: a = ( a 0, a 1, …
, an−1). Let us define the results yk, for k = 0, 1, … , n − 1, by The vector y = ( y 0, y 1, … , yn−1) is the discrete Fourier transform (DFT) of the coefficient vector a = ( a 0, a 1, … , an−1). We also write y
= DFT n( a).
The FFT
The fast Fourier transform (FFT) takes advantage of the special
properties of the complex roots of unity to compute DFT n( a) in Θ( n lg n) time, as opposed to the Θ( n 2) time of the straightforward method.



Assume throughout that n is an exact power of 2. Although strategies
for dealing with sizes that are not exact powers of 2 are known, they are
beyond the scope of this book.
The FFT method employs a divide-and-conquer strategy, using the
even-indexed and odd-indexed coefficients of A( x) separately to define
the two new polynomials A even( x) and A odd( x) of degree-bound n/2: A even( x) = a 0 + a 2 x + a 4 x 2 + ⋯ + an−2 xn/2−1, A odd( x) = a 1 + a 3 x + a 5 x 2 + ⋯ + an−1 xn/2−1.
Note that A even contains all the even-indexed coefficients of A (the binary representation of the index ends in 0) and A odd contains all the
odd-indexed coefficients (the binary representation of the index ends in
1). It follows that
so that the problem of evaluating A( x) at ,
reduces to
1. evaluating the degree-bound n/2 polynomials A even( x) and A odd( x) at the points
and then
2. combining the results according to equation (30.9).
By the halving lemma, the list of values (30.10) consists not of n
distinct values but only of the n/2 complex ( n/2)th roots of unity, with each root occurring exactly twice. Therefore, the FFT recursively
evaluates the polynomials A even and A odd of degree-bound n/2 at the n/2 complex ( n/2)th roots of unity. These subproblems have exactly the
same form as the original problem, but are half the size, dividing an n-
element DFT n computation into two n/2-element DFT n/2
computations. This decomposition is the basis for the FFT procedure





on the next page, which computes the DFT of an n-element vector a =
( a 0, a 1, … , an−1), where n is an exact power of 2.
The FFT procedure works as follows. Lines 1–2 represent the base
case of the recursion. The DFT of 1 element is the element itself, since
in this case
y 0 =
= a 0 · 1
= a 0.
Lines 5–6 define the coefficient vectors for the polynomials A even and
A odd. Lines 3, 4, and 12 guarantee that ω is updated properly so that
whenever lines 10–11 are executed,
. (Keeping a running value of ω
from iteration to iteration saves time over computing from scratch
each time through the for loop.4) Lines 7–8 perform the recursive DFT n/2 computations, setting, for k = 0, 1, … , n/2 − 1,
FFT( a, n)
1if n == 1
2
return a
// DFT of 1 element is the element itself
3 ωn = e 2 πi/ n
4 ω = 1
5 a even = ( a 0, a 2, … , an−2)
6 a odd = ( a 1, a 3, … , an−1)
7 y even = FFT( a even, n/2)
8 y odd = FFT( a odd, n/2)
9for k = 0 to n/2 − 1
// at this point,
10
11
12
ω = ωωn
13return y









or, since
by the cancellation lemma,
Lines 10–11 combine the results of the recursive DFT n/2 calculations.
For the first n/2 results y 0, y 1, … , yn/2−1, line 10 yields For yn/2, yn/2+1, … , yn−1, letting k = 0, 1, … , n/2 − 1, line 11 yields Thus the vector y returned by FFT is indeed the DFT of the input
vector a.
Lines 10 and 11 multiply each value by , for k = 0, 1, … , n/2 −
1. Line 10 adds this product to
, and line 11 subtracts it. Because
each factor appears in both its positive and negative forms, we call the
factors twiddle factors.
To determine the running time of the procedure FFT, note that
exclusive of the recursive calls, each invocation takes Θ( n) time, where n
is the length of the input vector. The recurrence for the running time is
therefore
T( n) = 2 T( n/2) + Θ( n)
= Θ( n lg n),
by case 2 of the master theorem (Theorem 4.1). Thus the FFT can
evaluate a polynomial of degree-bound n at the complex n th roots of unity in Θ( n lg n) time.