22.5 Proofs of shortest-paths properties

Throughout this chapter, our correctness arguments have relied on the

triangle inequality, upper-bound property, no-path property,

convergence property, path-relaxation property, and predecessor-

subgraph property. We stated these properties without proof on page

611. In this section, we prove them.

The triangle inequality

In studying breadth-first search (Section 20.2), we proved as Lemma 20.1 a simple property of shortest distances in unweighted graphs. The

triangle inequality generalizes the property to weighted graphs.

Lemma 22.10 (Triangle inequality)

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ and source vertex s. Then, for all edges ( u, v) ∈ E, δ( s, v) ≤ δ( s, u) + w( u, v).

Proof Suppose that p is a shortest path from source s to vertex v. Then p has no more weight than any other path from s to v. Specifically, path p has no more weight than the particular path that takes a shortest path

from source s to vertex u and then takes edge ( u, v).

Exercise 22.5-3 asks you to handle the case in which there is no

shortest path from s to v.

Effects of relaxation on shortest-path estimates

The next group of lemmas describes how shortest-path estimates are

affected by executing a sequence of relaxation steps on the edges of a

weighted, directed graph that has been initialized by INITIALIZE-

SINGLE-SOURCE.

Lemma 22.11 (Upper-bound property)

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ. Let sV be the source vertex, and let the graph be initialized by INITIALIZE-SINGLE-SOURCE( G, s). Then, v.d ≥ δ( s, v) for all v

V, and this invariant is maintained over any sequence of relaxation steps

on the edges of G. Moreover, once v.d achieves its lower bound δ( s, v), it never changes.

Proof We prove the invariant v.d ≥ δ( s, v) for all vertices vV by induction over the number of relaxation steps.

For the base case, v.d ≥ δ( s, v) holds after initialization, since if v.d =

∞, then v.d ≥ δ( s, v) for all vV − { s}, and since s.d = 0 ≥ δ( s, s). (Note that δ( s, s) = −∞ if s is on a negative-weight cycle and that δ( s, s) = 0

otherwise.)

For the inductive step, consider the relaxation of an edge ( u, v). By the inductive hypothesis, x.d ≥ δ( s, x) for all xV prior to the relaxation. The only d value that may change is v.d. If it changes, we have

v.d = u.d + w( u, v)

≥ δ( s, u) + w( u, v) (by the inductive hypothesis)

≥ δ( s, v)

(by the triangle inequality),

and so the invariant is maintained.

The value of v.d never changes once v.d = δ( s, v) because, having achieved its lower bound, v.d cannot decrease since we have just shown

that v.d ≥ δ( s, v), and it cannot increase because relaxation steps do not increase d values.

Corollary 22.12 (No-path property)

Suppose that in a weighted, directed graph G = ( V, E) with weight function w : E → ℝ, no path connects a source vertex sV to a given vertex vV. Then, after the graph is initialized by INITIALIZESINGLE-SOURCE( G, s), we have v.d = δ( s, v) = ∞, and this equation is maintained as an invariant over any sequence of relaxation steps on the

edges of G.

Proof By the upper-bound property, we always have ∞ = δ( s, v) ≤ v.d, and thus v.d = ∞ = δ( s, v).

Lemma 22.13

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ, and let ( u, v) ∈ E. Then, immediately after edge ( u, v) is relaxed by a call of RELAX( u, v, w), we have v.du.d + w( u, v).

Proof If, just prior to relaxing edge ( u, v), we have v.d > u.d + w( u, v), then v.d = u.d + w( u, v) afterward. If, instead, v.du.d + w( u, v) just before the relaxation, then neither u.d nor v.d changes, and so v.du.d +

w( u, v) afterward.

Lemma 22.14 (Convergence property)

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ, let sV be a source vertex, and let suv be a shortest path in G for some vertices u, vV. Suppose that G is initialized by INITIALIZE-SINGLE-SOURCE( G, s) and then a sequence of

relaxation steps that includes the call RELAX( u, v, w) is executed on the edges of G. If u.d = δ( s, u) at any time prior to the call, then v.d = δ( s, v) at all times after the call.

Proof By the upper-bound property, if u.d = δ( s, u) at some point prior to relaxing edge ( u, v), then this equation holds thereafter. In particular, after edge ( u, v) is relaxed, we have

v.d u.d + w( u, v)

(by Lemma 22.13)

= δ( s, u) + w( u, v)

= δ( s, u)

(by Lemma 22.1 on page 606).

The upper-bound property gives v.d ≥ δ( s, v), from which we conclude that v.d = δ( s, v), and this equation is maintained thereafter.

Lemma 22.15 (Path-relaxation property)

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ, and let sV be a source vertex. Consider any shortest path p =

v 0, v 1, … , vk〉 from s = v 0 to vk. If G is initialized by INITIALIZESINGLE-SOURCE( G, s) and then a sequence of relaxation steps occurs

that includes, in order, relaxing the edges ( v 0, v 1), ( v 1, v 2), … , ( vk−1, vk), then vk.d = δ( s, vk) after these relaxations and at all times afterward. This property holds no matter what other edge relaxations

occur, including relaxations that are intermixed with relaxations of the

edges of p.

Proof We show by induction that after the i th edge of path p is relaxed, we have vi.d = δ( s, vi). For the base case, i = 0, and before any edges of p

have been relaxed, we have from the initialization that v 0. d = s.d = 0 =

δ( s, s). By the upper-bound property, the value of s.d never changes after initialization.

For the inductive step, assume that vi−1. d = δ( s, vi−1). What happens when edge ( vi−1, vi) is relaxed? By the convergence property,

after this relaxation, we have vi.d = δ( s, vi), and this equation is maintained at all times thereafter.

Relaxation and shortest-paths trees

We now show that once a sequence of relaxations has caused the

shortest-path estimates to converge to shortest-path weights, the

predecessor subgraph G π induced by the resulting π values is a shortest-

paths tree for G. We start with the following lemma, which shows that

the predecessor subgraph always forms a rooted tree whose root is the

source.

Lemma 22.16

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ let sV be a source vertex, and assume that G contains no negative-weight cycles that are reachable from s. Then, after the graph is

initialized by INITIALIZE-SINGLE-SOURCE( G, s), the predecessor

subgraph G π forms a rooted tree with root s, and any sequence of relaxation steps on edges of G maintains this property as an invariant.

Proof Initially, the only vertex in G π is the source vertex, and the lemma is trivially true. Consider a predecessor subgraph G π that arises

after a sequence of relaxation steps. We first prove that G π is acyclic.

Suppose for the sake of contradiction that some relaxation step creates

a cycle in the graph G π. Let the cycle be c = 〈 v 0, v 1, … , vk〉, where vk =

v 0. Then, vi.π = vi−1 for i = 1, 2, … , k and, without loss of generality, assume that relaxing edge ( vk−1, vk) created the cycle in G π.

Image 694

Image 695

Image 696

We claim that all vertices on cycle c are reachable from the source

vertex s. Why? Each vertex on c has a non-NIL predecessor, and so each

vertex on c was assigned a finite shortest-path estimate when it was assigned its non- NIL π value. By the upper-bound property, each vertex

on cycle c has a finite shortest-path weight, which means that it is reachable from s.

We’ll examine the shortest-path estimates on cycle c immediately

before the call RELAX( vk−1, vk, w) and show that c is a negative-weight cycle, thereby contradicting the assumption that G contains no

negative-weight cycles that are reachable from the source. Just before

the call, we have vi.π = vi−1 for i = 1, 2, … , k − 1. Thus, for i = 1, 2, … , k − 1, the last update to vi.d was by the assignment vi.d =

vi−1. d+ w( vi−1, vi). If vi−1. d changed since then, it decreased. Therefore, just before the call RELAX( vk−1, vk, w), we have

Because vk.π is changed by the call RELAX( vk−1, vk, w), immediately beforehand we also have the strict inequality

vk.d > vk−1. d + wvk−1, vk):

Summing this strict inequality with the k − 1 inequalities (22.11), we

obtain the sum of the shortest-path estimates around cycle c:

But

Image 697

Image 698

Figure 22.9 Showing that a simple path in G π from source vertex s to vertex v is unique. If G π

contains two paths p 1 ( suxzv) and p 2 ( suyzv), where xy, then z.π = x and z.π = y, a contradiction.

since each vertex in the cycle c appears exactly once in each summation.

This equation implies

Thus, the sum of weights around the cycle c is negative, which provides

the desired contradiction.

We have now proven that G π is a directed, acyclic graph. To show

that it forms a rooted tree with root s, it suffices (see Exercise B.5-2 on

page 1175) to prove that for each vertex vV π, there is a unique simple path from s to v in G π.

The vertices in V π are those with non-NIL values, plus s. Exercise

22.5-6 asks you to prove that a path from s exists to each vertex in V π.

To complete the proof of the lemma, we now show that for any

vertex vV π, the graph G π contains at most one simple path from s to v. Suppose otherwise. That is, suppose that, as Figure 22.9 illustrates, G π contains two simple paths from s to some vertex v: p 1, which we decompose into suxzv, and p 2, which we decompose into s

uyzv, where xy (though u could be s and z could be v).

But then, z.π = x and z.π = y, which implies the contradiction that x =

y. We conclude that G π contains a unique simple path from s to v, and thus G π forms a rooted tree with root s.

We can now show that if all vertices have been assigned their true

shortest-path weights after a sequence of relaxation steps, then the

Image 699

predecessor subgraph G π is a shortest-paths tree.

Lemma 22.17 (Predecessor-subgraph property)

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ, let sV be a source vertex, and assume that G contains no negative-weight cycles that are reachable from s. Then, after a call to INITIALIZE-SINGLE-SOURCE( G, s) followed by any sequence of

relaxation steps on edges of G that produces v.d = δ( s, v) for all vV, the predecessor subgraph G π is a shortest-paths tree rooted at s.

Proof We must prove that the three properties of shortest-paths trees

given on page 608 hold for G π. To show the first property, we must show

that V π is the set of vertices reachable from s. By definition, a shortest-path weight δ( s, v) is finite if and only if v is reachable from s, and thus the vertices that are reachable from s are exactly those with finite d values. But a vertex vV − { s} has been assigned a finite value for v.d if and only if v.π ≠ NIL, since both assignments occur in RELAX. Thus,

the vertices in V π are exactly those reachable from s.

The second property, that G π forms a rooted tree with root s, follows

directly from Lemma 22.16.

It remains, therefore, to prove the last property of shortest-paths

trees: for each vertex vV π, the unique simple path

in G π is a

shortest path from s to v in G. Let p = 〈 v 0, v 1, … , vk〉, where v 0 = s and vk = v. Consider an edge ( vi−1, vi) in path p. Because this edge belongs to G π, the last relaxation that changed vi.d must have been of this edge.

After that relaxation, we had vi.d = vi−1. d + ( vi−1, vi). Subsequently, an edge entering vi−1 could have been relaxed, causing vi−1. d to decrease further, but without changing vi.d. Therefore, we have vi.dvi−1. d +

w( vi−1, vi). Thus, for i = 1, 2, … , k, we have both vi.d = δ( s, vi) and vi.d

vi−1. d + w( vi−1, vi), which together imply w( vi−1, vi) ≤ δ( s, vi) − δ( s, vi−1). Summing the weights along path p yields

Image 700

Thus, we have w( p) ≤ δ( s, vk). Since δ( s, vk) is a lower bound on the weight of any path from s to vk, we conclude that w( p) = δ( s, vk), and p is a shortest path from s to v = vk.

Exercises

22.5-1

Give two shortest-paths trees for the directed graph of Figure 22.2 on page 609 other than the two shown.

22.5-2

Give an example of a weighted, directed graph G = ( V, E) with weight function w : E → ℝ and source vertex s such that G satisfies the following property: For every edge ( u, v) ∈ E, there is a shortest-paths tree rooted at s that contains ( u, v) and another shortest-paths tree rooted at s that does not contain ( u, v).

22.5-3

Modify the proof of Lemma 22.10 to handle cases in which shortest-

path weights are ∞ or −∞.

22.5-4

Let G = ( V, E) be a weighted, directed graph with source vertex s, and let G be initialized by INITIALIZE-SINGLE-SOURCE( G, s). Prove that if a sequence of relaxation steps sets s.π to a non-NIL value, then G

contains a negative-weight cycle.

22.5-5

Let G = ( V, E) be a weighted, directed graph with no negative-weight edges. Let sV be the source vertex, and suppose that v.π is allowed to be the predecessor of v on any shortest path to v from source s if vV

− { s} is reachable from s, and NIL otherwise. Give an example of such a graph G and an assignment of π values that produces a cycle in G π. (By Lemma 22.16, such an assignment cannot be produced by a sequence of

relaxation steps.)

22.5-6

Let G = ( V, E) be a weighted, directed graph with weight function w : E

→ ℝ and no negative-weight cycles. Let sV be the source vertex, and

let G be initialized by INITIALIZE-SINGLE-SOURCE( G, s). Use induction to prove that for every vertex vV π, there exists a path from s to v in G π and that this property is maintained as an invariant over any sequence of relaxations.

22.5-7

Let G = ( V, E) be a weighted, directed graph that contains no negative-weight cycles. Let sV be the source vertex, and let G be initialized by INITIALIZESINGLE-SOURCE( G, s). Prove that there exists a

sequence of | V| − 1 relaxation steps that produces v.d = δ( s, v) for all v

V.

22.5-8

Let G be an arbitrary weighted, directed graph with a negative-weight

cycle reachable from the source vertex s. Show how to construct an

infinite sequence of relaxations of the edges of G such that every

relaxation causes a shortest-path estimate to change.

Problems

22-1 Yen’s improvement to Bellman-Ford

The Bellman-Ford algorithm does not specify the order in which to

relax edges in each pass. Consider the following method for deciding

upon the order. Before the first pass, assign an arbitrary linear order v 1,

Image 701

Image 702

Image 703

v 2, … , v| V| to the vertices of the input graph G = ( V, E). Then partition the edge set E into EfEb, where Ef = {( vi, vj) ∈ E : i < j} and Eb =

{( vi, vj) ∈ E : i > j}. (Assume that G contains no self-loops, so that every edge belongs to either Ef or Eb.) Define Gf = ( V, Ef) and Gb = ( V, Eb).

a. Prove that Gf is acyclic with topological sort 〈 v 1, v 2, … , v| V|〉 and that Gb is acyclic with topological sort 〈 v| V|, v| V|−1, … , v 1〉.

Suppose that each pass of the Bellman-Ford algorithm relaxes edges in

the following way. First, visit each vertex in the order v 1, v 2, … , v| V|, relaxing edges of Ef that leave the vertex. Then visit each vertex in the

order v| V|, v| V|−1, …, v 1, relaxing edges of Eb that leave the vertex.

b. Prove that with this scheme, if G contains no negative-weight cycles that are reachable from the source vertex s, then after only ⌈| V| / 2⌉

passes over the edges, v.d = δ( s, v) for all vertices vV.

c. Does this scheme improve the asymptotic running time of the

Bellman-Ford algorithm?

22-2 Nesting boxes

A d-dimensional box with dimensions ( x 1, x 2, … , xd) nests within another box with dimensions ( y 1, y 2, … , yd) if there exists a permutation π on {1, 2, … , d} such that x π(1) < y 1, x π(2) < y 2, … , x π( d) < yd.

a. Argue that the nesting relation is transitive.

b. Describe an efficient method to determine whether one d-dimensional box nests inside another.

c. You are given a set of n d-dimensional boxes { B 1, B 2, … , Bn}. Give an efficient algorithm to find the longest sequence

of

boxes such that nests within

for j = 1, 2, … , k − 1. Express the

running time of your algorithm in terms of n and d.

Image 704

22-3 Arbitrage

Arbitrage is the use of discrepancies in currency exchange rates to

transform one unit of a currency into more than one unit of the same

currency. For example, suppose that one U.S. dollar buys 64 Indian

rupees, one Indian rupee buys 1:8 Japanese yen, and one Japanese yen

buys 0:009 U.S. dollars. Then, by converting currencies, a trader can

start with 1 U.S. dollar and buy 64 × 1.8 × 0.009 = 1.0368 U.S. dollars,

thus turning a profit of 3.68%.

Suppose that you are given n currencies c 1, c 2, … , cn and an n × n table R of exchange rates, such that 1 unit of currency ci buys R[ i, j]

units of currency cj.

a. Give an efficient algorithm to determine whether there exists a

sequence of currencies

such that

R[ i 1, i 2] · R[ i 2, i 3] … R[ ik−1, ik] · R[ ik, i 1] > 1.

Analyze the running time of your algorithm.

b. Give an efficient algorithm to print out such a sequence if one exists.

Analyze the running time of your algorithm.

22-4 Gabow’s scaling algorithm for single-source shortest paths

A scaling algorithm solves a problem by initially considering only the highest-order bit of each relevant input value, such as an edge weight,

assuming that these values are nonnegative integers. The algorithm then

refines the initial solution by looking at the two highest-order bits. It

progressively looks at more and more high-order bits, refining the

solution each time, until it has examined all bits and computed the

correct solution.

This problem examines an algorithm for computing the shortest

paths from a single source by scaling edge weights. The input is a

directed graph G = ( V, E) with nonnegative integer edge weights w. Let W = max { w( u, v) : ( u, v) = E} be the maximum weight of any edge. In this problem, you will develop an algorithm that runs in O( E lg W) time.

Assume that all vertices are reachable from the source.

The scaling algorithm uncovers the bits in the binary representation

of the edge weights one at a time, from the most significant bit to the

least significant bit. Specifically, let k = ⌈lg( W + 1)⌉ be the number of bits in the binary representation of W, and for i = 1, 2, … , k, let wi( u, v)

= ⌊ w( u, v)/2 ki⌊. That is, wi ( u, v) is the “scaled-down” version of w( u, v) given by the i most significant bits of w( u, v). (Thus, wk( u, v) = w( u, v) for all ( u, v) ∈ E.) For example, if k = 5 and w( u, v) = 25, which has the binary representation 〈11001〉, then w 3( u, v) = 〈110〉 = 6. Also with k =

5, if w( u, v) = 〈00100〉 = 4, then w 4( u, v) = 〈0010〉 = 2. Define δ i( u, v) as the shortest-path weight from vertex u to vertex v using weight function wi, so that δ k( u, v) = δ( u, v) for all u, vV. For a given source vertex s, the scaling algorithm first computes the shortest-path weights δ1( s, v) for all vV, then computes δ2( s, v) for all vV, and so on, until it computes δ k( s, v) for all vV. Assume throughout that | E| ≥ | V| − 1.

You will show how to compute δ i from δ i−1 in O( E) time, so that the entire algorithm takes O( kE) = O( E lg W) time.

a. Suppose that for all vertices vV, we have δ( s, v) ≤ | E|. Show how to compute δ( s, v) for all vV in O( E) time.

b. Show how to compute δ1( s, v) for all vV in O( E) time.

Now focus on computing δi from δ i−1.

c. Prove that for i = 2, 3, … , k, either wi( u, v) = 2 wi−1( u, v) or wi( u, v) =

2 wi−1( u, v) + 1. Then prove that

i−1( s, v) ≤ δ i( s, v) ≤ 2δ i−1( s, v) + | V| − 1

for all vV.

d. Define, for i = 2, 3, … , k and all ( u, v) ∈ E, ŵi( u, v) = wi( u, v) + 2δ i−1( s, u) − 2δ i−1( s, v).

Image 705

Image 706

Image 707

Image 708

Image 709

Prove that for i = 2, 3, … , k and all u, vV, the “reweighted” value ŵi( u, v) of edge ( u, v) is a nonnegative integer.

e. Now define

as the shortest-path weight from s to v using the

weight function ŵi. Prove that for i = 2, 3, … , k and all vV, and that

.

f. Show how to compute δ i( s, v) from δ i−1( s, v) for all vV in O( E) time. Conclude that you can compute δ( s, v) for all vV in O( E lg W) time.

22-5 Karp’s minimum mean-weight cycle algorithm

Let G = ( V, E) be a directed graph with weight function w : E → ℝ, and let n = | V|. We define the mean weight of a cycle c = 〈 e 1, e 2, … , ek〉 of edges in E to be

Let μ* = min {μ( c) : c is a directed cycle in G}. We call a cycle c for which μ( c) = μ* a minimum mean-weight cycle. This problem investigates

an efficient algorithm for computing μ*.

Assume without loss of generality that every vertex vV is reachable from a source vertex sV. Let δ( s, v) be the weight of a shortest path from s to v, and let δ k( s, v) be the weight of a shortest path from s to v consisting of exactly k edges. If there is no path from s to v with exactly k edges, then δ k( s, v) = ∞.

a. Show that if μ* = 0, then G contains no negative-weight cycles and

δ( s, v) = min {δ k( s, v) : 0 ≤ kn − 1} for all vertices vV.

b. Show that if μ* = 0, then

Image 710

Image 711

Image 712

for all vertices vV. ( Hint: Use both properties from part (a).) c. Let c be a 0-weight cycle, and let u and v be any two vertices on c.

Suppose that μ* = 0 and that the weight of the simple path from u to v

along the cycle is x. Prove that δ( s, v) = δ( s, u) + x. ( Hint: The weight of the simple path from v to u along the cycle is − x.)

d. Show that if μ* = 0, then on each minimum mean-weight cycle there

exists a vertex v such that

( Hint: Show how to extend a shortest path to any vertex on a

minimum mean-weight cycle along the cycle to make a shortest path

to the next vertex on the cycle.)

e. Show that if μ* = 0, then the minimum value of

taken over all vertices vV, equals 0.

f. Show that if you add a constant t to the weight of each edge of G, then μ* increases by t. Use this fact to show that μ* equals the

minimum value of

taken over all vertices vV.

g. Give an O( VE)-time algorithm to compute μ*.

22-6 Bitonic shortest paths

A sequence is bitonic if it monotonically increases and then

monotonically decreases, or if by a circular shift it monotonically

increases and then monotonically decreases. For example the sequences

〈1, 4, 6, 8, 3, −2〉, 〈9, 2, −4, −10, −5〉, and 〈1, 2, 3, 4〉 are bitonic, but 〈1,

Image 713

3, 12, 4, 2, 10〉 is not bitonic. (See Problem 14-3 on page 407 for the

bitonic euclidean traveling-salesperson problem.)

Suppose that you are given a directed graph G = ( V, E) with weight function w : E → ℝ, where all edge weights are unique, and you wish to

find single-source shortest paths from a source vertex s. You are given

one additional piece of information: for each vertex vV, the weights of the edges along any shortest path from s to v form a bitonic sequence.

Give the most efficient algorithm you can to solve this problem, and

analyze its running time.

Chapter notes

The shortest-path problem has a long history that is nicely desribed in

an article by Schrijver [400]. He credits the general idea of repeatedly executing edge relaxations to Ford [148]. Dijkstra’s algorithm [116]

appeared in 1959, but it contained no mention of a priority queue. The

Bellman-Ford algorithm is based on separate algorithms by Bellman

[45] and Ford [149]. The same algorithm is also attributed to Moore

[334]. Bellman describes the relation of shortest paths to difference constraints. Lawler [276] describes the linear-time algorithm for shortest paths in a dag, which he considers part of the folklore.

When edge weights are relatively small nonnegative integers, more

efficient algorithms result from using min-priority queues that require

integer keys and rely on the sequence of values returned by the

EXTRACT-MIN calls in Dijkstra’s algorithm monotonically increasing

over time. Ahuja, Mehlhorn, Orlin, and Tarjan [8] give an algorithm that runs in

time on graphs with nonnegative edge weights,

where W is the largest weight of any edge in the graph. The best bounds

are by Thorup [436], who gives an algorithm that runs in O( E lg lg V) time, and by Raman [375], who gives an algorithm that runs in O( E + V

min {(lg V)1/3+ε, (lg W)1/4+ε}) time. These two algorithms use an amount of space that depends on the word size of the underlying

machine. Although the amount of space used can be unbounded in the

Image 714

Image 715

Image 716

Image 717

size of the input, it can be reduced to be linear in the size of the input

using randomized hashing.

For undirected graphs with integer weights, Thorup [435] gives an algorithm that runs in O( V + E) time for single-source shortest paths. In contrast to the algorithms mentioned in the previous paragraph, the

sequence of values returned by EXTRACT-MIN calls does not

monotonically increase over time, and so this algorithm is not an

implementation of Dijkstra’s algorithm. Pettie and Ramachandran [357]

remove the restriction of integer weights on undirected graphs. Their

algorithm entails a preprocessing phase, followed by queries for specific

source vertices. Preprocessing takes O( MST( V, E) + min { V lg V, V lg lg r}) time, where MST( V, E) is the time to compute a minimum spanning tree and r is the ratio of the maximum edge weight to the minimum edge weight. After preprocessing, each query takes

time, where

is the inverse of Ackermann’s function.

(See the chapter notes for Chapter 19 for a brief discussion of Ackermann’s function and its inverse.)

For graphs with negative edge weights, an algorithm due to Gabow

and Tarjan [167] runs in

time, and one by Goldberg [186]

runs in

time, where W = max {| w( u, v)| : ( u, v) ∈ E}. There has also been some progress based on methods that use continuous

optimization and electrical flows. Cohen et al. [98] give such an algorithm, which is randomized and runs in Õ( E 10/7 lg W) expected time (see Problem 3-6 on page 73 for the defintion of Õ-notation). There

is also a pseudopolyomial-time algorithm based on fast matrix

multiplication. Sankowski [394] and Yuster and Zwick [465] designed an algorithm for shortest paths that runs in Õ( W V ω) time, where two n ×

n matrices can be multiplied in O( n ω) time, giving a faster algorithm than the previously mentioned algorithms for small values of W on

dense graphs.

Cherkassky, Goldberg, and Radzik [89] conducted extensive

experiments comparing various shortest-path algorithms. Shortest-path

algorithms are widely used in real-time navigation and route-planning

applications. Typically based on Dijkstra’s algorithm, these algorithms

use many clever ideas to be able to compute shortest paths on networks

with many millions of vertices and edges in fractions of a second. Bast et al. [36] survey many of these developments.

1 It may seem strange that the term “relaxation” is used for an operation that tightens an upper bound. The use of the term is historical. The outcome of a relaxation step can be viewed as a relaxation of the constraint v.du.d + w( u, v), which, by the triangle inequality (Lemma 22.10

on page 633), must be satisfied if u.d = δ( s, u) and v.d = δ( s, v). That is, if v.du.d + w( u, v), there is no “pressure” to satisfy this constraint, so the constraint is “relaxed.”

2 “PERT” is an acronym for “program evaluation and review technique.”

23 All-Pairs Shortest Paths

In this chapter, we turn to the problem of finding shortest paths between

all pairs of vertices in a graph. A classic application of this problem

occurs in computing a table of distances between all pairs of cities for a

road atlas. Classic perhaps, but not a true application of finding shortest

paths between all pairs of vertices. After all, a road map modeled as a

graph has one vertex for every road intersection and one edge wherever

a road connects intersections. A table of intercity distances in an atlas

might include distances for 100 cities, but the United States has

approximately 300,000 signal-controlled intersections1 and many more uncontrolled intersections.

A legitimate application of all-pairs shortest paths is to determine

the diameter of a network: the longest of all shortest paths. If a directed

graph models a communication network, with the weight of an edge

indicating the time required for a message to traverse a communication

link, then the diameter gives the longest possible transit time for a

message in the network.

As in Chapter 22, the input is a weighted, directed graph G = ( V, E) with a weight function w : E → ℝ that maps edges to real-valued weights. Now the goal is to find, for every pair of vertices u, vV, a shortest (least-weight) path from u to v, where the weight of a path is the sum of the weights of its constituent edges. For the all-pairs

problem, the output typically takes a tabular form in which the entry in

u’s row and v’s column is the weight of a shortest path from u to v.

Image 718

You can solve an all-pairs shortest-paths problem by running a

single-source shortest-paths algorithm | V| times, once with each vertex

as the source. If all edge weights are nonnegative, you can use Dijkstra’s

algorithm. If you implement the min-priority queue with a linear array,

the running time is O( V 3 + VE) which is O( V 3). The binary min-heap implementation of the min-priority queue yields a running time of

O( V( V + E) lg V). If | E| = Ω( V), the running time becomes O( VE lg V), which is faster than O( V 3) if the graph is sparse. Alternatively, you can implement the min-priority queue with a Fibonacci heap, yielding a

running time of O( V 2 lg V + VE).

If the graph contains negative-weight edges, Dijkstra’s algorithm

doesn’t work, but you can run the slower Bellman-Ford algorithm once

from each vertex. The resulting running time is O( V 2 E), which on a dense graph is O( V 4). This chapter shows how to guarantee a much better asymptotic running time. It also investigates the relation of the

all-pairs shortest-paths problem to matrix multiplication.

Unlike the single-source algorithms, which assume an adjacency-list

representation of the graph, most of the algorithms in this chapter

represent the graph by an adjacency matrix. (Johnson’s algorithm for

sparse graphs, in Section 23.3, uses adjacency lists.) For convenience, we assume that the vertices are numbered 1, 2, … , | V|, so that the input is

an n × n matrix W = ( wij) representing the edge weights of an n-vertex directed graph G = ( V, E), where

The graph may contain negative-weight edges, but we assume for the

time being that the input graph contains no negative-weight cycles.

The tabular output of each of the all-pairs shortest-paths algorithms

presented in this chapter is an n × n matrix. The ( i, j) entry of the output matrix contains δ( i, j), the shortest-path weight from vertex i to vertex j, as in Chapter 22.

A full solution to the all-pairs shortest-paths problem includes not only the shortest-path weights but also a predecessor matrix Π = (π ij), where π ij is NIL if either i = j or there is no path from i to j, and otherwise π ij is the predecessor of j on some shortest path from i. Just as the predecessor subgraph G π from Chapter 22 is a shortest-paths tree for a given source vertex, the subgraph induced by the i th row of the Π

matrix should be a shortest-paths tree with root i. For each vertex i

V, the predecessor subgraph of G for i is G π, i = ( V π, i, E π, i), where V π, i = { jV : π ij ≠ NIL} ∪ { i},

E π, i = {(π ij, j) : jV π, i − { i}}.

If G π, i is a shortest-paths tree, then PRINT-ALL-PAIRS-SHORTEST-

PATH on the following page, which is a modified version of the

PRINT-PATH procedure from Chapter 20, prints a shortest path from

vertex i to vertex j.

In order to highlight the essential features of the all-pairs algorithms

in this chapter, we won’t cover how to compute predecessor matrices

and their properties as extensively as we dealt with predecessor

subgraphs in Chapter 22. Some of the exercises cover the basics.

PRINT-ALL-PAIRS-SHORTEST-PATH(Π, i, j)

1 if i == j

2

print i

3 elseif π ij == NIL

4

print “no path from” i “to” j “exists”

5 else PRINT-ALL-PAIRS-SHORTEST-PATH(Π, i, π ij)

6

print j

Chapter outline

Section 23.1 presents a dynamic-programming algorithm based on matrix multiplication to solve the all-pairs shortest-paths problem. The

technique of “repeated squaring” yields a running time of Θ( V 3 lg V).

Image 719

Image 720

Section 23.2 gives another dynamic-programming algorithm, the Floyd-Warshall algorithm, which runs in Θ( V 3) time. Section 23.2 also covers the problem of finding the transitive closure of a directed graph, which

is related to the all-pairs shortest-paths problem. Finally, Section 23.3

presents Johnson’s algorithm, which solves the all-pairs shortest-paths

problem in O( V 2 lg V + VE) time and is a good choice for large, sparse graphs.

Before proceeding, we need to establish some conventions for

adjacency-matrix representations. First, we generally assume that the

input graph G = ( V, E) has n vertices, so that n = | V|. Second, we use the convention of denoting matrices by uppercase letters, such as W, L, or

D, and their individual elements by subscripted lowercase letters, such

as wij, lij, or dij. Finally, some matrices have parenthesized superscripts, as in

or

, to indicate iterates.

23.1 Shortest paths and matrix multiplication

This section presents a dynamic-programming algorithm for the all-

pairs shortest-paths problem on a directed graph G = ( V, E). Each major loop of the dynamic program invokes an operation similar to

matrix multiplication, so that the algorithm looks like repeated matrix

multiplication. We’ll start by developing a Θ( V 4)-time algorithm for the

all-pairs shortest-paths problem, and then we’ll improve its running

time to Θ( V 3 lg V).

Before proceeding, let’s briefly recap the steps given in Chapter 14 for developing a dynamic-programming algorithm:

1. Characterize the structure of an optimal solution.

2. Recursively define the value of an optimal solution.

3. Compute the value of an optimal solution in a bottom-up

fashion.

We reserve the fourth step—constructing an optimal solution from

computed information—for the exercises.

Image 721

Image 722

Image 723

Image 724

Image 725

Image 726

The structure of a shortest path

Let’s start by characterizing the structure of an optimal solution.

Lemma 22.1 tells us that all subpaths of a shortest path are shortest

paths. Consider a shortest path p from vertex i to vertex j, and suppose that p contains at most r edges. Assuming that there are no negative-weight cycles, r is finite. If i = j, then p has weight 0 and no edges. If vertices i and j are distinct, then decompose path p into

, where

path p′ now contains at most r − 1 edges. Lemma 22.1 says that p′ is a shortest path from i to k, and so δ( i, j) = δ( i, k) + wkj.

A recursive solution to the all-pairs shortest-paths problem

Now, let be the minimum weight of any path from vertex i to vertex j

that contains at most r edges. When r = 0, there is a shortest path from i to j with no edges if and only if i = j, yielding

For r ≥ 1, one way to achieve a minimum-weight path from i to j with at most r edges is by taking a path containing at most r − 1 edges, so that

. Another way is by taking a path of at most r − 1 edges from i

to some vertex k and then taking the edge ( k, j), so that

.

Therefore, to examine paths from i to j consisting of at most r edges, try all possible predecessors k of j, giving the recursive definition

The last equality follows from the observation that wjj = 0 for all j.

What are the actual shortest-path weights δ( i, j)? If the graph contains no negative-weight cycles, then whenever δ( i, j) < ∞, there is a shortest path from vertex i to vertex j that is simple. (A path p from i to j that is not simple contains a cycle. Since each cycle’s weight is

nonnegative, removing all cycles from the path leaves a simple path with

weight no greater than p’s weight.) Because any simple path contains at

Image 727

Image 728

Image 729

most n − 1 edges, a path from vertex i to vertex j with more than n − 1

edges cannot have lower weight than a shortest path from i to j. The actual shortest-path weights are therefore given by

Computing the shortest-path weights bottom up

Taking as input the matrix W = ( wij), let’s see how to compute a series of matrices L(0), L(1), … , L( n−1), where

for r = 0, 1, … , n

1. The initial matrix is L(0) given by equation (23.2). The final matrix

L( n−1) contains the actual shortest-path weights.

The heart of the algorithm is the procedure EXTEND-SHORTEST-

PATHS, which implements equation (23.3) for all i and j. The four inputs are the matrix L( r−1) computed so far; the edge-weight matrix

W; the output matrix L( r), which will hold the computed result and whose elements are all initialized to ∞ before invoking the procedure;

and the number n of vertices. The superscripts r and r − 1 help to make the correspondence of the pseudocode with equation (23.3) plain, but

they play no actual role in the pseudocode. The procedure extends the

shortest paths computed so far by one more edge, producing the matrix

L( r) of shortest-path weights from the matrix L( r−1) computed so far.

Its running time is Θ( n 3) due to the three nested for loops.

EXTEND-SHORTEST-PATHS( L( r−1), W, L( r), n) 1 // Assume that the elements of L( r) are initialized to ∞.

2 for i = 1 to n

3

for j = 1 to n

4

for k = 1 to n

5

Let’s now understand the relation of this computation to matrix

multiplication. Consider how to compute the matrix product C = A · B

Image 730

of two n × n matrices A and B. The straightforward method used by MATRIX-MULTIPLY on page 81 uses a triply nested loop to

implement equation (4.1), which we repeat here for convenience:

for i, j = 1, 2, … , n. Now make the substitutions

l( r−1) → a,

wb,

l( r) → c,

min → +,

+ → .

in equation (23.3). You get equation (23.5)! Making these changes to

EXTEND-SHORTEST-PATHS, and also replacing ∞ (the identity for

min) by 0 (the identity for +), yields the procedure MATRIX-

MULTIPLY. We can see that the procedure EXTEND-SHORTEST-

PATHS( L( r−1), W, L( r), n) computes the matrix “product” L( r) =

L( r−1). W using this unusual definition of matrix multiplication.2

Thus, we can solve the all-pairs shortest-paths problem by repeatedly

multiplying matrices. Each step extends the shortest-path weights

computed so far by one more edge using EXTEND-SHORTEST-

PATHS( L( r−1), W, L( r), n) to perform the matrix multiplication.

Starting with the matrix L(0), we produce the following sequence of n

1 matrices corresponding to powers of W:

L(1) = L(0) · W = W 1,

L(2) = L(1) · W = W 2,

L(3) = L(2) · W = W 3,

L( n−1) = L( n−2) · W = Wn−1.

Image 731

At the end, the matrix L( n−1) = Wn−1 contains the shortest-path weights.

The procedure SLOW-APSP on the next page computes this

sequence in Θ( n 4) time. The procedure takes the n × n matrices W and L(0) as inputs, along with n. Figure 23.1 illustrates its operation. The pseudocode uses two n × n matrices L and M to store powers of W, computing M = L · W on each iteration. Line 2 initializes L = L(0). For each iteration r, line 4 initializes M = ∞, where ∞ in this context is a matrix of scalar ∞ values. The r th iteration starts with the invariant L =

L( r−1) = Wr−1. Line 6 computes M = L · W = L( r−1) · W = Wr−1 · W

= Wr = L( r) so that the invariant can be restored for the next iteration by line 7, which sets L = M. At the end, the matrix L = L( n−1) = Wn−1

of shortest-path weights is returned. The assignments to n × n matrices in lines 2, 4, and 7 implicitly run doubly nested loops that take Θ( n 2)

time for each assignment. The n − 1 invocations of EXTEND-

SHORTEST-PATHS, each of which takes Θ( n 3) time, dominate the

computation, yielding a total running time of Θ( n 4).

Figure 23.1 A directed graph and the sequence of matrices L( r) computed by SLOW-APSP. You might want to verify that L(5), defined as L(4) · W, equals L(4), and thus L( r) = L(4) for all r

4.

SLOW-APSP( W, L(0), n)

1 let L = ( lij) and M = ( mij) be new n × n matrices

Image 732

Image 733

2 L = L(0)

3 for r = 1 to n − 1

4

M = ∞ // initialize M

5

// Compute the matrix “product” M = L · W.

6

EXTEND-SHORTEST-PATHS( L, W, M, n)

7

L = M

8 return L

Improving the running time

Bear in mind that the goal is not to compute all the L( r) matrices: only the matrix L( n−1) matters. Recall that in the absence of negative-weight cycles, equation (23.4) implies L( r) = L( n−1) for all integers rn − 1.

Just as traditional matrix multiplication is associative, so is matrix

multiplication defined by the EXTEND-SHORTEST-PATHS

procedure (see Exercise 23.1-4). In fact, we can compute L( n−1) with only ⌈lg( n – 1)⌉ matrix products by using the technique of repeated squaring:

Since 2⌈lg( n – 1)⌉ ≥ n – 1, the final product is

.

The procedure FASTER-APSP implements this idea. It takes just

the n × n matrix W and the size n as inputs. Each iteration of the while loop of lines 4–8 starts with the invariant L = Wr, which it squares using EXTEND-SHORTEST-PATHS to obtain the matrix M = L 2 =

( Wr)2 = W 2 r. At the end of each iteration, the value of r doubles, and L

for the next iteration becomes M, restoring the invariant. Upon exiting

the loop when rn − 1, the procedure returns L = W r = L( r) = L( n−1) by equation (23.4). As in SLOW-APSP, the assignments to n × n

Image 734

matrices in lines 2, 5, and 8 implicitly run doubly nested loops, taking

Θ( n 2) time for each assignment.

FASTER-APSP( W, n)

1 let L and M be new n × n matrices

2 L = W

3 r = 1

4 while r < n − 1

5

M = ∞

// initialize M

6

EXTEND-SHORTEST-

// compute M = L 2

PATHS( L, L, M, n)

7

r = 2 r

8

L = M

// ready for the next

iteration

9 return L

Because each of the ⌈lg( n – 1)⌉ matrix products takes Θ( n 3) time, FASTER-APSP runs in Θ( n 3 lg n) time. The code is tight, containing

no elaborate data structures, and the constant hidden in the Θ-notation

is therefore small.

Exercises

23.1-1

Run SLOW-APSP on the weighted, directed graph of Figure 23.2, showing the matrices that result for each iteration of the loop. Then do

the same for FASTER-APSP.

Figure 23.2 A weighted, directed graph for use in Exercises 23.1-1, 23.2-1, and 23.3-1.

Image 735

23.1-2

Why is it convenient for both SLOW-APSP and FASTER-APSP that

wii = 0 for i = 1, 2, … , n?

23.1-3

What does the matrix

used in the shortest-paths algorithms correspond to in regular matrix

multiplication?

23.1-4

Show that matrix multiplication defined by EXTEND-SHORTEST-

PATHS is associative.

23.1-5

Show how to express the single-source shortest-paths problem as a

product of matrices and a vector. Describe how evaluating this product

corresponds to a Bellman-Ford-like algorithm (see Section 22.1).

23.1-6

Argue that we don’t need the matrix M in SLOW-APSP because by

substituting L for M and leaving out the initialization of M, the code still works correctly. ( Hint: Relate line 5 of EXTEND-SHORTEST-PATHS to RELAX on page 610.) Do we need the matrix M in

FASTER-APSP?

23.1-7

Suppose that you also want to compute the vertices on shortest paths in

the algorithms of this section. Show how to compute the predecessor

matrix Π from the completed matrix L of shortest-path weights in O( n 3) time.

23.1-8

Image 736

You can also compute the vertices on shortest paths along with

computing the shortest-path weights. Define as the predecessor of

vertex j on any minimum-weight path from vertex i to vertex j that contains at most r edges. Modify the EXTEND-SHORTEST-PATHS

and SLOW-APSP procedures to compute the matrices Π(1), Π(2), … ,

Π( n−1) as they compute the matrices L(1), L(2), … , L( n−1).

23.1-9

Modify FASTER-APSP so that it can determine whether the graph

contains a negative-weight cycle.

23.1-10

Give an efficient algorithm to find the length (number of edges) of a

minimum-length negative-weight cycle in a graph.

23.2 The Floyd-Warshall algorithm

Having already seen one dynamic-programming solution to the all-pairs

shortest-paths problem, in this section we’ll see another: the Floyd-

Warshall algorithm, which runs in Θ( V 3) time. As before, negative-weight edges may be present, but not negative-weight cycles. As in

Section 23.1, we develop the algorithm by following the dynamic-programming process. After studying the resulting algorithm, we

present a similar method for finding the transitive closure of a directed

graph.

The structure of a shortest path

In the Floyd-Warshall algorithm, we characterize the structure of a

shortest path differently from how we characterized it in Section 23.1.

The Floyd-Warshall algorithm considers the intermediate vertices of a

shortest path, where an intermediate vertex of a simple path p = 〈 v 1, v 2,

… , vl〉 is any vertex of p other than v 1 or vl, that is, any vertex in the set

{ v 2, v 3, … , vl−1}.

Image 737

Image 738

The Floyd-Warshall algorithm relies on the following observation.

Numbering the vertices of G by V = {1, 2, … , n}, take a subset {1, 2, …

, k} of vertices for some 1 ≤ kn. For any pair of vertices i, jV, consider all paths from i to j whose intermediate vertices are all drawn from {1, 2, … , k}, and let p be a minimum-weight path from among

them. (Path p is simple.) The Floyd-Warshall algorithm exploits a

relationship between path p and shortest paths from i to j with all intermediate vertices in the set {1, 2, … , k − 1}. The details of the relationship depend on whether k is an intermediate vertex of path p or not.

Figure 23.3 Optimal substructure used by the Floyd-Warshall algorithm. Path p is a shortest path from vertex i to vertex j, and k is the highest-numbered intermediate vertex of p. Path p 1, the portion of path p from vertex i to vertex k, has all intermediate vertices in the set {1, 2, … , k

− 1}. The same holds for path p 2 from vertex k to vertex j.

If k is not an intermediate vertex of path p, then all intermediate

vertices of path p belong to the set {1, 2, … , k − 1}. Thus a shortest path from vertex i to vertex j with all intermediate vertices in the set {1, 2, … , k − 1} is also a shortest path from i to j with all intermediate vertices in the set {1, 2, … , k}.

If k is an intermediate vertex of path p, then decompose p into

, as Figure 23.3 illustrates. By Lemma 22.1, p 1 is a shortest path from i to k with all intermediate vertices in the set

{1, 2, … , k}. In fact, we can make a slightly stronger statement.

Because vertex k is not an intermediate vertex of path p 1, all intermediate vertices of p 1 belong to the set {1, 2, … , k − 1}.

Therefore p 1 is a shortest path from i to k with all intermediate vertices in the set {1, 2, … , k − 1}. Likewise, p 2 is a shortest path

Image 739

Image 740

Image 741

Image 742

Image 743

Image 744

Image 745

Image 746

Image 747

from vertex k to vertex j with all intermediate vertices in the set

{1, 2, … , k − 1}.

A recursive solution to the all-pairs shortest-paths problem

The above observations suggest a recursive formulation of shortest-path

estimates that differs from the one in Section 23.1. Let be the weight of a shortest path from vertex i to vertex j for which all intermediate vertices belong to the set {1, 2, … , k}. When k = 0, a path from vertex i to vertex j with no intermediate vertex numbered higher than 0 has no

intermediate vertices at all. Such a path has at most one edge, and hence

. Following the above discussion, define recursively by

Because for any path, all intermediate vertices belong to the set {1, 2, …

, n}, the matrix

gives the final answer:

for all i, jV.

Computing the shortest-path weights bottom up

Based on recurrence (23.6), the bottom-up procedure FLOYD-

WARSHALL computes the values in order of increasing values of k.

Its input is an n × n matrix W defined as in equation (23.1). The procedure returns the matrix D( n) of shortest-path weights. Figure 23.4

shows the matrices D( k) computed by the Floyd-Warshall algorithm for

the graph in Figure 23.1.

FLOYD-WARSHALL( W, n)

1 D(0) = W

2 for k = 1 to n

3

let

be a new n × n matrix

4

for i = 1 to n

5

for j = 1 to n

6

7 return D( n)

Image 748

The running time of the Floyd-Warshall algorithm is determined by

the triply nested for loops of lines 2–6. Because each execution of line 6

takes O(1) time, the algorithm runs in Θ( n 3) time. As in the final algorithm in Section 23.1, the code is tight, with no elaborate data structures, and so the constant hidden in the Θ-notation is small. Thus,

the Floyd-Warshall algorithm is quite practical for even moderate-sized

input graphs.

Constructing a shortest path

There are a variety of different methods for constructing shortest paths

in the Floyd-Warshall algorithm. One way is to compute the matrix D

of shortest-path weights and then construct the predecessor matrix Π

from the D matrix. Exercise 23.1-7 asks you to implement this method

so that it runs in O( n 3) time. Given the predecessor matrix Π, the PRINT-ALL-PAIRS-SHORTEST-PATH procedure prints the vertices

on a given shortest path.

Alternatively, the predecessor matrix … can be computed while the

algorithm computes the matrices D(0), D(1), … , D( n). Specifically, compute a sequence of matrices Π(0), Π(1), … , Π( n), where Π = Π( n)

and is the predecessor of vertex j on a shortest path from vertex i with all intermediate vertices in the set {1, 2, … , k}.

Image 749

Image 750

Image 751

Figure 23.4 The sequence of matrices D( k) and Π( k) computed by the Floyd-Warshall algorithm for the graph in Figure 23.1.

Here’s a recursive formulation of . When k = 0, a shortest path

from i to j has no intermediate vertices at all, and so

For k ≥ 1, if the path has k as an intermediate vertex, so that it is ik

j where kj, then choose as the predecessor of j on this path the same vertex as the predecessor of j chosen on a shortest path from k with all intermediate vertices in the set {1, 2, … , k − 1}. Otherwise, when the path from i to j does not have k as an intermediate vertex,

Image 752

Image 753

choose the same predecessor of j as on a shortest path from i with all

intermediate vertices in the set {1, 2, … , k − 1}. Formally, for k ≥ 1, Exercise 23.2-3 asks you to show how to incorporate the Π( k) matrix

computations into the FLOYD-WARSHALL procedure. Figure 23.4

shows the sequence of Π( k) matrices that the resulting algorithm

computes for the graph of Figure 23.1. The exercise also asks for the more difficult task of proving that the predecessor subgraph G π, i is a shortest-paths tree with root i. Exercise 23.2-7 asks for yet another way

to reconstruct shortest paths.

Transitive closure of a directed graph

Given a directed graph G = ( V, E) with vertex set V = {1, 2, … , n}, you might wish to determine simply whether G contains a path from i to j for all vertex pairs i, jV, without regard to edge weights. We define the transitive closure of G as the graph G* = ( V, E*), where E* = {( i, j) : there is a path from vertex i to vertex j in G}.

One way to compute the transitive closure of a graph in Θ( n 3) time is

to assign a weight of 1 to each edge of E and run the Floyd-Warshall

algorithm. If there is a path from vertex i to vertex j, you get dij < n.

Otherwise, you get dij = ∞.

There is another, similar way to compute the transitive closure of G

in Θ( n 3) time, which can save time and space in practice. This method

substitutes the logical operations ∨ (logical OR) and ∧ (logical AND)

for the arithmetic operations min and + in the Floyd-Warshall

algorithm. For i, j, k = 1, 2, … , n, define to be 1 if there exists a path in graph G from vertex i to vertex j with all intermediate vertices in the set {1, 2, … , k}, and 0 otherwise. To construct the transitive closure G*

Image 754

Image 755

Image 756

Image 757

Image 758

Image 759

Image 760

Image 761

Image 762

Image 763

Image 764

= ( V, E*), put edge ( i, j) into E* if and only if

. A recursive

definition of , analogous to recurrence (23.6), is

Figure 23.5 A directed graph and the matrices T( k) computed by the transitive-closure algorithm.

and for k ≥ 1,

As in the Floyd-Warshall algorithm, the TRANSITIVE-CLOSURE

procedure computes the matrices

in order of increasing k.

TRANSITIVE-CLOSURE( G, n)

1 let

be a new n × n matrix

2 for i = 1 to n

3

for j = 1 to n

4

if i == j or ( i, j) ∈ G.E

5

6

else

7 for k = 1 to n

8

let

be a new n × n matrix

9

for i = 1 to n

10

for j = 1 to n

11

12 return T( n)

Image 765

Image 766

Image 767

Image 768

Figure 23.5 shows the matrices T( k) computed by the TRANSITIVE-CLOSURE procedure on a sample graph. The

TRANSITIVE-CLOSURE procedure, like the Floyd-Warshall

algorithm, runs in Θ( n 3) time. On some computers, though, logical

operations on single-bit values execute faster than arithmetic operations

on integer words of data. Moreover, because the direct transitive-closure

algorithm uses only boolean values rather than integer values, its space

requirement is less than the Floyd-Warshall algorithm’s by a factor

corresponding to the size of a word of computer storage.

Exercises

23.2-1

Run the Floyd-Warshall algorithm on the weighted, directed graph of

Figure 23.2. Show the matrix D( k) that results for each iteration of the outer loop.

23.2-2

Show how to compute the transitive closure using the technique of

Section 23.1.

23.2-3

Modify the FLOYD-WARSHALL procedure to compute the Π( k)

matrices according to equations (23.7) and (23.8). Prove rigorously that

for all iV, the predecessor subgraph G π, i is a shortest-paths tree with root i. ( Hint: To show that G π, i is acyclic, first show that implies

, according to the definition of . Then adapt the proof of

Lemma 22.16.)

23.2-4

As it appears on page 657, the Floyd-Warshall algorithm requires Θ( n 3)

space, since it creates for i, j, k = 1, 2, … , n. Show that the procedure

Image 769

Image 770

Image 771

Image 772

Image 773

Image 774

FLOYD-WARSHALL′, which simply drops all the superscripts, is

correct, and thus only Θ( n 2) space is required.

FLOYD-WARSHALL′( W, n)

1 D = W

2 for k = 1 to n

3

for i = 1 to n

4

for j = 1 to n

5

dij = min { dij, dik + dkj}

6 return D

23.2-5

Consider the following change to how equation (23.8) handles equality:

Is this alternative definition of the predecessor matrix Π correct?

23.2-6

Show how to use the output of the Floyd-Warshall algorithm to detect

the presence of a negative-weight cycle.

23.2-7

Another way to reconstruct shortest paths in the Floyd-Warshall

algorithm uses values for i, j, k = 1, 2, … , n, where is the highest-numbered intermediate vertex of a shortest path from i to j in which all intermediate vertices lie in the set {1, 2, … , k}. Give a recursive formulation for

, modify the FLOYD-WARSHALL procedure to

compute the

values, and rewrite the PRINT-ALL-PAIRS-

SHORTEST-PATH procedure to take the matrix

as an input.

How is the matrix Φ like the s table in the matrix-chain multiplication

problem of Section 14.2?

23.2-8

Give an O( VE)-time algorithm for computing the transitive closure of a directed graph G = ( V, E). Assume that | V| = O( E) and that the graph is represented with adjacency lists.

23.2-9

Suppose that it takes f(| V|, | E|) time to compute the transitive closure of a directed acyclic graph, where f is a monotonically increasing function

of both | V| and | E|. Show that the time to compute the transitive closure G* = ( V, E*) of a general directed graph G = ( V, E) is then f(| V|, | E|) +

O( V + E*).

23.3 Johnson’s algorithm for sparse graphs

Johnson’s algorithm finds shortest paths between all pairs in O( V 2 lg V

+ VE) time. For sparse graphs, it is asymptotically faster than either repeated squaring of matrices or the Floyd-Warshall algorithm. The

algorithm either returns a matrix of shortest-path weights for all pairs

of vertices or reports that the input graph contains a negative-weight

cycle. Johnson’s algorithm uses as subroutines both Dijkstra’s algorithm

and the Bellman-Ford algorithm, which Chapter 22 describes.

Johnson’s algorithm uses the technique of reweighting, which works

as follows. If all edge weights w in a graph G = ( V, E) are nonnegative, Dijkstra’s algorithm can find shortest paths between all pairs of vertices

by running it once from each vertex. With the Fibonacci-heap min-

priority queue, the running time of this all-pairs algorithm is O( V 2 lg V

+ VE). If G has negative-weight edges but no negative-weight cycles, first compute a new set of nonnegative edge weights so that Dijkstra’s

algorithm applies. The new set of edge weights ŵ must satisfy two

important properties:

1. For all pairs of vertices u, vV, a path p is a shortest path from u to v using weight function w if and only if p is also a shortest path from u to v using weight function ŵ.

2. For all edges ( u, v), the new weight ŵ( u, v) is nonnegative.

Image 775

Image 776

Image 777

Image 778

Image 779

As we’ll see in a moment, preprocessing G to determine the new weight

function ŵ takes O( VE) time.

Preserving shortest paths by reweighting

The following lemma shows how to reweight the edges to satisfy the first

property above. We use δ to denote shortest-path weights derived from

weight function w and to denote shortest-path weights derived from

weight function ŵ.

Lemma 23.1 (Reweighting does not change shortest paths)

Given a weighted, directed graph G = ( V, E) with weight function w : E

→ ℝ, let h : V → ℝ be any function mapping vertices to real numbers.

For each edge ( u, v) ∈ E, define

Let p = 〈 v 0, v 1, … , vk〉 be any path from vertex v 0 to vertex vk. Then p is a shortest path from v 0 to vk with weight function w if and only if it is a shortest path with weight function ŵ. That is, w( p) = δ( v 0, vk) if and only if

. Furthermore, G has a negative-weight cycle using

weight function w if and only if G has a negative-weight cycle using weight function ŵ.

Proof We start by showing that

We have

Image 780

Therefore, any path p from v 0 to vk has ŵ( p) = w( p) + h( v 0) − h( vk).

Because h( v 0) and h( vk) do not depend on the path, if one path from v 0

to vk is shorter than another using weight function w, then it is also shorter using ŵ. Thus, w( p) = δ( v 0, vk) if and only if

.

Finally, we show that G has a negative-weight cycle using weight

function w if and only if G has a negative-weight cycle using weight function ŵ. Consider any cycle c = 〈 v 0, v 1, … , vk〉, where v 0 = vk. By equation (23.11),

ŵ( c) = w( c) + h( v 0) + h( vk)

= w( c),

and thus c has negative weight using w if and only if it has negative weight using ŵ.

Producing nonnegative weights by reweighting

Our next goal is to ensure that the second property holds: ŵ( u, v) must be nonnegative for all edges ( u, v) = E. Given a weighted, directed graph G = ( V, E) with weight function w : E → ℝ, we’ll see how to make a new graph G′ = ( V′, E′), where V′ = V ∪ { s} for some new vertex sV and E′ = E ∪ {( s, v) : v = V }. To incorporate the new vertex s, extend the weight function w so that w( s, v) = 0 for all vV. Since no edges enter s, no shortest paths in G′, other than those with source s, contain s.

Moreover, G′ has no negative-weight cycles if and only if G has no negative-weight cycles. Figure 23.6(a) shows the graph G′ corresponding to the graph G of Figure 23.1.

Now suppose that G and G′ have no negative-weight cycles. Define

the function h( v) = δ( s, v) for all vV′. By the triangle inequality (Lemma 22.10 on page 633), we have h( v) ≤ h( u) + w( u, v) for all edges ( u, v) ∈ E′. Thus, by defining reweighted edge weights ŵ according to equation (23.10), we have ŵ( u, v) = w( u, v) + h( u) − h( v) ≥ 0, thereby

satisfying the second property. Figure 23.6(b) shows the graph G′ from

Figure 23.6(a) with reweighted edges.

Computing all-pairs shortest paths

Johnson’s algorithm to compute all-pairs shortest paths uses the

Bellman-Ford algorithm (Section 22.1) and Dijkstra’s algorithm (Section 22.3) as subroutines. The pseudocode appears in the procedure JOHNSON on page 666. It assumes implicitly that the edges are stored

in adjacency lists. The algorithm returns the usual | V| × | V| matrix D =

( dij), where dij = δ( i, j), or it reports that the input graph contains a negative-weight cycle. As is typical for an all-pairs shortest-paths

algorithm, it assumes that the vertices are numbered from 1 to | V|.

Image 781

Image 782

Image 783

Figure 23.6 Johnson’s all-pairs shortest-paths algorithm run on the graph of Figure 23.1. Vertex numbers appear outside the vertices. (a) The graph G′ with the original weight function w. The new vertex s is blue. Within each vertex v is h( v) = δ( s, v). (b) After reweighting each edge ( u, v) with weight function ŵ( u, v) = w( u, v) + h( u) − h( v). (c)–(g) The result of running Dijkstra’s algorithm on each vertex of G using weight function ŵ. In each part, the source vertex u is blue, and blue edges belong to the shortest-paths tree computed by the algorithm. Within each vertex v are the values

and δ( u, v), separated by a slash. The value duv = δ( u, v) is equal to

.

JOHNSON( G, w)

1 compute G′, where G′. V = G.V ∪ { s},

G′. E = G.E ∪ {( s, v) : vG.V}, and

Image 784

Image 785

Image 786

w( s, v) = 0 for all vG.V

2 if BELLMAN-FORD( G′, w, s) == FALSE

3

print “the input graph contains a negative-weight cycle”

4 else for each vertex vG′. V

5

set h( v) to the value of δ( s, v)

computed by the Bellman-Ford algorithm

6

for each edge ( u, v) ∈ G′. E

7

ŵ( u, v) = w( u, v) + h( u) − h( v) 8

let D = ( duv) be a new n × n matrix

9

for each vertex uG.V

10

run DIJKSTRA( G, ŵ, u) to compute

for all vG.V

11

for each vertex vG.V

12

13 return D

The JOHNSON procedure simply performs the actions specified

earlier. Line 1 produces G′. Line 2 runs the Bellman-Ford algorithm on

G′ with weight function w and source vertex s. If G′, and hence G, contains a negative-weight cycle, line 3 reports the problem. Lines 4–12

assume that G′ contains no negative-weight cycles. Lines 4–5 set h( v) to the shortest-path weight δ( s, v) computed by the Bellman-Ford

algorithm for all vV′. Lines 6–7 compute the new weights ŵ. For each pair of vertices u, vV, the for loop of lines 9–12 computes the shortest-path weight

by calling Dijkstra’s algorithm once from

each vertex in V. Line 12 stores in matrix entry duv the correct shortest-path weight δ( u, v), calculated using equation (23.11). Finally, line 13

returns the completed D matrix. Figure 23.6 depicts the execution of Johnson’s algorithm.

If the min-priority queue in Dijkstra’s algorithm is implemented by a

Fibonacci heap, Johnson’s algorithm runs in O( V 2 lg V + VE) time. The simpler binary min-heap implementation yields a running time of O( VE

lg V), which is still asymptotically faster than the Floyd-Warshall

algorithm if the graph is sparse.

Exercises

23.3-1

Use Johnson’s algorithm to find the shortest paths between all pairs of

vertices in the graph of Figure 23.2. Show the values of h and ŵ

computed by the algorithm.

23.3-2

What is the purpose of adding the new vertex s to V, yielding V′?

23.3-3

Suppose that w( u, v) ≥ 0 for all edges ( u, v) ∈ E. What is the relationship between the weight functions w and ŵ?

23.3-4

Professor Greenstreet claims that there is a simpler way to reweight

edges than the method used in Johnson’s algorithm. Letting w* = min

{ w( u, v) : ( u, v) ∈ E}, just define ŵ( u, v) = w( u, v) − w* for all edges ( u, v) ∈ E. What is wrong with the professor’s method of reweighting?

23.3-5

Show that if G contains a 0-weight cycle c, then ŵ( u, v) = 0 for every edge ( u, v) in c.

23.3-6

Professor Michener claims that there is no need to create a new source

vertex in line 1 of JOHNSON. He suggests using G′ = G instead and letting s be any vertex. Give an example of a weighted, directed graph G

for which incorporating the professor’s idea into JOHNSON causes

incorrect answers. Assume that ∞ − ∞ is undefined, and in particular, it

is not 0. Then show that if G is strongly connected (every vertex is reachable from every other vertex), the results returned by JOHNSON

with the professor’s modification are correct.

Problems

Image 787

23-1 Transitive closure of a dynamic graph

You wish to maintain the transitive closure of a directed graph G = ( V, E) as you insert edges into E. That is, after inserting an edge, you update the transitive closure of the edges inserted so far. Start with G having no

edges initially, and represent the transitive closure by a boolean matrix.

a. Show how to update the transitive closure G* = ( V, E*) of a graph G

= ( V, E) in O( V 2) time when a new edge is added to G.

b. Give an example of a graph G and an edge e such that Ω( V 2) time is required to update the transitive closure after inserting e into G, no

matter what algorithm is used.

c. Give an algorithm for updating the transitive closure as edges are

inserted into the graph. For any sequence of r insertions, your

algorithm should run in time

, where ti is the time to

update the transitive closure upon inserting the i th edge. Prove that

your algorithm attains this time bound.

23-2 Shortest paths in ϵ-dense graphs

A graph G = ( V, E) is ϵ-dense if | E| = Θ( V 1+ ϵ) for some constant in the range 0 < ϵ ≤ 1. d-ary min-heaps (see Problem 6-2 on page 179) provide a way to match the running times of Fibonacci-heap-based shortest-path algorithms on ϵ-dense graphs without using as complicated a data

structure.

a. What are the asymptotic running times for the operations INSERT,

EXTRACT-MIN, and DECREASE-KEY, as a function of d and the

number n of elements in a d-ary min-heap? What are these running

times if you choose d = Θ( ) for some constant 0 < α ≤ 1? Compare these running times to the amortized costs of these operations for a

Fibonacci heap.

b. Show how to compute shortest paths from a single source on an ϵ-

dense directed graph G = ( V, E) with no negative-weight edges in O( E) time. ( Hint: Pick d as a function of ϵ.)

Image 788

c. Show how to solve the all-pairs shortest-paths problem on an ϵ-dense directed graph G = ( V, E) with no negative-weight edges in O( VE) time.

d. Show how to solve the all-pairs shortest-paths problem in O( VE) time on an ϵ-dense directed graph G = ( V, E) that may have negative-weight edges but has no negative-weight cycles.

Chapter notes

Lawler [276] has a good discussion of the all-pairs shortest-paths problem. He attributes the matrix-multiplication algorithm to the

folklore. The Floyd-Warshall algorithm is due to Floyd [144], who based it on a theorem of Warshall [450] that describes how to compute the transitive closure of boolean matrices. Johnson’s algorithm is taken

from [238].

Several researchers have given improved algorithms for computing

shortest paths via matrix multiplication. Fredman [153] shows how to solve the all-pairs shortest paths problem using O( V 5/2) comparisons between sums of edge weights and obtains an algorithm that runs in

O( V 3lg lg V/lg V/1/3) time, which is slightly better than the running time of the Floyd-Warshall algorithm. This bound has been improved

several times, and the fastest algorithm is now by Williams [457], with a running time of

.

Another line of research demonstrates how to apply algorithms for

fast matrix multiplication (see the chapter notes for Chapter 4) to the all-pairs shortest paths problem. Let O( n ω) be the running time of the fastest algorithm for multiplying two n × n matrices. Galil and Margalit

[170, 171] and Seidel [403] designed algorithms that solve the all-pairs shortest paths problem in undirected, unweighted graphs in ( V ω p( V)) time, where p( n) denotes a particular function that is

polylogarithmically bounded in n. In dense graphs, these algorithms are

faster than the O( VE) time needed to perform | V| breadth-first searches.

Several researchers have extended these results to give algorithms for

Image 789

solving the all-pairs shortest paths problem in undirected graphs in

which the edge weights are integers in the range {1, 2, … , W}. The asymptotically fastest such algorithm, by Shoshan and Zwick [410], runs in O( W V ω p( V W)) time. In directed graphs, the best algorithm to date is due to Zwick [467] and runs in Õ( W 1/(4−ω) V 2+1/(4−ω)) time.

Karger, Koller, and Phillips [244] and independently McGeoch [320]

have given a time bound that depends on E*, the set of edges in E that

participate in some shortest path. Given a graph with nonnegative edge

weights, their algorithms run in O( VE* + V 2 lg V) time and improve upon running Dijkstra’s algorithm | V| times when | E*| = o( E). Pettie

[355] uses an approach based on component hierarchies to achieve a running time of O( VE + V 2 lg lg V), and the same running time is also achieved by Hagerup [205].

Baswana, Hariharan, and Sen [37] examined decremental

algorithms, which allow a sequence of intermixed edge deletions and

queries, for maintaining all-pairs shortest paths and transitive-closure

information. When a path exists, their randomized transitive-closure

algorithm can fail to report it with probability 1/ nc for an arbitrary c > 0. The query times are O(1) with high probability. For transitive closure,

the amortized time for each update is O( V 4/3 lg1/3 V). By comparison, Problem 23-1, in which edges are inserted, asks for an incremental

algorithm. For all-pairs shortest paths, the update times depend on the

queries. For queries just giving the shortest-path weights, the amortized

time per update is O( V 3/ E lg2 V). To report the actual shortest path, the amortized update time is min

. Demetrescu and

Italiano [111] showed how to handle update and query operations when edges are both inserted and deleted, as long as the range of edge weights

is bounded.

Aho, Hopcroft, and Ullman [5] defined an algebraic structure known as a “closed semiring,” which serves as a general framework for solving

path problems in directed graphs. Both the Floyd-Warshall algorithm

and the transitive-closure algorithm from Section 23.2 are instantiations of an all-pairs algorithm based on closed semirings. Maggs and Plotkin

[309] showed how to find minimum spanning trees using a closed semiring.

1 According to a report cited by U.S. Department of Transportation Federal Highway Administration, “a reasonable ‘rule of thumb’ is one signalized intersection per 1,000

population.”

2 An algebraic semiring contains operations ⊕, which is commutative with identity I⊕, and ⊕, with identity I⊕, where ⊕ distributes over ⊕ on both the left and right, and where I⊕⊕ x =

xI⊕ = I⊕ for all x. Standard matrix multiplication, as in MATRIX-MULTIPLY, uses the semiring with + for ⊕, ⊕ for ⊕, 0 for I⊕, and 1 for I⊕. The procedure EXTEND-SHORTEST-PATHS uses another semiring, known as the tropical semiring, with min for ⊕, + for ⊕, ∞ for I⊕, and 0 for I⊕.

24 Maximum Flow

Just as you can model a road map as a directed graph in order to find

the shortest path from one point to another, you can also interpret a

directed graph as a “flow network” and use it to answer questions about

material flows. Imagine a material coursing through a system from a

source, where the material is produced, to a sink, where it is consumed.

The source produces the material at some steady rate, and the sink

consumes the material at the same rate. The “flow” of the material at

any point in the system is intuitively the rate at which the material

moves. Flow networks can model many problems, including liquids

flowing through pipes, parts through assembly lines, current through

electrical networks, and information through communication networks.

You can think of each directed edge in a flow network as a conduit

for the material. Each conduit has a stated capacity, given as a

maximum rate at which the material can flow through the conduit, such

as 200 gallons of liquid per hour through a pipe or 20 amperes of

electrical current through a wire. Vertices are conduit junctions, and

other than the source and sink, material flows through the vertices

without collecting in them. In other words, the rate at which material

enters a vertex must equal the rate at which it leaves the vertex. We call

this property “flow conservation,” and it is equivalent to Kirchhoff’s

current law when the material is electrical current.

The goal of the maximum-flow problem is to compute the greatest

rate for shipping material from the source to the sink without violating

any capacity constraints. It is one of the simplest problems concerning

flow networks and, as we shall see in this chapter, this problem can be

solved by efficient algorithms. Moreover, other network-flow problems

are solvable by adapting the basic techniques used in maximum-flow

algorithms.

This chapter presents two general methods for solving the

maximum-flow problem. Section 24.1 formalizes the notions of flow networks and flows, formally defining the maximum-flow problem.

Section 24.2 describes the classical method of Ford and Fulkerson for finding maximum flows. We finish up with a simple application of this

method, finding a maximum matching in an undirected bipartite graph,

in Section 24.3. (Section 25.1 will give a more efficient algorithm that is specifically designed to find a maximum matching in a bipartite graph.)

24.1 Flow networks

This section gives a graph-theoretic definition of flow networks,

discusses their properties, and defines the maximum-flow problem

precisely. It also introduces some helpful notation.

Flow networks and flows

A flow network G = ( V, E) is a directed graph in which each edge ( u, v)

E has a nonnegative capacity c( u, v) ≥ 0. We further require that if E

contains an edge ( u, v), then there is no edge ( v, u) in the reverse direction. (We’ll see shortly how to work around this restriction.) If ( u,

v) ∉ E, then for convenience we define c( u, v) = 0, and we disallow self-loops. Each flow network contains two distinguished vertices: a source s

and a sink t. For convenience, we assume that each vertex lies on some

path from the source to the sink. That is, for each vertex vV, the flow network contains a path svt. Because each vertex other than s has at least one entering edge, we have | E| ≥ | V | − 1. Figure 24.1 shows an example of a flow network.

Image 790

Image 791

Image 792

Figure 24.1 (a) A flow network G = ( V, E) for the Lucky Puck Company’s trucking problem.

The Vancouver factory is the source s, and the Winnipeg warehouse is the sink t. The company ships pucks through intermediate cities, but only c( u, v) crates per day can go from city u to city v. Each edge is labeled with its capacity. (b) A flow f in G with value | f | = 19. Each edge ( u, v) is labeled by f ( u, v)/ c( u, v). The slash notation merely separates the flow and capacity and does not indicate division.

We are now ready to define flows more formally. Let G = ( V, E) be a flow network with a capacity function c. Let s be the source of the network, and let t be the sink. A flow in G is a real-valued function f : V

× V → ℝ that satisfies the following two properties:

Capacity constraint: For all u, vV, we require

0 ≤ f( u, v) ≤ c( u, v).

The flow from one vertex to another must be nonnegative and must

not exceed the given capacity.

Flow conservation: For all uV − { s, t}, we require

The total flow into a vertex other than the source or sink must equal

the total flow out of that vertex—informally, “flow in equals flow

out.”

When ( u, v) ∉ E, there can be no flow from u to v, and f ( u, v) = 0.

We call the nonnegative quantity f ( u, v) the flow from vertex u to vertex v. The value | f | of a flow f is defined as

that is, the total flow out of the source minus the flow into the source.

(Here, the |·| notation denotes flow value, not absolute value or

cardinality.) Typically, a flow network does not have any edges into the

source, and the flow into the source, given by the summation Σ vV f( v, s), is 0. We include it, however, because when we introduce residual networks later in this chapter, the flow into the source can be positive. In

the maximum-flow problem, the input is a flow network G with source s and sink t, and the goal is to find a flow of maximum value.

An example of flow

A flow network can model the trucking problem shown in Figure

24.1(a). The Lucky Puck Company has a factory (source s) in

Vancouver that manufactures hockey pucks, and it has a warehouse

(sink t) in Winnipeg that stocks them. Lucky Puck leases space on

trucks from another firm to ship the pucks from the factory to the

warehouse. Because the trucks travel over specified routes (edges)

between cities (vertices) and have a limited capacity, Lucky Puck can

ship at most c( u, v) crates per day between each pair of cities u and v in

Figure 24.1(a). Lucky Puck has no control over these routes and capacities, and so the company cannot alter the flow network shown in

Figure 24.1(a). They need to determine the largest number p of crates per day that they can ship and then to produce this amount, since there

is no point in producing more pucks than they can ship to their

warehouse. Lucky Puck is not concerned with how long it takes for a

given puck to get from the factory to the warehouse. They care only that

p crates per day leave the factory and p crates per day arrive at the warehouse.

Image 793

Figure 24.2 Converting a network with antiparallel edges to an equivalent one with no antiparallel edges. (a) A flow network containing both the edges ( v 1, v 2) and ( v 2, v 1). (b) An equivalent network with no antiparallel edges. A new vertex v′ was added, and edge ( v 1, v 2) was replaced by the pair of edges ( v 1, v′) and ( v′, v 2), both with the same capacity as ( v 1, v 2).

A flow in this network models the “flow” of shipments because the

number of crates shipped per day from one city to another is subject to

a capacity constraint. Additionally, the model must obey flow

conservation, for in a steady state, the rate at which pucks enter an

intermediate city must equal the rate at which they leave. Otherwise,

crates would accumulate at intermediate cities.

Modeling problems with antiparallel edges

Suppose that the trucking firm offers Lucky Puck the opportunity to

lease space for 10 crates in trucks going from Edmonton to Calgary. It

might seem natural to add this opportunity to our example and form

the network shown in Figure 24.2(a). This network suffers from one problem, however: it violates the original assumption that if edge ( v 1,

v 2) ∈ E, then ( v 2, v 1) ∉ E. We call the two edges ( v 1, v 2) and ( v 2, v 1) antiparallel. Thus, to model a flow problem with antiparallel edges, the

network must be transformed into an equivalent one containing no

antiparallel edges. Figure 24.2(b) displays this equivalent network. To transform the network, choose one of the two antiparallel edges, in this

case ( v 1, v 2), and split it by adding a new vertex v′ and replacing edge ( v 1, v 2) with the pair of edges ( v 1, v′) and ( v′, v 2). Also set the capacity of both new edges to the capacity of the original edge. The resulting

network satisfies the property that if an edge belongs to the network, the

Image 794

reverse edge does not. As Exercise 24.1-1 asks you to prove, the

resulting network is equivalent to the original one.

Figure 24.3 Converting a multiple-source, multiple-sink maximum-flow problem into a problem with a single source and a single sink. (a) A flow network with three sources S = { s 1, s 2, s 3} and two sinks T = { t 1, t 2}. (b) An equivalent single-source, single-sink flow network. Add a supersource s and an edge with infinite capacity from s to each of the multiple sources. Also add a supersink t and an edge with infinite capacity from each of the multiple sinks to t.

Networks with multiple sources and sinks

A maximum-flow problem may have several sources and sinks, rather

than just one of each. The Lucky Puck Company, for example, might

actually have a set of m factories { s 1, s 2, …, sm} and a set of n warehouses { t 1, t 2, …, tn}, as shown in Figure 24.3(a). Fortunately, this problem is no harder than ordinary maximum flow.

The problem of determining a maximum flow in a network with

multiple sources and multiple sinks reduces to an ordinary maximum-

flow problem. Figure 24.3(b) shows how to convert the network from (a) to an ordinary flow network with only a single source and a single sink.

Add a supersource s and add a directed edge ( s, si) with capacity c( s, si)

= ∞ for each i = 1, 2, …, m. Similarly, create a new supersink t and add a directed edge ( ti, t) with capacity c( ti, t) = ∞ for each i = 1, 2, …, n.

Intuitively, any flow in the network in (a) corresponds to a flow in the

network in (b), and vice versa. The single supersource s provides as much flow as desired for the multiple sources si, and the single supersink

t likewise consumes as much flow as desired for the multiple sinks ti.

Exercise 24.1-2 asks you to prove formally that the two problems are

equivalent.

Exercises

24.1-1

Show that splitting an edge in a flow network yields an equivalent

network. More formally, suppose that flow network G contains edge ( u,

v), and define a new flow network G′ by creating a new vertex x and replacing ( u, v) by new edges ( u, x) and ( x, v) with c( u, x) = c( x, v) = c( u, v). Show that a maximum flow in G′ has the same value as a maximum

flow in G.

24.1-2

Extend the flow properties and definitions to the multiple-source,

multiple-sink problem. Show that any flow in a multiple-source,

multiple-sink flow network corresponds to a flow of identical value in

the single-source, single-sink network obtained by adding a supersource

and a supersink, and vice versa.

24.1-3

Suppose that a flow network G = ( V, E) violates the assumption that the network contains a path svt for all vertices vV. Let u be a vertex for which there is no path sut. Show that there must exist a maximum flow f in G such that f ( u, v) = f ( v, u) = 0 for all vertices v

V.

24.1-4

Let f be a flow in a network, and let α be a real number. The scalar flow product, denoted αf, is a function from V × V to ℝ defined by ( αf)( u, v) = α · f ( u, v).

Prove that the flows in a network form a convex set. That is, show that if

f 1 and f 2 are flows, then so is αf 1 + (1 − α) f 2 for all α in the range 0 ≤ α

≤ 1.

24.1-5

State the maximum-flow problem as a linear-programming problem.

24.1-6

Professor Adam has two children who, unfortunately, dislike each other.

The problem is so severe that not only do they refuse to walk to school

together, but in fact each one refuses to walk on any block that the

other child has stepped on that day. The children have no problem with

their paths crossing at a corner. Fortunately both the professor’s house

and the school are on corners, but beyond that he is not sure if it is

going to be possible to send both of his children to the same school. The

professor has a map of his town. Show how to formulate the problem of

determining whether both his children can go to the same school as a

maximum-flow problem.

24.1-7

Suppose that, in addition to edge capacities, a flow network has vertex

capacities. That is each vertex v has a limit l( v) on how much flow can pass through v. Show how to transform a flow network G = ( V, E) with vertex capacities into an equivalent flow network G′ = ( V′, E′) without vertex capacities, such that a maximum flow in G′ has the same value as

a maximum flow in G. How many vertices and edges does G′ have?

24.2 The Ford-Fulkerson method

This section presents the Ford-Fulkerson method for solving the

maximum-flow problem. We call it a “method” rather than an

“algorithm” because it encompasses several implementations with

differing running times. The Ford-Fulkerson method depends on three

important ideas that transcend the method and are relevant to many

flow algorithms and problems: residual networks, augmenting paths,

and cuts. These ideas are essential to the important max-flow min-cut

theorem (Theorem 24.6), which characterizes the value of a maximum

flow in terms of cuts of the flow network. We end this section by

presenting one specific implementation of the Ford-Fulkerson method

and analyzing its running time.

The Ford-Fulkerson method iteratively increases the value of the flow. It starts with f ( u, v) = 0 for all u, vV, giving an initial flow of value 0. Each iteration increases the flow value in G by finding an

“augmenting path” in an associated “residual network” Gf. The edges

of the augmenting path in Gf indicate on which edges in G to update the flow in order to increase the flow value. Although each iteration of the

Ford-Fulkerson method increases the value of the flow, we’ll see that the

flow on any particular edge of G may increase or decrease. Although it

might seem counterintuitive to decrease the flow on an edge, doing so

may enable flow to increase on other edges, allowing more flow to travel

from the source to the sink. The Ford-Fulkerson method, given in the

procedure FORD-FULKERSON-METHOD, repeatedly augments the

flow until the residual network has no more augmenting paths. The

max-flow min-cut theorem shows that upon termination, this process

yields a maximum flow.

FORD-FULKERSON-METHOD ( G, s, t)

1 initialize flow f to 0

2 while there exists an augmenting path p in the residual network Gf

3

augment flow f along p

4 return f

In order to implement and analyze the Ford-Fulkerson method, we

need to introduce several additional concepts.

Residual networks

Intuitively, given a flow network G and a flow f, the residual network Gf consists of edges whose capacities represent how the flow can change on

edges of G. An edge of the flow network can admit an amount of

additional flow equal to the edge’s capacity minus the flow on that edge.

If that value is positive, that edge goes into Gf with a “residual capacity”

of cf ( u, v) = c( u, v) − f ( u, v). The only edges of G that belong to Gf are

Image 795

Image 796

those that can admit more flow. Those edges ( u, v) whose flow equals their capacity have cf ( u, v) = 0, and they do not belong to Gf.

You might be surprised that the residual network Gf can also contain

edges that are not in G. As an algorithm manipulates the flow, with the

goal of increasing the total flow, it might need to decrease the flow on a

particular edge in order to increase the flow elsewhere. In order to

represent a possible decrease in the positive flow f ( u, v) on an edge in G, the residual network Gf contains an edge ( v, u) with residual capacity cf ( v, u) = f ( u, v)—that is, an edge that can admit flow in the opposite direction to ( u, v), at most canceling out the flow on ( u, v). These reverse edges in the residual network allow an algorithm to send back flow it

has already sent along an edge. Sending flow back along an edge is

equivalent to decreasing the flow on the edge, which is a necessary

operation in many algorithms.

More formally, for a flow network G = ( V, E) with source s, sink t, and a flow f, consider a pair of vertices u, vV. We define the residual capacity cf ( u, v) by

In a flow network, ( u, v) ∈ E implies ( v, u) ∉ E, and so exactly one case in equation (24.2) applies to each ordered pair of vertices.

As an example of equation (24.2), if c( u, v) = 16 and f ( u, v) = 11, then f ( u, v) can increase by up to cf ( u, v) = 5 units before exceeding the capacity constraint on edge ( u, v). Alternatively, up to 11 units of flow can return from v to u, so that cf ( v, u) = 11.

Given a flow network G = ( V, E) and a flow f, the residual network of G induced by f is Gf = ( V, Ef), where

Image 797

Figure 24.4 (a) The flow network G and flow f of Figure 24.1(b). (b) The residual network Gf with augmenting path p, having residual capacity cf ( p) = cf ( v 2, v 3) = 4, in blue. Edges with residual capacity equal to 0, such as ( v 1, v 3), are not shown, a convention we follow in the remainder of this section. (c) The flow in G that results from augmenting along path p by its residual capacity 4. Edges carrying no flow, such as ( v 3, v 2), are labeled only by their capacity, another convention we follow throughout. (d) The residual network induced by the flow in (c).

That is, as promised above, each edge of the residual network, or

residual edge, can admit a flow that is greater than 0. Figure 24.4(a)

repeats the flow network G and flow f of Figure 24.1(b), and Figure

24.4(b) shows the corresponding residual network Gf. The edges in Ef

are either edges in E or their reversals, and thus

| Ef| ≤ 2 | E|.