Subsequently, Gabow [163] rediscovered this algorithm. Knuth [259]
was the first to give a linear-time algorithm for topological sorting.
1 We distinguish between gray and black vertices to help us understand how breadth-first search operates. In fact, as Exercise 20.2-3 shows, we get the same result even if we do not distinguish between gray and black vertices.
2 Chapters 22 and 23 generalize shortest paths to weighted graphs, in which every edge has a real-valued weight and the weight of a path is the sum of the weights of its constituent edges.
The graphs considered in the present chapter are unweighted or, equivalently, all edges have unit weight.
3 It may seem arbitrary that breadth-first search is limited to only one source whereas depth-first search may search from multiple sources. Although conceptually, breadth-first search could proceed from multiple sources and depth-first search could be limited to one source, our approach reflects how the results of these searches are typically used. Breadth-first search usually serves to find shortest-path distances and the associated predecessor subgraph from a given source. Depth-first search is often a subroutine in another algorithm, as we’ll see later in this chapter.
Electronic circuit designs often need to make the pins of several
components electrically equivalent by wiring them together. To
interconnect a set of n pins, the designer can use an arrangement of n −
1 wires, each connecting two pins. Of all such arrangements, the one
that uses the least amount of wire is usually the most desirable.
To model this wiring problem, use a connected, undirected graph G
= ( V, E), where V is the set of pins, E is the set of possible interconnections between pairs of pins, and for each edge ( u, v) ∈ E, a weight w( u, v) specifies the cost (amount of wire needed) to connect u and v. The goal is to find an acyclic subset T ⊆ E that connects all of the vertices and whose total weight
is minimized. Since T is acyclic and connects all of the vertices, it must
form a tree, which we call a spanning tree since it “spans” the graph G.
We call the problem of determining the tree T the minimum-spanning-
tree problem. 1 Figure 21.1 shows an example of a connected graph and a minimum spanning tree.
This chapter studies two ways to solve the minimum-spanning-tree
problem. Kruskal’s algorithm and Prim’s algorithm both run in O( E lg
V) time. Prim’s algorithm achieves this bound by using a binary heap as
a priority queue. By using Fibonacci heaps instead (see page 478),
Prim’s algorithm runs in O( E + V lg V) time. This bound is better than O( E lg V) whenever | E| grows asymptotically faster than | V|.
Figure 21.1 A minimum spanning tree for a connected graph. The weights on edges are shown, and the blue edges form a minimum spanning tree. The total weight of the tree shown is 37. This minimum spanning tree is not unique: removing the edge ( b, c) and replacing it with the edge ( a, h) yields another spanning tree with weight 37.
The two algorithms are greedy algorithms, as described in Chapter
15. Each step of a greedy algorithm must make one of several possible
choices. The greedy strategy advocates making the choice that is the best
at the moment. Such a strategy does not generally guarantee that it
always finds globally optimal solutions to problems. For the minimum-
spanning-tree problem, however, we can prove that certain greedy
strategies do yield a spanning tree with minimum weight. Although you
can read this chapter independently of Chapter 15, the greedy methods presented here are a classic application of the theoretical notions
introduced there.
Section 21.1 introduces a “generic” minimum-spanning-tree method
that grows a spanning tree by adding one edge at a time. Section 21.2
gives two algorithms that implement the generic method. The first
algorithm, due to Kruskal, is similar to the connected-components
algorithm from Section 19.1. The second, due to Prim, resembles Dijkstra’s shortest-paths algorithm (Section 22.3).
Because a tree is a type of graph, in order to be precise we must
define a tree in terms of not just its edges, but its vertices as well.
Because this chapter focuses on trees in terms of their edges, we’ll
implicitly understand that the vertices of a tree T are those that some
edge of T is incident on.
21.1 Growing a minimum spanning tree
The input to the minumum-spanning-tree problem is a connected, undirected graph G = ( V, E) with a weight function w : E → ℝ. The goal is to find a minimum spanning tree for G. The two algorithms
considered in this chapter use a greedy approach to the problem,
although they differ in how they apply this approach.
This greedy strategy is captured by the procedure GENERIC-MST
on the facing page, which grows the minimum spanning tree one edge at
a time. The generic method manages a set A of edges, maintaining the
following loop invariant:
Prior to each iteration, A is a subset of some minimum
spanning tree.
GENERIC-MST( G, w)
1 A = Ø
2 while A does not form a spanning tree
3
find an edge ( u, v) that is safe for A
4
A = A ∪ {( u, v)}
5 return A
Each step determines an edge ( u, v) that the procedure can add to A without violating this invariant, in the sense that A ∪ {( u, v)} is also a subset of a minimum spanning tree. We call such an edge a safe edge for
A, since it can be added safely to A while maintaining the invariant.
This generic algorithm uses the loop invariant as follows:
Initialization: After line 1, the set A trivially satisfies the loop invariant.
Maintenance: The loop in lines 2–4 maintains the invariant by adding
only safe edges.
Termination: All edges added to A belong to a minimum spanning tree,
and the loop must terminate by the time it has considered all edges.
Therefore, the set A returned in line 5 must be a minimum spanning
tree.
The tricky part is, of course, finding a safe edge in line 3. One must
exist, since when line 3 is executed, the invariant dictates that there is a
spanning tree T such that A ⊆ T. Within the while loop body, A must be a proper subset of T, and therefore there must be an edge ( u, v) ∈ T
such that ( u, v) ∉ A and ( u, v) is safe for A.
The remainder of this section provides a rule (Theorem 21.1) for
recognizing safe edges. The next section describes two algorithms that
use this rule to find safe edges efficiently.
We first need some definitions. Acut (S, V – S) of an undirected graph
G = ( V, E) is a partition of V. Figure 21.2 illustrates this notion. We say that an edge ( u, v) ∈ Ecrosses the cut ( S, V – S) if one of its endpoints belongs to S and the other belongs to V – S. A cut respects a set A of edges if no edge in A crosses the cut. An edge is a light edge crossing a cut if its weight is the minimum of any edge crossing the cut. There can
be more than one light edge crossing a cut in the case of ties. More
generally, we say that an edge is a light edge satisfying a given property
if its weight is the minimum of any edge satisfying the property.
The following theorem gives the rule for recognizing safe edges.
Theorem 21.1
Let G = ( V, E) be a connected, undirected graph with a real-valued weight function w defined on E. Let A be a subset of E that is included in some minimum spanning tree for G, let ( S, V – S) be any cut of G
that respects A, and let ( u, v) be a light edge crossing ( S, V – S). Then, edge ( u, v) is safe for A.
Figure 21.2 A cut ( S, V – S) of the graph from Figure 21.1. Orange vertices belong to the set S, and tan vertices belong to V – S. The edges crossing the cut are those connecting tan vertices with orange vertices. The edge ( d, c) is the unique light edge crossing the cut. Blue edges form a subset A of the edges. The cut ( S, V – S) respects A, since no edge of A crosses the cut.
Proof Let T be a minimum spanning tree that includes A, and assume that T does not contain the light edge ( u, v), since if it does, we are done.
We’ll construct another minimum spanning tree T′ that includes A ∪
{( u, v)} by using a cut-and-paste technique, thereby showing that ( u, v) is a safe edge for A.
The edge ( u, v) forms a cycle with the edges on the simple path p from u to v in T, as Figure 21.3 illustrates. Since u and v are on opposite sides of the cut ( S, V – S), at least one edge in T lies on the simple path p and also crosses the cut. Let ( x, y) be any such edge. The edge ( x, y) is not in A, because the cut respects A. Since ( x, y) is on the unique simple path from u to v in T, removing ( x, y) breaks T into two components.
Adding ( u, v) reconnects them to form a new spanning tree T′ = ( T –
{( x, y)}) ∪ {( u, v)}.
We next show that T′ is a minimum spanning tree. Since ( u, v) is a light edge crossing ( S, V – S) and ( x, y) also crosses this cut, w( u, v) ≤
w( x, y). Therefore,
w( T′) = w( T) − w( x, y) + w( u, v)
≤ w( T).
But T is a minimum spanning tree, so that w( T) ≤ w( T′), and thus, T′
must be a minimum spanning tree as well.
It remains to show that ( u, v) is actually a safe edge for A. We have A
⊆ T′, since A ⊆ T and ( x, y) ∉ A, and thus, A ∪ {( u, v)} ⊆ T′.
Consequently, since T′ is a minimum spanning tree, ( u, v) is safe for A.
▪
Theorem 21.1 provides insight into how the GENERIC-MST
method works on a connected graph G = ( V, E). As the method proceeds, the set A is always acyclic, since it is a subset of a minimum
spanning tree and a tree may not contain a cycle. At any point in the
execution, the graph GA = ( V, A) is a forest, and each of the connected components of GA is a tree. (Some of the trees may contain just one
vertex, as is the case, for example, when the method begins: A is empty
and the forest contains | V| trees, one for each vertex.) Moreover, any safe edge ( u, v) for A connects distinct components of GA, since A ∪
{( u, v)} must be acyclic.
Figure 21.3 The proof of Theorem 21.1. Orange vertices belong to S, and tan vertices belong to V – S. Only edges in the minimum spanning tree T are shown, along with edge ( u, v), which does not lie in T. The edges in A are blue, and ( u, v) is a light edge crossing the cut ( S, V – S). The edge ( x, y) is an edge on the unique simple path p from u to v in T. To form a minimum spanning tree T′ that contains ( u, v), remove the edge ( x, y) from T and add the edge ( u, v).
The while loop in lines 2–4 of GENERIC-MST executes | V| – 1 times
because it finds one of the | V| – 1 edges of a minimum spanning tree in
each iteration. Initially, when A = Ø, there are | V| trees in GA, and each iteration reduces that number by 1. When the forest contains only a
single tree, the method terminates.
The two algorithms in Section 21.2 use the following corollary to Theorem 21.1.
Corollary 21.2
Let G = ( V, E) be a connected, undirected graph with a real-valued weight function w defined on E. Let A be a subset of E that is included in some minimum spanning tree for G, and let C = ( VC, EC) be a connected component (tree) in the forest GA = ( V, A). If ( u, v) is a light edge connecting C to some other component in GA, then ( u, v) is safe for A.
Proof The cut ( VC, V – VC) respects A, and ( u, v) is a light edge for this cut. Therefore, ( u, v) is safe for A.
▪
Exercises
Let ( u, v) be a minimum-weight edge in a connected graph G. Show that ( u, v) belongs to some minimum spanning tree of G.
21.1-2
Professor Sabatier conjectures the following converse of Theorem 21.1.
Let G = ( V, E) be a connected, undirected graph with a real-valued weight function w defined on E. Let A be a subset of E that is included in some minimum spanning tree for G, let ( S, V – S) be any cut of G
that respects A, and let ( u, v) be a safe edge for A crossing ( S, V – S).
Then, ( u, v) is a light edge for the cut. Show that the professor’s conjecture is incorrect by giving a counterexample.
21.1-3
Show that if an edge ( u, v) is contained in some minimum spanning tree, then it is a light edge crossing some cut of the graph.
21.1-4
Give a simple example of a connected graph such that the set of edges
{( u, v) : there exists a cut ( S, V – S) such that ( u, v) is a light edge crossing ( S, V – S)} does not form a minimum spanning tree.
21.1-5
Let e be a maximum-weight edge on some cycle of connected graph G =
( V, E). Prove that there is a minimum spanning tree of G′ = ( V, E – { e}) that is also a minimum spanning tree of G. That is, there is a minimum
spanning tree of G that does not include e.
21.1-6
Show that a graph has a unique minimum spanning tree if, for every cut
of the graph, there is a unique light edge crossing the cut. Show that the
converse is not true by giving a counterexample.
21.1-7
Argue that if all edge weights of a graph are positive, then any subset of
edges that connects all vertices and has minimum total weight must be a
tree. Give an example to show that the same conclusion does not follow
if we allow some weights to be nonpositive.
21.1-8
Let T be a minimum spanning tree of a graph G, and let L be the sorted list of the edge weights of T. Show that for any other minimum
spanning tree T′ of G, the list L is also the sorted list of edge weights of T′.
21.1-9
Let T be a minimum spanning tree of a graph G = ( V, E), and let V′ be a subset of V. Let T′ be the subgraph of T induced by V′, and let G′ be the subgraph of G induced by V′. Show that if T′ is connected, then T′ is a minimum spanning tree of G′.
21.1-10
Given a graph G and a minimum spanning tree T, suppose that the weight of one of the edges in T decreases. Show that T is still a minimum spanning tree for G. More formally, let T be a minimum spanning tree for G with edge weights given by weight function w.
Choose one edge ( x, y) ∈ T and a positive number k, and define the weight function w′ by
Show that T is a minimum spanning tree for G with edge weights given
by w′.
★ 21.1-11
Given a graph G and a minimum spanning tree T, suppose that the weight of one of the edges not in T decreases. Give an algorithm for finding the minimum spanning tree in the modified graph.
21.2 The algorithms of Kruskal and Prim
The two minimum-spanning-tree algorithms described in this section
elaborate on the generic method. They each use a specific rule to
determine a safe edge in line 3 of GENERIC-MST. In Kruskal’s
algorithm, the set A is a forest whose vertices are all those of the given
graph. The safe edge added to A is always a lowest-weight edge in the
graph that connects two distinct components. In Prim’s algorithm, the
set A forms a single tree. The safe edge added to A is always a lowest-
weight edge connecting the tree to a vertex not in the tree. Both
algorithms assume that the input graph is connected and represented by
adjacency lists.
Figure 21.4 The execution of Kruskal’s algorithm on the graph from Figure 21.1. Blue edges belong to the forest A being grown. The algorithm considers each edge in sorted order by weight. A red arrow points to the edge under consideration at each step of the algorithm. If the edge joins two distinct trees in the forest, it is added to the forest, thereby merging the two trees.
Kruskal’s algorithm
Kruskal’s algorithm finds a safe edge to add to the growing forest by
finding, of all the edges that connect any two trees in the forest, an edge
( u, v) with the lowest weight. Let C 1 and C 2 denote the two trees that are connected by ( u, v). Since ( u, v) must be a light edge connecting C 1
to some other tree, Corollary 21.2 implies that ( u, v) is a safe edge for C 1. Kruskal’s algorithm qualifies as a greedy algorithm because at each
step it adds to the forest an edge with the lowest possible weight.
Figure 21.4, continued Further steps in the execution of Kruskal’s algorithm.
Like the algorithm to compute connected components from Section
19.1, the procedure MST-KRUSKAL on the following page uses a
disjoint-set data structure to maintain several disjoint sets of elements.
Each set contains the vertices in one tree of the current forest. The
operation FIND-SET( u) returns a representative element from the set
that contains u. Thus, to determine whether two vertices u and v belong to the same tree, just test whether FIND-SET( u) equals FIND-SET( v).
To combine trees, Kruskal’s algorithm calls the UNION procedure.
Figure 21.4 shows how Kruskal’s algorithm works. Lines 1–3
initialize the set A to the empty set and create | V| trees, one containing each vertex. The for loop in lines 6–9 examines edges in order of weight,
from lowest to highest. The loop checks, for each edge ( u, v), whether the endpoints u and v belong to the same tree. If they do, then the edge ( u, v) cannot be added to the forest without creating a cycle, and the edge is ignored. Otherwise, the two vertices belong to different trees. In
this case, line 8 adds the edge ( u, v) to A, and line 9 merges the vertices in the two trees.
MST-KRUSKAL( G, w)
1 A = Ø
2 for each vertex v ∈ G.V
3
MAKE-SET( v)
4 create a single list of the edges in G.E
5 sort the list of edges into monotonically increasing order by weight
w
6 for each edge ( u, v) taken from the sorted list in order
7
if FIND-SET( u) ≠ FIND-SET( v)
8
A = A ∪ {( u, v)}
9
UNION( u, v)
10 return A
The running time of Kruskal’s algorithm for a graph G = ( V, E) depends on the specific implementation of the disjoint-set data
structure. Let’s assume that it uses the disjoint-set-forest
implementation of Section 19.3 with the union-by-rank and path-compression heuristics, since that is the asymptotically fastest
implementation known. Initializing the set A in line 1 takes O(1) time, creating a single list of edges in line 4 takes O( V + E) time (which is O(E) because G is connected), and the time to sort the edges in line 5 is O( E lg E). (We’ll account for the cost of the | V| MAKE-SET operations in the for loop of lines 2–3 in a moment.) The for loop of lines 6–9
performs O( E) FIND-SET and UNION operations on the disjoint-set
forest. Along with the | V| MAKE-SET operations, these disjoint-set
operations take a total of O(( V + E) α( V)) time, where α is the very slowly growing function defined in Section 19.4. Because we assume that G is connected, we have | E| ≥ | V| – 1, and so the disjoint-set
operations take O( E α( V)) time. Moreover, since α(| V|) = O(lg V) = O(lg E), the total running time of Kruskal’s algorithm is O( E lg E).
Observing that | E| < | V|2, we have lg | E| = O(lg V), and so we can restate the running time of Kruskal’s algorithm as O( E lg V).
Prim’s algorithm
Like Kruskal’s algorithm, Prim’s algorithm is a special case of the
generic minimum-spanning-tree method from Section 21.1. Prim’s algorithm operates much like Dijkstra’s algorithm for finding shortest
paths in a graph, which we’ll see in Section 22.3. Prim’s algorithm has the property that the edges in the set A always form a single tree. As
Figure 21.5 shows, the tree starts from an arbitrary root vertex r and grows until it spans all the vertices in V. Each step adds to the tree A a
light edge that connects A to an isolated vertex—one on which no edge
of A is incident. By Corollary 21.2, this rule adds only edges that are safe for A. Therefore, when the algorithm terminates, the edges in A form a minimum spanning tree. This strategy qualifies as greedy since at
each step it adds to the tree an edge that contributes the minimum
amount possible to the tree’s weight.
Figure 21.5 The execution of Prim’s algorithm on the graph from Figure 21.1. The root vertex is a. Blue vertices and edges belong to the tree being grown, and tan vertices have yet to be added to the tree. At each step of the algorithm, the vertices in the tree determine a cut of the graph, and a light edge crossing the cut is added to the tree. The edge and vertex added to the tree are highlighted in orange. In the second step (part (c)), for example, the algorithm has a choice of adding either edge ( b, c) or edge ( a, h) to the tree since both are light edges crossing the cut.
In the procedure MST-PRIM below, the connected graph G and the
root r of the minimum spanning tree to be grown are inputs to the
algorithm. In order to efficiently select a new edge to add into tree A,
the algorithm maintains a min-priority queue Q of all vertices that are
not in the tree, based on a key attribute. For each vertex v, the attribute
v.key is the minimum weight of any edge connecting v to a vertex in the tree, where by convention, v.key = ∞ if there is no such edge. The
attribute v.π names the parent of v in the tree. The algorithm implicitly maintains the set A from GENERIC-MST as
A = {( v, v.π) : v ∈ V – { r} – Q}, where we interpret the vertices in Q as forming a set. When the
algorithm terminates, the min-priority queue Q is empty, and thus the
minimum spanning tree A for G is
A = {( v, v.π) : v ∈ V – { r}}.
MST-PRIM( G, w, r)
1 for each vertex u ∈ G. V
2
u.key = ∞
3
u.π = NIL
4 r.key = 0
5 Q = Ø
6 for each vertex u ∈ G.V
7
INSERT( Q, u)
8 while Q ≠ Ø
9
u = EXTRACT-MIN( Q) // add u to the tree
10
for each vertex v in// update keys of u’s non-tree
G.Adj[ u]
neighbors
11
if v ∈ Q and w( u, v) < v.key
12
v.π = u
13
v.key = w( u, v)
14
DECREASE-KEY( Q, v, w( u, v))
Figure 21.5 shows how Prim’s algorithm works. Lines 1–7 set the key of each vertex to ∞ (except for the root r, whose key is set to 0 to make it
the first vertex processed), set the parent of each vertex to NIL, and
insert each vertex into the min-priority queue Q. The algorithm
maintains the following three-part loop invariant:
Prior to each iteration of the while loop of lines 8–14,
1. A = {( v, v.π) : v ∈ V – {r} – Q}.
2. The vertices already placed into the minimum spanning
tree are those in V − Q.
3. For all vertices v ∈ Q, if v.π ≠ NIL, then v.key < ∞ and v.key is the weight of a light edge ( v, v.π) connecting v to some vertex already placed into the minimum spanning
tree.
Line 9 identifies a vertex u ∈ Q incident on a light edge that crosses the cut ( V – Q, Q) (with the exception of the first iteration, in which u = r due to lines 4–7). Removing u from the set Q adds it to the set V – Q of vertices in the tree, thus adding the edge ( u, u.π) to A. The for loop of lines 10–14 updates the key and attributes of every vertex v adjacent to u but not in the tree, thereby maintaining the third part of the loop
invariant. Whenever line 13 updates v.key, line 14 calls DECREASE-
KEY to inform the min-priority queue that v’s key has changed.
The running time of Prim’s algorithm depends on the specific
implementation of the min-priority queue Q. You can implement Q
with a binary min-heap (see Chapter 6), including a way to map between vertices and their corresponding heap elements. The BUILD-MIN-HEAP procedure can perform lines 5–7 in O( V) time. In fact, there is no need to call BUILD-MIN-HEAP. You can just put the key
of r at the root of the min-heap, and because all other keys are ∞, they
can go anywhere else in the min-heap. The body of the while loop
executes | V| times, and since each EXTRACT-MIN operation takes
O(lg V) time, the total time for all calls to EXTRACT-MIN is O( V lg V). The for loop in lines 10–14 executes O( E) times altogether, since the sum of the lengths of all adjacency lists is 2 | E|. Within the for loop, the
test for membership in Q in line 11 can take constant time if you keep a
bit for each vertex that indicates whether it belongs to Q and update the
bit when the vertex is removed from Q. Each call to DECREASE-KEY
in line 14 takes O(lg V) time. Thus, the total time for Prim’s algorithm is O( V lg V + E lg V) = O( E lg V), which is asymptotically the same as for our implementation of Kruskal’s algorithm.
You can further improve the asymptotic running time of Prim’s algorithm by implementing the min-priority queue with a Fibonacci
heap (see page 478). If a Fibonacci heap holds | V| elements, an
EXTRACT-MIN operation takes O(lg V) amortized time and each
INSERT and DECREASE-KEY operation takes only O(1) amortized
time. Therefore, by using a Fibonacci heap to implement the min-
priority queue Q, the running time of Prim’s algorithm improves to
O( E+ V lg V).
Exercises
21.2-1
Kruskal’s algorithm can return different spanning trees for the same
input graph G, depending on how it breaks ties when the edges are
sorted. Show that for each minimum spanning tree T of G, there is a way to sort the edges of G in Kruskal’s algorithm so that the algorithm
returns T.
21.2-2
Give a simple implementation of Prim’s algorithm that runs in O( V 2) time when the graph G = ( V, E) is represented as an adjacency matrix.
21.2-3
For a sparse graph G = ( V, E), where | E| = Θ( V), is the implementation of Prim’s algorithm with a Fibonacci heap asymptotically faster than
the binary-heap implementation? What about for a dense graph, where
| E| = Θ( V 2)? How must the sizes | E| and | V| be related for the Fibonacci-heap implementation to be asymptotically faster than the binary-heap
implementation?
21.2-4
Suppose that all edge weights in a graph are integers in the range from 1
to | V|. How fast can you make Kruskal’s algorithm run? What if the
edge weights are integers in the range from 1 to W for some constant
W?
21.2-5
Suppose that all edge weights in a graph are integers in the range from 1
to | V|. How fast can you make Prim’s algorithm run? What if the edge
weights are integers in the range from 1 to W for some constant W?
21.2-6
Professor Borden proposes a new divide-and-conquer algorithm for
computing minimum spanning trees, which goes as follows. Given a
graph G = ( V, E), partition the set V of vertices into two sets V 1 and V 2
such that | V 1| and | V 2| differ by at most 1. Let E 1 be the set of edges that are incident only on vertices in V 1, and let E 2 be the set of edges that are incident only on vertices in V 2. Recursively solve a minimum-spanning-tree problem on each of the two subgraphs G 1 = ( V 1, E 1) and G 2 = ( V 2, E 2). Finally, select the minimum-weight edge in E that crosses the cut V 1, V 2), and use this edge to unite the resulting two minimum spanning trees into a single spanning tree.
Either argue that the algorithm correctly computes a minimum
spanning tree of G, or provide an example for which the algorithm fails.
★ 21.2-7
Suppose that the edge weights in a graph are uniformly distributed over
the half-open interval [0, 1). Which algorithm, Kruskal’s or Prim’s, can
you make run faster?
★ 21.2-8
Suppose that a graph G has a minimum spanning tree already
computed. How quickly can you update the minimum spanning tree
upon adding a new vertex and incident edges to G?
Problems
21-1 Second-best minimum spanning tree
Let G = ( V, E) be an undirected, connected graph whose weight function is w : E → ℝ, and suppose that | E| ≥ | V| and all edge weights are distinct.
We define a second-best minimum spanning tree as follows. Let T be
the set of all spanning trees of G, and let T be a minimum spanning tree of G. Then a second-best minimum spanning tree is a spanning tree T′
such that w( T′) = min { w( T″) : T″ ∈ T − { T}}.
a. Show that the minimum spanning tree is unique, but that the second-
best minimum spanning tree need not be unique.
b. Let T be the minimum spanning tree of G. Prove that G contains some edge ( u, v) ∈ T and some edge ( x, y) ∉ T such that ( T – {( u, v)})
∪ {( x, y)} is a second-best minimum spanning tree of G.
c. Now let T be any spanning tree of G and, for any two vertices u, v ∈
V, let max[ u, v] denote an edge of maximum weight on the unique simple path between u and v in T. Describe an O( V 2)-time algorithm that, given T, computes max[ u, v] for all u, v ∈ V.
d. Give an efficient algorithm to compute the second-best minimum
spanning tree of G.
21-2 Minimum spanning tree in sparse graphs
For a very sparse connected graph G = ( V, E), it is possible to further improve upon the O( E + V lg V) running time of Prim’s algorithm with a Fibonacci heap by preprocessing G to decrease the number of vertices
before running Prim’s algorithm. In particular, for each vertex u, choose
the minimum-weight edge ( u, v) incident on u, and put ( u, v) into the minimum spanning tree under construction. Then, contract all chosen
edges (see Section B.4). Rather than contracting these edges one at a time, first identify sets of vertices that are united into the same new
vertex. Then create the graph that would have resulted from contracting
these edges one at a time, but do so by “renaming” edges according to
the sets into which their endpoints were placed. Several edges from the
original graph might be renamed the same as each other. In such a case,
only one edge results, and its weight is the minimum of the weights of
the corresponding original edges.
Initially, set the minimum spanning tree T being constructed to be
empty, and for each edge ( u, v) ∈ E, initialize the two attributes ( u,
v). orig = ( u, v) and ( u, v). c = w( u, v). Use the orig attribute to reference the edge from the initial graph that is associated with an edge in the
contracted graph. The c attribute holds the weight of an edge, and as
edges are contracted, it is updated according to the above scheme for
choosing edge weights. The procedure MST-REDUCE on the facing
page takes inputs G and T, and it returns a contracted graph G′ with updated attributes orig′ and c′. The procedure also accumulates edges of G into the minimum spanning tree T.
a. Let T be the set of edges returned by MST-REDUCE, and let A be the minimum spanning tree of the graph G′ formed by the call MST-PRIM( G′, c′, r), where c′ is the weight attribute on the edges of G′. E
and r is any vertex in G′: V. Prove that T ∪ {( x, y). orig′ : ( x, y) ∈ A} is a minimum spanning tree of G.
b. Argue that | G′. V| ≤ | V| /2.
c. Show how to implement MST-REDUCE so that it runs in O( E) time.
( Hint: Use simple data structures.)
d. Suppose that you run k phases of MST-REDUCE, using the output
G′ produced by one phase as the input G to the next phase and
accumulating edges in T. Argue that the overall running time of the k
phases is O( kE).
e. Suppose that after running k phases of MST-REDUCE, as in part
( d), you run Prim’s algorithm by calling MST-PRIM( G′, c′, r), where G′, with weight attribute c′, is returned by the last phase and r is any vertex in G′. V. Show how to pick k so that the overall running time is O( E lg lg V). Argue that your choice of k minimizes the overall asymptotic running time.
f. For what values of | E| (in terms of | V|) does Prim’s algorithm with preprocessing asymptotically beat Prim’s algorithm without
preprocessing?
MST-REDUCE( G, T)
1 for each vertex v ∈ G.V
2
v.mark = FALSE
MAKE-SET( v)
4 for each vertex u ∈ G.V
5
if u.mark == FALSE
6
choose v ∈ G.Adj[ u] such that ( u, v). c is minimized 7
UNION( u, v)
8
T = T ∪ {( u, v). orig}
9
u.mark = TRUE
10
v.mark = TRUE
11 G′. V = {FIND-SET( v) : v ∈ G.V}
12 G′. E = Ø
13 for each edge ( x, y) ∈ G.E
14
u = FIND-SET( x)
15
v = FIND-SET( y)
16
if u ≠ v
17
if ( u, v) ∉ G′. E
18
G′. E = G′. E ∪ {(u, v)}
19
( u, v). orig′ = ( x, y). orig
20
( u, v). c′ = ( x, y). c
21
elseif ( x, y). c < ( u, v).c′
22
( u, v). orig′ = ( x, y). orig
23
( u, v). c′ = ( x, y). c
24 construct adjacency lists G′. Adj for G′
25 return G′ and T
21-3 Alternative minimum-spanning-tree algorithms
Consider the three algorithms MAYBE-MST-A, MAYBE-MST-B, and
MAYBE-MST-C on the next page. Each one takes a connected graph
and a weight function as input and returns a set of edges T. For each
algorithm, either prove that T is a minimum spanning tree or prove that
T is not necessarily a minimum spanning tree. Also describe the most
efficient implementation of each algorithm, regardless of whether it
computes a minimum spanning tree.
21-4 Bottleneck spanning tree
A bottleneck spanning tree T of an undirected graph G is a spanning tree of G whose largest edge weight is minimum over all spanning trees of G.
The value of the bottleneck spanning tree is the weight of the
maximum-weight edge in T.
MAYBE-MST-A( G, w)
1 sort the edges into monotonically decreasing order of edge weights
w
2 T = E
3 for each edge e, taken in monotonically decreasing order by weight
4
if T – { e} is a connected graph
5
T = T – { e}
6 return T
MAYBE-MST-B( G, w)
1 T = Ø
2 for each edge e, taken in arbitrary order
3
if T ∪ { e} has no cycles
4
T = T ∪ { e}
5 return T
MAYBE-MST-C( G, w)
1 T = Ø
2 for each edge e, taken in arbitrary order
3
T = T ∪ { e}
4
if T has a cycle c
5
let e′ be a maximum-weight edge on c
6
T = T – { e′}
7 return T
a. Argue that a minimum spanning tree is a bottleneck spanning tree.
Part (a) shows that finding a bottleneck spanning tree is no harder than
finding a minimum spanning tree. In the remaining parts, you will show
how to find a bottleneck spanning tree in linear time.


b. Give a linear-time algorithm that, given a graph G and an integer b, determines whether the value of the bottleneck spanning tree is at
most b.
c. Use your algorithm for part (b) as a subroutine in a linear-time
algorithm for the bottleneck-spanning-tree problem. ( Hint: You might
want to use a subroutine that contracts sets of edges, as in the MST-
REDUCE procedure described in Problem 21-2.)
Chapter notes
Tarjan [429] surveys the minimum-spanning-tree problem and provides excellent advanced material. Graham and Hell [198] compiled a history of the minimum-spanning-tree problem.
Tarjan attributes the first minimum-spanning-tree algorithm to a
1926 paper by O. Borůvka. Borůvka’s algorithm consists of running
O(lg V) iterations of the procedure MST-REDUCE described in
Problem 21-2. Kruskal’s algorithm was reported by Kruskal [272] in 1956. The algorithm commonly known as Prim’s algorithm was indeed
invented by Prim [367], but it was also invented earlier by V. Jarník in 1930.
When | E| = Ω( V lg V), Prim’s algorithm, implemented with a Fibonacci heap, runs in O( E) time. For sparser graphs, using a combination of the ideas from Prim’s algorithm, Kruskal’s algorithm,
and Borůvka’s algorithm, together with advanced data structures,
Fredman and Tarjan [156] give an algorithm that runs in O( E lg* V) time. Gabow, Galil, Spencer, and Tarjan [165] improved this algorithm to run in O( E lg lg* V) time. Chazelle [83] gives an algorithm that runs in O( E ( E, V)) time, where ( E, V) is the functional inverse of Ackermann’s function. (See the chapter notes for Chapter 19 for a brief discussion of Ackermann’s function and its inverse.) Unlike previous
minimum-spanning-tree algorithms, Chazelle’s algorithm does not
follow the greedy method. Pettie and Ramachandran [356] give an algorithm based on precomputed “MST decision trees” that also runs
in O( E ( E, V)) time.
A related problem is spanning-tree verification: given a graph G = ( V, E) and a tree T ⊆ E, determine whether T is a minimum spanning tree of G. King [254] gives a linear-time algorithm to verify a spanning tree, building on earlier work of Komlós [269] and Dixon, Rauch, and Tarjan
[120].
The above algorithms are all deterministic and fall into the
comparison-based model described in Chapter 8. Karger, Klein, and Tarjan [243] give a randomized minimum-spanning-tree algorithm that runs in O( V + E) expected time. This algorithm uses recursion in a manner similar to the linear-time selection algorithm in Section 9.3: a recursive call on an auxiliary problem identifies a subset of the edges E′
that cannot be in any minimum spanning tree. Another recursive call on
E – E′ then finds the minimum spanning tree. The algorithm also uses
ideas from Borůvka’s algorithm and King’s algorithm for spanning-tree
verification.
Fredman and Willard [158] showed how to find a minimum spanning
tree in O( V + E) time using a deterministic algorithm that is not comparison based. Their algorithm assumes that the data are b-bit
integers and that the computer memory consists of addressable b-bit
words.
1 The phrase “minimum spanning tree” is a shortened form of the phrase “minimum-weight spanning tree.” There is no point in minimizing the number of edges in T, since all spanning trees have exactly | V| − 1 edges by Theorem B.2 on page 1169.

22 Single-Source Shortest Paths
Suppose that you need to drive from Oceanside, New York, to
Oceanside, California, by the shortest possible route. Your GPS
contains information about the entire road network of the United
States, including the road distance between each pair of adjacent
intersections. How can your GPS determine this shortest route?
One possible way is to enumerate all the routes from Oceanside, New
York, to Oceanside, California, add up the distances on each route, and
select the shortest. But even disallowing routes that contain cycles, your
GPS would need to examine an enormous number of possibilities, most
of which are simply not worth considering. For example, a route that
passes through Miami, Florida, is a poor choice, because Miami is
several hundred miles out of the way.
This chapter and Chapter 23 show how to solve such problems efficiently. The input to a shortest-paths problem is a weighted, directed
graph G = ( V, E), with a weight function w : E → ℝ mapping edges to real-valued weights. The weight w( p) of path p = 〈 v 0, v 1, … , vk〉 is the sum of the weights of its constituent edges:
We define the shortest-path weight δ( u, v) from u to v by
A shortest path from vertex u to vertex v is then defined as any path p with weight w( p) = δ( u, v).
In the example of going from Oceanside, New York, to Oceanside,
California, your GPS models the road network as a graph: vertices
represent intersections, edges represent road segments between
intersections, and edge weights represent road distances. The goal is to
find a shortest path from a given intersection in Oceanside, New York
(say, Brower Avenue and Skillman Avenue) to a given intersection in
Oceanside, California (say, Topeka Street and South Horne Street).
Edge weights can represent metrics other than distances, such as
time, cost, penalties, loss, or any other quantity that accumulates
linearly along a path and that you want to minimize.
The breadth-first-search algorithm from Section 20.2 is a shortest-paths algorithm that works on unweighted graphs, that is, graphs in
which each edge has unit weight. Because many of the concepts from
breadth-first search arise in the study of shortest paths in weighted
graphs, you might want to review Section 20.2 before proceeding.
Variants
This chapter focuses on the single-source shortest-paths problem: given a
graph G = ( V, E), find a shortest path from a given source vertex s ∈ V
to every vertex v ∈ V. The algorithm for the single-source problem can
solve many other problems, including the following variants.
Single-destination shortest-paths problem: Find a shortest path to a
given destination vertex t from each vertex v. By reversing the direction of each edge in the graph, you can reduce this problem to a single-source problem.
Single-pair shortest-path problem: Find a shortest path from u to v for given vertices u and v. If you solve the single-source problem with source vertex u, you solve this problem also. Moreover, all known
algorithms for this problem have the same worst-case asymptotic
running time as the best single-source algorithms.
All-pairs shortest-paths problem: Find a shortest path from u to v for every pair of vertices u and v. Although you can solve this problem by




running a single-source algorithm once from each vertex, you often
can solve it faster. Additionally, its structure is interesting in its own
right. Chapter 23 addresses the all-pairs problem in detail.
Optimal substructure of a shortest path
Shortest-paths algorithms typically rely on the property that a shortest
path between two vertices contains other shortest paths within it. (The
Edmonds-Karp maximum-flow algorithm in Chapter 24 also relies on this property.) Recall that optimal substructure is one of the key
indicators that dynamic programming (Chapter 14) and the greedy method (Chapter 15) might apply. Dijkstra’s algorithm, which we shall see in Section 22.3, is a greedy algorithm, and the Floyd-Warshall algorithm, which finds a shortest path between every pair of vertices
(see Section 23.2), is a dynamic-programming algorithm. The following lemma states the optimal-substructure property of shortest paths more
precisely.
Lemma 22.1 (Subpaths of shortest paths are shortest paths)
Given a weighted, directed graph G = ( V, E) with weight function w : E
→ ℝ let p = 〈 v 0, v 1, … , vk〉 be a shortest path from vertex v 0 to vertex vk and, for any i and j such that 0 ≤ i ≤ j ≤ k, let pij = 〈 vi, vi+1, … , vj〉 be the subpath of p from vertex vi to vertex vj. Then, pij is a shortest path from vi to vj.
Proof Decompose path p into
, so that w( p) = w( p 0 i) +
w( pij) + w( pjk). Now, assume that there is a path from vi to vj with weight
. Then,
is a path from v 0 to vk whose
weight
is less than w( p), which contradicts the
assumption that p is a shortest path from v 0 to vk.
▪
Negative-weight edges
Some instances of the single-source shortest-paths problem may include edges whose weights are negative. If the graph G = ( V, E) contains no negative-weight cycles reachable from the source s, then for all v ∈ V, the shortest-path weight δ( s, v) remains well defined, even if it has a negative value. If the graph contains a negative-weight cycle reachable
from s, however, shortest-path weights are not well defined. No path
from s to a vertex on the cycle can be a shortest path—you can always
find a path with lower weight by following the proposed “shortest” path
and then traversing the negative-weight cycle. If there is a negative-
weight cycle on some path from s to v, we define δ( s, v) = −∞.
Figure 22.1 illustrates the effect of negative weights and negative-weight cycles on shortest-path weights. Because there is only one path
from s to a (the path 〈 s, a〉), we have δ( s, a) = w( s, a) = 3. Similarly, there is only one path from s to b, and so δ( s, b) = w( s, a) + w( a, b) = 3 + (−4)
= −1. There are infinitely many paths from s to c: 〈 s, c〉, 〈 s, c, d, c〉, 〈 s, c, d, c, d, c〉, and so on. Because the cycle 〈 c, d, c〉 has weight 6 + (−3) = 3
> 0, the shortest path from s to c is 〈 s, c〉, with weight δ( s, c) = w( s, c) =
5, and the shortest path from s to d is 〈 s, c, d〉, with weight δ( s, d) = w( s, c) + w( c, d) = 11. Analogously, there are infinitely many paths from s to e: 〈 s, e〉, 〈 s, e, f, e〉, 〈 s, e, f, e, f, e〉, and so on. Because the cycle 〈 e, f, e〉
has weight 3 + (−6) = −3 < 0, however, there is no shortest path from s
to e. By traversing the negative-weight cycle 〈 e, f, e〉 arbitrarily many times, you can find paths from s to e with arbitrarily large negative weights, and so δ( s, e) = −∞. Similarly, δ( s, f) = −∞. Because g is reachable from f, you can also find paths with arbitrarily large negative
weights from s to g, and so δ( s, g) = −∞. Vertices h, i, and j also form a negative-weight cycle. They are not reachable from s, however, and so
δ( s, h) = δ( s, i) = δ( s, j) = ∞.
Figure 22.1 Negative edge weights in a directed graph. The shortest-path weight from source s appears within each vertex. Because vertices e and f form a negative-weight cycle reachable from s, they have shortest-path weights of −∞. Because vertex g is reachable from a vertex whose shortest-path weight is −∞, it, too, has a shortest-path weight of −∞. Vertices such as h, i, and j are not reachable from s, and so their shortest-path weights are ∞, even though they lie on a negative-weight cycle.
Some shortest-paths algorithms, such as Dijkstra’s algorithm,
assume that all edge weights in the input graph are nonnegative, as in a
road network. Others, such as the Bellman-Ford algorithm, allow
negative-weight edges in the input graph and produce a correct answer
as long as no negative-weight cycles are reachable from the source.
Typically, if there is such a negative-weight cycle, the algorithm can
detect and report its existence.
Cycles
Can a shortest path contain a cycle? As we have just seen, it cannot
contain a negative-weight cycle. Nor can it contain a positive-weight
cycle, since removing the cycle from the path produces a path with the
same source and destination vertices and a lower path weight. That is, if
p = 〈 v 0, v 1, … , vk〉 is a path and c = 〈 vi, vi+1, … , vj〉 is a positive-weight cycle on this path (so that vi = vj and w( c) > 0), then the path p′
= 〈 v 0, v 1, … , vi, vj+1, vj+2, … , vk〉 has weight w( p′) = w( p) − w( c) < w( p), and so p cannot be a shortest path from v 0 to vk.
That leaves only 0-weight cycles. You can remove a 0-weight cycle
from any path to produce another path whose weight is the same. Thus,
if there is a shortest path from a source vertex s to a destination vertex v that contains a 0-weight cycle, then there is another shortest path from s
to v without this cycle. As long as a shortest path has 0-weight cycles, you can repeatedly remove these cycles from the path until you have a
shortest path that is cycle-free. Therefore, without loss of generality,
assume that shortest paths have no cycles, that is, they are simple paths.
Since any acyclic path in a graph G = ( V, E) contains at most | V| distinct vertices, it also contains at most | V| − 1 edges. Assume, therefore, that
any shortest path contains at most | V| − 1 edges.
Representing shortest paths
It is usually not enough to compute only shortest-path weights. Most
applications of shortest paths need to know the vertices on shortest
paths as well. For example, if your GPS told you the distance to your
destination but not how to get there, it would not be terribly useful. We
represent shortest paths similarly to how we represented breadth-first
trees in Section 20.2. Given a graph G = ( V, E), maintain for each vertex v ∈ V a predecessor v.π that is either another vertex or NIL. The shortest-paths algorithms in this chapter set the π attributes so that the
chain of predecessors originating at a vertex v runs backward along a
shortest path from s to v. Thus, given a vertex v for which v.π ≠ NIL, the procedure PRINT-PATH( G, s, v) from Section 20.2 prints a shortest path from s to v.
In the midst of executing a shortest-paths algorithm, however, the π
values might not indicate shortest paths. The predecessor subgraph G π =
( V π, E π) induced by the π values is defined the same for single-source shortest paths as for breadth-first search in equations (20.2) and (20.3)
on page 561:
V π = { v ∈ V : v.π ≠ NIL} ∪ { s},
E π = {( v.π, v) ∈ E : v ∈ V π − { s}}.
We’ll prove that the π values produced by the algorithms in this
chapter have the property that at termination G π is a “shortest-paths
tree”—informally, a rooted tree containing a shortest path from the
source s to every vertex that is reachable from s. A shortest-paths tree is like the breadth-first tree from Section 20.2, but it contains shortest
paths from the source defined in terms of edge weights instead of
numbers of edges. To be precise, let G = ( V, E) be a weighted, directed graph with weight function w : E → ℝ, and assume that G contains no negative-weight cycles reachable from the source vertex s ∈ V, so that
shortest paths are well defined. Ashortest-paths tree rooted at s is a directed subgraph G′ = ( V′, E′), where V′ ⊆ V and E′ ⊆ E, such that 1. V′ is the set of vertices reachable from s in G,
2. G′ forms a rooted tree with root s, and
3. for all v ∈ V′, the unique simple path from s to v in G′ is a shortest path from s to v in G.
Figure 22.2 (a) A weighted, directed graph with shortest-path weights from source s. (b) The blue edges form a shortest-paths tree rooted at the source s. (c) Another shortest-paths tree with the same root.
Shortest paths are not necessarily unique, and neither are shortest-
paths trees. For example, Figure 22.2 shows a weighted, directed graph and two shortest-paths trees with the same root.
Relaxation
The algorithms in this chapter use the technique of relaxation. For each
vertex v ∈ V, the single-source shortest paths algorithms maintain an
attribute v.d, which is an upper bound on the weight of a shortest path
from source s to v. We call v.d a shortest-path estimate. To initialize the shortest-path estimates and predecessors, call the Θ( V)-time procedure
INITIALIZE-SINGLE-SOURCE. After initialization, we have v.π =
NIL for all v ∈ V, s.d = 0 and v.d = ∞ for v ∈ V − { s}.
INITIALIZE-SINGLE-SOURCE( G, s)
1 for each vertex v ∈ G.V
2
v.d = ∞
3
v.π = NIL
4 s.d = 0
The process of relaxing an edge ( u, v) consists of testing whether going through vertex u improves the shortest path to vertex v found so
far and, if so, updating v.d and v.π. A relaxation step might decrease the value of the shortest-path estimate v.d and update v’s predecessor attribute v.π. The RELAX procedure on the following page performs a
relaxation step on edge ( u, v) in O(1) time. Figure 22.3 shows two examples of relaxing an edge, one in which a shortest-path estimate
decreases and one in which no estimate changes.
Figure 22.3 Relaxing an edge ( u, v) with weight w( u, v) = 2. The shortest-path estimate of each vertex appears within the vertex. (a) Because v.d > u.d + w( u, v) prior to relaxation, the value of v.d decreases. (b) Since we have v.d ≤ u.d + w( u, v) before relaxing the edge, the relaxation step leaves v.d unchanged.
RELAX( u, v, w)
1 if v.d > u.d + w( u, v)
2
v.d = u.d + w( u, v)
3
v.π = u
Each algorithm in this chapter calls INITIALIZE-SINGLE-
SOURCE and then repeatedly relaxes edges. 1 Moreover, relaxation is the only means by which shortest-path estimates and predecessors
change. The algorithms in this chapter differ in how many times they relax each edge and the order in which they relax edges. Dijkstra’s
algorithm and the shortest-paths algorithm for directed acyclic graphs
relax each edge exactly once. The Bellman-Ford algorithm relaxes each
edge | V| − 1 times.
Properties of shortest paths and relaxation
To prove the algorithms in this chapter correct, we’ll appeal to several
properties of shortest paths and relaxation. We state these properties
here, and Section 22.5 proves them formally. For your reference, each property stated here includes the appropriate lemma or corollary
number from Section 22.5. The latter five of these properties, which refer to shortest-path estimates or the predecessor subgraph, implicitly
assume that the graph is initialized with a call to INITIALIZE-
SINGLE-SOURCE( G, s) and that the only way that shortest-path
estimates and the predecessor subgraph change are by some sequence of
relaxation steps.
Triangle inequality (Lemma 22.10)
For any edge ( u, v) ∈ E, we have δ( s, v) ≤ δ( s, u) + w( u, v).
Upper-bound property (Lemma 22.11)
We always have v.d ≥ δ( s, v) for all vertices v ∈ V, and once v.d achieves the value δ( s, v), it never changes.
No-path property (Corollary 22.12)
If there is no path from s to v, then we always have v.d = δ( s, v) = ∞.
Convergence property (Lemma 22.14)
If s ⇝ u → v is a shortest path in G for some u, v ∈ V, and if u.d =
δ( s, u) at any time prior to relaxing edge ( u, v), then v.d = δ( s, v) at all times afterward.
Path-relaxation property (Lemma 22.15)
If p = 〈 v 0, v 1, … , vk〉 is a shortest path from s = v 0 to vk, and the edges of p are relaxed in the order ( v 0, v 1), ( v 1, v 2), … , ( vk−1, vk), then vk. d = δ( s, vk). This property holds regardless of any other
relaxation steps that occur, even if they are intermixed with relaxations of the edges of p.
Predecessor-subgraph property (Lemma 22.17)
Once v.d = δ( s, v) for all v ∈ V, the predecessor subgraph is a shortest-paths tree rooted at s.
Chapter outline
Section 22.1 presents the Bellman-Ford algorithm, which solves the single-source shortest-paths problem in the general case in which edges
can have negative weight. The Bellman-Ford algorithm is remarkably
simple, and it has the further benefit of detecting whether a negative-
weight cycle is reachable from the source. Section 22.2 gives a linear-time algorithm for computing shortest paths from a single source in a
directed acyclic graph. Section 22.3 covers Dijkstra’s algorithm, which has a lower running time than the Bellman-Ford algorithm but requires
the edge weights to be nonnegative. Section 22.4 shows how to use the Bellman-Ford algorithm to solve a special case of linear programming.
Finally, Section 22.5 proves the properties of shortest paths and relaxation stated above.
This chapter does arithmetic with infinities, and so we need some
conventions for when ∞ or −∞ appears in an arithmetic expression. We
assume that for any real number a ≠ −∞, we have a + ∞ = ∞ + a = ∞.
Also, to make our proofs hold in the presence of negative-weight cycles,
we assume that for any real number a ≠ ∞, we have a + (−∞) = (−∞) + a
= −∞.
All algorithms in this chapter assume that the directed graph G is
stored in the adjacency-list representation. Additionally, stored with
each edge is its weight, so that as each algorithm traverses an adjacency
list, it can find edge weights in O(1) time per edge.
22.1 The Bellman-Ford algorithm
The Bellman-Ford algorithm solves the single-source shortest-paths
problem in the general case in which edge weights may be negative.
Given a weighted, directed graph G = ( V, E) with source vertex s and weight function w : E → ℝ, the Bellman-Ford algorithm returns a boolean value indicating whether there is a negative-weight cycle that is
reachable from the source. If there is such a cycle, the algorithm
indicates that no solution exists. If there is no such cycle, the algorithm
produces the shortest paths and their weights.
The procedure BELLMAN-FORD relaxes edges, progressively
decreasing an estimate v.d on the weight of a shortest path from the source s to each vertex v ∈ V until it achieves the actual shortest-path weight δ( s, v). The algorithm returns TRUE if and only if the graph contains no negative-weight cycles that are reachable from the source.
BELLMAN-FORD( G, w, s)
1 INITIALIZE-SINGLE-SOURCE( G, s)
2 for i = 1 to | G.V| − 1
3
for each edge ( u, v) ∈ G.E
4
RELAX( u, v, w)
5 for each edge ( u, v) = G.E
6
if v.d > u.d + w( u, v)
7
return FALSE
8 return TRUE
Figure 22.4 shows the execution of the Bellman-Ford algorithm on a
graph with 5 vertices. After initializing the d and π values of all vertices
in line 1, the algorithm makes | V| − 1 passes over the edges of the graph.
Each pass is one iteration of the for loop of lines 2–4 and consists of
relaxing each edge of the graph once. Figures 22.4(b)–(e) show the state of the algorithm after each of the four passes over the edges. After
making | V| − 1 passes, lines 5–8 check for a negative-weight cycle and
return the appropriate boolean value. (We’ll see a little later why this
check works.)
Figure 22.4 The execution of the Bellman-Ford algorithm. The source is vertex s. The d values appear within the vertices, and blue edges indicate predecessor values: if edge ( u, v) is blue, then v.π = u. In this particular example, each pass relaxes the edges in the order ( t, x), ( t, y), ( t, z), ( x, t), ( y, x), ( y, z), ( z, x), ( z, s), ( s, t), ( s, y). (a) The situation just before the first pass over the edges.
(b)–(e) The situation after each successive pass over the edges. Vertices whose shortest-path estimates and predecessors have changed due to a pass are highlighted in orange. The d and π
values in part (e) are the final values. The Bellman-Ford algorithm returns TRUE in this example.
The Bellman-Ford algorithm runs in O( V 2 + VE) time when the graph is represented by adjacency lists, since the initialization in line 1
takes Θ( V) time, each of the | V| − 1 passes over the edges in lines 2–4
takes Θ( V + E) time (examining | V| adjacency lists to find the | E| edges), and the for loop of lines 5–7 takes O( V + E) time. Fewer than | V| − 1
passes over the edges sometimes suffice (see Exercise 22.1-3), which is
why we say O( V 2+ VE) time, rather than Θ( V 2+ VE) time. In the frequent case where | E| = Ω( V), we can express this running time as O( VE). Exercise 22.1-5 asks you to make the Bellman-Ford algorithm
run in O( VE) time even when | E| = o( V).
To prove the correctness of the Bellman-Ford algorithm, we start by
showing that if there are no negative-weight cycles, the algorithm
computes correct shortest-path weights for all vertices reachable from
the source.
Let G = ( V, E) be a weighted, directed graph with source vertex s and weight function w : E → ℝ, and assume that G contains no negative-weight cycles that are reachable from s. Then, after the | V| − 1 iterations of the for loop of lines 2–4 of BELLMAN-FORD, v.d = δ( s, v) for all vertices v that are reachable from s.
Proof We prove the lemma by appealing to the path-relaxation
property. Consider any vertex v that is reachable from s, and let p = 〈 v 0, v 1, … , vk〉, where v 0 = s and vk = v, be any shortest path from s to v.
Because shortest paths are simple, p has at most | V| − 1 edges, and so k
≤ | V| − 1. Each of the | V| − 1 iterations of the for loop of lines 2–4
relaxes all | E| edges. Among the edges relaxed in the i th iteration, for i =
1, 2, … , k, is ( vi−1, vi). By the path-relaxation property, therefore, v.d =
vk.d = δ( s, vk) = δ( s, v).
▪
Corollary 22.3
Let G = ( V, E) be a weighted, directed graph with source vertex s and weight function w : E → ℝ. Then, for each vertex v ∈ V, there is a path from s to v if and only if BELLMAN-FORD terminates with v.d < ∞
when it is run on G.
Proof The proof is left as Exercise 22.1-2.
▪
Theorem 22.4 (Correctness of the Bellman-Ford algorithm)
Let BELLMAN-FORD be run on a weighted, directed graph G = ( V,
E) with source vertex s and weight function w : E → ℝ. If G contains no negative-weight cycles that are reachable from s, then the algorithm
returns TRUE, v.d = δ( s, v) for all vertices v ∈ V, and the predecessor subgraph G π is a shortest-paths tree rooted at s. If G does contain a negative-weight cycle reachable from s, then the algorithm returns
FALSE.




Proof Suppose that graph G contains no negative-weight cycles that are reachable from the source s. We first prove the claim that at
termination, v.d = δ( s, v) for all vertices v ∈ V. If vertex v is reachable from s, then Lemma 22.2 proves this claim. If v is not reachable from s, then the claim follows from the no-path property. Thus, the claim is
proven. The predecessor-subgraph property, along with the claim,
implies that G π is a shortest-paths tree. Now we use the claim to show
that BELLMAN-FORD returns TRUE. At termination, for all edges
( u, v) ∈ E we have
v.d = δ( s, v)
≤ δ( s, u) + w( u, v) (by the triangle inequality)
= u.d + w( u, v),
and so none of the tests in line 6 causes BELLMAN-FORD to return
FALSE. Therefore, it returns TRUE.
Now, suppose that graph G contains a negative-weight cycle
reachable from the source s. Let this cycle be c = 〈 v 0, v 1, … , vk〉, where v 0 = vk, in which case we have
Assume for the purpose of contradiction that the Bellman-Ford
algorithm returns TRUE. Thus, vi.d ≤ vi−1. d + w( vi−1, vi) for i = 1, 2,
… , k. Summing the inequalities around cycle c gives
Since v 0 = vk, each vertex in c appears exactly once in each of the summations
and
, and so
Moreover, by Corollary 22.3, vi.d is finite for i = 1, 2, … , k. Thus, which contradicts inequality (22.1). We conclude that the Bellman-Ford
algorithm returns TRUE if graph G contains no negative-weight cycles
reachable from the source, and FALSE otherwise.
▪
Exercises
22.1-1
Run the Bellman-Ford algorithm on the directed graph of Figure 22.4, using vertex z as the source. In each pass, relax edges in the same order
as in the figure, and show the d and π values after each pass. Now, change the weight of edge ( z, x) to 4 and run the algorithm again, using s as the source.
22.1-2
Prove Corollary 22.3.
22.1-3
Given a weighted, directed graph G = ( V, E) with no negative-weight cycles, let m be the maximum over all vertices v ∈ V of the minimum number of edges in a shortest path from the source s to v. (Here, the shortest path is by weight, not the number of edges.) Suggest a simple
change to the Bellman-Ford algorithm that allows it to terminate in m +
1 passes, even if m is not known in advance.
22.1-4
Modify the Bellman-Ford algorithm so that it sets v.d to −∞ for all vertices v for which there is a negative-weight cycle on some path from
the source to v.
22.1-5
Suppose that the graph given as input to the Bellman-Ford algorithm is
represented with a list of | E| edges, where each edge indicates the
vertices it leaves and enters, along with its weight. Argue that the Bellman-Ford algorithm runs in O( VE) time without the constraint that
| E| = Ω( V). Modify the Bellman-Ford algorithm so that it runs in O( VE) time in all cases when the input graph is represented with adjacency
lists.
22.1-6
Let G = ( V, E) be a weighted, directed graph with weight function w : E
→ ℝ. Give an O( VE)-time algorithm to find, for all vertices v ∈ V, the value δ*( v) = min {δ( u, v) : u ∈ V}.
22.1-7
Suppose that a weighted, directed graph G = ( V, E) contains a negative-weight cycle. Give an efficient algorithm to list the vertices of one such
cycle. Prove that your algorithm is correct.
22.2 Single-source shortest paths in directed acyclic graphs
In this section, we introduce one further restriction on weighted,
directed graphs: they are acyclic. That is, we are concerned with
weighted dags. Shortest paths are always well defined in a dag, since
even if there are negative-weight edges, no negative-weight cycles can
exist. We’ll see that if the edges of a weighted dag G = ( V, E) are relaxed according to a topological sort of its vertices, it takes only Θ( V + E) time to compute shortest paths from a single source.
The algorithm starts by topologically sorting the dag (see Section
20.4) to impose a linear ordering on the vertices. If the dag contains a
path from vertex u to vertex v, then u precedes v in the topological sort.
The DAG-SHORTEST-PATHS procedure makes just one pass over the
vertices in the topologically sorted order. As it processes each vertex, it
relaxes each edge that leaves the vertex. Figure 22.5 shows the execution of this algorithm.
DAG-SHORTEST-PATHS( G, w, s)
1 topologically sort the vertices of G
2 INITIALIZE-SINGLE-SOURCE( G, s)
3 for each vertex u ∈ G.V, taken in topologically sorted order
4
for each vertex v in G.Adj[ u]
5
RELAX( u, v, w)
Let’s analyze the running time of this algorithm. As shown in
Section 20.4, the topological sort of line 1 takes Θ( V + E) time. The call of INITIALIZE-SINGLE-SOURCE in line 2 takes Θ( V) time. The for
loop of lines 3–5 makes one iteration per vertex. Altogether, the for loop
of lines 4–5 relaxes each edge exactly once. (We have used an aggregate
analysis here.) Because each iteration of the inner for loop takes Θ(1)
time, the total running time is Θ( V + E), which is linear in the size of an adjacency-list representation of the graph.
The following theorem shows that the DAG-SHORTEST-PATHS
procedure correctly computes the shortest paths.
Theorem 22.5
If a weighted, directed graph G = ( V, E) has source vertex s and no cycles, then at the termination of the DAG-SHORTEST-PATHS
procedure, v.d = δ( s, v) for all vertices v ∈ V, and the predecessor subgraph G π is a shortest-paths tree.
Proof We first show that v.d = δ( s, v) for all vertices v ∈ V at termination. If v is not reachable from s, then v.d = δ( s, v) = 1 by the no-path property. Now, suppose that v is reachable from s, so that there is a shortest path p = 〈 v 0, v 1, … , vk〉, where v 0 = s and vk = v. Because DAG-SHORTEST-PATHS processes the vertices in topologically
sorted order, it relaxes the edges on p in the order ( v 0, v 1), ( v 1, v 2), … , ( vk−1, vk). The path-relaxation property implies that vi.d = δ( s, vi) at termination for i = 0, 1, … , k. Finally, by the predecessor-subgraph property, G π is a shortest-paths tree.
▪
A useful application of this algorithm arises in determining critical
paths in PERT chart2 analysis. A job consists of several tasks. Each task
takes a certain amount of time, and some tasks must be completed before others can be started. For example, if the job is to build a house,
then the foundation must be completed before starting to frame the
exterior walls, which must be completed before starting on the roof.
Some tasks require more than one other task to be completed before
they can be started: before the drywall can be installed over the wall
framing, both the electrical system and plumbing must be installed. A
dag models the tasks and dependencies. Edges represent tasks, with the
weight of an edge indicating the time required to perform the task.
Vertices represent “milestones,” which are achieved when all the tasks
represented by the edges entering the vertex have been completed. If
edge ( u, v) enters vertex v and edge ( v, x) leaves v, then task ( u, v) must be completed before task ( v, x) is started. A path through this dag represents a sequence of tasks that must be performed in a particular
order. A critical path is a longest path through the dag, corresponding to the longest time to perform any sequence of tasks. Thus, the weight of a
critical path provides a lower bound on the total time to perform all the
tasks, even if as many tasks as possible are performed simultaneously.
You can find a critical path by either
negating the edge weights and running DAG-SHORTEST-
PATHS, or
running DAG-SHORTEST-PATHS, but replacing “∞” by “−∞”
in line 2 of INITIALIZE-SINGLE-SOURCE and “>” by “<” in
the RELAX procedure.
Figure 22.5 The execution of the algorithm for shortest paths in a directed acyclic graph. The vertices are topologically sorted from left to right. The source vertex is s. The d values appear within the vertices, and blue edges indicate the π values. (a) The situation before the first iteration of the for loop of lines 3–5. (b)–(g) The situation after each iteration of the for loop of lines 3–5. Blue vertices have had their outgoing edges relaxed. The vertex highlighted in orange was used as u in that iteration. Each edge highlighted in orange caused a d value to change when it was relaxed in that iteration. The values shown in part (g) are the final values.
Exercises
22.2-1
Show the result of running DAG-SHORTEST-PATHS on the directed
acyclic graph of Figure 22.5, using vertex r as the source.
22.2-2
Suppose that you change line 3 of DAG-SHORTEST-PATHS to read
3 for the first | V| − 1 vertices, taken in topologically sorted order
Show that the procedure remains correct.
An alternative way to represent a PERT chart looks more like the dag of
Figure 20.7 on page 574. Vertices represent tasks and edges represent sequencing constraints, that is, edge ( u, v) indicates that task u must be performed before task v. Vertices, not edges, have weights. Modify the
DAG-SHORTEST-PATHS procedure so that it finds a longest path in
a directed acyclic graph with weighted vertices in linear time.
★ 22.2-4
Give an efficient algorithm to count the total number of paths in a
directed acyclic graph. The count should include all paths between all
pairs of vertices and all paths with 0 edges. Analyze your algorithm.
Dijkstra’s algorithm solves the single-source shortest-paths problem on
a weighted, directed graph G = ( V, E), but it requires nonnegative weights on all edges: w( u, v) ≥ 0 for each edge ( u, v) ∈ E. As we shall see, with a good implementation, the running time of Dijkstra’s algorithm is
lower than that of the Bellman-Ford algorithm.
You can think of Dijkstra’s algorithm as generalizing breadth-first
search to weighted graphs. A wave emanates from the source, and the
first time that a wave arrives at a vertex, a new wave emanates from that
vertex. Whereas breadth-first search operates as if each wave takes unit
time to traverse an edge, in a weighted graph, the time for a wave to
traverse an edge is given by the edge’s weight. Because a shortest path in
a weighted graph might not have the fewest edges, a simple, first-in,
first-out queue won’t suffice for choosing the next vertex from which to
send out a wave.
Instead, Dijkstra’s algorithm maintains a set S of vertices whose final
shortest-path weights from the source s have already been determined.
The algorithm repeatedly selects the vertex u ∈ V – S with the minimum shortest-path estimate, adds u into S, and relaxes all edges leaving u.
The procedure DIJKSTRA replaces the first-in, first-out queue of
breadth-first search by a min-priority queue Q of vertices, keyed by their d values.
DIJKSTRA( G, w, s)
1 INITIALIZE-SINGLE-SOURCE( G, s)
2 S = Ø
3 Q = Ø
4 for each vertex u ∈ G.V
5
INSERT( Q, u)
6 while Q ≠ Ø
7
u = EXTRACT-MIN( Q)
8
S = S ∪ {u}
9
for each vertex v in G.Adj[ u]
10
RELAX( u, v, w)
11
if the call of RELAX decreased v.d
12
DECREASE-KEY( Q, v, v.d)
Dijkstra’s algorithm relaxes edges as shown in Figure 22.6. Line 1
initializes the d and π values in the usual way, and line 2 initializes the
set S to the empty set. The algorithm maintains the invariant that Q =
V − S at the start of each iteration of the while loop of lines 6–12. Lines 3–5 initialize the min-priority queue Q to contain all the vertices in V.
Since S = Ø at that time, the invariant is true upon first reaching line 6.
Each time through the while loop of lines 6–12, line 7 extracts a vertex u
from Q = V − S and line 8 adds it to set S, thereby maintaining the invariant. (The first time through this loop, u = s.) Vertex u, therefore, has the smallest shortest-path estimate of any vertex in V − S. Then, lines 9–12 relax each edge ( u, v) leaving u, thus updating the estimate v.d and the predecessor v.π if the shortest path to v found so far improves by going through u. Whenever a relaxation step changes the d and π
values, the call to DECREASE-KEY in line 12 updates the min-priority
queue. The algorithm never inserts vertices into Q after the for loop of
lines 4–5, and each vertex is extracted from Q and added to S exactly
once, so that the while loop of lines 6–12 iterates exactly | V| times.
Figure 22.6 The execution of Dijkstra’s algorithm. The source s is the leftmost vertex. The shortest-path estimates appear within the vertices, and blue edges indicate predecessor values.
Blue vertices belong to the set S, and tan vertices are in the min-priority queue Q = V − S. (a) The situation just before the first iteration of the while loop of lines 6–12. (b)–(f) The situation after each successive iteration of the while loop. In each part, the vertex highlighted in orange was chosen as vertex u in line 7, and each edge highlighted in orange caused a d value and a predecessor to change when the edge was relaxed. The d values and predecessors shown in part (f) are the final values.
Because Dijkstra’s algorithm always chooses the “lightest” or
“closest” vertex in V − S to add to set S, you can think of it as using a greedy strategy. Chapter 15 explains greedy strategies in detail, but you need not have read that chapter to understand Dijkstra’s algorithm.
Greedy strategies do not always yield optimal results in general, but as
the following theorem and its corollary show, Dijkstra’s algorithm does
indeed compute shortest paths. The key is to show that u.d = δ( s, u) each time it adds a vertex u to set S.
Figure 22.7 The proof of Theorem 22.6. Vertex u is selected to be added into set S in line 7 of DIJKSTRA. Vertex y is the first vertex on a shortest path from the source s to vertex u that is not in set S, and x ∈ S is y’s predecessor on that shortest path. The subpath from y to u may or may not re-enter set S.
Theorem 22.6 (Correctness of Dijkstra’s algorithm)
Dijkstra’s algorithm, run on a weighted, directed graph G = ( V, E) with nonnegative weight function w and source vertex s, terminates with u.d
= δ( s, u) for all vertices u ∈ V.
Proof We will show that at the start of each iteration of the while loop
of lines 6–12, we have v.d = δ( s, v) for all v ∈ S. The algorithm terminates when S = V, so that v.d = δ( s, v) for all v ∈ V.
The proof is by induction on the number of iterations of the while
loop, which equals | S| at the start of each iteration. There are two bases:
for | S| = 0, so that S = Ø and the claim is trivially true, and for | S| = 1, so that S = { s} and s.d = δ( s, s) = 0.
For the inductive step, the inductive hypothesis is that v.d = δ( s, v) for all v ∈ S. The algorithm extracts vertex u from V − S. Because the algorithm adds u into S, we need to show that u.d = δ( s, u) at that time.
If there is no path from s to u, then we are done, by the no-path property. If there is a path from s to u, then, as Figure 22.7 shows, let y be the first vertex on a shortest path from s to u that is not in S, and let x ∈ S be the predecessor of y on that shortest path. (We could have y =
u or x = s.) Because y appears no later than u on the shortest path and all edge weights are nonnegative, we have δ( s, y) ≤ δ( s, u). Because the call of EXTRACT-MIN in line 7 returned u as having the minimum d
value in V − S, we also have u.d ≤ y.d, and the upper-bound property gives δ( s, u) ≤ u.d.
Since x ∈ S, the inductive hypothesis implies that x.d = δ( s, x).
During the iteration of the while loop that added x into S, edge ( x, y) was relaxed. By the convergence property, y.d received the value of δ( s, y) at that time. Thus, we have
δ( s, y) ≤ δ( s, u) ≤ u.d ≤ y.d and y.d = δ( s, y), so that
δ( s, y) = δ( s, u) = u.d = y.d.
Hence, u.d = δ( s, u), and by the upper-bound property, this value never changes again.
▪
Corollary 22.7
After Dijkstra’s algorithm is run on a weighted, directed graph G = ( V, E) with nonnegative weight function w and source vertex s, the predecessor subgraph G π is a shortest-paths tree rooted at s.
Proof Immediate from Theorem 22.6 and the predecessor-subgraph
property.
▪
Analysis
How fast is Dijkstra’s algorithm? It maintains the min-priority queue Q
by calling three priority-queue operations: INSERT (in line 5),
EXTRACT-MIN (in line 7), and DECREASE-KEY (in line 12). The
algorithm calls both INSERT and EXTRACT-MIN once per vertex.
Because each vertex u ∈ V is added to set S exactly once, each edge in the adjacency list Adj[ u] is examined in the for loop of lines 9–12 exactly once during the course of the algorithm. Since the total number of
edges in all the adjacency lists is | E|, this for loop iterates a total of | E|
times, and thus the algorithm calls DECREASE-KEY at most | E| times
overall. (Observe once again that we are using aggregate analysis.)
Just as in Prim’s algorithm, the running time of Dijkstra’s algorithm
depends on the specific implementation of the min-priority queue Q. A
simple implementation takes advantage of the vertices being numbered
1 to | V|: simply store v.d in the v th entry of an array. Each INSERT and DECREASE-KEY operation takes O(1) time, and each EXTRACT-MIN operation takes O( V) time (since it has to search through the entire array), for a total time of O( V 2 + E) = O( V 2).
If the graph is sufficiently sparse—in particular, E = o( V 2/lg V)—you can improve the running time by implementing the min-priority queue
with a binary min-heap that includes a way to map between vertices and
their corresponding heap elements. Each EXTRACT-MIN operation
then takes O(lg V) time. As before, there are | V| such operations. The time to build the binary min-heap is O( V). (As noted in Section 21.2, you don’t even need to call BUILD-MIN-HEAP.) Each DECREASE-KEY operation takes O(lg V) time, and there are still at most | E| such operations. The total running time is therefore O(( V + E) lg V), which is O( E lg V) in the typical case that | E| = Ω( V). This running time improves upon the straightforward O( V 2)-time implementation if E = o( V 2/lg V).
By implementing the min-priority queue with a Fibonacci heap (see
page 478), you can improve the running time to O( V lg V + E). The amortized cost of each of the | V| EXTRACT-MIN operations is O(lg V), and each DECREASE-KEY call, of which there are at most | E|, takes only O(1) amortized time. Historically, the development of
Fibonacci heaps was motivated by the observation that Dijkstra’s
algorithm typically makes many more DECREASE-KEY calls than
EXTRACT-MIN calls, so that any method of reducing the amortized
time of each DECREASE-KEY operation to o(lg V) without increasing
the amortized time of EXTRACT-MIN would yield an asymptotically
faster implementation than with binary heaps.
Dijkstra’s algorithm resembles both breadth-first search (see Section
20.2) and Prim’s algorithm for computing minimum spanning trees (see
Section 21.2). It is like breadth-first search in that set S corresponds to the set of black vertices in a breadth-first search. Just as vertices in S
have their final shortest-path weights, so do black vertices in a breadth-
first search have their correct breadth-first distances. Dijkstra’s
algorithm is like Prim’s algorithm in that both algorithms use a min-
priority queue to find the “lightest” vertex outside a given set (the set S
in Dijkstra’s algorithm and the tree being grown in Prim’s algorithm),
add this vertex into the set, and adjust the weights of the remaining vertices outside the set accordingly.
Exercises
22.3-1
Run Dijkstra’s algorithm on the directed graph of Figure 22.2, first using vertex s as the source and then using vertex z as the source. In the style of Figure 22.6, show the d and π values and the vertices in set S
after each iteration of the while loop.
22.3-2
Give a simple example of a directed graph with negative-weight edges
for which Dijkstra’s algorithm produces an incorrect answer. Why
doesn’t the proof of Theorem 22.6 go through when negative-weight
edges are allowed?
22.3-3
Suppose that you change line 6 of Dijkstra’s algorithm to read
6 while | Q| > 1
This change causes the while loop to execute | V| − 1 times instead of | V|
times. Is this proposed algorithm correct?
22.3-4
Modify the DIJKSTRA procedure so that the priority queue Q is more
like the queue in the BFS procedure in that it contains only vertices that
have been reached from source s so far: Q ⊆ V − S and v ∈ Q implies v.d ≠ ∞.
22.3-5
Professor Gaedel has written a program that he claims implements
Dijkstra’s algorithm. The program produces v.d and v.π for each vertex
v ∈ V. Give an O( V + E)-time algorithm to check the output of the professor’s program. It should determine whether the d and π attributes
match those of some shortest-paths tree. You may assume that all edge
weights are nonnegative.
Professor Newman thinks that he has worked out a simpler proof of
correctness for Dijkstra’s algorithm. He claims that Dijkstra’s algorithm
relaxes the edges of every shortest path in the graph in the order in
which they appear on the path, and therefore the path-relaxation
property applies to every vertex reachable from the source. Show that
the professor is mistaken by constructing a directed graph for which
Dijkstra’s algorithm relaxes the edges of a shortest path out of order.
22.3-7
Consider a directed graph G = ( V, E) on which each edge ( u, v) ∈ E has an associated value r( u, v), which is a real number in the range 0 ≤ r( u, v)
≤ 1 that represents the reliability of a communication channel from
vertex u to vertex v. Interpret r( u, v) as the probability that the channel from u to v will not fail, and assume that these probabilities are independent. Give an efficient algorithm to find the most reliable path
between two given vertices.
22.3-8
Let G = ( V, E) be a weighted, directed graph with positive weight function w : E → {1, 2, … , W} for some positive integer W, and assume that no two vertices have the same shortest-path weights from source
vertex s. Now define an unweighted, directed graph G′ = ( V ∪ V′, E′) by replacing each edge ( u, v) ∈ E with w( u, v) unit-weight edges in series.
How many vertices does G′ have? Now suppose that you run a breadth-
first search on G′. Show that the order in which the breadth-first search
of G′ colors vertices in V black is the same as the order in which Dijkstra’s algorithm extracts the vertices of V from the priority queue
when it runs on G.
22.3-9
Let G = ( V, E) be a weighted, directed graph with nonnegative weight function w : E → {0, 1, … , W} for some nonnegative integer W.
Modify Dijkstra’s algorithm to compute the shortest paths from a given
source vertex s in O( W V + E) time.
22.3-10
Modify your algorithm from Exercise 22.3-9 to run in O(( V + E) lg W) time. ( Hint: How many distinct shortest-path estimates can V − S
contain at any point in time?)
22.3-11
Suppose that you are given a weighted, directed graph G = ( V, E) in which edges that leave the source vertex s may have negative weights, all
other edge weights are nonnegative, and there are no negative-weight
cycles. Argue that Dijkstra’s algorithm correctly finds shortest paths
from s in this graph.
22.3-12
Suppose that you have a weighted directed graph G = ( V, E) in which all edge weights are positive real values in the range [ C, 2 C] for some positive constant C. Modify Dijkstra’s algorithm so that it runs in O( V
+ E) time.
22.4 Difference constraints and shortest paths
Chapter 29 studies the general linear-programming problem, showing how to optimize a linear function subject to a set of linear inequalities.
This section investigates a special case of linear programming that
reduces to finding shortest paths from a single source. The Bellman-
Ford algorithm then solves the resulting single-source shortest-paths
problem, thereby also solving the linear-programming problem.
Linear programming
In the general linear-programming problem, the input is an m × n matrix A, an m-vector b, and an n-vector c. The goal is to find a vector x of n elements that maximizes the objective function
subject to the m
constraints given by Ax ≤ b.
The most popular method for solving linear programs is the simplex
algorithm, which Section 29.1 discusses. Although the simplex algorithm does not always run in time polynomial in the size of its
input, there are other linear-programming algorithms that do run in
polynomial time. We offer here two reasons to understand the setup of
linear-programming problems. First, if you know that you can cast a
given problem as a polynomial-sized linear-programming problem, then
you immediately have a polynomial-time algorithm to solve the
problem. Second, faster algorithms exist for many special cases of linear
programming. For example, the single-pair shortest-path problem
(Exercise 22.4-4) and the maximum-flow problem (Exercise 24.1-5) are
special cases of linear programming.
Sometimes the objective function does not matter: it’s enough just to
find any feasible solution, that is, any vector x that satisfies Ax ≤ b, or to determine that no feasible solution exists. This section focuses on one
such feasibility problem.
Systems of difference constraints
In a system of difference constraints, each row of the linear-
programming matrix A contains one 1 and one −1, and all other entries
of A are 0. Thus, the constraints given by Ax ≤ b are a set of mdifference constraints involving n unknowns, in which each constraint is a simple
linear inequality of the form
xj − xi ≤ bk,
where 1 ≤ i, j ≤ n, i ≠ j, and 1 ≤ k ≤ m.
For example, consider the problem of finding a 5-vector x = ( xi) that
satisfies
This problem is equivalent to finding values for the unknowns x 1, x 2, x 3, x 4, x 5, satisfying the following 8 difference constraints:
One solution to this problem is x = (−5, −3, 0, −1, −4), which you can
verify directly by checking each inequality. In fact, this problem has
more than one solution. Another is x′ = (0, 2, 5, 4, 1). These two solutions are related: each component of x′ is 5 larger than the
corresponding component of x. This fact is not mere coincidence.
Lemma 22.8
Let x = ( x 1, x 2, … , xn) be a solution to a system A x ≤ b of difference constraints, and let d be any constant. Then x + d = ( x 1 + d, x 2 + d, … , xn + d) is a solution to Ax ≤ b as well.
Proof For each xi and xj, we have ( xj + d) − ( xi + d) = xj − xi. Thus, if x satisfies Ax ≤ b, so does x + d.
▪
Systems of difference constraints occur in various applications. For
example, the unknowns xi might be times at which events are to occur.
Each constraint states that at least a certain amount of time, or at most
a certain amount of time, must elapse between two events. Perhaps the
events are jobs to be performed during the assembly of a product. If the
manufacturer applies an adhesive that takes 2 hours to set at time x 1
and has to wait until it sets to install a part at time x 2, then there is a
constraint that x 2 ≥ x 1 + 2 or, equivalently, that x 1 − x 2 ≤ −2.
Alternatively, the manufacturer might require the part to be installed
after the adhesive has been applied but no later than the time that the
adhesive has set halfway. In this case, there is a pair of constraints x 2 ≥
x 1 and x 2 ≤ x 1 + 1 or, equivalently, x 1 − x 2 ≤ 0 and x 2 − x 1 ≤ 1.
If all the constraints have nonnegative numbers on the right-hand side—that is, if bi ≥ 0 for i = 1, 2, … , m—then finding a feasible solution is trivial: just set all the unknowns xi equal to each other. Then
all the differences are 0, and every constraint is satisfied. The problem of
finding a feasible solution to a system of difference constraints is
interesting only if at least one constraint has bi < 0.
Constraint graphs
We can interpret systems of difference constraints from a graph-
theoretic point of view. For a system Ax ≤ b of difference constraints, let’s view the m × n linear-programming matrix A as the transpose of an incidence matrix (see Exercise 20.1-7) for a graph with n vertices and m
edges. Each vertex vi in the graph, for i = 1, 2, … , n, corresponds to one of the n unknown variables xi. Each directed edge in the graph corresponds to one of the m inequalities involving two unknowns.
More formally, given a system Ax ≤ b of difference constraints, the
corresponding constraint graph is a weighted, directed graph G = ( V, E), where
V = { v 0, v 1, … , vn}
and
E = {( vi, vj) : xj − xi ≤ bk is a constraint}
∪ {( v 0, v 1), ( v 0, v 2), ( v 0, v 3), … , ( v 0, vn)}.
The constraint graph includes the additional vertex v 0, as we shall see
shortly, to guarantee that the graph has some vertex that can reach all
other vertices. Thus, the vertex set V consists of a vertex vi for each unknown xi, plus an additional vertex v 0. The edge set E contains an edge for each difference constraint, plus an edge ( v 0, vi) for each unknown xi. If xj − xi ≤ bk is a difference constraint, then the weight of edge ( vi, vj) is w( vi, vj) = bk. The weight of each edge leaving v 0 is 0.

Figure 22.8 shows the constraint graph for the system (22.2)–(22.9) of difference constraints.
Figure 22.8 The constraint graph corresponding to the system (22.2)–(22.9) of difference constraints. The value of δ( v 0, vi) appears in each vertex vi. One feasible solution to the system is x = (−5, −3, 0, −1, −4).
The following theorem shows how to solve a system of difference
constraints by finding shortest-path weights in the corresponding
constraint graph.
Theorem 22.9
Given a system Ax ≤ b of difference constraints, let G = ( V, E) be the corresponding constraint graph. If G contains no negative-weight cycles,
then
is a feasible solution for the system. If G contains a negative-weight cycle, then there is no feasible solution for the system.
Proof We first show that if the constraint graph contains no negative-
weight cycles, then equation (22.10) gives a feasible solution. Consider
any edge ( vi, vj) ∈ E. The triangle inequality implies that δ( v 0, vj) ≤ δ( v 0, vi) + w( vi, vj), which is equivalent to δ( v 0, vj)−δ( v 0, vi) ≤ w( vi, vj). Thus, letting xi = δ( v 0, vi) and xj = δ( v 0, vj) satisfies the difference constraint xj − xi ≤ w( vi, vj) that corresponds to edge ( vi, vj).
Now we show that if the constraint graph contains a negative-weight
cycle, then the system of difference constraints has no feasible solution.
Without loss of generality, let the negative-weight cycle be c = 〈 v 1, v 2,
… , vk〉, where v 1 = vk. (The vertex v 0 cannot be on cycle c, because it has no entering edges.) Cycle c corresponds to the following difference
constraints:
x 2 − x 1 ≤ w( v 1, v 2),
x 3 − x 2 ≤ w( v 2, v 3),
⋮
xk−1 − xk−2 ≤ w( vk−2, vk−1),
xk − xk−1 ≤ w( vk−1, vk).
We’ll assume that x has a solution satisfying each of these k inequalities and then derive a contradiction. The solution must also satisfy the
inequality that results from summing the k inequalities together. In
summing the left-hand sides, each unknown xi is added in once and
subtracted out once (remember that v 1 = vk implies x 1 = xk), so that the left-hand side sums to 0. The right-hand side sums to the weight
w( c) of the cycle, giving 0 ≤ w( c). But since c is a negative-weight cycle, w( c) < 0, and we obtain the contradiction that 0 ≤ w( c) < 0.
▪
Solving systems of difference constraints
Theorem 22.9 suggests how to use the Bellman-Ford algorithm to solve
a system of difference constraints. Because the constraint graph
contains edges from the source vertex v 0 to all other vertices, any negative-weight cycle in the constraint graph is reachable from v 0. If the
Bellman-Ford algorithm returns TRUE, then the shortest-path weights
give a feasible solution to the system. In Figure 22.8, for example, the shortest-path weights provide the feasible solution x = (−5, −3, 0, −1,
−4), and by Lemma 22.8, x = ( d − 5, d − 3, d, d − 1, d − 4) is also a feasible solution for any constant d. If the Bellman-Ford algorithm
returns FALSE, there is no feasible solution to the system of difference
constraints.
A system of difference constraints with m constraints on n unknowns produces a graph with n + 1 vertices and n + m edges. Thus, the Bellman-Ford algorithm provides a way to solve the system in O(( n + 1)
( n + m)) = O( n 2 + nm) time. Exercise 22.4-5 asks you to modify the algorithm to run in O( nm) time, even if m is much less than n.
Exercises
22.4-1
Find a feasible solution or determine that no feasible solution exists for
the following system of difference constraints:
x 1 − x 2 ≤ 1,
x 1 − x 4 ≤ −4,
x 2 − x 3 ≤ 2,
x 2 − x 5 ≤ 7,
x 2 − x 6 ≤ 5,
x 3 − x 6 ≤ 10,
x 4 − x 2 ≤ 2,
x 5 − x 1 ≤ −1,
x 5 − x 4 ≤ 3,
x 6 − x 3 ≤ −8.
22.4-2
Find a feasible solution or determine that no feasible solution exists for
the following system of difference constraints:
x 1 − x 2 ≤ 4,
x 1 − x 5 ≤ 5,
x 2 − x 4 ≤ −6,
x 3 − x 2 ≤ 1,
≤ 3,
x 4 − x 1
x 4 − x 3 ≤ 5,
x 4 − x 5 ≤ 10,
x 5 − x 3 ≤ −4,
x 5 − x 4 ≤ −8.
22.4-3
Can any shortest-path weight from the new vertex v 0 in a constraint graph be positive? Explain.
22.4-4
Express the single-pair shortest-path problem as a linear program.
22.4-5
Show how to modify the Bellman-Ford algorithm slightly so that when
using it to solve a system of difference constraints with m inequalities on
n unknowns, the running time is O( nm).
22.4-6
Consider adding equality constraints of the form xi = xj + bk to a system of difference constraints. Show how to solve this variety of
constraint system.
22.4-7
Show how to solve a system of difference constraints by a Bellman-
Ford-like algorithm that runs on a constraint graph without the extra
vertex v 0.
★ 22.4-8
Let Ax ≤ b be a system of m difference constraints in n unknowns. Show that the Bellman-Ford algorithm, when run on the corresponding
constraint graph, maximizes
subject to Ax ≤ b and xi ≤ 0 for all xi.
★ 22.4-9
Show that the Bellman-Ford algorithm, when run on the constraint graph for a system Ax ≤ b of difference constraints, minimizes the quantity (max { xi}−min { xi}) subject to Ax ≤ b. Explain how this fact might come in handy if the algorithm is used to schedule construction
jobs.
22.4-10
Suppose that every row in the matrix A of a linear program Ax ≤ b corresponds to a difference constraint, a single-variable constraint of
the form xi ≤ bk, or a single-variable constraint of the form − xi ≤ bk.
Show how to adapt the Bellman-Ford algorithm to solve this variety of
constraint system.
22.4-11
Give an efficient algorithm to solve a system Ax ≤ b of difference constraints when all of the elements of b are real-valued and all of the
unknowns xi must be integers.
★ 22.4-12
Give an efficient algorithm to solve a system Ax ≤ b of difference constraints when all of the elements of b are real-valued and a specified
subset of some, but not necessarily all, of the unknowns xi must be integers.