cycle in an undirected graph. Give a related decision problem. Give the

language corresponding to the decision problem.

34.1-3

Give a formal encoding of directed graphs as binary strings using an

adjacency-matrix representation. Do the same using an adjacency-list

representation. Argue that the two representations are polynomially

related.

34.1-4

Is the dynamic-programming algorithm for the 0-1 knapsack problem

that is asked for in Exercise 15.2-2 a polynomial-time algorithm?

Explain your answer.

34.1-5

Show that if an algorithm makes at most a constant number of calls to

polynomial-time subroutines and performs an additional amount of

work that also takes polynomial time, then it runs in polynomial time.

Image 1551

Also show that a polynomial number of calls to polynomial-time

subroutines may result in an exponential-time algorithm.

34.1-6

Show that the class P, viewed as a set of languages, is closed under

union, intersection, concatenation, complement, and Kleene star. That

is, if L 1, L 2 ∈ P, then L 1 ∪ L 2 ∈ P, L 1 ∩ L 2 ∈ P, L 1 L 2 ∈ P, L 1 ∈ P, and

.

34.2 Polynomial-time verification

Now, let’s look at algorithms that verify membership in languages. For

example, suppose that for a given instance 〈 G, u, v, k〉 of the decision problem PATH, you are also given a path p from u to v. You can check whether p is a path in G and whether the length of p is at most k, and if so, you can view p as a “certificate” that the instance indeed belongs to

PATH. For the decision problem PATH, this certificate doesn’t seem to

buy much. After all, PATH belongs to P—in fact, you can solve PATH

in linear time—and so verifying membership from a given certificate

takes as long as solving the problem from scratch. Instead, let’s examine

a problem for which we know of no polynomial-time decision algorithm

and yet, given a certificate, verification is easy.

Hamiltonian cycles

The problem of finding a hamiltonian cycle in an undirected graph has

been studied for over a hundred years. Formally, a hamiltonian cycle of

an undirected graph G = ( V, E) is a simple cycle that contains each vertex in V. A graph that contains a hamiltonian cycle is said to be hamiltonian, and otherwise, it is nonhamiltonian. The name honors W.

R. Hamilton, who described a mathematical game on the dodecahedron

(Figure 34.2(a)) in which one player sticks five pins in any five consecutive vertices and the other player must complete the path to

form a cycle containing all the vertices.8 The dodecahedron is hamiltonian, and Figure 34.2(a) shows one hamiltonian cycle. Not all

Image 1552

Image 1553

graphs are hamiltonian, however. For example, Figure 34.2(b) shows a bipartite graph with an odd number of vertices. Exercise 34.2-2 asks you

to show that all such graphs are nonhamiltonian.

Here is how to define the hamiltonian-cycle problem, “Does a graph

G have a hamiltonian cycle?” as a formal language:

HAM-CYCLE = {〈 G〉 : G is a hamiltonian graph}.

How might an algorithm decide the language HAM-CYCLE? Given a

problem instance 〈 G〉, one possible decision algorithm lists all

permutations of the vertices of G and then checks each permutation to

see whether it is a hamiltonian cycle. What is the running time of this

algorithm? It depends on the encoding of the graph G. Let’s say that G

is encoded as its adjacency matrix. If the adjacency matrix contains n

entries, so that the length of the encoding of G equals n, then the number m of vertices in the graph is

. There are m! possible

permutations of the vertices, and therefore the running time is

, which is not O( nk) for any constant k. Thus,

this naive algorithm does not run in polynomial time. In fact, the

hamiltonian-cycle problem is NP-complete, as we’ll prove in Section

34.5.

Image 1554

Figure 34.2 (a) A graph representing the vertices, edges, and faces of a dodecahedron, with a hamiltonian cycle shown by edges highlighted in blue. (b) A bipartite graph with an odd number of vertices. Any such graph is nonhamiltonian.

Verification algorithms

Consider a slightly easier problem. Suppose that a friend tells you that a

given graph G is hamiltonian, and then the friend offers to prove it by

giving you the vertices in order along the hamiltonian cycle. It would

certainly be easy enough to verify the proof: simply verify that the

provided cycle is hamiltonian by checking whether it is a permutation of

the vertices of V and whether each of the consecutive edges along the

cycle actually exists in the graph. You could certainly implement this

verification algorithm to run in O( n 2) time, where n is the length of the encoding of G. Thus, a proof that a hamiltonian cycle exists in a graph

can be verified in polynomial time.

We define a verification algorithm as being a two-argument algorithm

A, where one argument is an ordinary input string x and the other is a

binary string y called a certificate. A two-argument algorithm A verifies an input string x if there exists a certificate y such that A( x, y) = 1. The language verified by a verification algorithm A is

L = { x ∈ {0, 1}* : there exists y ∈ {0, 1}* such that A( x, y) = 1}.

Think of an algorithm A as verifying a language L if, for any string x

L, there exists a certificate y that A can use to prove that xL.

Moreover, for any string xL, there must be no certificate proving that xL. For example, in the hamiltonian-cycle problem, the certificate is the list of vertices in some hamiltonian cycle. If a graph is hamiltonian,

the hamiltonian cycle itself offers enough information to verify that the

graph is indeed hamiltonian. Conversely, if a graph is not hamiltonian,

there can be no list of vertices that fools the verification algorithm into

believing that the graph is hamiltonian, since the verification algorithm

carefully checks the so-called cycle to be sure.

The complexity class NP

The complexity class NP is the class of languages that can be verified by

a polynomial-time algorithm.9 More precisely, a language L belongs to NP if and only if there exist a two-input polynomial-time algorithm A

and a constant c such that

L = { x ∈ {0, 1}*: there exists a certificate y with | y| = O(| x| c) such that A( x, y) = 1}.

We say that algorithm A verifies language L in polynomial time.

From our earlier discussion about the hamiltonian-cycle problem,

you can see that HAM-CYCLE ∈ NP. (It is always nice to know that an

important set is nonempty.) Moreover, if L ∈ P, then L ∈ NP, since if

there is a polynomial-time algorithm to decide L, the algorithm can be

converted to a two-argument verification algorithm that simply ignores

any certificate and accepts exactly those input strings it determines to

belong to L. Thus, P ⊆ NP.

That leaves the question of whether P = NP. A definitive answer is

unknown, but most researchers believe that P and NP are not the same

class. Think of the class P as consisting of problems that can be solved

quickly and the class NP as consisting of problems for which a solution

can be verified quickly. You may have learned from experience that it is

often more difficult to solve a problem from scratch than to verify a

clearly presented solution, especially when working under time

Image 1555

constraints. Theoretical computer scientists generally believe that this

analogy extends to the classes P and NP, and thus that NP includes

languages that do not belong to P.

Figure 34.3 Four possibilities for relationships among complexity classes. In each diagram, one region enclosing another indicates a proper-subset relation. (a) P = NP = co-NP. Most researchers regard this possibility as the most unlikely. (b) If NP is closed under complement, then NP = co-NP, but it need not be the case that P = NP. (c) P = NP ∩ co-NP, but NP is not closed under complement. (d) NP ≠ co-NP and P ≠ NP ∩ co-NP. Most researchers regard this possibility as the most likely.

There is more compelling, though not conclusive, evidence that P ≠

NP—the existence of languages that are “NP-complete.” Section 34.3

will study this class.

Many other fundamental questions beyond the P ≠ NP question

remain unresolved. Figure 34.3 shows some possible scenarios. Despite much work by many researchers, no one even knows whether the class

NP is closed under complement. That is, does L ∈ NP imply L ∈ NP?

We define the complexity class co-NP as the set of languages L such that L ∈ NP, so that the question of whether NP is closed under

complement is also whether NP = co-NP. Since P is closed under

complement (Exercise 34.1-6), it follows from Exercise 34.2-9 (P ⊆ co-

NP) that P ⊆ NP ∩ co-NP. Once again, however, no one knows whether

P = NP ∩ co-NP or whether there is some language in (NP ∩ co-NP) −

P.

Image 1556

Thus our understanding of the precise relationship between P and

NP is woefully incomplete. Nevertheless, even though we might not be

able to prove that a particular problem is intractable, if we can prove

that it is NP-complete, then we have gained valuable information about

it.

Exercises

34.2-1

Consider the language GRAPH-ISOMORPHISM = {〈 G 1, G 2〉 : G 1

and G 2 are isomorphic graphs}. Prove that GRAPH-ISOMORPHISM

∈ NP by describing a polynomial-time algorithm to verify the language.

34.2-2

Prove that if G is an undirected bipartite graph with an odd number of

vertices, then G is nonhamiltonian.

34.2-3

Show that if HAM-CYCLE ∈ P, then the problem of listing the vertices

of a hamiltonian cycle, in order, is polynomial-time solvable.

34.2-4

Prove that the class NP of languages is closed under union, intersection,

concatenation, and Kleene star. Discuss the closure of NP under

complement.

34.2-5

Show that any language in NP can be decided by an algorithm with a

running time of

for some constant k.

34.2-6

A hamiltonian path in a graph is a simple path that visits every vertex

exactly once. Show that the language HAM-PATH = {〈 G, u, v〉 : there is

a hamiltonian path from u to v in graph G} belongs to NP.

34.2-7

Show that the hamiltonian-path problem from Exercise 34.2-6 can be solved in polynomial time on directed acyclic graphs. Give an efficient

algorithm for the problem.

34.2-8

Let ϕ be a boolean formula constructed from the boolean input

variables x 1, x 2, … , xk, negations (¬), ANDs (∧), ORs (∨), and parentheses. The formula ϕ is a tautology if it evaluates to 1 for every assignment of 1 and 0 to the input variables. Define TAUTOLOGY as

the language of boolean formulas that are tautologies. Show that

TAUTOLOGY ∈ co-NP.

34.2-9

Prove that P ⊆ co-NP.

34.2-10

Prove that if NP ≠ co-NP, then P ≠ NP.

34.2-11

Let G be a connected, undirected graph with at least three vertices, and

let G 3 be the graph obtained by connecting all pairs of vertices that are

connected by a path in G of length at most 3. Prove that G 3 is hamiltonian. ( Hint: Construct a spanning tree for G, and use an inductive argument.)

34.3 NP-completeness and reducibility

Perhaps the most compelling reason why theoretical computer scientists

believe that P ≠ NP comes from the existence of the class of NP-

complete problems. This class has the intriguing property that if any

NP-complete problem can be solved in polynomial time, then every

problem in NP has a polynomial-time solution, that is, P = NP. Despite

decades of study, though, no polynomial-time algorithm has ever been

discovered for any NP-complete problem.

Image 1557

Image 1558

The language HAM-CYCLE is one NP-complete problem. If there

were an algorithm to decide HAM-CYCLE in polynomial time, then

every problem in NP could be solved in polynomial time. The NP-

complete languages are, in a sense, the “hardest” languages in NP. In

fact, if NP − P turns out to be nonempty, we will be able to say with

certainty that HAM-CYCLE ∈ NP − P.

This section starts by showing how to compare the relative

“hardness” of languages using a precise notion called “polynomial-time

reducibility.” It then formally defines the NP-complete languages,

finishing by sketching a proof that one such language, called CIRCUIT-

SAT, is NP-complete. Sections 34.4 and 34.5 will use the notion of reducibility to show that many other problems are NP-complete.

Reducibility

One way that sometimes works for solving a problem is to recast it as a

different problem. We call that strategy “reducing” one problem to

another. Think of a problem Q as being reducible to another problem

Q′ if any instance of Q can be recast as an instance of Q′, and the solution to the instance of Q′ provides a solution to the instance of Q.

For example, the problem of solving linear equations in an

indeterminate x reduces to the problem of solving quadratic equations.

Given a linear-equation instance ax + b = 0 (with solution x = − b/ a), you can transform it to the quadratic equation ax 2 + bx + 0 = 0. This

quadratic equation has the solutions

, where c =

0, so that

. The solutions are then x = (− b + b)/2 a = 0 and

x = (− bb)/2 a = − b/ a, thereby providing a solution to ax + b = 0.

Thus, if a problem Q reduces to another problem Q′, then Q is, in a sense, “no harder to solve” than Q′.

Image 1559

Image 1560

Figure 34.4 A function f that reduces language L 1 to language L 2. For any input x ∈ {0, 1}*, the question of whether xL 1 has the same answer as the question of whether f ( x) ∈ L 2.

Returning to our formal-language framework for decision problems,

we say that a language L 1 is polynomial-time reducible to a language L 2, written L 1 ≤P L 2, if there exists a polynomial-time computable function f : {0, 1}* → {0, 1}* such that for all x ∈ {0, 1}*,

We call the function f the reduction function, and a polynomial-time algorithm F that computes f is a reduction algorithm.

Figure 34.4 illustrates the idea of a reduction from a language L 1 to another language L 2. Each language is a subset of {0, 1}*. The

reduction function f provides a mapping such that if xL 1, then f ( x)

L 2. Moreover, if xL 1, then f ( x) ∉ L 2. Thus, the reduction function maps any instance x of the decision problem represented by the

language L 1 to an instance f ( x) of the problem represented by L 2.

Providing an answer to whether f ( x) ∈ L 2 directly provides the answer to whether xL 1. If, in addition, f can be computed in polynomial time, it is a polynomial-time reduction function.

Polynomial-time reductions give us a powerful tool for proving that

various languages belong to P.

Lemma 34.3

Image 1561

If L 1, L 2 ⊆ {0, 1}* are languages such that L 1 ≤P L 2, then L 2 ∈ P

implies L 1 ∈ P.

Figure 34.5 The proof of Lemma 34.3. The algorithm F is a reduction algorithm that computes the reduction function f from L 1 to L 2 in polynomial time, and A 2 is a polynomial-time algorithm that decides L 2. Algorithm A 1 decides whether xL 1 by using F to transform any input x into f ( x) and then using A 2 to decide whether f ( x) ∈ L 2.

Proof Let A 2 be a polynomial-time algorithm that decides L 2, and let F be a polynomial-time reduction algorithm that computes the

reduction function f. We show how to construct a polynomial-time

algorithm A 1 that decides L 1.

Figure 34.5 illustrates how we construct A 1. For a given input x

{0, 1}*, algorithm A 1 uses F to transform x into f ( x), and then it uses A 2 to test whether f ( x) ∈ L 2. Algorithm A 1 takes the output from algorithm A 2 and produces that answer as its own output.

The correctness of A 1 follows from condition (34.1). The algorithm

runs in polynomial time, since both F and A 2 run in polynomial time

(see Exercise 34.1-5).

NP-completeness

Polynomial-time reductions allow us to formally show that one problem

is at least as hard as another, to within a polynomial-time factor. That

is, if L 1 ≤P L 2, then L 1 is not more than a polynomial factor harder than L 2, which is why the “less than or equal to” notation for reduction

is mnemonic. We can now define the set of NP-complete languages,

which are the hardest problems in NP.

Image 1562

A language L ⊆ {0, 1}* is NP-complete if

1. L ∈ NP, and

2. L′ ≤P L for every L′ ∈ NP.

If a language L satisfies property 2, but not necessarily property 1, we

say that L is NP-hard. We also define NPC to be the class of NP-complete languages.

As the following theorem shows, NP-completeness is at the crux of

deciding whether P is in fact equal to NP.

Theorem 34.4

If any NP-complete problem is polynomial-time solvable, then P = NP.

Equivalently, if any problem in NP is not polynomial-time solvable, then

no NP-complete problem is polynomial-time solvable.

Figure 34.6 How most theoretical computer scientists view the relationships among P, NP, and NPC. Both P and NPC are wholly contained within NP, and P ∩ NPC = Ø.

Proof Suppose that L ∈ P and also that L ∈ NPC. For any L′ ∈ NP, we have L′ ≤P L by property 2 of the definition of NP-completeness.

Thus, by Lemma 34.3, we also have that L′ ∈ P, which proves the first

statement of the theorem.

To prove the second statement, consider the contrapositive of the

first statement: if P ≠ NP, then there does not exist an NP-complete

problem that is polynomial-time solvable. But P ≠ NP means that there

is some problem in NP that is not polynomial-time solvable, and hence

the second statement is the contrapositive of the first statement.

Image 1563

It is for this reason that research into the P ≠ NP question centers

around the NP-complete problems. Most theoretical computer scientists

believe that P ≠ NP, which leads to the relationships among P, NP, and

NPC shown in Figure 34.6. For all we know, however, someone may yet come up with a polynomial-time algorithm for an NP-complete

problem, thus proving that P = NP. Nevertheless, since no polynomial-

time algorithm for any NP-complete problem has yet been discovered, a

proof that a problem is NP-complete provides excellent evidence that it

is intractable.

Circuit satisfiability

We have defined the notion of an NP-complete problem, but up to this

point, we have not actually proved that any problem is NP-complete.

Once we prove that at least one problem is NP-complete, polynomial-

time reducibility becomes a tool to prove other problems to be NP-

complete. Thus, we now focus on demonstrating the existence of an NP-

complete problem: the circuit-satisfiability problem.

Unfortunately, the formal proof that the circuit-satisfiability problem

is NP-complete requires technical detail beyond the scope of this text.

Instead, we’ll informally describe a proof that relies on a basic

understanding of boolean combinational circuits.

Figure 34.7 Three basic logic gates, with binary inputs and outputs. Under each gate is the truth table that describes the gate’s operation. (a) The NOT gate. (b) The AND gate. (c) The OR gate.

Boolean combinational circuits are built from boolean

combinational elements that are interconnected by wires. A boolean

combinational element is any circuit element that has a constant number

of boolean inputs and outputs and that performs a well-defined function. Boolean values are drawn from the set {0, 1}, where 0

represents FALSE and 1 represents TRUE.

The boolean combinational elements appearing in the circuit-

satisfiability problem compute simple boolean functions, and they are

known as logic gates. Figure 34.7 shows the three basic logic gates used in the circuit-satisfiability problem: the NOT gate (or inverter), the AND

gate, and the OR gate. The NOT gate takes a single binary input x, whose value is either 0 or 1, and produces a binary output z whose value is opposite that of the input value. Each of the other two gates takes two

binary inputs x and y and produces a single binary output z.

The operation of each gate, or of any boolean combinational

element, is defined by a truth table, shown under each gate in Figure

34.7. A truth table gives the outputs of the combinational element for

each possible setting of the inputs. For example, the truth table for the

OR gate says that when the inputs are x = 0 and y = 1, the output value is z = 1. The symbol ¬ denotes the NOT function, ∧ denotes the AND

function, and ∨ denotes the OR function. Thus, for example, 0 ∨ 1 = 1.

AND and OR gates are not limited to just two inputs. An AND

gate’s output is 1 if all of its inputs are 1, and its output is 0 otherwise.

An OR gate’s output is 1 if any of its inputs are 1, and its output is 0

otherwise.

A boolean combinational circuit consists of one or more boolean

combinational elements interconnected by wires. A wire can connect the

output of one element to the input of another, so that the output value

of the first element becomes an input value of the second. Figure 34.8

shows two similar boolean combinational circuits, differing in only one

gate. Part (a) of the figure also shows the values on the individual wires,

given the input 〈 x 1 = 1, x 2 = 1, x 3 = 0〉. Although a single wire may have no more than one combinational-element output connected to it, it

can feed several element inputs. The number of element inputs fed by a

wire is called the fan-out of the wire. If no element output is connected

to a wire, the wire is a circuit input, accepting input values from an external source. If no element input is connected to a wire, the wire is a

circuit output, providing the results of the circuit’s computation to the

Image 1564

outside world. (An internal wire can also fan out to a circuit output.)

For the purpose of defining the circuit-satisfiability problem, we limit

the number of circuit outputs to 1, though in actual hardware design, a

boolean combinational circuit may have multiple outputs.

Figure 34.8 Two instances of the circuit-satisfiability problem. (a) The assignment 〈 x 1 = 1, x 2 =

1, x 3 = 0〉 to the inputs of this circuit causes the output of the circuit to be 1. The circuit is therefore satisfiable. (b) No assignment to the inputs of this circuit can cause the output of the circuit to be 1. The circuit is therefore unsatisfiable.

Boolean combinational circuits contain no cycles. In other words, for

a given combinational circuit, imagine a directed graph G = ( V, E) with one vertex for each combinational element and with k directed edges for

each wire whose fan-out is k, where the graph contains a directed edge

( u, v) if a wire connects the output of element u to an input of element v.

Then G must be acyclic.

A truth assignment for a boolean combinational circuit is a set of

boolean input values. We say that a 1-output boolean combinational

circuit is satisfiable if it has a satisfying assignment: a truth assignment that causes the output of the circuit to be 1. For example, the circuit in

Figure 34.8(a) has the satisfying assignment 〈 x 1 = 1, x 2 = 1, x 3 = 0〉, and so it is satisfiable. As Exercise 34.3-1 asks you to show, no

assignment of values to x 1, x 2, and x 3 causes the circuit in Figure

34.8(b) to produce a 1 output. Since it always produces 0, it is

unsatisfiable.

The circuit-satisfiability problem is, “Given a boolean combinational

circuit composed of AND, OR, and NOT gates, is it satisfiable?” In

order to pose this question formally, however, we must agree on a standard encoding for circuits. The size of a boolean combinational

circuit is the number of boolean combinational elements plus the

number of wires in the circuit. We could devise a graph-like encoding

that maps any given circuit C into a binary string 〈 C〉 whose length is

polynomial in the size of the circuit itself. As a formal language, we can

therefore define

CIRCUIT-SAT = {〈 C〉 : C is a satisfiable boolean combinational

circuit}.

The circuit-satisfiability problem arises in the area of computer-aided

hardware optimization. If a subcircuit always produces 0, that subcircuit

is unnecessary: the designer can replace it by a simpler subcircuit that

omits all logic gates and provides the constant 0 value as its output. You

can see the value in having a polynomial-time algorithm for this

problem.

Given a circuit C, you can determine whether it is satisfiable by

simply checking all possible assignments to the inputs. Unfortunately, if

the circuit has k inputs, then you would have to check up to 2 k possible assignments. When the size of C is polynomial in k, checking all possible assignments to the inputs takes Ω(2 k) time, which is

superpolynomial in the size of the circuit.10 In fact, as we have claimed, there is strong evidence that no polynomial-time algorithm exists that

solves the circuit-satisfiability problem because circuit satisfiability is

NP-complete. We break the proof of this fact into two parts, based on

the two parts of the definition of NP-completeness.

Lemma 34.5

The circuit-satisfiability problem belongs to the class NP.

Proof We provide a two-input, polynomial-time algorithm A that can

verify CIRCUIT-SAT. One of the inputs to A is (a standard encoding

of) a boolean combinational circuit C. The other input is a certificate

corresponding to an assignment of a boolean value to each of the wires

in C. (See Exercise 34.3-4 for a smaller certificate.)

The algorithm A works as follows. For each logic gate in the circuit, it checks that the value provided by the certificate on the output wire is

correctly computed as a function of the values on the input wires. Then,

if the output of the entire circuit is 1, algorithm A outputs 1, since the

values assigned to the inputs of C provide a satisfying assignment.

Otherwise, A outputs 0.

Whenever a satisfiable circuit C is input to algorithm A, there exists a certificate whose length is polynomial in the size of C and that causes A

to output a 1. Whenever an unsatisfiable circuit is input, no certificate

can fool A into believing that the circuit is satisfiable. Algorithm A runs in polynomial time, and with a good implementation, linear time

suffices. Thus, CIRCUIT-SAT is verifiable in polynomial time, and

CIRCUIT-SAT ∈ NP.

The second part of proving that CIRCUIT-SAT is NP-complete is to

show that the language is NP-hard: that every language in NP is

polynomial-time reducible to CIRCUIT-SAT. The actual proof of this

fact is full of technical intricacies, and so instead we’ll sketch the proof

based on some understanding of the workings of computer hardware.

A computer program is stored in the computer’s memory as a

sequence of instructions. A typical instruction encodes an operation to

be performed, addresses of operands in memory, and an address where

the result is to be stored. A special memory location, called the program

counter, keeps track of which instruction is to be executed next. The program counter automatically increments when each instruction is

fetched, thereby causing the computer to execute instructions

sequentially. Certain instructions can cause a value to be written to the

program counter, however, which alters the normal sequential execution

and allows the computer to loop and perform conditional branches.

At any point while a program executes, the computer’s memory

holds the entire state of the computation. (Consider the memory to

include the program itself, the program counter, working storage, and

any of the various bits of state that a computer maintains for

bookkeeping.) We call any particular state of computer memory a

configuration. When an instruction executes, it transforms the

configuration. Think of an instruction as mapping one configuration to another. The computer hardware that accomplishes this mapping can be

implemented as a boolean combinational circuit, which we denote by M

in the proof of the following lemma.

Lemma 34.6

The circuit-satisfiability problem is NP-hard.

Proof Let L be any language in NP. We’ll describe a polynomial-time

algorithm F computing a reduction function f that maps every binary

string x to a circuit C = f ( x) such that xL if and only if C

CIRCUIT-SAT.

Since L ∈ NP, there must exist an algorithm A that verifies L in polynomial time. The algorithm F that we construct uses the two-input

algorithm A to compute the reduction function f.

Let T ( n) denote the worst-case running time of algorithm A on length- n input strings, and let k ≥ 1 be a constant such that T ( n) =

O( nk) and the length of the certificate is O( nk). (The running time of A is actually a polynomial in the total input size, which includes both an

input string and a certificate, but since the length of the certificate is

polynomial in the length n of the input string, the running time is polynomial in n.)

Image 1565

Figure 34.9 The sequence of configurations produced by an algorithm A running on an input x and certificate y. Each configuration represents the state of the computer for one step of the computation and, besides A, x, and y, includes the program counter (PC), auxiliary machine state, and working storage. Except for the certificate y, the initial configuration c 0 is constant. A boolean combinational circuit M maps each configuration to the next configuration. The output is a distinguished bit in the working storage.

The basic idea of the proof is to represent the computation of A as a

sequence of configurations. As Figure 34.9 illustrates, consider each configuration as comprising a few parts: the program for A, the

program counter and auxiliary machine state, the input x, the certificate

y, and working storage. The combinational circuit M, which implements

the computer hardware, maps each configuration ci to the next configuration ci+1, starting from the initial configuration c 0. Algorithm A writes its output—0 or 1—to some designated location by the time it

finishes executing. After A halts, the output value never changes. Thus,

if the algorithm runs for at most T ( n) steps, the output appears as one of the bits in cT( n).

The reduction algorithm F constructs a single combinational circuit

that computes all configurations produced by a given initial

configuration. The idea is to paste together T ( n) copies of the circuit M.

The output of the i th circuit, which produces configuration ci, feeds directly into the input of the ( i +1)st circuit. Thus, the configurations,

rather than being stored in the computer’s memory, simply reside as

values on the wires connecting copies of M.

Recall what the polynomial-time reduction algorithm F must do.

Given an input x, it must compute a circuit C = f ( x) that is satisfiable if and only if there exists a certificate y such that A( x, y) = 1. When F

obtains an input x, it first computes n = | x| and constructs a combinational circuit C′ consisting of T ( n) copies of M. The input to C

is an initial configuration corresponding to a computation on A( x, y), and the output is the configuration cT( n).

Algorithm F modifies circuit C′ slightly to construct the circuit C = f ( x). First, it wires the inputs to C′ corresponding to the program for A, the initial program counter, the input x, and the initial state of memory

directly to these known values. Thus, the only remaining inputs to the

circuit correspond to the certificate y. Second, it ignores all outputs from C′, except for the one bit of cT( n) corresponding to the output of A. This circuit C, so constructed, computes C( y) = A( x, y) for any input y of length O( nk). The reduction algorithm F, when provided an input string x, computes such a circuit C and outputs it.

We need to prove two properties. First, we must show that F

correctly computes a reduction function f. That is, we must show that C

is satisfiable if and only if there exists a certificate y such that A( x, y) =

1. Second, we must show that F runs in polynomial time.

To show that F correctly computes a reduction function, suppose that there exists a certificate y of length O( nk) such that A( x, y) = 1.

Then, upon applying the bits of y to the inputs of C, the output of C is C( y) = A( x, y) = 1. Thus, if a certificate exists, then C is satisfiable. For the other direction, suppose that C is satisfiable. Hence, there exists an

input y to C such that C( y) = 1, from which we conclude that A( x, y) =

1. Thus, F correctly computes a reduction function.

To complete the proof sketch, we need to show that F runs in time

polynomial in n = | x|. First, the number of bits required to represent a configuration is polynomial in n. Why? The program for A itself has constant size, independent of the length of its input x. The length of the

input x is n, and the length of the certificate y is O( nk). Since the algorithm runs for at most O( nk) steps, the amount of working storage

required by A is polynomial in n as well. (We implicitly assume that this memory is contiguous. Exercise 34.3-5 asks you to extend the argument

to the situation in which the locations accessed by A are scattered across

a much larger region of memory and the particular pattern of scattering

can differ for each input x.)

The combinational circuit M implementing the computer hardware

has size polynomial in the length of a configuration, which is O( nk), and hence, the size of M is polynomial in n. (Most of this circuitry implements the logic of the memory system.) The circuit C consists of

O( nk) copies of M, and hence it has size polynomial in n. The reduction algorithm F can construct C from x in polynomial time, since each step of the construction takes polynomial time.

The language CIRCUIT-SAT is therefore at least as hard as any

language in NP, and since it belongs to NP, it is NP-complete.

Theorem 34.7

The circuit-satisfiability problem is NP-complete.

Proof Immediate from Lemmas 34.5 and 34.6 and from the definition

of NP-completeness.

Exercises

34.3-1

Verify that the circuit in Figure 34.8(b) is unsatisfiable.

34.3-2

Show that the ≤P relation is a transitive relation on languages. That is,

show that if L 1 ≤P L 2 and L 2 ≤P L 3, then L 1 ≤P L 3.

34.3-3

Prove that L ≤P L if and only if L ≤P L.

34.3-4

Show that an alternative proof of Lemma 34.5 can use a satisfying

assignment as a certificate. Which certificate makes for an easier proof?

34.3-5

The proof of Lemma 34.6 assumes that the working storage for

algorithm A occupies a contiguous region of polynomial size. Where

does the proof exploit this assumption? Argue that this assumption does

not involve any loss of generality.

34.3-6

A language L is complete for a language class C with respect to polynomial-time reductions if LC and L′ ≤P L for all L′ ∈ C. Show that Ø and {0, 1}* are the only languages in P that are not complete for

P with respect to polynomial-time reductions.

34.3-7

Show that, with respect to polynomial-time reductions (see Exercise

34.3-6), L is complete for NP if and only if L is complete for co-NP.

34.3-8

The reduction algorithm F in the proof of Lemma 34.6 constructs the

circuit C = f ( x) based on knowledge of x, A, and k. Professor Sartre

observes that the string x is input to F, but only the existence of A, k, and the constant factor implicit in the O( nk) running time is known to F

(since the language L belongs to NP), not their actual values. Thus, the

professor concludes that F cannot possibly construct the circuit C and

that the language CIRCUIT-SAT is not necessarily NP-hard. Explain

the flaw in the professor’s reasoning.

34.4 NP-completeness proofs

The proof that the circuit-satisfiability problem is NP-complete showed

directly that L ≤P CIRCUIT-SAT for every language L ∈ NP. This section shows how to prove that languages are NP-complete without

directly reducing every language in NP to the given language. We’ll

explore examples of this methodology by proving that various formula-

satisfiability problems are NP-complete. Section 34.5 provides many more examples.

The following lemma provides a foundation for showing that a given

language is NP-complete.

Lemma 34.8

If L is a language such that L′ ≤P L for some L′ ∈ NPC, then L is NP-hard. If, in addition, we have L ∈ NP, then L ∈ NPC.

Proof Since L′ is NP-complete, for all L″ ∈ NP, we have L″ ≤P L′. By supposition, we have L′ ≤P L, and thus by transitivity (Exercise 34.3-2), we have L″ ≤P L, which shows that L is NP-hard. If L ∈ NP, we also have L ∈ NPC.

In other words, by reducing a known NP-complete language L′ to L,

we implicitly reduce every language in NP to L. Thus, Lemma 34.8

provides a method for proving that a language L is NP-complete:

1. Prove L ∈ NP.

2. Prove that L is NP-hard:

a. Select a known NP-complete language L′.

b. Describe an algorithm that computes a function f mapping

every instance x ∈ {0, 1}* of L′ to an instance f ( x) of L.

c. Prove that the function f satisfies xL′ if and only if f ( x) ∈

L for all x ∈ {0, 1}*.

d. Prove that the algorithm computing f runs in polynomial time.

This methodology of reducing from a single known NP-complete

language is far simpler than the more complicated process of showing

directly how to reduce from every language in NP. Proving CIRCUIT-

SAT ∈ NPC furnishes a starting point. Knowing that the circuit-

satisfiability problem is NP-complete makes it much easier to prove that

other problems are NP-complete. Moreover, as the catalog of known

NP-complete problems grows, so will the choices for languages from

which to reduce.

Formula satisfiability

To illustrate the reduction methodology, let’s see an NP-completeness

proof for the problem of determining whether a boolean formula, not a

circuit, is satisfiable. This problem has the historical honor of being the

first problem ever shown to be NP-complete.

We formulate the (formula) satisfiability problem in terms of the

language SAT as follows. An instance of SAT is a boolean formula ϕ

composed of

1. n boolean variables: x 1, x 2, … , xn;

2. m boolean connectives: any boolean function with one or two

inputs and one output, such as ∧ (AND), ∨ (OR), ¬ (NOT), →

(implication), ↔ (if and only if); and

3. parentheses. (Without loss of generality, assume that there are no

redundant parentheses, i.e., a formula contains at most one pair

of parentheses per boolean connective.)

Image 1566

We can encode a boolean formula ϕ in a length that is polynomial in n

+ m. As in boolean combinational circuits, a truth assignment for a boolean formula ϕ is a set of values for the variables of ϕ, and a satisfying assignment is a truth assignment that causes it to evaluate to

1. A formula with a satisfying assignment is a satisfiable formula. The

satisfiability problem asks whether a given boolean formula is

satisfiable, which we can express in formal-language terms as

SAT = {〈 ϕ〉 : ϕ is a satisfiable boolean formula}.

As an example, the formula

ϕ = (( x 1 → x 2) ∨ ¬((¬ x 1 ↔ x 3) ∨ x 4)) ∧ ¬ x 2

has the satisfying assignment 〈 x 1 = 0, x 2 = 0, x 3 = 1, x 4 = 1〉, since and thus this formula ϕ belongs to SAT.

The naive algorithm to determine whether an arbitrary boolean

formula is satisfiable does not run in polynomial time. A formula with n

variables has 2 n possible assignments. If the length of 〈 ϕ〉 is polynomial in n, then checking every assignment requires Ω(2 n) time, which is superpolynomial in the length of 〈 ϕ〉. As the following theorem shows, a

polynomial-time algorithm is unlikely to exist.

Theorem 34.9

Satisfiability of boolean formulas is NP-complete.

Proof We start by arguing that SAT ∈ NP. Then we prove that SAT is

NP-hard by showing that CIRCUIT-SAT ≤P SAT, which by Lemma

34.8 will prove the theorem.

To show that SAT belongs to NP, we show that a certificate

consisting of a satisfying assignment for an input formula ϕ can be

Image 1567

verified in polynomial time. The verifying algorithm simply replaces

each variable in the formula with its corresponding value and then

evaluates the expression, much as we did in equation (34.2) above. This

task can be done in polynomial time. If the expression evaluates to 1,

then the algorithm has verified that the formula is satisfiable. Thus, SAT

belongs to NP.

To prove that SAT is NP-hard, we show that CIRCUIT-SAT ≤P

SAT. In other words, we need to show how to reduce any instance of

circuit satisfiability to an instance of formula satisfiability in polynomial

time. We can use induction to express any boolean combinational circuit

as a boolean formula. We simply look at the gate that produces the

circuit output and inductively express each of the gate’s inputs as

formulas. We then obtain the formula for the circuit by writing an

expression that applies the gate’s function to its inputs’ formulas.

Figure 34.10 Reducing circuit satisfiability to formula satisfiability. The formula produced by the reduction algorithm has a variable for each wire in the circuit and a clause for each logic gate.

Unfortunately, this straightforward method does not amount to a

polynomial-time reduction. As Exercise 34.4-1 asks you to show, shared

subformulas—which arise from gates whose output wires have fan-out

of 2 or more—can cause the size of the generated formula to grow

exponentially. Thus, the reduction algorithm must be somewhat more

clever.

Figure 34.10 illustrates how to overcome this problem, using as an example the circuit from Figure 34.8(a). For each wire xi in the circuit C, the formula ϕ has a variable xi. To express how each gate operates,

construct a small formula involving the variables of its incident wires.

The formula has the form of an “if and only if” (↔), with the variable

for the gate’s output on the left and on the right a logical expression

encapsulating the gate’s function on its inputs. For example, the

operation of the output AND gate (the rightmost gate in the figure) is

x 10 ↔ ( x 7 ∧ x 8 ∧ x 9). We call each of these small formulas a clause.

The formula ϕ produced by the reduction algorithm is the AND of

the circuit-output variable with the conjunction of clauses describing

the operation of each gate. For the circuit in the figure, the formula is

ϕ = x10 ∧ ( x 4 ↔ ¬ x 3)

∧ ( x 5 ↔ ( x 1 ∨ x 2))

∧ ( x 6 ↔ ¬ x 4)

∧ ( x 7 ↔ ( x 1 ∧ x 2 ∧ x 4))

∧ ( x 8 ↔ ( x 5 ∨ x 6))

∧ ( x 9 ↔ ( x 6 ∨ x 7))

∧ ( x 10 ↔ ( x 7 ∧ x 8 ∧ x 9)).

Given a circuit C, it is straightforward to produce such a formula ϕ in polynomial time.

Why is the circuit C satisfiable exactly when the formula ϕ is satisfiable? If C has a satisfying assignment, then each wire of the circuit

has a well-defined value, and the output of the circuit is 1. Therefore,

when wire values are assigned to variables in ϕ, each clause of ϕ

evaluates to 1, and thus the conjunction of all evaluates to 1.

Conversely, if some assignment causes ϕ to evaluate to 1, the circuit C is satisfiable by an analogous argument. Thus, we have shown that

CIRCUIT-SAT ≤P SAT, which completes the proof.

3-CNF satisfiability

Reducing from formula satisfiability gives us an avenue to prove many problems NP-complete. The reduction algorithm must handle any input

formula, though, and this requirement can lead to a huge number of

cases to consider. Instead, it is usually simpler to reduce from a

restricted language of boolean formulas. Of course, the restricted

language must not be polynomial-time solvable. One convenient

language is 3-CNF satisfiability, or 3-CNF-SAT.

In order to define 3-CNF satisfiability, we first need to define a few

terms. A literal in a boolean formula is an occurrence of a variable (such

as x 1) or its negation (¬ x 1). A clause is the OR of one or more literals, such as x 1 ∨ ¬ x 2 ∨ ¬ x 3. A boolean formula is in conjunctive normal form, or CNF, if it is expressed as an AND of clauses, and it’s in 3-conjunctive normal form, or 3-CNF, if each clause contains exactly three distinct literals.

For example, the boolean formula

( x 1 ∨ ¬ x 1 ∨ ¬ x 2) ∧ ( x 3 ∨ x 2 ∨ x 4) ∧ (¬ x 1 ∨ ¬ x 3 ∨ ¬ x 4) is in 3-CNF. The first of its three clauses is ( x 1 ∨ ¬ x 1 ∨ ¬ x 2), which contains the three literals x 1, ¬ x 1, and ¬ x 2.

The language 3-CNF-SAT consists of encodings of boolean

formulas in 3-CNF that are satisfiable. The following theorem shows

that a polynomial-time algorithm that can determine the satisfiability of

boolean formulas is unlikely to exist, even when they are expressed in

this simple normal form.

Theorem 34.10

Satisfiability of boolean formulas in 3-conjunctive normal form is NP-

complete.

Proof The argument from the proof of Theorem 34.9 to show that SAT

∈ NP applies equally well here to show that 3-CNF-SAT ∈ NP. By

Lemma 34.8, therefore, we need only show that SAT ≤P 3-CNF-SAT.

Image 1568

Image 1569

Figure 34.11 The tree corresponding to the formula ϕ = (( x 1 → x 2)∨¬((¬ x 1 ↔ x 3)∨ x 4))∧¬ x 2.

We break the reduction algorithm into three basic steps. Each step

progressively transforms the input formula ϕ closer to the desired 3-

conjunctive normal form.

The first step is similar to the one used to prove CIRCUIT-SAT ≤P

SAT in Theorem 34.9. First, construct a binary “parse” tree for the

input formula ϕ, with literals as leaves and connectives as internal

nodes. Figure 34.11 shows such a parse tree for the formula

If the input formula contains a clause such as the OR of several literals,

use associativity to parenthesize the expression fully so that every

internal node in the resulting tree has just one or two children. The

binary parse tree is like a circuit for computing the function.

Mimicking the reduction in the proof of Theorem 34.9, introduce a

variable yi for the output of each internal node. Then rewrite the

original formula ϕ as the AND of the variable at the root of the parse

tree and a conjunction of clauses describing the operation of each node.

For the formula (34.3), the resulting expression is

∧ ( y 1 ↔ ( y 2 ∧ ¬ x 2))

Image 1570

Image 1571

Image 1572

Image 1573

Image 1574

Image 1575

ϕ′ = y 1

∧ ( y 2 ↔ ( y 3 ∨ y 4))

∧ ( y 3 ↔ ( x 1 → x 2))

∧ ( y 4 ↔ ¬ y 5)

∧ ( y 5 ↔ ( y 6 ∨ x 4))

∧ ( y 6 ↔ (¬ x 1 ↔ x 3)).

Figure 34.12 The truth table for the clause ( y 1 ↔ ( y 2 ∧ ¬ x 2)).

The formula ϕ′ thus obtained is a conjunction of clauses , each of

which has at most three literals. These clauses are not yet ORs of three

literals.

The second step of the reduction converts each clause into

conjunctive normal form. Construct a truth table for by evaluating all

possible assignments to its variables. Each row of the truth table consists

of a possible assignment of the variables of the clause, together with the

value of the clause under that assignment. Using the truth-table entries

that evaluate to 0, build a formula in disjunctive normal form (or DNF)

—an OR of ANDs—that is equivalent to ¬ . Then negate this formula

and convert it into a CNF formula by using DeMorgan’s laws for

propositional logic,

¬( ab) = ¬ a ∨ ¬ b,

¬( ab) = ¬ a ∧ ¬ b,

Image 1576

Image 1577

Image 1578

Image 1579

Image 1580

Image 1581

Image 1582

Image 1583

to complement all literals, change ORs into ANDs, and change ANDs

into ORs.

In our example, the clause

converts into CNF

as follows. The truth table for appears in Figure 34.12. The DNF

formula equivalent to ¬ is

( y 1 ∧ y 2 ∧ x 2) ∨ ( y 1 ∧ ¬ y 2 ∧ x 2) ∨ ( y 1 ∧ ¬ y 2 ∧ ¬ x 2) ∨ (¬ y 1 ∧ y 2 ∧

¬ x 2).

Negating and applying DeMorgan’s laws yields the CNF formula

which is equivalent to the original clause .

At this point, each clause of the formula ϕ′ has been converted

into a CNF formula , and thus ϕ′ is equivalent to the CNF formula

ϕ″ consisting of the conjunction of the . Moreover, each clause of ϕ

has at most three literals.

The third and final step of the reduction further transforms the

formula so that each clause has exactly three distinct literals. From the

clauses of the CNF formula ϕ″, construct the final 3-CNF formula ϕ‴.

This formula also uses two auxiliary variables, p and q. For each clause Ci of ϕ″, include the following clauses in ϕ‴:

If Ci contains three distinct literals, then simply include Ci as a

clause of ϕ‴.

If Ci contains exactly two distinct literals, that is, if Ci = ( l 1 ∨ l 2), where l 1 and l 2 are literals, then include ( l 1 ∨ l 2 ∨ p) ∧ ( l 1 ∨ l 2 ∨

¬ p) as clauses of ϕ‴. The literals p and ¬ p merely fulfill the syntactic requirement that each clause of ϕ‴ contain exactly three

distinct literals. Whether p = 0 or p = 1, one of the clauses is equivalent to l 1 ∨ l 2, and the other evaluates to 1, which is the

identity for AND.

If Ci contains just one distinct literal l, then include ( lpq)∧( l

p ∨ ¬ q) ∧ ( l ∨ ¬ pq) ∧ ( l ∨ ¬ p ∨ ¬ q) as clauses of ϕ‴.

Regardless of the values of p and q, one of the four clauses is equivalent to l, and the other three evaluate to 1.

We can see that the 3-CNF formula ϕ‴ is satisfiable if and only if ϕ is satisfiable by inspecting each of the three steps. Like the reduction from

CIRCUIT-SAT to SAT, the construction of ϕ′ from ϕ in the first step

preserves satisfiability. The second step produces a CNF formula ϕ

that is algebraically equivalent to ϕ′. Then the third step produces a 3-

CNF formula ϕ‴ that is effectively equivalent to ϕ″, since any assignment to the variables p and q produces a formula that is algebraically equivalent to ϕ″.

We must also show that the reduction can be computed in

polynomial time. Constructing ϕ′ from ϕ introduces at most one variable and one clause per connective in ϕ. Constructing ϕ″ from ϕ

can introduce at most eight clauses into ϕ″ for each clause from ϕ′, since each clause of ϕ′ contains at most three variables, and the truth

table for each clause has at most 23 = 8 rows. The construction of ϕ

from ϕ″ introduces at most four clauses into ϕ‴ for each clause of ϕ″.

Thus the size of the resulting formula ϕ‴ is polynomial in the length of

the original formula. Each of the constructions can be accomplished in

polynomial time.

Exercises

34.4-1

Consider the straightforward (nonpolynomial-time) reduction in the

proof of Theorem 34.9. Describe a circuit of size n that, when converted

to a formula by this method, yields a formula whose size is exponential

in n.

34.4-2

Show the 3-CNF formula that results upon using the method of Theorem 34.10 on the formula (34.3).

34.4-3

Professor Jagger proposes to show that SAT ≤P 3-CNF-SAT by using

only the truth-table technique in the proof of Theorem 34.10, and not

the other steps. That is, the professor proposes to take the boolean

formula ϕ, form a truth table for its variables, derive from the truth table a formula in 3-DNF that is equivalent to ¬ ϕ, and then negate and

apply DeMorgan’s laws to produce a 3-CNF formula equivalent to ϕ.

Show that this strategy does not yield a polynomial-time reduction.

34.4-4

Show that the problem of determining whether a boolean formula is a

tautology is complete for co-NP. ( Hint: See Exercise 34.3-7.)

34.4-5

Show that the problem of determining the satisfiability of boolean

formulas in disjunctive normal form is polynomial-time solvable.

34.4-6

Someone gives you a polynomial-time algorithm to decide formula

satisfiability. Describe how to use this algorithm to find satisfying

assignments in polynomial time.

34.4-7

Let 2-CNF-SAT be the set of satisfiable boolean formulas in CNF with

exactly two literals per clause. Show that 2-CNF-SAT ∈ P. Make your

algorithm as efficient as possible. ( Hint: Observe that xy is equivalent to ¬ xy. Reduce 2-CNF-SAT to an efficiently solvable problem on a

directed graph.)

34.5 NP-complete problems

NP-complete problems arise in diverse domains: boolean logic, graphs,

arithmetic, network design, sets and partitions, storage and retrieval,

Image 1584

sequencing and scheduling, mathematical programming, algebra and

number theory, games and puzzles, automata and language theory,

program optimization, biology, chemistry, physics, and more. This

section uses the reduction methodology to provide NP-completeness

proofs for a variety of problems drawn from graph theory and set

partitioning.

Figure 34.13 The structure of NP-completeness proofs in Sections 34.4 and 34.5. All proofs ultimately follow by reduction from the NP-completeness of CIRCUIT-SAT.

Figure 34.13 outlines the structure of the NP-completeness proofs in this section and Section 34.4. We prove each language in the figure to be NP-complete by reduction from the language that points to it. At the

root is CIRCUIT-SAT, which we proved NP-complete in Theorem 34.7.

This section concludes with a recap of reduction strategies.

34.5.1 The clique problem

A clique in an undirected graph G = ( V, E) is a subset V′ ⊆ V of vertices, each pair of which is connected by an edge in E. In other words, a clique is a complete subgraph of G. The size of a clique is the number of vertices it contains. The clique problem is the optimization problem of finding a clique of maximum size in a graph. The

Image 1585

Image 1586

Image 1587

Image 1588

Image 1589

Image 1590

Image 1591

Image 1592

Image 1593

corresponding decision problem asks simply whether a clique of a given

size k exists in the graph. The formal definition is

CLIQUE = {〈 G, k〉 : G is a graph containing a clique of size k}.

A naive algorithm for determining whether a graph G = ( V, E) with

| V| vertices contains a clique of size k lists all k-subsets of V and checks each one to see whether it forms a clique. The running time of this

algorithm is

, which is polynomial if k is a constant. In general,

however, k could be near | V|/2, in which case the algorithm runs in superpolynomial time. Indeed, an efficient algorithm for the clique

problem is unlikely to exist.

Theorem 34.11

The clique problem is NP-complete.

Proof First, we show that CLIQUE ∈ NP. For a given graph G = ( V, E), use the set V′ ⊆ V of vertices in the clique as a certificate for G. To check whether V′ is a clique in polynomial time, check whether, for each

pair u, vV′, the edge ( u, v) belongs to E.

We next prove that 3-CNF-SAT ≤P CLIQUE, which shows that the

clique problem is NP-hard. You might be surprised that the proof

reduces an instance of 3-CNF-SAT to an instance of CLIQUE, since on

the surface logical formulas seem to have little to do with graphs.

The reduction algorithm begins with an instance of 3-CNF-SAT. Let

ϕ = C 1 ∧ C 2 ∧ ⋯ ∧ Ck be a boolean formula in 3-CNF with k clauses.

For r = 1, 2, … , k, each clause Cr contains exactly three distinct literals:

, and . We will construct a graph G such that ϕ is satisfiable if and

only if G contains a clique of size k.

We construct the undirected graph G = ( V, E) as follows. For each clause

in ϕ, place a triple of vertices

, and into V.

Add edge

into E if both of the following hold:

and are in different triples, that is, rs, and

Image 1594

Image 1595

Image 1596

Image 1597

Image 1598

Image 1599

Image 1600

Image 1601

Image 1602

Image 1603

their corresponding literals are consistent, that is, is not the

negation of .

We can build this graph from ϕ in polynomial time. As an example of

this construction, if

ϕ = ( x 1 ∨ ¬ x 2 ∨ ¬ x 3) ∧ (¬ x 1 ∨ x 2 ∨ x 3) ∧ ( x 1 ∨ x 2 ∨ x 3), then G is the graph shown in Figure 34.14.

We must show that this transformation of ϕ into G is a reduction.

First, suppose that ϕ has a satisfying assignment. Then each clause Cr

contains at least one literal that is assigned 1, and each such literal

corresponds to a vertex . Picking one such “true” literal from each

clause yields a set V′ of k vertices. We claim that V′ is a clique. For any two vertices

, where rs, both corresponding literals and

map to 1 by the given satisfying assignment, and thus the literals cannot

be complements. Thus, by the construction of G, the edge

belongs to E.

Conversely, suppose that G contains a clique V′ of size k. No edges in G connect vertices in the same triple, and so V′ contains exactly one vertex per triple. If

, then assign 1 to the corresponding literal .

Since G contains no edges between inconsistent literals, no literal and its

complement are both assigned 1. Each clause is satisfied, and so ϕ is satisfied. (Any variables that do not correspond to a vertex in the clique

may be set arbitrarily.)

Image 1604

Figure 34.14 The graph G derived from the 3-CNF formula ϕ = C 1 ∧ C 2 ∧ C 3, where C 1 = ( x 1

∨ ¬ x 2 ∨ ¬ x 3), C 2 = (¬ x 1 ∨ x 2 ∨ x 3), and C 3 = ( x 1 ∨ x 2 ∨ x 3), in reducing 3-CNF-SAT to CLIQUE. A satisfying assignment of the formula has x 2 = 0, x 3 = 1, and x 1 set to either 0 or 1.

This assignment satisfies C 1 with ¬ x 2, and it satisfies C 2 and C 3 with x 3, corresponding to the clique with blue vertices.

In the example of Figure 34.14, a satisfying assignment of ϕ has x 2 =

0 and x 3 = 1. A corresponding clique of size k = 3 consists of the vertices corresponding to ¬ x 2 from the first clause, x 3 from the second clause, and x 3 from the third clause. Because the clique contains no vertices corresponding to either x 1 or ¬ x 1, this satisfying assignment can set x 1 to either 0 or 1.

The proof of Theorem 34.11 reduced an arbitrary instance of 3-

CNF-SAT to an instance of CLIQUE with a particular structure. You

might think that we have shown only that CLIQUE is NP-hard in

graphs in which the vertices are restricted to occur in triples and in

which there are no edges between vertices in the same triple. Indeed, we

have shown that CLIQUE is NP-hard only in this restricted case, but

this proof suffices to show that CLIQUE is NP-hard in general graphs.

Why? If there were a polynomial-time algorithm that solves CLIQUE

on general graphs, it would also solve CLIQUE on restricted graphs.

Image 1605

The opposite approach—reducing instances of 3-CNF-SAT with a

special structure to general instances of CLIQUE—does not suffice,

however. Why not? Perhaps the instances of 3-CNF-SAT that we choose

to reduce from are “easy,” and so we would not have reduced an NP-

hard problem to CLIQUE.

Moreover, the reduction uses the instance of 3-CNF-SAT, but not

the solution. We would have erred if the polynomial-time reduction had

relied on knowing whether the formula ϕ is satisfiable, since we do not

know how to decide whether ϕ is satisfiable in polynomial time.

Figure 34.15 Reducing CLIQUE to VERTEX-COVER. (a) An undirected graph G = ( V, E) with clique V′ = { u, v, x, y}, shown in blue. (b) The graph G produced by the reduction algorithm that has vertex cover VV′ = { w, z}, in blue.

34.5.2 The vertex-cover problem

A vertex cover of an undirected graph G = ( V, E) is a subset V′ ⊆ V

such that if ( u, v) ∈ E, then uV′ or vV′ (or both). That is, each vertex “covers” its incident edges, and a vertex cover for G is a set of

vertices that covers all the edges in E. The size of a vertex cover is the number of vertices in it. For example, the graph in Figure 34.15(b) has a vertex cover { w, z} of size 2.

The vertex-cover problem is to find a vertex cover of minimum size in

a given graph. For this optimization problem, the corresponding

decision problem asks whether a graph has a vertex cover of a given size

k. As a language, we define

VERTEX-COVER = {〈 G, k〉 : graph G has a vertex cover of size k}.

The following theorem shows that this problem is NP-complete.

Theorem 34.12

The vertex-cover problem is NP-complete.

Proof We first show that VERTEX-COVER ∈ NP. Given a graph G =

( V, E) and an integer k, the certificate is the vertex cover V′ ⊆ V itself.

The verification algorithm affirms that | V′| = k, and then it checks, for each edge ( u, v) ∈ E, that uV′ or vV′. It is easy to verify the certificate in polynomial time.

To prove that the vertex-cover problem is NP-hard, we reduce from

the clique problem, showing that CLIQUE ≤P VERTEX-COVER. This

reduction relies on the notion of the complement of a graph. Given an

undirected graph G = ( V, E), we define the complement of G as a graph G = ( V, E), where E = {( u, v) : u, vV, uv, and ( u, v) ∉ E}. In other words, G is the graph containing exactly those edges that are not in G.

Figure 34.15 shows a graph and its complement and illustrates the reduction from CLIQUE to VERTEX-COVER.

The reduction algorithm takes as input an instance 〈 G, k〉 of the

clique problem and computes the complement G in polynomial time.

The output of the reduction algorithm is the instance 〈 G, | V| − k〉 of the vertex-cover problem. To complete the proof, we show that this

transformation is indeed a reduction: the graph G contains a clique of

size k if and only if the graph G has a vertex cover of size | V| − k.

Suppose that G contains a clique V′ ⊆ V with | V′| = k. We claim that VV′ is a vertex cover in G. Let ( u, v) be any edge in E. Then, ( u, v) ∉

E, which implies that at least one of u or v does not belong to V′, since every pair of vertices in V′ is connected by an edge of E. Equivalently, at least one of u or v belongs to VV′, which means that edge ( u, v) is covered by VV′. Since ( u, v) was chosen arbitrarily from E, every edge of E is covered by a vertex in VV′. Hence the set VV′, which has size | V| − k, forms a vertex cover for G.

Conversely, suppose that G has a vertex cover V′ ⊆ V, where | V′| =

| V| − k. Then for all u, vV, if ( u, v) ∈ E, then uV′ or vV′ or both. The contrapositive of this implication is that for all u, vV, if u

Image 1606

V′ and vV′, then ( u, v) ∈ E. In other words, VV′ is a clique, and it has size | V|−| V′| = k.

Since VERTEX-COVER is NP-complete, we don’t expect to find a

polynomial-time algorithm for finding a minimum-size vertex cover.

Section 35.1 presents a polynomial-time “approximation algorithm,”

however, which produces “approximate” solutions for the vertex-cover

problem. The size of a vertex cover produced by the algorithm is at

most twice the minimum size of a vertex cover.

Thus, you shouldn’t give up hope just because a problem is NP-

complete. You might be able to design a polynomial-time

approximation algorithm that obtains near-optimal solutions, even

though finding an optimal solution is NP-complete. Chapter 35 gives several approximation algorithms for NP-complete problems.

34.5.3 The hamiltonian-cycle problem

We now return to the hamiltonian-cycle problem defined in Section

34.2.

Theorem 34.13

The hamiltonian cycle problem is NP-complete.

Figure 34.16 The gadget used in reducing the vertex-cover problem to the hamiltonian-cycle problem. An edge ( u, v) of graph G corresponds to gadget Γ uv in the graph G′ created in the reduction. (a) The gadget, with individual vertices labeled. (b)–(d) The paths highlighted in blue are the only possible ones through the gadget that include all vertices, assuming that the only connections from the gadget to the remainder of G′ are through vertices [ u, v, 1], [ u, v, 6], [ v, u, 1], and [ v, u, 6].

Proof We first show that HAM-CYCLE ∈ NP. Given an undirected graph G = ( V, E), the certificate is the sequence of | V| vertices that makes up the hamiltonian cycle. The verification algorithm checks that

this sequence contains each vertex in V exactly once and that with the

first vertex repeated at the end, it forms a cycle in G. That is, it checks

that there is an edge between each pair of consecutive vertices and

between the first and last vertices. This certificate can be verified in

polynomial time.

We now prove that VERTEX-COVER ≤P HAM-CYCLE, which

shows that HAM-CYCLE is NP-complete. Given an undirected graph

G = ( V, E) and an integer k, we construct an undirected graph G′ = ( V′, E′) that has a hamiltonian cycle if and only if G has a vertex cover of size k. We assume without loss of generality that G contains no isolated vertices (that is, every vertex in V has at least one incident edge) and that k ≤ | V|. (If an isolated vertex belongs to a vertex cover of size k, then there also exists a vertex cover of size k − 1, and for any graph, the

entire set V is always a vertex cover.)

Our construction uses a gadget, which is a piece of a graph that

enforces certain properties. Figure 34.16(a) shows the gadget we use.

For each edge ( u, v) ∈ E, the constructed graph G′ contains one copy of this gadget, which we denote by Γ uv. We denote each vertex in Γ uv by

[ u, v, i] or [ v, u, i], where 1 ≤ i ≤ 6, so that each gadget Γ uv contains 12

vertices. Gadget Γ uv also contains the 14 edges shown in Figure

34.16(a).

Along with the internal structure of the gadget, we enforce the

properties we want by limiting the connections between the gadget and

the remainder of the graph G′ that we construct. In particular, only vertices [ u, v, 1], [ u, v, 6], [ v, u, 1], and [ v, u, 6] will have edges incident from outside Γ uv. Any hamiltonian cycle of G′ must traverse the edges

of Γ uv in one of the three ways shown in Figures 34.16(b)–(d). If the cycle enters through vertex [ u, v, 1], it must exit through vertex [ u, v, 6], and it either visits all 12 of the gadget’s vertices (Figure 34.16(b)) or the six vertices [ u, v, 1] through [ u, v, 6] (Figure 34.16(c)). In the latter case, the cycle will have to reenter the gadget to visit vertices [ v, u, 1] through

Image 1607

[ v, u, 6]. Similarly, if the cycle enters through vertex [ v, u, 1], it must exit through vertex [ v, u, 6], and either it visits all 12 of the gadget’s vertices (Figure 34.16(d)) or it visits the six vertices [ v, u, 1] through [ v, u, 6] and reenters to visit [ u, v, 1] through [ u, v, 6] (Figure 34.16(c)). No other paths through the gadget that visit all 12 vertices are possible. In

particular, it is impossible to construct two vertex-disjoint paths, one of

which connects [ u, v, 1] to [ v, u, 6] and the other of which connects [ v, u, 1] to [ u, v, 6], such that the union of the two paths contains all of the

gadget’s vertices.

The only other vertices in V′ other than those of gadgets are selector

vertices s 1, s 2, … , sk. We’ll use edges incident on selector vertices in G

to select the k vertices of the cover in G.

In addition to the edges in gadgets, E′ contains two other types of

edges, which Figure 34.17 shows. First, for each vertex uV, edges join pairs of gadgets in order to form a path containing all gadgets

corresponding to edges incident on u in G. We arbitrarily order the vertices adjacent to each vertex uV as u(1), u(2), … , u(degree( u)), where degree( u) is the number of vertices adjacent to u. To create a path in G′ through all the gadgets corresponding to edges incident on u, E

contains the edges {([ u, u( i), 6], [ u, u( i+1), 1]) : 1 ≤ i ≤ degree( u) − 1}. In

Figure 34.17, for example, we order the vertices adjacent to w as 〈 x, y, z〉, and so graph G′ in part (b) of the figure includes the edges ([ w, x, 6],

[ w, y, 1]) and ([ w, y, 6], [ w, z, 1]). The vertices adjacent to x are ordered as 〈 w, y〉, so that G′ includes the edge ([ x, w, 6], [ x, y, 1]). For each vertex uV, these edges in G′ fill in a path containing all gadgets corresponding to edges incident on u in G.

The intuition behind these edges is that if vertex uV belongs to

the vertex cover of G, then G′ contains a path from [ u, u(1), 1] to [ u, u(degree( u)), 6] that “covers” all gadgets corresponding to edges incident on u. That is, for each of these gadgets, say

, the path

either includes all 12 vertices (if u belongs to the vertex cover but u( i) does not) or just the six vertices [ u, u( i), 1] through [ u, u( i), 6] (if both u and u( i) belong to the vertex cover).

Image 1608

The final type of edge in E′ joins the first vertex [ u, u(1), 1] and the last vertex [ u, u(degree( u)), 6] of each of these paths to each of the selector vertices. That is, E′ includes the edges

{( sj, [ u, u(1), 1]) : uV and 1 ≤ jk}

∪ {( sj, [ u, u(degree( u)), 6]) : uV and 1 ≤ j

k}.

Figure 34.17 Reducing an instance of the vertex-cover problem to an instance of the hamiltonian-cycle problem. (a) An undirected graph G with a vertex cover of size 2, consisting of the blue vertices w and y. (b) The undirected graph G′ produced by the reduction, with the hamiltonian cycle corresponding to the vertex cover highlighted in blue. The vertex cover { w, y}

corresponds to edges ( s 1, [ w, x, 1]) and ( s 2, [ y, x, 1]) appearing in the hamiltonian cycle.

Next we show that the size of G′ is polynomial in the size of G, and

hence it takes time polynomial in the size of G to construct G′. The

Image 1609

Image 1610

vertices of G′ are those in the gadgets, plus the selector vertices. With 12

vertices per gadget, plus k ≤ | V | selector vertices, G′ contains a total of

| V′| = 12 | E| + k

≤ 12 | E| + | V|

vertices. The edges of G′ are those in the gadgets, those that go between

gadgets, and those connecting selector vertices to gadgets. Each gadget

contains 14 edges, totaling 14 | E| in all gadgets. For each vertex uV, graph G′ has degree( u) − 1 edges going between gadgets, so that summed over all vertices in V,

edges go between gadgets. Finally, G′ has two edges for each pair

consisting of a selector vertex and a vertex of V, totaling 2 k | V| such edges. The total number of edges of G′ is therefore

| E′| = (14 | E|) + (2 | E| − | V|) + (2 k | V|)

= 16 | E| + (2 k − 1) | V|

≤ 16 | E| + (2 | V| − 1) | V|.

Now we show that the transformation from graph G to G′ is a reduction. That is, we must show that G has a vertex cover of size k if and only if G′ has a hamiltonian cycle.

Suppose that G = ( V, E) has a vertex cover V* ⊆ V, where | V*| = k.

Let V* = { u 1, u 2, … , uk}. As Figure 34.17 shows, we can construct a hamiltonian cycle in G′ by including the following edges11 for each vertex

uj

V*.

Start

by

including

edges

, which connect all gadgets

corresponding to edges incident on uj. Also include the edges within

these gadgets as Figures 34.16(b)–(d) show, depending on whether the edge is covered by one or two vertices in V*. The hamiltonian cycle also

includes the edges

Image 1611

Image 1612

By inspecting Figure 34.17, you can verify that these edges form a cycle, where u 1 = w and u 2 = y. The cycle starts at s 1, visits all gadgets corresponding to edges incident on u 1, then visits s 2, visits all gadgets corresponding to edges incident on u 2, and so on, until it returns to s 1.

The cycle visits each gadget either once or twice, depending on whether

one or two vertices of V* cover its corresponding edge. Because V* is a vertex cover for G, each edge in E is incident on some vertex in V*, and so the cycle visits each vertex in each gadget of G′. Because the cycle also visits every selector vertex, it is hamiltonian.

Conversely, suppose that G′ = ( V′, E′) contains a hamiltonian cycle C

E′. We claim that the set

is a vertex cover for G.

We first argue that the set V* is well defined, that is, for each selector

vertex sj, exactly one of the incident edges in the hamiltonian cycle C is of the form ( sj, [ u, u(1), 1]) for some vertex uV. To see why, partition the hamiltonian cycle C into maximal paths that start at some selector

vertex si, visit one or more gadgets, and end at some selector vertex sj

without passing through any other selector vertex. Let’s call each of

these maximal paths a “cover path.” Let P be one such cover path, and

orient it going from si to sj. If P contains the edge ( si, [ u, u(1), 1]) for some vertex uV, then we have shown that one edge incident on si has the required form. Assume, then, that P contains the edge ( si, [ v, v(degree( v)), 6]) for some vertex vV. This path enters a gadget from the bottom, as drawn in Figures 34.16 and 34.17, and it leaves from the top. It might go through several gadgets, but it always enters from the

bottom of a gadget and leaves from the top. The only edges incident on

vertices at the top of a gadget either go to the bottoms of other gadgets or to selector vertices. Therefore, after the last gadget in the series of

gadgets visited by P, the edge taken must go to a selector vertex sj, so that P contains an edge of the form ( sj, [ u, u(1), 1]), where [ u, u(1), 1] is a vertex at the top of some gadget. To see that not both edges incident on

sj have this form, simply reverse the direction of traversing P in the above argument.

Having established that the set V* is well defined, let’s see why it is a

vertex cover for G. We have already established that each cover path starts at some si, takes the edge ( si, [ u, u(1), 1]) for some vertex uV, passes through all the gadgets corresponding to edges in E incident on

u, and then ends at some selector vertex sj. (This orientation is the reverse of the orientation in the paragraph above.) Let’s call this cover

path Pu, and by equation (34.4), the vertex cover V* includes u. Each gadget visited by Pu must be Γ uv or Γ vu for some vV. For each gadget visited by Pu, its vertices are visited by either one or two cover

paths. If they are visited by one cover path, then edge ( u, v) ∈ E is covered in G by vertex u. If two cover paths visit the gadget, then the other cover path must be Pv, which implies that vV*, and edge ( u, v)

E is covered by both u and v. Because each vertex in each gadget is visited by some cover path, we see that each edge in E is covered by some vertex in V*.

34.5.4 The traveling-salesperson problem

In the traveling-salesperson problem, which is closely related to the

hamiltonian-cycle problem, a salesperson must visit n cities. Let’s model

the problem as a complete graph with n vertices, so that the salesperson

wishes to make a tour, or hamiltonian cycle, visiting each city exactly

once and finishing at the starting city. The salesperson incurs a

nonnegative integer cost c( i, j) to travel from city i to city j. In the optimization version of the problem, the salesperson wishes to make the

tour whose total cost is minimum, where the total cost is the sum of the

Image 1613

individual costs along the edges of the tour. For example, in Figure

34.18, a minimum-cost tour is 〈 u, w, v, x, u〉, with cost 7. The formal

language for the corresponding decision problem is

Figure 34.18 An instance of the traveling-salesperson problem. Edges highlighted in blue represent a minimum-cost tour, with cost 7.

TSP = {〈 G, c, G = ( V, E) is a complete graph,

k〉:

c is a function from V × V → ℕ,

k ∈ ℕ, and

G has a traveling-salesperson tour with cost at most

k}.

The following theorem shows that a fast algorithm for the traveling-

salesperson problem is unlikely to exist.

Theorem 34.14

The traveling-salesperson problem is NP-complete.

Proof We first show that TSP ∈ NP. Given an instance of the problem,

the certificate is the sequence of n vertices in the tour. The verification

algorithm checks that this sequence contains each vertex exactly once,

sums up the edge costs, and checks that the sum is at most k. This process can certainly be done in polynomial time.

To prove that TSP is NP-hard, we show that HAM-CYCLE ≤P TSP.

Given an instance G = ( V, E) of HAM-CYCLE, construct an instance of TSP by forming the complete graph G′ = ( V, E′), where E′ = {( i, j) : i, jV and ij }, with the cost function c defined as

Image 1614

(Because G is undirected, it contains no self-loops, and so c( v, v) = 1 for all vertices vV.) The instance of TSP is then 〈 G′, c, 0〉, which can be created in polynomial time.

We now show that graph G has a hamiltonian cycle if and only if

graph G′ has a tour of cost at most 0. Suppose that graph G has a hamiltonian cycle H. Each edge in H belongs to E and thus has cost 0 in G′. Thus, H is a tour in G′ with cost 0. Conversely, suppose that graph G′ has a tour H′ of cost at most 0. Since the costs of the edges in E′ are 0

and 1, the cost of tour H′ is exactly 0 and each edge on the tour must

have cost 0. Therefore, H′ contains only edges in E. We conclude that H

is a hamiltonian cycle in graph G.

34.5.5 The subset-sum problem

We next consider an arithmetic NP-complete problem. The subset-sum

problem takes as inputs a finite set S of positive integers and an integer target t > 0. It asks whether there exists a subset S′ ⊆ S whose elements sum to exactly t. For example, if S = {1, 2, 7, 14, 49, 98, 343, 686, 2409, 2793, 16808, 17206, 117705, 117993} and t = 138457, then the subset S

= {1, 2, 7, 98, 343, 686, 2409, 17206, 117705} is a solution.

As usual, we express the problem as a language:

SUBSET-SUM = {〈 S, t〉 : there exists a subset S′ ⊆ S such that t =

Σ sSS}.

As with any arithmetic problem, it is important to recall that our

standard encoding assumes that the input integers are coded in binary.

With this assumption in mind, we can show that the subset-sum

problem is unlikely to have a fast algorithm.

Theorem 34.15

The subset-sum problem is NP-complete.

Proof To show that SUBSET-SUM ∈ NP, for an instance 〈 S, t〉 of the problem, let the subset S′ be the certificate. A verification algorithm can

check whether t = Σ sSS in polynomial time.

We now show that 3-CNF-SAT ≤P SUBSET-SUM. Given a 3-CNF

formula ϕ over variables x 1, x 2, … , xn with clauses C 1, C 2, … , Ck, each containing exactly three distinct literals, the reduction algorithm

constructs an instance 〈 S, t〉 of the subset-sum problem such that ϕ is

satisfiable if and only if there exists a subset of S whose sum is exactly t.

Without loss of generality, we make two simplifying assumptions about

the formula ϕ. First, no clause contains both a variable and its

negation, for such a clause is automatically satisfied by any assignment

of values to the variables. Second, each variable appears in at least one

clause, because it does not matter what value is assigned to a variable

that appears in no clauses.

The reduction creates two numbers in set S for each variable xi and

two numbers in S for each clause Cj. The numbers will be represented in base 10, with each number containing n + k digits and each digit corresponding to either one variable or one clause. Base 10 (and other

bases, as we shall see) has the property we need of preventing carries

from lower digits to higher digits.

Image 1615

Image 1616

Image 1617

Image 1618

Image 1619

Image 1620

Figure 34.19 The reduction of 3-CNF-SAT to SUBSET-SUM. The formula in 3-CNF is ϕ =

C 1∧ C 2∧ C 3∧ C 4, where C 1 = ( x 1∨¬ x 2∨¬ x 3), C 2 = (¬ x 1∨¬ x 2∨¬ x 3), C 3 = (¬ x 1∨¬ x 2∨ x 3), and C 4 = ( x 1 ∨ x 2 ∨ x 3). A satisfying assignment of ϕ is 〈 x 1 = 0, x 2 = 0, x 3 = 1〉. The set S

produced by the reduction consists of the base-10 numbers shown: reading from top to bottom, S = {1001001, 1000110, 100001, 101110, 10011, 11100, 1000, 2000, 100, 200, 10, 20, 1, 2}. The target t is 1114444. The subset S′ ⊆ S is shaded blue, and it contains

, and v 3,

corresponding to the satisfying assignment. Subset S′ also contains slack variables s 1,

, s 3,

s 4, and to achieve the target value of 4 in the digits labeled by C 1 through C 4.

As Figure 34.19 shows, we construct set S and target t as follows.

Label each digit position by either a variable or a clause. The least

significant k digits are labeled by the clauses, and the most significant n digits are labeled by variables.

The target t has a 1 in each digit labeled by a variable and a 4 in

each digit labeled by a clause.

For each variable xi, set S contains two integers vi and . Each of vi and has a 1 in the digit labeled by xi and 0s in the other variable digits. If literal xi appears in clause Cj, then the digit

Image 1621

Image 1622

Image 1623

Image 1624

Image 1625

Image 1626

Image 1627

Image 1628

Image 1629

Image 1630

Image 1631

Image 1632

Image 1633

Image 1634

labeled by Cj in vi contains a 1. If literal ¬ xi appears in clause Cj, then the digit labeled by Cj in contains a 1. All other digits

labeled by clauses in vi and are 0.

All vi and values in set S are unique. Why? For i, no vℓ or values can equal vi and in the most significant n digits.

Furthermore, by our simplifying assumptions above, no vi and

can be equal in all k least significant digits. If vi and were equal, then xi and ¬ xi would have to appear in exactly the same set of

clauses. But we assume that no clause contains both xi and ¬ xi

and that either xi or ¬ xi appears in some clause, and so there must be some clause Cj for which vi and differ.

For each clause Cj, set S contains two integers sj and . Each of sj and has 0s in all digits other than the one labeled by Cj. For sj,

there is a 1 in the Cj digit, and has a 2 in this digit. These integers are “slack variables,” which we use to get each clause-labeled digit position to add to the target value of 4.

Simple inspection of Figure 34.19 demonstrates that all sj and values in S are unique in set S.

The greatest sum of digits in any one digit position is 6, which occurs

in the digits labeled by clauses (three 1s from the vi and values, plus 1

and 2 from the sj and values). Interpreting these numbers in base 10,

therefore, no carries can occur from lower digits to higher digits. 12

The reduction can be performed in polynomial time. The set S

consists of 2 n + 2 k values, each of which has n + k digits, and the time to produce each digit is polynomial in n + k. The target t has n + k digits, and the reduction produces each in constant time.

Let’s now show that the 3-CNF formula ϕ is satisfiable if and only if

there exists a subset S′ ⊆ S whose sum is t. First, suppose that ϕ has a

Image 1635

Image 1636

Image 1637

Image 1638

Image 1639

Image 1640

Image 1641

Image 1642

Image 1643

Image 1644

Image 1645

Image 1646

Image 1647

Image 1648

Image 1649

satisfying assignment. For i = 1, 2, … , n, if xi = 1 in this assignment, then include vi in S′. Otherwise, include . In other words, S′ includes exactly the vi and values that correspond to literals with the value 1 in

the satisfying assignment. Having included either vi or , but not both,

for all i, and having put 0 in the digits labeled by variables in all sj and

, we see that for each variable-labeled digit, the sum of the values of S

must be 1, which matches those digits of the target t. Because each clause is satisfied, the clause contains some literal with the value 1.

Therefore, each digit labeled by a clause has at least one 1 contributed

to its sum by a vi or value in S′. In fact, one, two, or three literals may be 1 in each clause, and so each clause-labeled digit has a sum of 1, 2, or

3 from the vi and values in S′. In Figure 34.19 for example, literals

¬ x 1, ¬ x 2, and x 3 have the value 1 in a satisfying assignment. Each of clauses C 1 and C 4 contains exactly one of these literals, and so together

, and v 3 contribute 1 to the sum in the digits for C 1 and C 4. Clause C 2 contains two of these literals, and

, and v 3 contribute 2 to the

sum in the digit for C 2. Clause C 3 contains all three of these literals, and

, and v 3 contribute 3 to the sum in the digit for C 3. To achieve

the target of 4 in each digit labeled by clause Cj, include in S′ the appropriate nonempty subset of slack variables { sj, }. In Figure 34.19,

S′ includes s 1,

, s 3, s 4, and . Since S′ matches the target in all digits

of the sum, and no carries can occur, the values of S′ sum to t.

Now suppose that some subset S′ ⊆ S sums to t. The subset S′ must include exactly one of vi and for each i = 1, 2, … , n, for otherwise the digits labeled by variables would not sum to 1. If viS′, then set xi = 1.

Otherwise,

, and set xi = 0. We claim that every clause Cj, for j =

1, 2, … , k, is satisfied by this assignment. To prove this claim, note that

to achieve a sum of 4 in the digit labeled by Cj, the subset S′ must include at least one vi or value that has a 1 in the digit labeled by Cj,

Image 1650

Image 1651

Image 1652

Image 1653

since the contributions of the slack variables sj and together sum to at

most 3. If S′ includes a that has a 1 in Cj’s position, then the literal xi appears in clause Cj. Since xi = 1 when viS′, clause Cj is satisfied. If S′ includes a that has a 1 in that position, then the literal ¬ xi appears in Cj. Since xi = 0 when ∈ S′, clause Cj is again satisfied. Thus, all clauses of ϕ are satisfied, which completes the proof.

34.5.6 Reduction strategies

From the reductions in this section, you can see that no single strategy

applies to all NP-complete problems. Some reductions are

straightforward, such as reducing the hamiltonian-cycle problem to the

traveling-salesperson problem. Others are considerably more

complicated. Here are a few things to keep in mind and some strategies

that you can often bring to bear.

Pitfalls

Make sure that you don’t get the reduction backward. That is, in trying

to show that problem Y is NP-complete, you might take a known NP-

complete problem X and give a polynomial-time reduction from Y to X.

That is the wrong direction. The reduction should be from X to Y, so

that a solution to Y gives a solution to X.

Remember also that reducing a known NP-complete problem X to a

problem Y does not in itself prove that Y is NP-complete. It proves that Y is NP-hard. In order to show that Y is NP-complete, you additionally

need to prove that it’s in NP by showing how to verify a certificate for Y

in polynomial time.

Go from general to specific

When reducing problem X to problem Y, you always have to start with

an arbitrary input to problem X. But you are allowed to restrict the input to problem Y as much as you like. For example, when reducing 3-CNF satisfiability to the subset-sum problem, the reduction had to be

able to handle any 3-CNF formula as its input, but the input to the subset-sum problem that it produced had a particular structure: 2 n + 2 k

integers in the set, and each integer was formed in a particular way. The

reduction did not need to produce every possible input to the subset-sum problem. The point is that one way to solve the 3-CNF satisfiability

problem transforms the input into an input to the subset-sum problem

and then uses the answer to the subset-sum problem as the answer to

the 3-CNF satisfiability problem.

Take advantage of structure in the problem you are reducing from

When you are choosing a problem to reduce from, you might consider

two problems in the same domain, but one problem has more structure

than the other. For example, it’s almost always much easier to reduce

from 3-CNF satisfiability than to reduce from formula satisfiability.

Boolean formulas can be arbitrarily complicated, but you can exploit

the structure of 3-CNF formulas when reducing.

Likewise, it is usually more straightforward to reduce from the

hamiltonian-cycle problem than from the traveling-salesperson

problem, even though they are so similar. That’s because you can view

the hamiltonian-cycle problem as taking a complete graph but with

edge weights of just 0 or 1, as they would appear in the adjacency

matrix. In that sense, the hamiltonian-cycle problem has more structure

than the traveling-salesperson problem, in which edge weights are

unrestricted.

Look for special cases

Several NP-complete problems are just special cases of other NP-

complete problems. For example, consider the decision version of the 0-

1 knapsack problem: given a set of n items, each with a weight and a

value, does there exist a subset of items whose total weight is at most a

given weight W and whose total value is at least a given value V? You

can view the set-partition problem in Exercise 34.5-5 as a special case of

the 0-1 knapsack problem: let the value of each item equal its weight,

and set both W and V to half the total weight. If problem X is NP-hard and it is a special case of problem Y, then problem Y must be NP-hard

as well. That is because a polynomial-time solution for problem Y

automatically gives a polynomial-time solution for problem X. More

intuitively, problem Y, being more general than problem X, is at least as hard.

Select an appropriate problem to reduce from

It’s often a good strategy to reduce from a problem in a domain that is

the same as, or at least related to, the domain of the problem that you’re

trying to prove NP-complete. For example, we saw that the vertex-cover

problem—a graph problem—was NP-hard by reducing from the clique

problem—also a graph problem. From the vertex-cover problem, we

reduced to the hamiltonian-cycle problem, and from the hamiltonian-

cycle problem, we reduced to the traveling-salesperson problem. All of

these problems take undirected graphs as inputs.

Sometimes, however, you will find that is it better to cross over from

one domain to another, such as when we reduced from 3-CNF

satisfiability to the clique problem or to the subset-sum problem. 3-CNF

satisfiability often turns out to be a good choice as a problem to reduce

from when crossing domains.

Within graph problems, if you need to select a portion of the graph,

without regard to ordering, then the vertex-cover problem is often a

good place to start. If ordering matters, then consider starting from the

hamiltonian-cycle or hamiltonian-path problem (see Exercise 34.5-6).

Make big rewards and big penalties

The strategy for reducing the hamiltonian-cycle problem with a graph G

to the traveling-salesperson problem encouraged using edges present in

G when choosing edges for the traveling-salesperson tour. The reduction

did so by giving these edges a low weight: 0. In other words, we gave a

big reward for using these edges.

Alternatively, the reduction could have given the edges in G a finite

weight and given edges not in G infinite weight, thereby exacting a hefty

penalty for using edges not in G. With this approach, if each edge in G

has weight W, then the target weight of the traveling-salesperson tour

becomes W · | V|. You can sometimes think of the penalties as a way to

Image 1654

enforce requirements. For example, if the traveling-salesperson tour

includes an edge with infinite weight, then it violates the requirement

that the tour should include only edges belonging to G.

Design gadgets

The reduction from the vertex-cover problem to the hamiltonian-cycle

problem uses the gadget shown in Figure 34.16. This gadget is a subgraph that is connected to other parts of the constructed graph in

order to restrict the ways that a cycle can visit each vertex in the gadget

once. More generally, a gadget is a component that enforces certain

properties. Gadgets can be complicated, as in the reduction to the

hamiltonian-cycle problem. Or they can be simple: in the reduction of 3-

CNF satisfiability to the subset-sum problem, you can view the slack

variables sj and as gadgets enabling each clause-labeled digit position

to achieve the target value of 4.

Exercises

34.5-1

The subgraph-isomorphism problem takes two undirected graphs G 1 and