Image 1725

Image 1726

Image 1727

Image 1728

gives

Reindexing summations

A series can sometimes be simplified by changing its index, often

reversing the order of summation. Consider the series

. Because

the terms in this summation are an, an−1, … , a 0, we can reverse the order of indices by letting j = nk and rewrite this summation as Generally, if the summation index appears in the body of the sum with a

minus sign, it’s worth thinking about reindexing.

As an example, consider the summation

The index k appears with a negative sign in 1/( nk + 1). And indeed, we can simplify this summation, this time setting j = nk + 1, yielding which is just the harmonic series (A.8).

Products

The finite product a 1 a 2 … an can be expressed as

If n = 0, the value of the product is defined to be 1. You can convert a

formula with a product to a formula with a summation by using the

identity

Image 1729

Image 1730

Image 1731

Image 1732

Image 1733

Image 1734

Image 1735

Image 1736

Image 1737

Image 1738

Exercises

A.1-1

Prove that

by using the linearity property of

summations.

A.1-2

Find a simple formula for

.

A.1-3

Interpret the decimal number 111,111,111 in light of equation (A.6).

A.1-4

Evaluate the infinite series

.

A.1-5

Let c ≥ 0 be a constant. Show that

.

A.1-6

Show that

for | x| < 1.

A.1-7

Prove that

. ( Hint: Show the asymptotic upper

and lower bounds separately.)

A.1-8

Show that

by manipulating the harmonic

series.

A.1-9

Show that

.

A.1-10

Evaluate the sum

.

Image 1739

Image 1740

Image 1741

Image 1742

Image 1743

Image 1744

Image 1745

A.1-11

Evaluate the product

.

A.2 Bounding summations

You can choose from several techniques to bound the summations that

describe the running times of algorithms. Here are some of the most

frequently used methods.

Mathematical induction

The most basic way to evaluate a series is to use mathematical

induction. As an example, let’s prove that the arithmetic series

evaluates to n( n + 1)/2. For n = 1, we have that n( n + 1)/2 = 1 · 2/2 = 1, which equals

. With the inductive assumption that it holds for n,

we prove that it holds for n + 1. We have

You don’t always need to guess the exact value of a summation in

order to use mathematical induction. Instead, you can use induction to

prove an upper or lower bound on a summation. As an example, let’s

prove the asymptotic upper bound

. More specifically, we’ll

prove that

for some constant c. For the initial condition n =

0, we have

as long as c ≥ 1. Assuming that the bound holds

for n, we prove that it holds for n + 1. We have

Image 1746

Image 1747

Image 1748

Image 1749

Image 1750

Image 1751

Image 1752

Image 1753

as long as (1/3 + 1/ c) ≤ 1 or, equivalently, c ≥ 3/2. Thus,

, as

we wished to show.

You need to take care when using asymptotic notation to prove

bounds by induction. Consider the following fallacious proof that

. Certainly,

. Assuming that the bound holds for

n, we now prove it for n + 1:

The bug in the argument is that the “constant” hidden by the “big-oh”

grows with n and thus is not constant. We have not shown that the same

constant works for all n.

Bounding the terms

You can sometimes obtain a good upper bound on a series by bounding

each term of the series, and it often suffices to use the largest term to

bound the others. For example, a quick upper bound on the arithmetic

series (A.1) is

In general, for a series

, if we let a max = max { ak : 1 ≤ kn}, then

The technique of bounding each term in a series by the largest term

is a weak method when the series can in fact be bounded by a geometric

Image 1754

Image 1755

Image 1756

Image 1757

Image 1758

Image 1759

Image 1760

series. Given the series

, suppose that ak+1/ akr for all k ≥ 0,

where 0 < r < 1 is a constant. You can bound the sum by an infinite decreasing geometric series, since aka 0 rk, and thus

You can apply this method to bound the summation

. In

order to start the summation at k = 0, rewrite it as

. The

first term ( a 0) is 1/3, and the ratio ( r) of consecutive terms is

for all k ≥ 0. Thus, we have

A common bug in applying this method is to show that the ratio of

consecutive terms is less than 1 and then to assume that the summation

is bounded by a geometric series. An example is the infinite harmonic

series, which diverges since

The ratio of the ( k+1)st and k th terms in this series is k/( k+1) < 1, but the series is not bounded by a decreasing geometric series. To bound a

series by a geometric series, you need to show that there is an r < 1, which is a constant, such that the ratio of all pairs of consecutive terms

Image 1761

Image 1762

Image 1763

Image 1764

Image 1765

never exceeds r. In the harmonic series, no such r exists because the ratio becomes arbitrarily close to 1.

Splitting summations

One way to obtain bounds on a difficult summation is to express the

series as the sum of two or more series by partitioning the range of the

index and then to bound each of the resulting series. For example, let’s

find a lower bound on the arithmetic series

, which we have already

seen has an upper bound of n 2. You might attempt to bound each term

in the summation by the smallest term, but since that term is 1, you

would get a lower bound of n for the summation—far off from the

upper bound of n 2.

You can obtain a better lower bound by first splitting the

summation. Assume for convenience that n is even, so that

which is an asymptotically tight bound, since

.

For a summation arising from the analysis of an algorithm, you can

sometimes split the summation and ignore a constant number of the

initial terms. Generally, this technique applies when each term ak in a

summation

is independent of n. Then for any constant k 0 > 0,

you can write

since the initial terms of the summation are all constant and there are a

constant number of them. You can then use other methods to bound

Image 1766

Image 1767

Image 1768

Image 1769

Image 1770

. This technique applies to infinite summations as well. For

example, let’s find an asymptotic upper bound on

. The ratio of

consecutive terms is

if k ≥ 3. Thus, you can split the summation into

The technique of splitting summations can help determine

asymptotic bounds in much more difficult situations. For example, here

is one way to obtain a bound of O(lg n) on the harmonic series (A.9):

The idea is to split the range 1 to n into ⌊lg n⌊ + 1 pieces and upper-bound the contribution of each piece by 1. For i = 0, 1, … , ⌊lg n⌊, the i th piece consists of the terms starting at 1/2 i and going up to but not including 1/2 i+1. The last piece might contain terms not in the original

harmonic series, giving

Image 1771

Image 1772

Image 1773

Image 1774

Image 1775

Image 1776

Approximation by integrals

When a summation has the form

, where f ( k) is a

monotonically increasing function, you can approximate it by integrals:

Figure A.1 justifies this approximation. The summation is represented as the area of the rectangles in the figure, and the integral is the blue region under the curve. When f ( k) is a monotonically decreasing function, you can use a similar method to provide the bounds

The integral approximation (A.19) can be used to prove the tight

bounds in inequality (A.10) for the n th harmonic number. The lower

bound is

For the upper bound, the integral approximation gives

Image 1777

Image 1778

Exercises

A.2-1

Show that

is bounded above by a constant.

A.2-2

Find an asymptotic upper bound on the summation

Image 1779

Image 1780

Image 1781

Image 1782

Image 1783

Figure A.1 Approximation of

by integrals. The area of each rectangle is shown within

the rectangle, and the total rectangle area represents the value of the summation. The integral is represented by the blue area under the curve. Comparing areas in (a) gives the lower bound

. Shifting the rectangles one unit to the right gives the upper bound

in (b).

A.2-3

Show that the n th harmonic number is Ω(lg n) by splitting the summation.

A.2-4

Approximate

with an integral.

Image 1784

Image 1785

Image 1786

Image 1787

A.2-5

Why can’t you use the integral approximation (A.19) directly on

to obtain an upper bound on the n th harmonic number?

Problems

A-1 Bounding summations

Give asymptotically tight bounds on the following summations. Assume

that r ≥ 0 and s ≥ 0 are constants.

a.

b.

c.

Appendix notes

Knuth [259] provides an excellent reference for the material presented here. You can find basic properties of series in any good calculus book,

such as Apostol [19] or Thomas et al. [433].

B Sets, Etc.

Many chapters of this book touch on the elements of discrete

mathematics. This appendix reviews the notations, definitions, and

elementary properties of sets, relations, functions, graphs, and trees. If

you are already well versed in this material, you can probably just skim

this chapter.

B.1 Sets

A set is a collection of distinguishable objects, called its members or elements. If an object x is a member of a set S, we write xS (read “x is a member of S” or, more briefly, “x belongs to S”). If x is not a member of S, we write xS. To describe a set explicitly, write its members as a list inside braces. For example, to define a set S to contain

precisely the numbers 1, 2, and 3, write S = {1, 2, 3}. Since 2 belongs to

the set S, we can write 2 ∈ S, and since 4 is not a member, we can write 4 ∉ S. A set cannot contain the same object more than once, 1 and its elements are not ordered. Two sets A and B are equal, written A = B, if they contain the same elements. For example, {1, 2, 3, 1} = {1, 2, 3} =

{3, 2, 1}.

We adopt special notations for frequently encountered sets:

Ø denotes the empty set, that is, the set containing no members.

ℤ denotes the set of integers, that is, the set {… −2, −1, 0, 1, 2,

…}.

ℝ denotes the set of real numbers.

ℕ denotes the set of natural numbers, that is, the set {0, 1, 2,…}. 2

If all the elements of a set A are contained in a set B, that is, if xA implies xB, then we write AB and say that A is a subset of B. A set A is a proper subset of set B, written AB, if AB but AB.

(Some authors use the symbol “⊂” to denote the ordinary subset

relation, rather than the proper-subset relation.) Every set is a subset of

itself: AA for any set A. For two sets A and B, we have A = B if and only if AB and BA. The subset relation is transitive (see page 1159): for any three sets A, B, and C, if AB and BC, then AC.

The proper-subset relation is transitive as well. The empty set is a subset

of all sets: for any set A, we have Ø ⊆ A.

Sets can be specified in terms of other sets. Given a set A, a set B

A can be defined by stating a property that distinguishes the elements of

B. For example, one way to define the set of even integers is { x : x ∈ ℤ

and x/2 is an integer}. The colon in this notation is read “such that.”

(Some authors use a vertical bar in place of the colon.)

Given two sets A and B, set operations define new sets:

The intersection of sets A and B is the set

AB = { x : xA and xB}.

The union of sets A and B is the set

AB = { x : xA or xB}.

The difference between two sets A and B is the set

AB = { x : xA and xB}.

Set operations obey the following laws:

Empty set laws:

A ∩ Ø = Ø,

A ∪ Ø = A.

Image 1788

Image 1789

Image 1790

Idempotency laws:

AA = A,

AA = A.

Commutative laws:

AB = BA,

AB = BA.

Figure B.1 A Venn diagram illustrating the first of DeMorgan’s laws (B.2). Each of the sets A, B, and C is represented as a circle.

Associative laws:

A ∩ ( BC) = ( AB) ∩ C,

A ∪ ( BC) = ( AB) ∪ C.

Distributive laws:

Absorption laws:

A ∩ ( AB) = A,

A ∪ ( AB) = A.

DeMorgan’s laws:

Image 1791

Image 1792

Figure B.1 illustrates the first of DeMorgan’s laws, using a Venn diagram: a graphical picture in which sets are represented as regions of

the plane.

Often, all the sets under consideration are subsets of some larger set

U called the universe. For example, when considering various sets made

up only of integers, the set ℤ of integers is an appropriate universe.

Given a universe U, we define the complement of a set A as Ā = UA =

{ x : xU and xA}. For any set AU, we have the following laws:

= A,

AĀ = Ø,

AĀ = U.

An equivalent way to express DeMorgan’s laws (B.2) uses set

complements. For any two sets B, CU, we have

BC = BC,

BC = BC.

Two sets A and B are disjoint if they have no elements in common, that is, if AB = Ø. A collection of sets S 1, S 2, … , either finite or infinite, is a set of sets, in which each member is a set Si. A collection S

= { Si} of nonempty sets forms a partition of a set S if

the sets are pairwise disjoint, that is, Si, Sj ∈ S and ij imply Si

Sj = Ø,

their union is S, that is,

In other words, S forms a partition of S if each element of S appears in exactly one set Si ∈ S.

The number of elements in a set is the cardinality (or size) of the set, denoted | S|. Two sets have the same cardinality if their elements can be

put into a one-to-one correspondence. The cardinality of the empty set

Image 1793

Image 1794

is |Ø| = 0. If the cardinality of a set is a natural number, the set is finite, and otherwise, it is infinite. An infinite set that can be put into a one-to-one correspondence with the natural numbers ℕ is countably infinite,

and otherwise, it is uncountable. For example, the integers ℤ are

countable, but the reals ℝ are uncountable.

For any two finite sets A and B, we have the identity

from which we can conclude that

| AB| ≤ | A| + | B|.

If A and B are disjoint, then | AB| = 0 and thus | AB| = | A| + | B|. If AB, then | A| ≤ | B|.

A finite set of n elements is sometimes called an n-set. A 1-set is called a singleton. A subset of k elements of a set is sometimes called a k-subset.

We denote the set of all subsets of a set S, including the empty set

and S itself, by 2 S, called the power set of S. For example, 2{ a, b} = {Ø,

{ a}, { b}, { a, b}}. The power set of a finite set S has cardinality 2| S| (see Exercise B.1-5).

We sometimes care about setlike structures in which the elements are

ordered. An ordered pair of two elements a and b is denoted ( a, b) and is defined formally as the set ( a, b) = { a, { a, b}}. Thus, the ordered pair ( a, b) is not the same as the ordered pair ( b, a).

The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs such that the first element of the pair is an element

of A and the second is an element of B. More formally,

A × B = {( a, b) : aA and bB}.

For example, { a, b}×{ a, b, c} = {( a, a), ( a, b), ( a, c), ( b, a), ( b, b), ( b, c)}.

When A and B are finite sets, the cardinality of their Cartesian product is

The Cartesian product of n sets A 1, A 2, … , An is the set of n-tuples

Image 1795

A 1 × A 2 × … × An = {( a 1, a 2, … , an) : aiAi for i = 1, 2, … , n}, whose cardinality is

| A 1 × A 2 × … × An| = | A 1| · | A 2| · | An|

if all sets Ai are finite. We denote an n-fold Cartesian product over a single set A by the set

whose cardinality is | An| = | A| n if A is finite. We can also view an n-tuple as a finite sequence of length n (see page 1162).

Intervals are continuous sets of real numbers. We denote them with

parentheses and/or brackets. Given real numbers a and b, the closed interval [ a, b] is the set { x ∈ ℝ : axb} of reals between a and b, including both a and b. (If a > b, this definition implies that [ a, b] = Ø.) The open interval ( a, b) = { x ∈ ℝ : a < x < b} omits both of the endpoints from the set. There are two half-open intervals [ a, b) = { x ∈ ℝ

: ax < b} and ( a, b] = { x ∈ ℝ : a < xb}, each of which excludes one endpoint.

Intervals can also be defined on the integers by replacing ℝ in the

these definitions by ℤ. Whether the interval is defined over the reals or

integers can usually be inferred from context.

Exercises

B.1-1

Draw Venn diagrams that illustrate the first of the distributive laws

(B.1).

B.1-2

Prove the generalization of DeMorgan’s laws to any finite collection of

sets:

A 1 ∩ A 2 ∩ … ∩ An = A 1 ∪ A 2 ∪ … ∪ An,

=

A 1 ∪ A 2 ∪ … ∪ An A 1 ∩ A 2 ∩ … ∩ An.

B.1-3

Prove the generalization of equation (B.3), which is called the principle

of inclusion and exclusion:

| A 1 ∪ A 2 ∪ … ∪ An| =

| A 1| + | A 2| + … + | An|

− | A 1 ∩ A 2| − | A 1 ∩ A 3| − …

(all pairs)

+ | A 1 ∩ A 2 ∩ A 3| + …

(all triples)

+ (−1) n−1 | A 1 ∩ A 2 ∩ … ∩ An|.

B.1-4

Show that the set of odd natural numbers is countable.

B.1-5

Show that for any finite set S, the power set 2 S has 2| S| elements (that is, there are 2| S| distinct subsets of S).

B.1-6

Give an inductive definition for an n-tuple by extending the set-theoretic

definition for an ordered pair.

B.2 Relations

A binary relation R on two sets A and B is a subset of the Cartesian product A× B. If ( a, b) ∈ R, we sometimes write a R b. When we say that R is a binary relation on a set A, we mean that R is a subset of A ×

A. For example, the “less than” relation on the natural numbers is the

set {( a, b) : a, b ∈ ℕ and a < b}. An n-ary relation on sets A 1, A 2, … , An is a subset of A 1 × A 2 × … × An.

A binary relation RA × A is reflexive if

a R a

for all aA. For example, “=” and “≤” are reflexive relations on ℕ, but

“<” is not. The relation R is symmetric if

a R b implies b R a

for all a, bA. For example, “=” is symmetric, but “<” and “≤” are not. The relation R is transitive if

a R b and b R c imply a R c

for all a, b, cA. For example, the relations “<,” “≤,” and “=” are transitive, but the relation R = {( a, b) : a, b ∈ ℕ and a = b − 1} is not, since 3 R 4 and 4 R 5 do not imply 3 R 5.

A relation that is reflexive, symmetric, and transitive is an equivalence

relation. For example, “=” is an equivalence relation on the natural

numbers, but “<” is not. If R is an equivalence relation on a set A, then for aA, the equivalence class of a is the set [ a] = { bA : a R b}, that is, the set of all elements equivalent to a. For example, if we define R =

{( a, b) : a, b ∈ ℕ and a + b is an even number}, then R is an equivalence relation, since a + a is even (reflexive), a + b is even implies b + a is even (symmetric), and a + b is even and b + c is even imply a + c is even (transitive). The equivalence class of 4 is [4] = {0, 2, 4, 6,…}, and the

equivalence class of 3 is [3] = {1, 3, 5, 7,…}. A basic theorem of

equivalence classes is the following.

Theorem B.1 (An equivalence relation is the same as a partition)

The equivalence classes of any equivalence relation R on a set A form a partition of A, and any partition of A determines an equivalence relation on A for which the sets in the partition are the equivalence classes.

Proof For the first part of the proof, we must show that the equivalence

classes of R are nonempty, pairwise-disjoint sets whose union is A.

Because R is reflexive, a ∈ [ a], and so the equivalence classes are nonempty. Moreover, since every element aA belongs to the

equivalence class [ a], the union of the equivalence classes is A. It remains to show that the equivalence classes are pairwise disjoint, that

is, if two equivalence classes [ a] and [ b] have an element c in common, then they are in fact the same set. Suppose that a R c and b R c.

Symmetry gives that c R b and, by transitivity, a R b. Thus, we have x R

a for any arbitrary element x ∈ [ a] and, by transitivity, x R b, and thus

[ a] ⊆ [ b]. Similarly, [ b] ⊆ [ a], and thus [ a] = [ b].

For the second part of the proof, let A = { Ai} be a partition of A,

and define R = {( a, b) : there exists i such that aAi and bAi}. We claim that R is an equivalence relation on A. Reflexivity holds, since a

Ai implies a R a. Symmetry holds, because if a R b, then a and b belong to the same set Ai, and hence b R a. If a R b and b R c, then all three elements are in the same set Ai, and thus a R c and transitivity holds. To see that the sets in the partition are the equivalence classes of R, observe

that if aAi, then x ∈ [ a] implies xAi, and xAi implies x ∈ [ a].

A binary relation R on a set A is antisymmetric if

a R b and b R a imply a = b.

For example, the “≤” relation on the natural numbers is antisymmetric,

since ab and ba imply a = b. A relation that is reflexive, antisymmetric, and transitive is a partial order, and we call a set on which a partial order is defined a partially ordered set. For example, the

relation “is a descendant of” is a partial order on the set of all people (if

we view individuals as being their own descendants).

In a partially ordered set A, there may be no single “maximum”

element a such that b R a for all bA. Instead, the set may contain several maximal elements a such that for no bA, where ba, is it the case that a R b. For example, a collection of different-sized boxes may

contain several maximal boxes that don’t fit inside any other box, yet it

has no single “maximum” box into which any other box will fit. 3

A relation R on a set A is a total relation if for all a, bA, we have a R b or b R a (or both), that is, if every pairing of elements of A is related by R. A partial order that is also a total relation is a total order or linear order. For example, the relation “≤” is a total order on the natural

numbers, but the “is a descendant of” relation is not a total order on the set of all people, since there are individuals neither of whom is

descended from the other. A total relation that is transitive, but not

necessarily either symmetric or antisymmetric, is a total preorder.

Exercises

B.2-1

Prove that the subset relation “⊆” on all subsets of ℤ is a partial order

but not a total order.

B.2-2

Show that for any positive integer n, the relation “equivalent modulo n

is an equivalence relation on the integers. (We say that a = b (mod n) if there exists an integer q such that ab = qn.) Into what equivalence classes does this relation partition the integers?

B.2-3

Give examples of relations that are

a. reflexive and symmetric but not transitive,

b. reflexive and transitive but not symmetric,

c. symmetric and transitive but not reflexive.

B.2-4

Let S be a finite set, and let R be an equivalence relation on S × S.

Show that if in addition R is antisymmetric, then the equivalence classes

of S with respect to R are singletons.

B.2-5

Professor Narcissus claims that if a relation R is symmetric and

transitive, then it is also reflexive. He offers the following proof. By

symmetry, a R b implies b R a. Transitivity, therefore, implies a R a. Is the professor correct?

B.3 Functions

Given two sets A and B, a function f is a binary relation on A and B

such that for all aA, there exists precisely one bB such that ( a, b)

f. The set A is called the domain of f, and the set B is called the codomain of f. We sometimes write f : AB, and if ( a, b) ∈ f, we write b = f ( a), since the choice of a uniquely determines b.

Intuitively, the function f assigns an element of B to each element of

A. No element of A is assigned two different elements of B, but the same element of B can be assigned to two different elements of A. For

example, the binary relation

f = {( a, b) : a, b ∈ ℕ and b = a mod 2}

is a function f : → {0, 1}, since for each natural number a, there is exactly one value b in {0, 1} such that b = a mod 2. For this example, 0

= f (0), 1 = f (1), 0 = f (2), 1 = f (3), etc. In contrast, the binary relation g = {( a, b) : a, b ∈ ℕ and a + b is even}

is not a function, since (1, 3) and (1, 5) are both in g, and thus for the

choice a = 1, there is not precisely one b such that ( a, b) ∈ g.

Given a function f : AB, if b = f ( a), we say that a is the argument of f and that b is the value of f at a. We can define a function by stating its value for every element of its domain. For example, we might define f

( n) = 2 n for n ∈ ℕ, which means f = {( n, 2 n) : n ∈ ℕ}. Two functions f and g are equal if they have the same domain and codomain and if f ( a)

= g( a) for all a in the domain.

A finite sequence of length n is a function f whose domain is the set of n integers {0, 1, … , n − 1}. We often denote a finite sequence by listing its values in angle brackets: 〈 f (0), f (1), … , f ( n−1)〉. An infinite sequence is a function whose domain is the set ℕ of natural numbers.

For example, the Fibonacci sequence, defined by recurrence (3.31), is

the infinite sequence 〈0, 1, 1, 2, 3, 5, 8, 13, 21,…〉.

When the domain of a function f is a Cartesian product, we often

omit the extra parentheses surrounding the argument of f. For example,

if we have a function f : A 1 × A 2 × … AnB, we write b = f ( a 1, a 2, …

an) instead of writing b = f (( a 1, a 2, … an)). We also call each ai an

argument to the function f, though technically f has just a single argument, which is the n-tuple ( a 1, a 2, … an).

If f : AB is a function and b = f ( a), then we sometimes say that b is the image of a under f. The image of a set A′ ⊆ A under f is defined by f ( A′) = { bB : b = f ( a) for some aA′}.

The range of f is the image of its domain, that is, f ( A). For example, the range of the function f : ℕ → ℕ defined by f ( n) = 2 n is f(ℕ) = { m : m =

2 n for some n ∈ ℕ}, in other words, the set of nonnegative even integers.

A function is a surjection if its range is its codomain. For example,

the function f ( n) = ⌊ n/2⌊ is a surjective function from ℕ to ℕ, since every element in ℕ appears as the value of f for some argument. In contrast, the function f ( n) = 2 n is not a surjective function from ℕ to ℕ, since no argument to f can produce any odd natural number as a

value. The function f ( n) = 2 n is, however, a surjective function from the natural numbers to the even numbers. A surjection f : AB is sometimes described as mapping A onto B. When we say that f is onto,

we mean that it is surjective.

A function f : AB is an injection if distinct arguments to f produce distinct values, that is, if aa′ implies f ( a) ≠ f ( a′). For example, the function f ( n) = 2 n is an injective function from ℕ to ℕ, since each even number b is the image under f of at most one element of the domain,

namely b/2. The function f ( n) = ⌊ n/2⌊ is not injective, since the value 1 is produced by two arguments: f (2) = 1 and f (3) = 1. An injection is sometimes called a one-to-one function.

A function f : AB is a bijection if it is injective and surjective. For example, the function f ( n) = (−1) nn/2⌉ is a bijection from ℕ to ℤ: 0 → 0,

1 → −1,

2 → 1,

3 → −2,

4 → 2,

Image 1796

The function is injective, since no element of ℤ is the image of more

than one element of ℕ. It is surjective, since every element of ℤ appears

as the image of some element of ℕ. Hence, the function is bijective. A

bijection is sometimes called a one-to-one correspondence, since it pairs

elements in the domain and codomain. A bijection from a set A to itself

is sometimes called a permutation.

When a function f is bijective, we define its inverse f−1 as

f −1( b) = a if and only if f ( a) = b.

For example, the inverse of the function f ( n) = (−1) nn/2⌉ is Exercises

B.3-1

Let A and B be finite sets, and let f : AB be a function. Show the following:

a. If f is injective, then | A| ≤ | B|.

b. If f is surjective, then | A| ≥ | B|.

B.3-2

Is the function f ( x) = x + 1 bijective when the domain and the codomain are the set ℕ? Is it bijective when the domain and the

codomain are the set ℤ?

B.3-3

Give a natural definition for the inverse of a binary relation such that if

a relation is in fact a bijective function, its relational inverse is its

functional inverse.

B.3-4

Give a bijection from ℤ to ℤ × ℤ.

B.4 Graphs

This section presents two kinds of graphs: directed and undirected.

Certain definitions in the literature differ from those given here, but for

the most part, the differences are slight. Section 20.1 shows how to represent graphs in computer memory.

A directed graph (or digraph) G is a pair ( V, E), where V is a finite set and E is a binary relation on V. The set V is called the vertex set of G, and its elements are called vertices (singular: vertex). The set E is called the edge set of G, and its elements are called edges. Figure B.2(a) is a pictorial representation of a directed graph on the vertex set {1, 2, 3, 4,

5, 6}. Vertices are represented by circles in the figure, and edges are

represented by arrows. Self-loops—edges from a vertex to itself—are

possible.

In an undirected graph G = ( V, E), the edge set E consists of unordered pairs of vertices, rather than ordered pairs. That is, an edge is

a set { u, v}, where u, vV and uv. By convention, we use the notation ( u, v) for an edge, rather than the set notation { u, v}, and we consider ( u, v) and ( v, u) to be the same edge. In an undirected graph, self-loops are forbidden, so that every edge consists of two distinct

vertices. Figure B.2(b) shows an undirected graph on the vertex set {1, 2, 3, 4, 5, 6}.

Image 1797

Figure B.2 Directed and undirected graphs. (a) A directed graph G = ( V, E), where V = {1, 2, 3, 4, 5, 6} and E = {(1, 2), (2, 2), (2, 4), (2, 5), (4, 1), (4, 5), (5, 4), (6, 3)}. The edge (2, 2) is a self-loop. (b) An undirected graph G = ( V, E), where V = {1, 2, 3, 4, 5, 6} and E = {(1, 2), (1, 5), (2, 5), (3, 6)}. The vertex 4 is isolated. (c) The subgraph of the graph in part (a) induced by the vertex set {1, 2, 3, 6}.

Many definitions for directed and undirected graphs are the same,

although certain terms have slightly different meanings in the two

contexts. If ( u, v) is an edge in a directed graph G = ( V, E), we say that ( u, v) is incident from or leaves vertex u and is incident to or enters vertex v. For example, the edges leaving vertex 2 in Figure B.2(a) are (2, 2), (2, 4), and (2, 5). The edges entering vertex 2 are (1, 2) and (2, 2). If ( u, v) is an edge in an undirected graph G = ( V, E), we say that ( u, v) is incident on vertices u and v. In Figure B.2(b), the edges incident on vertex 2 are (1, 2) and (2, 5).

If ( u, v) is an edge in a graph G = ( V, E), we say that vertex v is adjacent to vertex u. When the graph is undirected, the adjacency relation is symmetric. When the graph is directed, the adjacency relation

is not necessarily symmetric. If v is adjacent to u in a directed graph, we can write uv. In parts (a) and (b) of Figure B.2, vertex 2 is adjacent to vertex 1, since the edge (1, 2) belongs to both graphs. Vertex 1 is not

adjacent to vertex 2 in Figure B.2(a), since the edge (2, 1) is absent.

The degree of a vertex in an undirected graph is the number of edges

incident on it. For example, vertex 2 in Figure B.2(b) has degree 2. A vertex whose degree is 0, such as vertex 4 in Figure B.2(b), is isolated. In a directed graph, the out-degree of a vertex is the number of edges leaving it, and the in-degree of a vertex is the number of edges entering

it. The degree of a vertex in a directed graph is its in-degree plus its out-

degree. Vertex 2 in Figure B.2(a) has in-degree 2, out-degree 3, and degree 5.

Image 1798

Image 1799

Image 1800

A path of length k from a vertex u to a vertex u′ in a graph G = ( V, E) is a sequence 〈 v 0, v 1, v 2, … , vk〉 of vertices such that u = v 0, u′ = vk, and ( vi−1, vi) ∈ E for i = 1, 2, … , k. The length of the path is the number of edges in the path, which is 1 less than the number of vertices in the path.

The path contains the vertices v 0, v 1, … , vk and the edges ( v 0, v 1), ( v 1, v 2), … , ( vk−1, vk). (There is always a 0-length path from u to u.) If there is a path p from u to u′, we say that u′ is reachable from u via p, which we can write as

. A path is simple 4 if all vertices in the path

are distinct. In Figure B.2(a), the path 〈1, 2, 5, 4〉 is a simple path of length 3. The path 〈2, 5, 4, 5〉 is not simple. A subpath of path p = 〈 v 0, v 1, … , vk〉 is a contiguous subsequence of its vertices. That is, for any 0

ijk, the subsequence of vertices 〈 vi, vi+1, … , vj〉 is a subpath of p.

In a directed graph, a path 〈 v 0, v 1, … , vk〉 forms a cycle if v 0 = vk and the path contains at least one edge. The cycle is simple if, in addition, v 1, v 2, … , vk are distinct. A cycle consisting of k vertices has length k. A self-loop is a cycle of length 1. Two paths 〈 v 0, v 1, v 2, … , vk−1, v 0〉 and

form the same cycle if there exists an

integer j such that

for i = 0, 1, … , k−1. In Figure B.2(a), the

path 〈1,2,4,1〉 forms the same cycle as the paths 〈2, 4, 1, 2〉 and 〈4, 1, 2,

4〉. This cycle is simple, but the cycle 〈1, 2, 4, 5, 4, 1〉 is not. The cycle 〈2,

2〉 formed by the edge (2, 2) is a self-loop. A directed graph with no self-

loops is simple. In an undirected graph, a path 〈 v 0, v 1, …, vk〉 forms a cycle if k > 0, v 0 = vk, and all edges on the path are distinct. The cycle is simple if v 1, v 2, … , vk are distinct. For example, in Figure B.2(b), the path 〈1, 2, 5, 1〉 is a simple cycle. A graph with no simple cycles is

acyclic.

An undirected graph is connected if every vertex is reachable from all

other vertices. The connected components of an undirected graph are the

equivalence classes of vertices under the “is reachable from” relation.

The graph shown in Figure B.2(b) has three connected components: {1, 2, 5}, {3, 6}, and {4}. Every vertex in the connected component {1, 2,

5} is reachable from every other vertex in {1, 2, 5}. An undirected graph is connected if it has exactly one connected component. The edges of a

connected component are those that are incident on only the vertices of

the component. In other words, edge ( u, v) is an edge of a connected component only if both u and v are vertices of the component.

A directed graph is strongly connected if every two vertices are

reachable from each other. The strongly connected components of a

directed graph are the equivalence classes of vertices under the “are

mutually reachable” relation. A directed graph is strongly connected if it

has only one strongly connected component. The graph in Figure B.2(a)

has three strongly connected components: {1, 2, 4, 5}, {3}, and {6}. All

pairs of vertices in {1, 2, 4, 5} are mutually reachable. The vertices {3,

6} do not form a strongly connected component, since vertex 6 cannot

be reached from vertex 3.

Two graphs G = ( V, E) and G′ = ( V′, E′) are isomorphic if there exists a bijection f : VV′ such that ( u, v) ∈ E if and only if ( f ( u), f ( v)) ∈

E′. In other words, G and G′ are isomorphic if the vertices of G can be relabeled to be vertices of G′, maintaining the corresponding edges in G

and G′. Figure B.3(a) shows a pair of isomorphic graphs G and G′ with respective vertex sets V = {1, 2, 3, 4, 5, 6} and V′ = { u, v, w, x, y, z}. The mapping from V to V′ given by f (1) = u, f (2) = v, f (3) = w, f (4) = x, f (5) = y, f (6) = z provides the required bijective function. The graphs in

Figure B.3(b) are not isomorphic. Although both graphs have 5 vertices and 7 edges, the top graph has a vertex of degree 4 and the bottom

graph does not.

We say that a graph G′ = ( V′, E′) is a subgraph of G = ( V, E) if V′ ⊆

V and E′ ⊆ E. Given a set V′ ⊆ V, the subgraph of G induced by V′ is the graph G′ = ( V′, E′), where

E′ = {( u, v) ∈ E : u, vV′}.

The subgraph induced by the vertex set {1, 2, 3, 6} in Figure B.2(a)

appears in Figure B.2(c) and has the edge set {(1, 2), (2, 2), (6, 3)}.

Given an undirected graph G = ( V, E), the directed version of G is the directed graph G′ = ( V, E′), where ( u, v) ∈ E′ if and only if ( u, v) ∈ E.

That is, each undirected edge ( u, v) in G turns into two directed edges, ( u, v) and ( v, u), in the directed version. Given a directed graph G = ( V,

Image 1801

E), the undirected version of G is the undirected graph G′ = ( V, E′), where ( u, v) ∈ E′ if and only if uv and E contains at least one of the edges ( u, v) and ( v, u). That is, the undirected version contains the edges of G “with their directions removed” and with self-loops eliminated.

(Since ( u, v) and ( v, u) are the same edge in an undirected graph, the undirected version of a directed graph contains it only once, even if the

directed graph contains both edges ( u, v) and ( v, u).) In a directed graph G = ( V, E), a neighbor of a vertex u is any vertex that is adjacent to u in the undirected version of G. That is, v is a neighbor of u if uv and either ( u, v) ∈ E or ( v, u) ∈ E. In an undirected graph, u and v are neighbors if they are adjacent.

Figure B.3 (a) A pair of isomorphic graphs. The vertices of the top graph are mapped to the vertices of the bottom graph by f (1) = u, f (2) = v, f (3) = w, f (4) = x, f (5) = y, f (6) = z. (b) Two graphs that are not isomorphic. The top graph has a vertex of degree 4, and the bottom graph does not.

Several kinds of graphs have special names. A complete graph is an

undirected graph in which every pair of vertices is adjacent. An

undirected graph G = ( V, E) is bipartite if V can be partitioned into two sets V 1 and V 2 such that ( u, v) ∈ E implies either uV 1 and vV 2 or uV 2 and vV 1. That is, all edges go between the two sets V 1 and V 2. An acyclic, undirected graph is a forest, and a connected, acyclic, undirected graph is a (free) tree (see Section B.5). We often take the first letters of “directed acyclic graph” and call such a graph a dag.

There are two variants of graphs that you may occasionally

encounter. A multigraph is like an undirected graph, but it can have

Image 1802

both multiple edges between vertices (such as two distinct edges ( u, v) and ( u, v)) and self-loops. A hypergraph is like an undirected graph, but each hyperedge, rather than connecting two vertices, connects an

arbitrary subset of vertices. Many algorithms written for ordinary

directed and undirected graphs can be adapted to run on these

graphlike structures.

The contraction of an undirected graph G = ( V, E) by an edge e = ( u, v) is a graph G′ = ( V′, E′), where V′ = V − { u, v} ∪ { x} and x is a new vertex. The set of edges E′ is formed from E by deleting the edge ( u, v) and, for each vertex w adjacent to u or v, deleting whichever of ( u, w) and ( v, w) belongs to E and adding the new edge ( x, w). In effect, u and v are “contracted” into a single vertex.

Exercises

B.4-1

Attendees of a faculty party shake hands to greet each other, with every

pair of professors shaking hands one time. Each professor remembers

the number of times he or she shook hands. At the end of the party, the

department head asks the professors for their totals and adds them all

up. Show that the result is even by proving the handshaking lemma: if G

= ( V, E) is an undirected graph, then

B.4-2

Show that if a directed or undirected graph contains a path between two

vertices u and v, then it contains a simple path between u and v. Show that if a directed graph contains a cycle, then it contains a simple cycle.

B.4-3

Show that any connected, undirected graph G = ( V, E) satisfies | E| ≥ | V |

− 1.

B.4-4

Verify that in an undirected graph, the “is reachable from” relation is an

equivalence relation on the vertices of the graph. Which of the three

properties of an equivalence relation hold in general for the “is reachable from” relation on the vertices of a directed graph?

B.4-5

What is the undirected version of the directed graph in Figure B.2(a)?

What is the directed version of the undirected graph in Figure B.2(b)?

B.4-6

Show how a bipartite graph can represent a hypergraph by letting

incidence in the hypergraph correspond to adjacency in the bipartite

graph. ( Hint: Let one set of vertices in the bipartite graph correspond to

vertices of the hypergraph, and let the other set of vertices of the

bipartite graph correspond to hyperedges.)

B.5 Trees

As with graphs, there are many related, but slightly different, notions of

trees. This section presents definitions and mathematical properties of

several kinds of trees. Sections 10.3 and 20.1 describe how to represent trees in computer memory.

B.5.1 Free trees

As defined in Section B.4, a free tree is a connected, acyclic, undirected graph. We often omit the adjective “free” when we say that a graph is a

tree. If an undirected graph is acyclic but possibly disconnected, it is a

forest. Many algorithms that work for trees also work for forests. Figure

B.4(a) shows a free tree, and Figure B.4(b) shows a forest. The forest in

Figure B.4(b) is not a tree because it is not connected. The graph in

Figure B.4(c) is connected but neither a tree nor a forest, because it contains a cycle.

The following theorem captures many important facts about free

trees.

Theorem B.2 (Properties of free trees)

Image 1803

Image 1804

Figure B.4 (a) A free tree. (b) A forest. (c) A graph that contains a cycle and is therefore neither a tree nor a forest.

Let G = ( V, E) be an undirected graph. The following statements are equivalent.

1. G is a free tree.

2. Any two vertices in G are connected by a unique simple path.

3. G is connected, but if any edge is removed from E, the resulting

graph is disconnected.

4. G is connected, and | E| = | V | − 1.

5. G is acyclic, and | E| = | V | − 1.

6. G is acyclic, but if any edge is added to E, the resulting graph contains a cycle.

Figure B.5 A step in the proof of Theorem B.2: if (1) G is a free tree, then (2) any two vertices in G are connected by a unique simple path. Assume for the sake of contradiction that vertices u and v are connected by two distinct simple paths. These paths first diverge at vertex w, and they first reconverge at vertex z. The path p′ concatenated with the reverse of the path p″ forms a cycle, which yields the contradiction.

Proof (1) ⇒ (2): Since a tree is connected, any two vertices in G are connected by at least one simple path. Suppose for the sake of

Image 1805

Image 1806

contradiction that vertices u and v are connected by two distinct simple paths as shown in Figure B.5. Let w be the vertex at which the paths first diverge. That is, if we call the paths p 1 and p 2, then w is the first vertex on both p 1 and p 2 whose successor on p 1 is x and whose successor on p 2 is y, where xy. Let z be the first vertex at which the paths reconverge, that is, z is the first vertex following w on p 1 that is also on p 2. Let p′ = wxz be the subpath of p 1 from w through x to z, so that

, and let p″ = wyz be the subpath of p 2 from w through y to z, so that

. Paths p′ and p″ share no

vertices except their endpoints. Then, as Figure B.5 shows, the path obtained by concatenating p′ and the reverse of p″ is a cycle, which contradicts our assumption that G is a tree. Thus, if G is a tree, there can be at most one simple path between two vertices.

(2) ⇒ (3): If any two vertices in G are connected by a unique simple

path, then G is connected. Let ( u, v) be any edge in E. This edge is a path from u to v, and so it must be the unique path from u to v. If ( u, v) were to be removed from G, there would be no path from u to v, and G

would be disconnected.

(3) ⇒ (4): By assumption, the graph G is connected, so Exercise B.4-3

gives that | E| ≥ | V| − 1. We prove | E| ≤ | V| − 1 by induction on | V|. The base cases are when | V| = 1 or | V| = 2, and in either case, | E| = | V| − 1.

For the inductive step, suppose that | V| ≥ 3 for graph G and that any graph G′ = ( V′, E′), where | V′| < | V|, that satisfies (3) also satisfies | E′| ≤

| V′| − 1. Removing an arbitrary edge from G separates the graph into k

≥ 2 connected components (actually k = 2). Each component satisfies

(3), or else G would not satisfy (3). Consider each connected component

Vi, with edge set Ei, as a separate free tree. Then, because each connected component has fewer than | V| vertices, the inductive

hypothesis implies that | Ei| ≤ | Vi| − 1. Thus, the number of edges in all k connected components combined is at most | V| − k ≤ | V| − 2. Adding in the removed edge yields | E| ≤ | V| − 1.

(4) ⇒ (5): Suppose that G is connected and that | E| = | V| − 1. We must show that G is acyclic. Suppose that G has a cycle containing k

vertices v 1, v 2, … , vk, and without loss of generality assume that this cycle is simple. Let Gk = ( Vk, Ek) be the subgraph of G consisting of the cycle, so that | Vk| = | Ek| = k. If k < | V|, then because G is connected, there must be a vertex vk+1∈ VVk that is adjacent to some vertex vi

Vk. Define Gk+1 = ( Vk+1, Ek+1) to be the subgraph of G with Vk+1 = Vk ∪ { vk+1} and Ek+1 = Ek ∪ {( vi, vk+1)}. Note that | Vk+1|

= | Ek+1| = k + 1. If k + 1 < | V|, then continue, defining Gk+2 in the same manner, and so forth, until we obtain Gn = ( Vn, En), where n =

| V|, Vn = V, and | En| = | Vn| = | V|. Since Gn is a subgraph of G, we have EnE, and hence | E| ≥ | En| = | Vn| = | V|, which contradicts the assumption that | E| = | V| − 1. Thus, G is acyclic.

(5) ⇒ (6): Suppose that G is acyclic and that | E| = | V| − 1. Let k be the number of connected components of G. Each connected component

is a free tree by definition, and since (1) implies (5), the sum of all edges

in all connected components of G is | V| − k. Consequently, k must equal 1, and G is in fact a tree. Since (1) implies (2), any two vertices in G are connected by a unique simple path. Thus, adding any edge to G creates

a cycle.

(6) ⇒ (1): Suppose that G is acyclic but that adding any edge to E

creates a cycle. We must show that G is connected. Let u and v be arbitrary vertices in G. If u and v are not already adjacent, adding the edge ( u, v) creates a cycle in which all edges but ( u, v) belong to G. Thus, the cycle minus edge ( u, v) must contain a path from u to v, and since u and v were chosen arbitrarily, G is connected.

B.5.2 Rooted and ordered trees

A rooted tree is a free tree in which one of the vertices is distinguished

from the others. We call the distinguished vertex the root of the tree. We

often refer to a vertex of a rooted tree as a nod e 5 of the tree. Figure

B.6(a) shows a rooted tree on a set of 12 nodes with root 7.

Image 1807

Figure B.6 Rooted and ordered trees. (a) A rooted tree with height 4. The tree is drawn in a standard way: the root (node 7) is at the top, its children (nodes with depth 1) are beneath it, their children (nodes with depth 2) are beneath them, and so forth. If the tree is ordered, the relative left-to-right order of the children of a node matters; otherwise, it doesn’t. (b) Another rooted tree. As a rooted tree, it is identical to the tree in (a), but as an ordered tree it is different, since the children of node 3 appear in a different order.

Consider a node x in a rooted tree T with root r. We call any node y on the unique simple path from r to x an ancestor of x. If y is an ancestor of x, then x is a descendant of y. (Every node is both an ancestor and a descendant of itself.) If y is an ancestor of x and xy, then y is a proper ancestor of x and x is a proper descendant of y. The subtree rooted at x is the tree induced by descendants of x, rooted at x.

For example, the subtree rooted at node 8 in Figure B.6(a) contains nodes 8, 6, 5, and 9.

If the last edge on the simple path from the root r of a tree T to a

node x is ( y, x), then y is the parent of x, and x is a child of y. The root is the only node in T with no parent. If two nodes have the same parent,

they are siblings. A node with no children is a leaf or external node. A nonleaf node is an internal node.

The number of children of a node x in a rooted tree T is the degree of x. 6 The length of the simple path from the root r to a node x is the depth of x in T. A level of a tree consists of all nodes at the same depth.

The height of a node in a tree is the number of edges on the longest simple downward path from the node to a leaf, and the height of a tree

is the height of its root. The height of a tree is also equal to the largest

depth of any node in the tree.

An ordered tree is a rooted tree in which the children of each node are ordered. That is, if a node has k children, then there is a first child, a

second child, and so on, up to and including a k th child. The two trees

in Figure B.6 are different when considered to be ordered trees, but the same when considered to be just rooted trees.

B.5.3 Binary and positional trees

We define binary trees recursively. A binary tree T is a structure defined on a finite set of nodes that either

contains no nodes, or

is composed of three disjoint sets of nodes: a root node, a binary

tree called its left subtree, and a binary tree called its right subtree.

The binary tree that contains no nodes is called the empty tree or null

tree, sometimes denoted NIL. If the left subtree is nonempty, its root is

called the left child of the root of the entire tree. Likewise, the root of a

nonnull right subtree is the right child of the root of the entire tree. If a

subtree is the null tree NIL, we say that the child is absent or missing.

Figure B.7(a) shows a binary tree.

A binary tree is not simply an ordered tree in which each node has

degree at most 2. For example, in a binary tree, if a node has just one

child, the position of the child—whether it is the left child or the right

child—matters. In an ordered tree, there is no distinguishing a sole child

as being either left or right. Figure B.7(b) shows a binary tree that differs from the tree in Figure B.7(a) because of the position of one node. Considered as ordered trees, however, the two trees are identical.

One way to represent the positioning information in a binary tree is

by the internal nodes of an ordered tree, as shown in Figure B.7(c). The idea is to replace each missing child in the binary tree with a node

having no children. These leaf nodes are drawn as squares in the figure.

The tree that results is a full binary tree: each node is either a leaf or has

degree exactly 2. No nodes have degree 1. Consequently, the order of

the children of a node preserves the position information.

Image 1808

Image 1809

Figure B.7 Binary trees. (a) A binary tree drawn in a standard way. The left child of a node is drawn beneath the node and to the left. The right child is drawn beneath and to the right. (b) A binary tree different from the one in (a). In (a), the left child of node 7 is 5 and the right child is absent. In (b), the left child of node 7 is absent and the right child is 5. As ordered trees, these trees are the same, but as binary trees, they are distinct. (c) The binary tree in (a) represented by the internal nodes of a full binary tree: an ordered tree in which each internal node has degree 2.

The leaves in the tree are shown as squares.

The positioning information that distinguishes binary trees from

ordered trees extends to trees with more than two children per node. In

a positional tree, the children of a node are labeled with distinct positive

integers. The i th child of a node is absent if no child is labeled with integer i. A k-ary tree is a positional tree in which for every node, all children with labels greater than k are missing. Thus, a binary tree is a

k-ary tree with k = 2.

A complete k-ary tree is a k-ary tree in which all leaves have the same depth and all internal nodes have degree k. Figure B.8 shows a complete binary tree of height 3. How many leaves does a complete k-ary tree of

height h have? The root has k children at depth 1, each of which has k children at depth 2, etc. Thus, the number of nodes at depth d is kd. In a complete k-ary tree with height h, the leaves are at depth h, so that there are kh leaves. Consequently, the height of a complete k-ary tree with n leaves is log kn. A complete k-ary tree of height h has

Image 1810

internal nodes. Thus, a complete binary tree has 2 h − 1 internal nodes.

Figure B.8 A complete binary tree of height 3 with 8 leaves and 7 internal nodes.

Exercises

B.5-1

Draw all the free trees composed of the three vertices x, y, and z. Draw all the rooted trees with nodes x, y, and z with x as the root. Draw all the ordered trees with nodes x, y, and z with x as the root. Draw all the binary trees with nodes x, y, and z with x as the root.

B.5-2

Let G = ( V, E) be a directed acyclic graph in which there is a vertex v 0

V such that there exists a unique path from v 0 to every vertex vV.

Prove that the undirected version of G forms a tree.

B.5-3

Show by induction that the number of degree-2 nodes in any nonempty

binary tree is one less than the number of leaves. Conclude that the

number of internal nodes in a full binary tree is one less than the

number of leaves.

B.5-4

Prove that for any integer k ≥ 1, there is a full binary tree with k leaves.

B.5-5

Use induction to show that a nonempty binary tree with n nodes has

height at least ⌊lg n⌊.

B.5-6

The internal path length of a full binary tree is the sum, taken over all

internal nodes of the tree, of the depth of each node. Likewise, the

external path length is the sum, taken over all leaves of the tree, of the

depth of each leaf. Consider a full binary tree with n internal nodes, internal path length i, and external path length e. Prove that e = i + 2 n.

B.5-7

Associate a “weight” w( x) = 2− d with each leaf x of depth d in a binary tree T, and let L be the set of leaves of T. Prove the Kraft inequality: Σ xL w( x) ≤ 1.

B.5-8

Show that if L ≥ 2, then every binary tree with L leaves contains a subtree having between L/3 and 2 L/3 leaves, inclusive.

Problems

B-1 Graph coloring

A k-coloring of undirected graph G = ( V, E) is a function c : V → {1, 2,

… , k} such that c( u) ≠ c( v) for every edge ( u, v) ∈ E. In other words, the numbers 1, 2, … , k represent the k colors, and adjacent vertices must

have different colors.

a. Show that any tree is 2-colorable.

b. Show that the following are equivalent:

1. G is bipartite.

2. G is 2-colorable.

3. G has no cycles of odd length.

c. Let d be the maximum degree of any vertex in a graph G. Prove that G

can be colored with d + 1 colors.

Image 1811

d. Show that if G has O(| V|) edges, then G can be colored with colors.

B-2 Friendly graphs

Reword each of the following statements as a theorem about undirected

graphs, and then prove it. Assume that friendship is symmetric but not

reflexive.

a. Any group of at least two people contains at least two people with the

same number of friends in the group.

b. Every group of six people contains either at least three mutual friends

or at least three mutual strangers.

c. Any group of people can be partitioned into two subgroups such that

at least half the friends of each person belong to the subgroup of

which that person is not a member.

d. If everyone in a group is the friend of at least half the people in the

group, then the group can be seated around a table in such a way that

everyone is seated between two friends.

B-3 Bisecting trees

Many divide-and-conquer algorithms that operate on graphs require

that the graph be bisected into two nearly equal-sized subgraphs, which

are induced by a partition of the vertices. This problem investigates

bisections of trees formed by removing a small number of edges. We

require that whenever two vertices end up in the same subtree after

removing edges, then they must belong to the same partition.

a. Show that the vertices of any n-vertex binary tree can be partitioned into two sets A and B, such that | A| ≤ 3 n/4 and | B| ≤ 3 n/4, by removing a single edge.

b. Show that the constant 3/4 in part (a) is optimal in the worst case by

giving an example of a simple binary tree whose most evenly balanced

partition upon removal of a single edge has | A| = 3 n/4.

c. Show that by removing at most O(lg n) edges, we can partition the vertices of any n-vertex binary tree into two sets A and B such that | A|

= ⌊ n/2⌊ and | B| = ⌈ n/2⌉.

Appendix notes

G. Boole pioneered the development of symbolic logic, and he

introduced many of the basic set notations in a book published in 1854.

Modern set theory was created by G. Cantor during the period 1874–

1895. Cantor focused primarily on sets of infinite cardinality. The term

“function” is attributed to G. W. Leibniz, who used it to refer to several

kinds of mathematical formulas. His limited definition has been

generalized many times. Graph theory originated in 1736, when L.

Euler proved that it was impossible to cross each of the seven bridges in

the city of Königsberg exactly once and return to the starting point.

The book by Harary [208] provides a useful compendium of many definitions and results from graph theory.

1 A variation of a set, which can contain the same object more than once, is called a multiset.

2 Some authors start the natural numbers with 1 instead of 0. The modern trend seems to be to start with 0.

3 To be precise, in order for the “fit inside” relation to be a partial order, we need to view a box as fitting inside itself.

4 Some authors refer to what we call a path as a “walk” and to what we call a simple path as just a “path.”

5 The term “node” is often used in the graph theory literature as a synonym for “vertex.” We reserve the term “node” to mean a vertex of a rooted tree.

6 The degree of a node depends on whether we consider T to be a rooted tree or a free tree. The degree of a vertex in a free tree is, as in any undirected graph, the number of adjacent vertices. In a rooted tree, however, the degree is the number of children—the parent of a node does not count toward its degree.

C Counting and Probability

This appendix reviews elementary combinatorics and probability theory.

If you have a good background in these areas, you may want to skim the

beginning of this appendix lightly and concentrate on the later sections.

Most of this book’s chapters do not require probability, but for some

chapters it is essential.

Section C.1 reviews elementary results in counting theory, including standard formulas for counting permutations and combinations. The

axioms of probability and basic facts concerning probability

distributions form Section C.2. Random variables are introduced in

Section C.3, along with the properties of expectation and variance.

Section C.4 investigates the geometric and binomial distributions that arise from studying Bernoulli trials. The study of the binomial

distribution continues in Section C.5, an advanced discussion of the

“tails” of the distribution.

C.1 Counting

Counting theory tries to answer the question “How many?” without

actually enumerating all the choices. For example, you might ask, “How

many different n-bit numbers are there?” or “How many orderings of n

distinct elements are there?” This section reviews the elements of

counting theory. Since some of the material assumes a basic

understanding of sets, you might wish to start by reviewing the material

in Section B.1.

Rules of sum and product

We can sometimes express a set of items that we wish to count as a

union of disjoint sets or as a Cartesian product of sets.

The rule of sum says that the number of ways to choose one element

from one of two disjoint sets is the sum of the cardinalities of the sets.

That is, if A and B are two finite sets with no members in common, then

| AB| = | A| + | B|, which follows from equation (B.3) on page 1156. For example, if each position on a car’s license plate is a letter or a digit, then the number of possibilities for each position is 26 + 10 = 36, since

there are 26 choices if it is a letter and 10 choices if it is a digit.

The rule of product says that the number of ways to choose an

ordered pair is the number of ways to choose the first element times the

number of ways to choose the second element. That is, if A and B are

two finite sets, then | A × B| = | A|·| B|, which is simply equation (B.4) on page 1157. For example, if an ice-cream parlor offers 28 flavors of ice

cream and four toppings, the number of possible sundaes with one

scoop of ice cream and one topping is 28 · 4 = 112.

Strings

A string over a finite set S is a sequence of elements of S. For example, there are eight binary strings of length 3:

000, 001, 010, 011, 100, 101, 110, 111.

(Here we use the shorthand of omitting the angle brackets when

denoting a sequence.) We sometimes call a string of length k a k-string.

A substring s′ of a string s is an ordered sequence of consecutive elements of s. A k-substring of a string is a substring of length k. For example, 010 is a 3-substring of 01101001 (the 3-substring that begins in

position 4), but 111 is not a substring of 01101001.

We can view a k-string over a set S as an element of the Cartesian

product Sk of k-tuples, which means that there are | S| k strings of length k. For example, the number of binary k-strings is 2 k. Intuitively, to construct a k-string over an n-set, there are n ways to pick the first element; for each of these choices, there are n ways to pick the second

Image 1812

Image 1813

element; and so forth k times. This construction leads to the k-fold product

as the number of k-strings.

Permutations

A permutation of a finite set S is an ordered sequence of all the elements of S, with each element appearing exactly once. For example, if S = { a, b, c}, then S has 6 permutations:

abc, acb, bac, bca, cab, cba.

(Again, we use the shorthand of omitting the angle brackets when

denoting a sequence.) There are n! permutations of a set of n elements, since there are n ways to choose the first element of the sequence, n − 1

ways for the second element, n − 2 ways for the third, and so on.

A k-permutation of S is an ordered sequence of k elements of S, with no element appearing more than once in the sequence. (Thus, an

ordinary permutation is an n-permutation of an n-set.) Here are the 2-

permutations of the set { a, b, c, d}:

ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc.

The number of k-permutations of an n-set is

since there are n ways to choose the first element, n − 1 ways to choose the second element, and so on, until k elements are chosen, with the last

element chosen from the remaining nk + 1 elements. For the above

example, with n = 4 and k = 2, the formula (C.1) evaluates to 4!/2! = 12, matching the number of 2-permutations listed.

Combinations

A k-combination of an n-set S is simply a k-subset of S. For example, the 4-set { a, b, c, d} has six 2-combinations:

ab, ac, ad, bc, bd, cd.

Image 1814

Image 1815

Image 1816

Image 1817

Image 1818

(Here we use the shorthand of omitting the braces around each subset.)

To construct a k-combination of an n-set, choose k distinct (different) elements from the n-set. The order of selecting the elements does not matter.

We can express the number of k-combinations of an n-set in terms of

the number of k-permutations of an n-set. Every k-combination has exactly k! permutations of its elements, each of which is a distinct k-

permutation of the n-set. Thus the number of k-combinations of an n-

set is the number of k-permutations divided by k!. From equation (C.1), this quantity is

For k = 0, this formula tells us that the number of ways to choose 0

elements from an n-set is 1 (not 0), since 0! = 1.

Binomial coefficients

The notation (read “n choose k”) denotes the number of k-

combinations of an n-set. Equation (C.2) gives

This formula is symmetric in k and nk:

These numbers are also known as binomial coefficients, due to their

appearance in the binomial theorem:

where n ∈ ℕ and x, y ∈ ℝ. The right-hand side of equation (C.4) is called the binomial expansion of the left-hand side. A special case of the

binomial theorem occurs when x = y = 1:

Image 1819

Image 1820

Image 1821

Image 1822

Image 1823

Image 1824

This formula corresponds to counting the 2 n binary n-strings by the number of 1s they contain: binary n-strings contain exactly k 1s, since there are ways to choose k out of the n positions in which to place the 1s. Many identities involve binomial coefficients. The exercises at the

end of this section give you the opportunity to prove a few.

Binomial bounds

You sometimes need to bound the size of a binomial coefficient. For 1 ≤

kn, we have the lower bound

Taking advantage of the inequality k! ≥ ( k/ e) k derived from Stirling’s approximation (3.25) on page 67, we obtain the upper bounds

For all integers k such that 0 ≤ kn, you can use induction (see Exercise C.1-12) to prove the bound

where for convenience we assume that 00 = 1. For k = λn, where 0 ≤ λ

1, we can rewrite this bound as

Image 1825

Image 1826

where

is the (binary) entropy function and where, for convenience, we assume

that 0 lg 0 = 0, so that H(0) = H(1) = 0.

Exercises

C.1-1

How many k-substrings does an n-string have? (Consider identical k-

substrings at different positions to be different.) How many substrings

does an n-string have in total?

C.1-2

An n-input, m-output boolean function is a function from {0, 1} n to {0, 1} m. How many n-input, 1-output boolean functions are there? How many n-input, m-output boolean functions are there?

C.1-3

In how many ways can n professors sit around a circular conference

table? Consider two seatings to be the same if one can be rotated to

form the other.

C.1-4

In how many ways is it possible to choose three distinct numbers from

the set {1, 2, … , 99} so that their sum is even?

C.1-5

Prove the identity

Image 1827

Image 1828

Image 1829

Image 1830

Image 1831

Image 1832

Image 1833

Image 1834

Image 1835

Image 1836

Image 1837

Image 1838

for 0 < kn.

C.1-6

Prove the identity

for 0 ≤ k < n.

C.1-7

To choose k objects from n, you can make one of the objects distinguished and consider whether the distinguished object is chosen.

Use this approach to prove that

C.1-8

Using the result of Exercise C.1-7, make a table for n = 0, 1, … , 6 and 0

kn of the binomial coefficients with at the top, and on the next line, then , , and , and so forth. Such a table of binomial

coefficients is called Pascal’s triangle.

C.1-9

Prove that

C.1-10

Show that for any integers n ≥ 0 and 0 ≤ kn, the expression achieves its maximum value when k = ⌊ n/2⌊ or k = ⌈ n/2⌉.

Image 1839

Image 1840

Image 1841

Image 1842

C.1-11

Argue that for any integers n ≥ 0, j ≥ 0, k ≥ 0, and j + kn, Provide both an algebraic proof and an argument based on a method

for choosing j + k items out of n. Give an example in which equality does not hold.

C.1-12

Use induction on all integers k such that 0 ≤ kn/2 to prove inequality (C.7), and use equation (C.3) to extend it to all integers k such that 0 ≤ k

n.

C.1-13

Use Stirling’s approximation to prove that

C.1-14

By differentiating the entropy function H( λ), show that it achieves its maximum value at λ = 1/2. What is H(1/2)?

C.1-15

Show that for any integer n ≥ 0,

C.1-16

Inequality (C.5) provides a lower bound on the binomial coefficient .

For small values of k, a stronger bound holds. Prove that

Image 1843

Image 1844

for

.

C.2 Probability

Probability is an essential tool for the design and analysis of

probabilistic and randomized algorithms. This section reviews basic

probability theory.

We define probability in terms of a sample space S, which is a set whose elements are called outcomes or elementary events. Think of each

outcome as a possible result of an experiment. For the experiment of

flipping two distinguishable coins, with each individual flip resulting in a

head (H) or a tail (T), you can view the sample space S as consisting of

the set of all possible 2-strings over {H, T}:

S = {HH, HT, TH, TT}.

An event is a subset1 of the sample space S. For example, in the experiment of flipping two coins, the event of obtaining one head and

one tail is {HT, TH}. The event S is called the certain event, and the event ∅ is called the null event. We say that two events A and B are mutually exclusive if AB = ∅ . An outcome s also defines the event

{ s}, which we sometimes write as just s. By definition, all outcomes are mutually exclusive.

Axioms of probability

A probability distribution Pr {} on a sample space S is a mapping from

events of S to real numbers satisfying the following probability axioms: 1. Pr { A} ≥ 0 for any event A.

2. Pr { S} = 1.

3. Pr { AB} = Pr { A} + Pr { B} for any two mutually exclusive events A and B. More generally, for any sequence of events A 1,

Image 1845

Image 1846

Image 1847

A 2, … (finite or countably infinite) that are pairwise mutually

exclusive,

We call Pr { A} the probability of the event A. Axiom 2 is simply a normalization requirement: there is really nothing fundamental about

choosing 1 as the probability of the certain event, except that it is

natural and convenient.

Several results follow immediately from these axioms and basic set

theory (see Section B.1). The null event ∅ has probability Pr {∅ } = 0. If AB, then Pr { A} ≤ Pr { B}. Using Ā to denote the event SA (the complement of A), we have Pr { Ā} = 1 − Pr { A}. For any two events A and B,

In our coin-flipping example, suppose that each of the four outcomes

has probability 1/4. Then the probability of getting at least one head is

Pr {HH, HT, TH} = Pr {HH} + Pr {HT} + Pr {TH}

= 3/4.

Another way to obtain the same result is to observe that since the

probability of getting strictly less than one head is Pr {TT} = 1/4, the

probability of getting at least one head is 1 − 1/4 = 3/4.

Discrete probability distributions

A probability distribution is discrete if it is defined over a finite or countably infinite sample space. Let S be the sample space. Then for any

event A,

since outcomes, specifically those in A, are mutually exclusive. If S is finite and every outcome sS has probability Pr { s} = 1/| S|, then we

Image 1848

Image 1849

Image 1850

Image 1851

have the uniform probability distribution on S. In such a case the experiment is often described as “picking an element of S at random.”

As an example, consider the process of flipping a fair coin, one for

which the probability of obtaining a head is the same as the probability

of obtaining a tail, that is, 1/2. Flipping the coin n times gives the uniform probability distribution defined on the sample space S = {H,

T} n, a set of size 2 n. We can represent each outcome in S as a string of length n over {H, T}, with each string occurring with probability 1/2 n.

The event A = {exactly k heads and exactly nk tails occur} is a subset of S of size

, since strings of length n over {H, T} contain

exactly k H’s. The probability of event A is thus

.

Continuous uniform probability distribution

The continuous uniform probability distribution is an example of a

probability distribution in which not all subsets of the sample space are

considered to be events. The continuous uniform probability

distribution is defined over a closed interval [ a, b] of the reals, where a < b. The intuition is that each point in the interval [ a, b] should be

“equally likely.” Because there are an uncountable number of points,

however, if all points had the same finite, positive probability, axioms 2

and 3 would not be simultaneously satisfied. For this reason, we’d like

to associate a probability only with some of the subsets of S in such a way that the axioms are satisfied for these events.

For any closed interval [ c, d], where acdb, the continuous uniform probability distribution defines the probability of the event [ c, d]

to be

Letting c = d gives that the probability of a single point is 0. Removing the endpoints [ c, c] and [ d, d] of an interval [ c, d] results in the open interval ( c, d). Since [ c, d] = [ c, c] ∪ ( c, d) ∪ [ d, d], axiom 3 gives Pr {[ c, d]} = Pr {( c, d)}. Generally, the set of events for the continuous uniform probability distribution contains any subset of the sample space [ a, b]

Image 1852

Image 1853

that can be obtained by a finite or countable union of open and closed

intervals, as well as certain more complicated sets.

Conditional probability and independence

Sometimes you have some prior partial knowledge about the outcome

of an experiment. For example, suppose that a friend has flipped two

fair coins and has told you that at least one of the coins showed a head.

What is the probability that both coins are heads? The information

given eliminates the possibility of two tails. The three remaining

outcomes are equally likely, and so you infer that each occurs with

probability 1/3. Since only one of these outcomes shows two heads, the

answer is 1/3.

Conditional probability formalizes the notion of having prior partial

knowledge of the outcome of an experiment. The conditional probability

of an event A given that another event B occurs is defined to be

whenever Pr { B} ≠ 0. (Read “Pr { A | B}” as “the probability of A given B.”) The idea behind equation (C.16) is that since we are given that event B occurs, the event that A also occurs is AB. That is, AB is the set of outcomes in which both A and B occur. Because the outcome

is one of the elementary events in B, we normalize the probabilities of

all the elementary events in B by dividing them by Pr { B}, so that they sum to 1. The conditional probability of A given B is, therefore, the ratio of the probability of event AB to the probability of event B. In the example above, A is the event that both coins are heads, and B is the event that at least one coin is a head. Thus, Pr { A | B} = (1/4)/(3/4) =

1/3.Two events are independent if

which is equivalent, if Pr { B} ≠ 0, to the condition

Pr { A | B} = Pr { A}.

Image 1854

Image 1855

For example, suppose that you flip two fair coins and that the outcomes

are independent. Then the probability of two heads is (1/2)(1/2) = 1/4.

Now suppose that one event is that the first coin comes up heads and

the other event is that the coins come up differently. Each of these

events occurs with probability 1/2, and the probability that both events

occur is 1/4. Thus, according to the definition of independence, the

events are independent—even though you might think that both events

depend on the first coin. Finally, suppose that the coins are welded

together so that they both fall heads or both fall tails and that the two

possibilities are equally likely. Then the probability that each coin comes

up heads is 1/2, but the probability that they both come up heads is 1/2

≠ (1/2)(1/2). Consequently, the event that one comes up heads and the

event that the other comes up heads are not independent.

A collection A 1, A 2, … , An of events is said to be pairwise independent if

Pr { AiAj } = Pr { Ai} Pr { Aj}

for all 1 ≤ i < jn. We say that the events of the collection are (mutually) independent if every k-subset

of the collection,

where 2 ≤ kn and 1 ≤ i 1 < i 2 < ⋯ < ikn, satisfies For example, suppose that you flip two fair coins. Let A 1 be the event

that the first coin is heads, let A 2 be the event that the second coin is

heads, and let A 3 be the event that the two coins are different. Then,

Pr { A 1} = 1/2,

Pr { A 2} = 1/2,

Pr { A 3} = 1/2,

Pr { A 1 ∩ A 2} = 1/4,

Pr { A 1 ∩ A 3} = 1/4,

Pr { A 2 ∩ A 3} = 1/4,

Image 1856

Image 1857

Image 1858

Pr { A 1 ∩ A 2 ∩ A 3} = 0.

Since for 1 ≤ i < j ≤ 3, we have Pr { AiAj } = Pr { Ai} Pr { Aj} = 1/4, the events A 1, A 2, and A 3 are pairwise independent. The events are not mutually independent, however, because Pr { A 1 ∩ A 2 ∩ A 3} = 0 and Pr

{ A 1} Pr { A 2} Pr { A 3} = 1/8 ≠ 0.

Bayes’s theorem

From the definition (C.16) of conditional probability and the

commutative law AB = BA, it follows that for two events A and B, each with nonzero probability,

Solving for Pr { A | B}, we obtain

which is known as Bayes’s theorem. The denominator Pr { B} is a normalizing constant, which we can reformulate as follows. Since B =

( BA) ∪ ( BĀ), and since BA and BĀ are mutually exclusive events,

Pr { B} = Pr { BA} + Pr { BĀ}

= Pr { A} Pr { B | A} + Pr { Ā} Pr { B | Ā}.

Substituting into equation (C.19) produces an equivalent form of

Bayes’s theorem:

Bayes’s theorem can simplify the computing of conditional

probabilities. For example, suppose that you have a fair coin and a

biased coin that always comes up heads. Run an experiment consisting

of three independent events: choose one of the two coins at random, flip

that coin once, and then flip it again. Suppose that the coin you have

Image 1859

Image 1860

chosen comes up heads both times. What is the probability that it’s the

biased coin?

Bayes’s theorem solves this problem. Let A be the event that you

choose the biased coin, and let B be the event that the chosen coin comes up heads both times. We wish to determine Pr { A | B}, knowing

that Pr { A} = 1/2, Pr { B | A} = 1, Pr { Ā} = 1/2, and Pr { B | Ā = 1/4.

Thus we have

Exercises

C.2-1

Professor Rosencrantz flips a fair coin twice. Professor Guildenstern

flips a fair coin once. What is the probability that Professor Rosencrantz

obtains strictly more heads than Professor Guildenstern?

C.2-2

Prove Boole’s inequality: For any finite or countably infinite sequence of

events A 1, A 2, …,

C.2-3

You shuffle a deck of 10 cards, each bearing a distinct number from 1 to

10, in order to mix the cards thoroughly. You then remove three cards,

one at a time, from the deck. What is the probability that the three cards

you select are in sorted (increasing) order?

C.2-4

Prove that

Pr { A | B} + Pr { Ā | B} = 1.

C.2-5

Image 1861

Prove that for any collection of events A 1, A 2, … , An,

C.2-6

Show how to construct a set of n events that are pairwise independent

but such that no subset of k > 2 of them is mutually independent.

C.2-7

Two events A and B are conditionally independent, given C, if Pr { AB | C} = Pr { A | C} · Pr { B | C}.

Give a simple but nontrivial example of two events that are not

independent but are conditionally independent given a third event.

C.2-8

Professor Gore teaches a music class on rhythm in which three students

—Jeff, Tim, and Carmine—are in danger of failing. Professor Gore tells

the three that one of them will pass the course and the other two will

fail. Carmine asks Professor Gore privately which of Jeff and Tim will

fail, arguing that since he already knows at least one of them will fail,

the professor won’t be revealing any information about Carmine’s

outcome. In a breach of privacy law, Professor Gore tells Carmine that

Jeff will fail. Carmine feels somewhat relieved now, figuring that either

he or Tim will pass, so that his probability of passing is now 1/2. Is

Carmine correct, or is his chance of passing still 1/3? Explain.

C.3 Discrete random variables

A (discrete) random variable X is a function from a finite or countably infinite sample space S to the real numbers. It associates a real number

with each possible outcome of an experiment, which allows us to work

with the probability distribution induced on the resulting set of

numbers. Random variables can also be defined for uncountably infinite

Image 1862

Image 1863

Image 1864

Image 1865

sample spaces, but they raise technical issues that are unnecessary to

address for our purposes. Therefore we’ll assume that random variables

are discrete.

For a random variable X and a real number x, we define the event X

= x to be { sS : X( s) = x}, and thus

The function

f( x) = Pr { X = x}

is the probability density function of the random variable X. From the probability axioms, Pr { X = x} ≥ 0 and ∑ x Pr { X = x} = 1.

As an example, consider the experiment of rolling a pair of ordinary,

6-sided dice. There are 36 possible outcomes in the sample space.

Assume that the probability distribution is uniform, so that each

outcome sS is equally likely: Pr { s} = 1/36. Define the random variable X to be the maximum of the two values showing on the dice. We

have Pr { X = 3} = 5/36, since X assigns a value of 3 to 5 of the 36

possible outcomes, namely, (1, 3), (2, 3), (3, 3), (3, 2), and (3, 1).

We can define several random variables on the same sample space. If

X and Y are random variables, the function

f( x, y) = Pr { X = x and Y = y}

is the joint probability density function of X and Y. For a fixed value y, and similarly, for a fixed value x,

Using the definition (C.16) of conditional probability on page 1187, we

have

Image 1866

Image 1867

We define two random variables X and Y to be independent if for all x and y, the events X = x and Y = y are independent or, equivalently, if for all x and y, we have Pr { X = x and Y = y} = Pr { X = x} Pr { Y = y}.

Given a set of random variables defined over the same sample space,

we can define new random variables as sums, products, or other

functions of the original variables.

Expected value of a random variable

The simplest, and often the most useful, summary of the distribution of

a random variable is the “average” of the values it takes on. The

expected value (or, synonymously, expectation or mean) of a discrete random variable X is

which is well defined if the sum is finite or converges absolutely.

Sometimes the expectation of X is denoted by μX or, when the random

variable is apparent from context, simply by μ.

Consider a game in which you flip two fair coins. You earn $3 for

each head but lose $2 for each tail. The expected value of the random

variable X representing your earnings is

E[ X] = 6 · Pr {2 H’s} + 1 · Pr {1 H, 1 T} − 4 · Pr {2 T’s}

= 6 · (1/4) + 1 · (1/2) − 4 · (1/4)

= 1.

Linearity of expectation says that the expectation of the sum of two

random variables is the sum of their expectations, that is,

whenever E[ X] and E[ Y] are defined. Linearity of expectation applies to a broad range of situations, holding even when X and Y are not independent. It also extends to finite and absolutely convergent

summations of expectations. Linearity of expectation is the key property

that enables us to perform probabilistic analyses by using indicator

random variables (see Section 5.2).

Image 1868

Image 1869

Image 1870

Image 1871

Image 1872

Image 1873

If X is any random variable, any function g( x) defines a new random variable g( X). If the expectation of g( X) is defined, then Letting g( x) = ax, we have for any constant a,

Consequently, expectations are linear: for any two random variables X

and Y and any constant a,

When two random variables X and Y are independent and each has

a defined expectation,

In general, when n random variables X 1, X 2, … , Xn are mutually independent,

When a random variable X takes on values from the set of natural

numbers ℕ = {0, 1, 2, …}, we have a nice formula for its expectation:

since each term Pr { Xi} is added in i times and subtracted out i − 1

times (except Pr { X ≥ 0}, which is added in 0 times and not subtracted

Image 1874

Image 1875

Image 1876

Image 1877

out at all).

A function f( x) is convex if

for all x and y and for all 0 ≤ λ ≤ 1. Jensen’s inequality says that when a convex function f( x) is applied to a random variable X,

provided that the expectations exist and are finite.

Variance and standard deviation

The expected value of a random variable does not express how “spread

out” the variable’s values are. For example, consider random variables X

and Y for which Pr { X = 1/4} = Pr { X = 3/4} = 1/2 and Pr { Y = 0} = Pr

{ Y = 1} = 1/2. Then both E[ X] and E[ Y] are 1/2, yet the actual values taken on by Y are further from the mean than the actual values taken

on by X.

The notion of variance mathematically expresses how far from the

mean a random variable’s values are likely to be. The variance of a random variable X with mean E[ X] is

To justify the equation E[E2[ X]] = E2[ X], note that because E[ X] is a real number and not a random variable, so is E2[ X]. The equation

E[ X E[ X]] = E2[ X] follows from equation (C.25), with a = E[ X].

Rewriting equation (C.31) yields an expression for the expectation of

the square of a random variable:

The variance of a random variable X and the variance of aX are related (see Exercise C.3-10):

Image 1878

Var[ aX] = a 2Var[ X].

When X and Y are independent random variables,

Var[ X + Y] = Var[ X] + Var[ Y].