



gives
Reindexing summations
A series can sometimes be simplified by changing its index, often
reversing the order of summation. Consider the series
. Because
the terms in this summation are an, an−1, … , a 0, we can reverse the order of indices by letting j = n − k and rewrite this summation as Generally, if the summation index appears in the body of the sum with a
minus sign, it’s worth thinking about reindexing.
As an example, consider the summation
The index k appears with a negative sign in 1/( n − k + 1). And indeed, we can simplify this summation, this time setting j = n − k + 1, yielding which is just the harmonic series (A.8).
Products
The finite product a 1 a 2 … an can be expressed as
If n = 0, the value of the product is defined to be 1. You can convert a
formula with a product to a formula with a summation by using the
identity









Exercises
A.1-1
Prove that
by using the linearity property of
summations.
A.1-2
Find a simple formula for
.
A.1-3
Interpret the decimal number 111,111,111 in light of equation (A.6).
A.1-4
Evaluate the infinite series
.
A.1-5
Let c ≥ 0 be a constant. Show that
.
A.1-6
Show that
for | x| < 1.
A.1-7
Prove that
. ( Hint: Show the asymptotic upper
and lower bounds separately.)
★ A.1-8
Show that
by manipulating the harmonic
series.
★ A.1-9
Show that
.
★ A.1-10
Evaluate the sum
.






★ A.1-11
Evaluate the product
.
You can choose from several techniques to bound the summations that
describe the running times of algorithms. Here are some of the most
frequently used methods.
Mathematical induction
The most basic way to evaluate a series is to use mathematical
induction. As an example, let’s prove that the arithmetic series
evaluates to n( n + 1)/2. For n = 1, we have that n( n + 1)/2 = 1 · 2/2 = 1, which equals
. With the inductive assumption that it holds for n,
we prove that it holds for n + 1. We have
You don’t always need to guess the exact value of a summation in
order to use mathematical induction. Instead, you can use induction to
prove an upper or lower bound on a summation. As an example, let’s
prove the asymptotic upper bound
. More specifically, we’ll
prove that
for some constant c. For the initial condition n =
0, we have
as long as c ≥ 1. Assuming that the bound holds
for n, we prove that it holds for n + 1. We have







as long as (1/3 + 1/ c) ≤ 1 or, equivalently, c ≥ 3/2. Thus,
, as
we wished to show.
You need to take care when using asymptotic notation to prove
bounds by induction. Consider the following fallacious proof that
. Certainly,
. Assuming that the bound holds for
n, we now prove it for n + 1:
The bug in the argument is that the “constant” hidden by the “big-oh”
grows with n and thus is not constant. We have not shown that the same
constant works for all n.
Bounding the terms
You can sometimes obtain a good upper bound on a series by bounding
each term of the series, and it often suffices to use the largest term to
bound the others. For example, a quick upper bound on the arithmetic
series (A.1) is
In general, for a series
, if we let a max = max { ak : 1 ≤ k ≤ n}, then
The technique of bounding each term in a series by the largest term
is a weak method when the series can in fact be bounded by a geometric






series. Given the series
, suppose that ak+1/ ak ≤ r for all k ≥ 0,
where 0 < r < 1 is a constant. You can bound the sum by an infinite decreasing geometric series, since ak ≤ a 0 rk, and thus
You can apply this method to bound the summation
. In
order to start the summation at k = 0, rewrite it as
. The
first term ( a 0) is 1/3, and the ratio ( r) of consecutive terms is
for all k ≥ 0. Thus, we have
A common bug in applying this method is to show that the ratio of
consecutive terms is less than 1 and then to assume that the summation
is bounded by a geometric series. An example is the infinite harmonic
series, which diverges since
The ratio of the ( k+1)st and k th terms in this series is k/( k+1) < 1, but the series is not bounded by a decreasing geometric series. To bound a
series by a geometric series, you need to show that there is an r < 1, which is a constant, such that the ratio of all pairs of consecutive terms




never exceeds r. In the harmonic series, no such r exists because the ratio becomes arbitrarily close to 1.
Splitting summations
One way to obtain bounds on a difficult summation is to express the
series as the sum of two or more series by partitioning the range of the
index and then to bound each of the resulting series. For example, let’s
find a lower bound on the arithmetic series
, which we have already
seen has an upper bound of n 2. You might attempt to bound each term
in the summation by the smallest term, but since that term is 1, you
would get a lower bound of n for the summation—far off from the
upper bound of n 2.
You can obtain a better lower bound by first splitting the
summation. Assume for convenience that n is even, so that
which is an asymptotically tight bound, since
.
For a summation arising from the analysis of an algorithm, you can
sometimes split the summation and ignore a constant number of the
initial terms. Generally, this technique applies when each term ak in a
summation
is independent of n. Then for any constant k 0 > 0,
you can write
since the initial terms of the summation are all constant and there are a
constant number of them. You can then use other methods to bound




. This technique applies to infinite summations as well. For
example, let’s find an asymptotic upper bound on
. The ratio of
consecutive terms is
if k ≥ 3. Thus, you can split the summation into
The technique of splitting summations can help determine
asymptotic bounds in much more difficult situations. For example, here
is one way to obtain a bound of O(lg n) on the harmonic series (A.9):
The idea is to split the range 1 to n into ⌊lg n⌊ + 1 pieces and upper-bound the contribution of each piece by 1. For i = 0, 1, … , ⌊lg n⌊, the i th piece consists of the terms starting at 1/2 i and going up to but not including 1/2 i+1. The last piece might contain terms not in the original
harmonic series, giving





Approximation by integrals
When a summation has the form
, where f ( k) is a
monotonically increasing function, you can approximate it by integrals:
Figure A.1 justifies this approximation. The summation is represented as the area of the rectangles in the figure, and the integral is the blue region under the curve. When f ( k) is a monotonically decreasing function, you can use a similar method to provide the bounds
The integral approximation (A.19) can be used to prove the tight
bounds in inequality (A.10) for the n th harmonic number. The lower
bound is
For the upper bound, the integral approximation gives

Exercises
A.2-1
Show that
is bounded above by a constant.
A.2-2
Find an asymptotic upper bound on the summation




Figure A.1 Approximation of
by integrals. The area of each rectangle is shown within
the rectangle, and the total rectangle area represents the value of the summation. The integral is represented by the blue area under the curve. Comparing areas in (a) gives the lower bound
. Shifting the rectangles one unit to the right gives the upper bound
in (b).
A.2-3
Show that the n th harmonic number is Ω(lg n) by splitting the summation.
A.2-4
Approximate
with an integral.



A.2-5
Why can’t you use the integral approximation (A.19) directly on
to obtain an upper bound on the n th harmonic number?
Problems
A-1 Bounding summations
Give asymptotically tight bounds on the following summations. Assume
that r ≥ 0 and s ≥ 0 are constants.
a.
b.
c.
Appendix notes
Knuth [259] provides an excellent reference for the material presented here. You can find basic properties of series in any good calculus book,
such as Apostol [19] or Thomas et al. [433].
Many chapters of this book touch on the elements of discrete
mathematics. This appendix reviews the notations, definitions, and
elementary properties of sets, relations, functions, graphs, and trees. If
you are already well versed in this material, you can probably just skim
this chapter.
A set is a collection of distinguishable objects, called its members or elements. If an object x is a member of a set S, we write x ∈ S (read “x is a member of S” or, more briefly, “x belongs to S”). If x is not a member of S, we write x ∉ S. To describe a set explicitly, write its members as a list inside braces. For example, to define a set S to contain
precisely the numbers 1, 2, and 3, write S = {1, 2, 3}. Since 2 belongs to
the set S, we can write 2 ∈ S, and since 4 is not a member, we can write 4 ∉ S. A set cannot contain the same object more than once, 1 and its elements are not ordered. Two sets A and B are equal, written A = B, if they contain the same elements. For example, {1, 2, 3, 1} = {1, 2, 3} =
{3, 2, 1}.
We adopt special notations for frequently encountered sets:
Ø denotes the empty set, that is, the set containing no members.
ℤ denotes the set of integers, that is, the set {… −2, −1, 0, 1, 2,
…}.
ℝ denotes the set of real numbers.
ℕ denotes the set of natural numbers, that is, the set {0, 1, 2,…}. 2
If all the elements of a set A are contained in a set B, that is, if x ∈ A implies x ∈ B, then we write A ⊆ B and say that A is a subset of B. A set A is a proper subset of set B, written A ⊂ B, if A ⊆ B but A ≠ B.
(Some authors use the symbol “⊂” to denote the ordinary subset
relation, rather than the proper-subset relation.) Every set is a subset of
itself: A ⊆ A for any set A. For two sets A and B, we have A = B if and only if A ⊆ B and B ⊆ A. The subset relation is transitive (see page 1159): for any three sets A, B, and C, if A ⊆ B and B ⊆ C, then A ⊆ C.
The proper-subset relation is transitive as well. The empty set is a subset
of all sets: for any set A, we have Ø ⊆ A.
Sets can be specified in terms of other sets. Given a set A, a set B ⊆
A can be defined by stating a property that distinguishes the elements of
B. For example, one way to define the set of even integers is { x : x ∈ ℤ
and x/2 is an integer}. The colon in this notation is read “such that.”
(Some authors use a vertical bar in place of the colon.)
Given two sets A and B, set operations define new sets:
The intersection of sets A and B is the set
A ∩ B = { x : x ∈ A and x ∈ B}.
The union of sets A and B is the set
A ∪ B = { x : x ∈ A or x ∈ B}.
The difference between two sets A and B is the set
A − B = { x : x ∈ A and x ∉ B}.
Set operations obey the following laws:
Empty set laws:
A ∩ Ø = Ø,
A ∪ Ø = A.


Idempotency laws:
A ∩ A = A,
A ∪ A = A.
Commutative laws:
A ∩ B = B ∩ A,
A ∪ B = B ∪ A.
Figure B.1 A Venn diagram illustrating the first of DeMorgan’s laws (B.2). Each of the sets A, B, and C is represented as a circle.
Associative laws:
A ∩ ( B ∩ C) = ( A ∩ B) ∩ C,
A ∪ ( B ∪ C) = ( A ∪ B) ∪ C.
Distributive laws:
Absorption laws:
A ∩ ( A ∪ B) = A,
A ∪ ( A ∩ B) = A.
DeMorgan’s laws:

Figure B.1 illustrates the first of DeMorgan’s laws, using a Venn diagram: a graphical picture in which sets are represented as regions of
the plane.
Often, all the sets under consideration are subsets of some larger set
U called the universe. For example, when considering various sets made
up only of integers, the set ℤ of integers is an appropriate universe.
Given a universe U, we define the complement of a set A as Ā = U − A =
{ x : x ∈ U and x ∉ A}. For any set A ⊆ U, we have the following laws:
= A,
A ∩ Ā = Ø,
A ∪ Ā = U.
An equivalent way to express DeMorgan’s laws (B.2) uses set
complements. For any two sets B, C ⊆ U, we have
B ∩ C = B ∪ C,
B ∪ C = B ∩ C.
Two sets A and B are disjoint if they have no elements in common, that is, if A ∩ B = Ø. A collection of sets S 1, S 2, … , either finite or infinite, is a set of sets, in which each member is a set Si. A collection S
= { Si} of nonempty sets forms a partition of a set S if
the sets are pairwise disjoint, that is, Si, Sj ∈ S and i ≠ j imply Si
∩ Sj = Ø,
their union is S, that is,
In other words, S forms a partition of S if each element of S appears in exactly one set Si ∈ S.
The number of elements in a set is the cardinality (or size) of the set, denoted | S|. Two sets have the same cardinality if their elements can be
put into a one-to-one correspondence. The cardinality of the empty set

is |Ø| = 0. If the cardinality of a set is a natural number, the set is finite, and otherwise, it is infinite. An infinite set that can be put into a one-to-one correspondence with the natural numbers ℕ is countably infinite,
and otherwise, it is uncountable. For example, the integers ℤ are
countable, but the reals ℝ are uncountable.
For any two finite sets A and B, we have the identity
from which we can conclude that
| A ∪ B| ≤ | A| + | B|.
If A and B are disjoint, then | A ∩ B| = 0 and thus | A ∪ B| = | A| + | B|. If A ⊆ B, then | A| ≤ | B|.
A finite set of n elements is sometimes called an n-set. A 1-set is called a singleton. A subset of k elements of a set is sometimes called a k-subset.
We denote the set of all subsets of a set S, including the empty set
and S itself, by 2 S, called the power set of S. For example, 2{ a, b} = {Ø,
{ a}, { b}, { a, b}}. The power set of a finite set S has cardinality 2| S| (see Exercise B.1-5).
We sometimes care about setlike structures in which the elements are
ordered. An ordered pair of two elements a and b is denoted ( a, b) and is defined formally as the set ( a, b) = { a, { a, b}}. Thus, the ordered pair ( a, b) is not the same as the ordered pair ( b, a).
The Cartesian product of two sets A and B, denoted A × B, is the set of all ordered pairs such that the first element of the pair is an element
of A and the second is an element of B. More formally,
A × B = {( a, b) : a ∈ A and b ∈ B}.
For example, { a, b}×{ a, b, c} = {( a, a), ( a, b), ( a, c), ( b, a), ( b, b), ( b, c)}.
When A and B are finite sets, the cardinality of their Cartesian product is
The Cartesian product of n sets A 1, A 2, … , An is the set of n-tuples
A 1 × A 2 × … × An = {( a 1, a 2, … , an) : ai ∈ Ai for i = 1, 2, … , n}, whose cardinality is
| A 1 × A 2 × … × An| = | A 1| · | A 2| · | An|
if all sets Ai are finite. We denote an n-fold Cartesian product over a single set A by the set
whose cardinality is | An| = | A| n if A is finite. We can also view an n-tuple as a finite sequence of length n (see page 1162).
Intervals are continuous sets of real numbers. We denote them with
parentheses and/or brackets. Given real numbers a and b, the closed interval [ a, b] is the set { x ∈ ℝ : a ≤ x ≤ b} of reals between a and b, including both a and b. (If a > b, this definition implies that [ a, b] = Ø.) The open interval ( a, b) = { x ∈ ℝ : a < x < b} omits both of the endpoints from the set. There are two half-open intervals [ a, b) = { x ∈ ℝ
: a ≤ x < b} and ( a, b] = { x ∈ ℝ : a < x ≤ b}, each of which excludes one endpoint.
Intervals can also be defined on the integers by replacing ℝ in the
these definitions by ℤ. Whether the interval is defined over the reals or
integers can usually be inferred from context.
Exercises
B.1-1
Draw Venn diagrams that illustrate the first of the distributive laws
(B.1).
B.1-2
Prove the generalization of DeMorgan’s laws to any finite collection of
sets:
A 1 ∩ A 2 ∩ … ∩ An = A 1 ∪ A 2 ∪ … ∪ An,
=
A 1 ∪ A 2 ∪ … ∪ An A 1 ∩ A 2 ∩ … ∩ An.
★ B.1-3
Prove the generalization of equation (B.3), which is called the principle
of inclusion and exclusion:
| A 1 ∪ A 2 ∪ … ∪ An| =
| A 1| + | A 2| + … + | An|
− | A 1 ∩ A 2| − | A 1 ∩ A 3| − …
(all pairs)
+ | A 1 ∩ A 2 ∩ A 3| + …
(all triples)
⋮
+ (−1) n−1 | A 1 ∩ A 2 ∩ … ∩ An|.
B.1-4
Show that the set of odd natural numbers is countable.
B.1-5
Show that for any finite set S, the power set 2 S has 2| S| elements (that is, there are 2| S| distinct subsets of S).
B.1-6
Give an inductive definition for an n-tuple by extending the set-theoretic
definition for an ordered pair.
A binary relation R on two sets A and B is a subset of the Cartesian product A× B. If ( a, b) ∈ R, we sometimes write a R b. When we say that R is a binary relation on a set A, we mean that R is a subset of A ×
A. For example, the “less than” relation on the natural numbers is the
set {( a, b) : a, b ∈ ℕ and a < b}. An n-ary relation on sets A 1, A 2, … , An is a subset of A 1 × A 2 × … × An.
A binary relation R ⊆ A × A is reflexive if
for all a ∈ A. For example, “=” and “≤” are reflexive relations on ℕ, but
“<” is not. The relation R is symmetric if
a R b implies b R a
for all a, b ∈ A. For example, “=” is symmetric, but “<” and “≤” are not. The relation R is transitive if
a R b and b R c imply a R c
for all a, b, c ∈ A. For example, the relations “<,” “≤,” and “=” are transitive, but the relation R = {( a, b) : a, b ∈ ℕ and a = b − 1} is not, since 3 R 4 and 4 R 5 do not imply 3 R 5.
A relation that is reflexive, symmetric, and transitive is an equivalence
relation. For example, “=” is an equivalence relation on the natural
numbers, but “<” is not. If R is an equivalence relation on a set A, then for a ∈ A, the equivalence class of a is the set [ a] = { b ∈ A : a R b}, that is, the set of all elements equivalent to a. For example, if we define R =
{( a, b) : a, b ∈ ℕ and a + b is an even number}, then R is an equivalence relation, since a + a is even (reflexive), a + b is even implies b + a is even (symmetric), and a + b is even and b + c is even imply a + c is even (transitive). The equivalence class of 4 is [4] = {0, 2, 4, 6,…}, and the
equivalence class of 3 is [3] = {1, 3, 5, 7,…}. A basic theorem of
equivalence classes is the following.
Theorem B.1 (An equivalence relation is the same as a partition)
The equivalence classes of any equivalence relation R on a set A form a partition of A, and any partition of A determines an equivalence relation on A for which the sets in the partition are the equivalence classes.
Proof For the first part of the proof, we must show that the equivalence
classes of R are nonempty, pairwise-disjoint sets whose union is A.
Because R is reflexive, a ∈ [ a], and so the equivalence classes are nonempty. Moreover, since every element a ∈ A belongs to the
equivalence class [ a], the union of the equivalence classes is A. It remains to show that the equivalence classes are pairwise disjoint, that
is, if two equivalence classes [ a] and [ b] have an element c in common, then they are in fact the same set. Suppose that a R c and b R c.
Symmetry gives that c R b and, by transitivity, a R b. Thus, we have x R
a for any arbitrary element x ∈ [ a] and, by transitivity, x R b, and thus
[ a] ⊆ [ b]. Similarly, [ b] ⊆ [ a], and thus [ a] = [ b].
For the second part of the proof, let A = { Ai} be a partition of A,
and define R = {( a, b) : there exists i such that a ∈ Ai and b ∈ Ai}. We claim that R is an equivalence relation on A. Reflexivity holds, since a ∈
Ai implies a R a. Symmetry holds, because if a R b, then a and b belong to the same set Ai, and hence b R a. If a R b and b R c, then all three elements are in the same set Ai, and thus a R c and transitivity holds. To see that the sets in the partition are the equivalence classes of R, observe
that if a ∈ Ai, then x ∈ [ a] implies x ∈ Ai, and x ∈ Ai implies x ∈ [ a].
▪
A binary relation R on a set A is antisymmetric if
a R b and b R a imply a = b.
For example, the “≤” relation on the natural numbers is antisymmetric,
since a ≤ b and b ≤ a imply a = b. A relation that is reflexive, antisymmetric, and transitive is a partial order, and we call a set on which a partial order is defined a partially ordered set. For example, the
relation “is a descendant of” is a partial order on the set of all people (if
we view individuals as being their own descendants).
In a partially ordered set A, there may be no single “maximum”
element a such that b R a for all b ∈ A. Instead, the set may contain several maximal elements a such that for no b ∈ A, where b ≠ a, is it the case that a R b. For example, a collection of different-sized boxes may
contain several maximal boxes that don’t fit inside any other box, yet it
has no single “maximum” box into which any other box will fit. 3
A relation R on a set A is a total relation if for all a, b ∈ A, we have a R b or b R a (or both), that is, if every pairing of elements of A is related by R. A partial order that is also a total relation is a total order or linear order. For example, the relation “≤” is a total order on the natural
numbers, but the “is a descendant of” relation is not a total order on the set of all people, since there are individuals neither of whom is
descended from the other. A total relation that is transitive, but not
necessarily either symmetric or antisymmetric, is a total preorder.
Exercises
B.2-1
Prove that the subset relation “⊆” on all subsets of ℤ is a partial order
but not a total order.
B.2-2
Show that for any positive integer n, the relation “equivalent modulo n”
is an equivalence relation on the integers. (We say that a = b (mod n) if there exists an integer q such that a − b = qn.) Into what equivalence classes does this relation partition the integers?
B.2-3
Give examples of relations that are
a. reflexive and symmetric but not transitive,
b. reflexive and transitive but not symmetric,
c. symmetric and transitive but not reflexive.
B.2-4
Let S be a finite set, and let R be an equivalence relation on S × S.
Show that if in addition R is antisymmetric, then the equivalence classes
of S with respect to R are singletons.
B.2-5
Professor Narcissus claims that if a relation R is symmetric and
transitive, then it is also reflexive. He offers the following proof. By
symmetry, a R b implies b R a. Transitivity, therefore, implies a R a. Is the professor correct?
Given two sets A and B, a function f is a binary relation on A and B
such that for all a ∈ A, there exists precisely one b ∈ B such that ( a, b)
∈ f. The set A is called the domain of f, and the set B is called the codomain of f. We sometimes write f : A → B, and if ( a, b) ∈ f, we write b = f ( a), since the choice of a uniquely determines b.
Intuitively, the function f assigns an element of B to each element of
A. No element of A is assigned two different elements of B, but the same element of B can be assigned to two different elements of A. For
example, the binary relation
f = {( a, b) : a, b ∈ ℕ and b = a mod 2}
is a function f : → {0, 1}, since for each natural number a, there is exactly one value b in {0, 1} such that b = a mod 2. For this example, 0
= f (0), 1 = f (1), 0 = f (2), 1 = f (3), etc. In contrast, the binary relation g = {( a, b) : a, b ∈ ℕ and a + b is even}
is not a function, since (1, 3) and (1, 5) are both in g, and thus for the
choice a = 1, there is not precisely one b such that ( a, b) ∈ g.
Given a function f : A → B, if b = f ( a), we say that a is the argument of f and that b is the value of f at a. We can define a function by stating its value for every element of its domain. For example, we might define f
( n) = 2 n for n ∈ ℕ, which means f = {( n, 2 n) : n ∈ ℕ}. Two functions f and g are equal if they have the same domain and codomain and if f ( a)
= g( a) for all a in the domain.
A finite sequence of length n is a function f whose domain is the set of n integers {0, 1, … , n − 1}. We often denote a finite sequence by listing its values in angle brackets: 〈 f (0), f (1), … , f ( n−1)〉. An infinite sequence is a function whose domain is the set ℕ of natural numbers.
For example, the Fibonacci sequence, defined by recurrence (3.31), is
the infinite sequence 〈0, 1, 1, 2, 3, 5, 8, 13, 21,…〉.
When the domain of a function f is a Cartesian product, we often
omit the extra parentheses surrounding the argument of f. For example,
if we have a function f : A 1 × A 2 × … An → B, we write b = f ( a 1, a 2, …
an) instead of writing b = f (( a 1, a 2, … an)). We also call each ai an
argument to the function f, though technically f has just a single argument, which is the n-tuple ( a 1, a 2, … an).
If f : A → B is a function and b = f ( a), then we sometimes say that b is the image of a under f. The image of a set A′ ⊆ A under f is defined by f ( A′) = { b ∈ B : b = f ( a) for some a ∈ A′}.
The range of f is the image of its domain, that is, f ( A). For example, the range of the function f : ℕ → ℕ defined by f ( n) = 2 n is f(ℕ) = { m : m =
2 n for some n ∈ ℕ}, in other words, the set of nonnegative even integers.
A function is a surjection if its range is its codomain. For example,
the function f ( n) = ⌊ n/2⌊ is a surjective function from ℕ to ℕ, since every element in ℕ appears as the value of f for some argument. In contrast, the function f ( n) = 2 n is not a surjective function from ℕ to ℕ, since no argument to f can produce any odd natural number as a
value. The function f ( n) = 2 n is, however, a surjective function from the natural numbers to the even numbers. A surjection f : A → B is sometimes described as mapping A onto B. When we say that f is onto,
we mean that it is surjective.
A function f : A → B is an injection if distinct arguments to f produce distinct values, that is, if a ≠ a′ implies f ( a) ≠ f ( a′). For example, the function f ( n) = 2 n is an injective function from ℕ to ℕ, since each even number b is the image under f of at most one element of the domain,
namely b/2. The function f ( n) = ⌊ n/2⌊ is not injective, since the value 1 is produced by two arguments: f (2) = 1 and f (3) = 1. An injection is sometimes called a one-to-one function.
A function f : A → B is a bijection if it is injective and surjective. For example, the function f ( n) = (−1) n⌈ n/2⌉ is a bijection from ℕ to ℤ: 0 → 0,
1 → −1,
2 → 1,
3 → −2,
4 → 2,
⋮
The function is injective, since no element of ℤ is the image of more
than one element of ℕ. It is surjective, since every element of ℤ appears
as the image of some element of ℕ. Hence, the function is bijective. A
bijection is sometimes called a one-to-one correspondence, since it pairs
elements in the domain and codomain. A bijection from a set A to itself
is sometimes called a permutation.
When a function f is bijective, we define its inverse f−1 as
f −1( b) = a if and only if f ( a) = b.
For example, the inverse of the function f ( n) = (−1) n⌈ n/2⌉ is Exercises
B.3-1
Let A and B be finite sets, and let f : A → B be a function. Show the following:
a. If f is injective, then | A| ≤ | B|.
b. If f is surjective, then | A| ≥ | B|.
B.3-2
Is the function f ( x) = x + 1 bijective when the domain and the codomain are the set ℕ? Is it bijective when the domain and the
codomain are the set ℤ?
B.3-3
Give a natural definition for the inverse of a binary relation such that if
a relation is in fact a bijective function, its relational inverse is its
functional inverse.
★ B.3-4
Give a bijection from ℤ to ℤ × ℤ.
This section presents two kinds of graphs: directed and undirected.
Certain definitions in the literature differ from those given here, but for
the most part, the differences are slight. Section 20.1 shows how to represent graphs in computer memory.
A directed graph (or digraph) G is a pair ( V, E), where V is a finite set and E is a binary relation on V. The set V is called the vertex set of G, and its elements are called vertices (singular: vertex). The set E is called the edge set of G, and its elements are called edges. Figure B.2(a) is a pictorial representation of a directed graph on the vertex set {1, 2, 3, 4,
5, 6}. Vertices are represented by circles in the figure, and edges are
represented by arrows. Self-loops—edges from a vertex to itself—are
possible.
In an undirected graph G = ( V, E), the edge set E consists of unordered pairs of vertices, rather than ordered pairs. That is, an edge is
a set { u, v}, where u, v ∈ V and u ≠ v. By convention, we use the notation ( u, v) for an edge, rather than the set notation { u, v}, and we consider ( u, v) and ( v, u) to be the same edge. In an undirected graph, self-loops are forbidden, so that every edge consists of two distinct
vertices. Figure B.2(b) shows an undirected graph on the vertex set {1, 2, 3, 4, 5, 6}.
Figure B.2 Directed and undirected graphs. (a) A directed graph G = ( V, E), where V = {1, 2, 3, 4, 5, 6} and E = {(1, 2), (2, 2), (2, 4), (2, 5), (4, 1), (4, 5), (5, 4), (6, 3)}. The edge (2, 2) is a self-loop. (b) An undirected graph G = ( V, E), where V = {1, 2, 3, 4, 5, 6} and E = {(1, 2), (1, 5), (2, 5), (3, 6)}. The vertex 4 is isolated. (c) The subgraph of the graph in part (a) induced by the vertex set {1, 2, 3, 6}.
Many definitions for directed and undirected graphs are the same,
although certain terms have slightly different meanings in the two
contexts. If ( u, v) is an edge in a directed graph G = ( V, E), we say that ( u, v) is incident from or leaves vertex u and is incident to or enters vertex v. For example, the edges leaving vertex 2 in Figure B.2(a) are (2, 2), (2, 4), and (2, 5). The edges entering vertex 2 are (1, 2) and (2, 2). If ( u, v) is an edge in an undirected graph G = ( V, E), we say that ( u, v) is incident on vertices u and v. In Figure B.2(b), the edges incident on vertex 2 are (1, 2) and (2, 5).
If ( u, v) is an edge in a graph G = ( V, E), we say that vertex v is adjacent to vertex u. When the graph is undirected, the adjacency relation is symmetric. When the graph is directed, the adjacency relation
is not necessarily symmetric. If v is adjacent to u in a directed graph, we can write u → v. In parts (a) and (b) of Figure B.2, vertex 2 is adjacent to vertex 1, since the edge (1, 2) belongs to both graphs. Vertex 1 is not
adjacent to vertex 2 in Figure B.2(a), since the edge (2, 1) is absent.
The degree of a vertex in an undirected graph is the number of edges
incident on it. For example, vertex 2 in Figure B.2(b) has degree 2. A vertex whose degree is 0, such as vertex 4 in Figure B.2(b), is isolated. In a directed graph, the out-degree of a vertex is the number of edges leaving it, and the in-degree of a vertex is the number of edges entering
it. The degree of a vertex in a directed graph is its in-degree plus its out-
degree. Vertex 2 in Figure B.2(a) has in-degree 2, out-degree 3, and degree 5.


A path of length k from a vertex u to a vertex u′ in a graph G = ( V, E) is a sequence 〈 v 0, v 1, v 2, … , vk〉 of vertices such that u = v 0, u′ = vk, and ( vi−1, vi) ∈ E for i = 1, 2, … , k. The length of the path is the number of edges in the path, which is 1 less than the number of vertices in the path.
The path contains the vertices v 0, v 1, … , vk and the edges ( v 0, v 1), ( v 1, v 2), … , ( vk−1, vk). (There is always a 0-length path from u to u.) If there is a path p from u to u′, we say that u′ is reachable from u via p, which we can write as
. A path is simple 4 if all vertices in the path
are distinct. In Figure B.2(a), the path 〈1, 2, 5, 4〉 is a simple path of length 3. The path 〈2, 5, 4, 5〉 is not simple. A subpath of path p = 〈 v 0, v 1, … , vk〉 is a contiguous subsequence of its vertices. That is, for any 0
≤ i ≤ j ≤ k, the subsequence of vertices 〈 vi, vi+1, … , vj〉 is a subpath of p.
In a directed graph, a path 〈 v 0, v 1, … , vk〉 forms a cycle if v 0 = vk and the path contains at least one edge. The cycle is simple if, in addition, v 1, v 2, … , vk are distinct. A cycle consisting of k vertices has length k. A self-loop is a cycle of length 1. Two paths 〈 v 0, v 1, v 2, … , vk−1, v 0〉 and
form the same cycle if there exists an
integer j such that
for i = 0, 1, … , k−1. In Figure B.2(a), the
path 〈1,2,4,1〉 forms the same cycle as the paths 〈2, 4, 1, 2〉 and 〈4, 1, 2,
4〉. This cycle is simple, but the cycle 〈1, 2, 4, 5, 4, 1〉 is not. The cycle 〈2,
2〉 formed by the edge (2, 2) is a self-loop. A directed graph with no self-
loops is simple. In an undirected graph, a path 〈 v 0, v 1, …, vk〉 forms a cycle if k > 0, v 0 = vk, and all edges on the path are distinct. The cycle is simple if v 1, v 2, … , vk are distinct. For example, in Figure B.2(b), the path 〈1, 2, 5, 1〉 is a simple cycle. A graph with no simple cycles is
acyclic.
An undirected graph is connected if every vertex is reachable from all
other vertices. The connected components of an undirected graph are the
equivalence classes of vertices under the “is reachable from” relation.
The graph shown in Figure B.2(b) has three connected components: {1, 2, 5}, {3, 6}, and {4}. Every vertex in the connected component {1, 2,
5} is reachable from every other vertex in {1, 2, 5}. An undirected graph is connected if it has exactly one connected component. The edges of a
connected component are those that are incident on only the vertices of
the component. In other words, edge ( u, v) is an edge of a connected component only if both u and v are vertices of the component.
A directed graph is strongly connected if every two vertices are
reachable from each other. The strongly connected components of a
directed graph are the equivalence classes of vertices under the “are
mutually reachable” relation. A directed graph is strongly connected if it
has only one strongly connected component. The graph in Figure B.2(a)
has three strongly connected components: {1, 2, 4, 5}, {3}, and {6}. All
pairs of vertices in {1, 2, 4, 5} are mutually reachable. The vertices {3,
6} do not form a strongly connected component, since vertex 6 cannot
be reached from vertex 3.
Two graphs G = ( V, E) and G′ = ( V′, E′) are isomorphic if there exists a bijection f : V → V′ such that ( u, v) ∈ E if and only if ( f ( u), f ( v)) ∈
E′. In other words, G and G′ are isomorphic if the vertices of G can be relabeled to be vertices of G′, maintaining the corresponding edges in G
and G′. Figure B.3(a) shows a pair of isomorphic graphs G and G′ with respective vertex sets V = {1, 2, 3, 4, 5, 6} and V′ = { u, v, w, x, y, z}. The mapping from V to V′ given by f (1) = u, f (2) = v, f (3) = w, f (4) = x, f (5) = y, f (6) = z provides the required bijective function. The graphs in
Figure B.3(b) are not isomorphic. Although both graphs have 5 vertices and 7 edges, the top graph has a vertex of degree 4 and the bottom
graph does not.
We say that a graph G′ = ( V′, E′) is a subgraph of G = ( V, E) if V′ ⊆
V and E′ ⊆ E. Given a set V′ ⊆ V, the subgraph of G induced by V′ is the graph G′ = ( V′, E′), where
E′ = {( u, v) ∈ E : u, v ∈ V′}.
The subgraph induced by the vertex set {1, 2, 3, 6} in Figure B.2(a)
appears in Figure B.2(c) and has the edge set {(1, 2), (2, 2), (6, 3)}.
Given an undirected graph G = ( V, E), the directed version of G is the directed graph G′ = ( V, E′), where ( u, v) ∈ E′ if and only if ( u, v) ∈ E.
That is, each undirected edge ( u, v) in G turns into two directed edges, ( u, v) and ( v, u), in the directed version. Given a directed graph G = ( V,
E), the undirected version of G is the undirected graph G′ = ( V, E′), where ( u, v) ∈ E′ if and only if u ≠ v and E contains at least one of the edges ( u, v) and ( v, u). That is, the undirected version contains the edges of G “with their directions removed” and with self-loops eliminated.
(Since ( u, v) and ( v, u) are the same edge in an undirected graph, the undirected version of a directed graph contains it only once, even if the
directed graph contains both edges ( u, v) and ( v, u).) In a directed graph G = ( V, E), a neighbor of a vertex u is any vertex that is adjacent to u in the undirected version of G. That is, v is a neighbor of u if u ≠ v and either ( u, v) ∈ E or ( v, u) ∈ E. In an undirected graph, u and v are neighbors if they are adjacent.
Figure B.3 (a) A pair of isomorphic graphs. The vertices of the top graph are mapped to the vertices of the bottom graph by f (1) = u, f (2) = v, f (3) = w, f (4) = x, f (5) = y, f (6) = z. (b) Two graphs that are not isomorphic. The top graph has a vertex of degree 4, and the bottom graph does not.
Several kinds of graphs have special names. A complete graph is an
undirected graph in which every pair of vertices is adjacent. An
undirected graph G = ( V, E) is bipartite if V can be partitioned into two sets V 1 and V 2 such that ( u, v) ∈ E implies either u ∈ V 1 and v ∈ V 2 or u ∈ V 2 and v ∈ V 1. That is, all edges go between the two sets V 1 and V 2. An acyclic, undirected graph is a forest, and a connected, acyclic, undirected graph is a (free) tree (see Section B.5). We often take the first letters of “directed acyclic graph” and call such a graph a dag.
There are two variants of graphs that you may occasionally
encounter. A multigraph is like an undirected graph, but it can have
both multiple edges between vertices (such as two distinct edges ( u, v) and ( u, v)) and self-loops. A hypergraph is like an undirected graph, but each hyperedge, rather than connecting two vertices, connects an
arbitrary subset of vertices. Many algorithms written for ordinary
directed and undirected graphs can be adapted to run on these
graphlike structures.
The contraction of an undirected graph G = ( V, E) by an edge e = ( u, v) is a graph G′ = ( V′, E′), where V′ = V − { u, v} ∪ { x} and x is a new vertex. The set of edges E′ is formed from E by deleting the edge ( u, v) and, for each vertex w adjacent to u or v, deleting whichever of ( u, w) and ( v, w) belongs to E and adding the new edge ( x, w). In effect, u and v are “contracted” into a single vertex.
Exercises
B.4-1
Attendees of a faculty party shake hands to greet each other, with every
pair of professors shaking hands one time. Each professor remembers
the number of times he or she shook hands. At the end of the party, the
department head asks the professors for their totals and adds them all
up. Show that the result is even by proving the handshaking lemma: if G
= ( V, E) is an undirected graph, then
B.4-2
Show that if a directed or undirected graph contains a path between two
vertices u and v, then it contains a simple path between u and v. Show that if a directed graph contains a cycle, then it contains a simple cycle.
B.4-3
Show that any connected, undirected graph G = ( V, E) satisfies | E| ≥ | V |
− 1.
B.4-4
Verify that in an undirected graph, the “is reachable from” relation is an
equivalence relation on the vertices of the graph. Which of the three
properties of an equivalence relation hold in general for the “is reachable from” relation on the vertices of a directed graph?
B.4-5
What is the undirected version of the directed graph in Figure B.2(a)?
What is the directed version of the undirected graph in Figure B.2(b)?
B.4-6
Show how a bipartite graph can represent a hypergraph by letting
incidence in the hypergraph correspond to adjacency in the bipartite
graph. ( Hint: Let one set of vertices in the bipartite graph correspond to
vertices of the hypergraph, and let the other set of vertices of the
bipartite graph correspond to hyperedges.)
As with graphs, there are many related, but slightly different, notions of
trees. This section presents definitions and mathematical properties of
several kinds of trees. Sections 10.3 and 20.1 describe how to represent trees in computer memory.
B.5.1 Free trees
As defined in Section B.4, a free tree is a connected, acyclic, undirected graph. We often omit the adjective “free” when we say that a graph is a
tree. If an undirected graph is acyclic but possibly disconnected, it is a
forest. Many algorithms that work for trees also work for forests. Figure
B.4(a) shows a free tree, and Figure B.4(b) shows a forest. The forest in
Figure B.4(b) is not a tree because it is not connected. The graph in
Figure B.4(c) is connected but neither a tree nor a forest, because it contains a cycle.
The following theorem captures many important facts about free
trees.
Theorem B.2 (Properties of free trees)

Figure B.4 (a) A free tree. (b) A forest. (c) A graph that contains a cycle and is therefore neither a tree nor a forest.
Let G = ( V, E) be an undirected graph. The following statements are equivalent.
1. G is a free tree.
2. Any two vertices in G are connected by a unique simple path.
3. G is connected, but if any edge is removed from E, the resulting
graph is disconnected.
4. G is connected, and | E| = | V | − 1.
5. G is acyclic, and | E| = | V | − 1.
6. G is acyclic, but if any edge is added to E, the resulting graph contains a cycle.
Figure B.5 A step in the proof of Theorem B.2: if (1) G is a free tree, then (2) any two vertices in G are connected by a unique simple path. Assume for the sake of contradiction that vertices u and v are connected by two distinct simple paths. These paths first diverge at vertex w, and they first reconverge at vertex z. The path p′ concatenated with the reverse of the path p″ forms a cycle, which yields the contradiction.
Proof (1) ⇒ (2): Since a tree is connected, any two vertices in G are connected by at least one simple path. Suppose for the sake of

contradiction that vertices u and v are connected by two distinct simple paths as shown in Figure B.5. Let w be the vertex at which the paths first diverge. That is, if we call the paths p 1 and p 2, then w is the first vertex on both p 1 and p 2 whose successor on p 1 is x and whose successor on p 2 is y, where x ≠ y. Let z be the first vertex at which the paths reconverge, that is, z is the first vertex following w on p 1 that is also on p 2. Let p′ = w → x ⇝ z be the subpath of p 1 from w through x to z, so that
, and let p″ = w → y ⇝ z be the subpath of p 2 from w through y to z, so that
. Paths p′ and p″ share no
vertices except their endpoints. Then, as Figure B.5 shows, the path obtained by concatenating p′ and the reverse of p″ is a cycle, which contradicts our assumption that G is a tree. Thus, if G is a tree, there can be at most one simple path between two vertices.
(2) ⇒ (3): If any two vertices in G are connected by a unique simple
path, then G is connected. Let ( u, v) be any edge in E. This edge is a path from u to v, and so it must be the unique path from u to v. If ( u, v) were to be removed from G, there would be no path from u to v, and G
would be disconnected.
(3) ⇒ (4): By assumption, the graph G is connected, so Exercise B.4-3
gives that | E| ≥ | V| − 1. We prove | E| ≤ | V| − 1 by induction on | V|. The base cases are when | V| = 1 or | V| = 2, and in either case, | E| = | V| − 1.
For the inductive step, suppose that | V| ≥ 3 for graph G and that any graph G′ = ( V′, E′), where | V′| < | V|, that satisfies (3) also satisfies | E′| ≤
| V′| − 1. Removing an arbitrary edge from G separates the graph into k
≥ 2 connected components (actually k = 2). Each component satisfies
(3), or else G would not satisfy (3). Consider each connected component
Vi, with edge set Ei, as a separate free tree. Then, because each connected component has fewer than | V| vertices, the inductive
hypothesis implies that | Ei| ≤ | Vi| − 1. Thus, the number of edges in all k connected components combined is at most | V| − k ≤ | V| − 2. Adding in the removed edge yields | E| ≤ | V| − 1.
(4) ⇒ (5): Suppose that G is connected and that | E| = | V| − 1. We must show that G is acyclic. Suppose that G has a cycle containing k
vertices v 1, v 2, … , vk, and without loss of generality assume that this cycle is simple. Let Gk = ( Vk, Ek) be the subgraph of G consisting of the cycle, so that | Vk| = | Ek| = k. If k < | V|, then because G is connected, there must be a vertex vk+1∈ V − Vk that is adjacent to some vertex vi
∈ Vk. Define Gk+1 = ( Vk+1, Ek+1) to be the subgraph of G with Vk+1 = Vk ∪ { vk+1} and Ek+1 = Ek ∪ {( vi, vk+1)}. Note that | Vk+1|
= | Ek+1| = k + 1. If k + 1 < | V|, then continue, defining Gk+2 in the same manner, and so forth, until we obtain Gn = ( Vn, En), where n =
| V|, Vn = V, and | En| = | Vn| = | V|. Since Gn is a subgraph of G, we have En ⊆ E, and hence | E| ≥ | En| = | Vn| = | V|, which contradicts the assumption that | E| = | V| − 1. Thus, G is acyclic.
(5) ⇒ (6): Suppose that G is acyclic and that | E| = | V| − 1. Let k be the number of connected components of G. Each connected component
is a free tree by definition, and since (1) implies (5), the sum of all edges
in all connected components of G is | V| − k. Consequently, k must equal 1, and G is in fact a tree. Since (1) implies (2), any two vertices in G are connected by a unique simple path. Thus, adding any edge to G creates
a cycle.
(6) ⇒ (1): Suppose that G is acyclic but that adding any edge to E
creates a cycle. We must show that G is connected. Let u and v be arbitrary vertices in G. If u and v are not already adjacent, adding the edge ( u, v) creates a cycle in which all edges but ( u, v) belong to G. Thus, the cycle minus edge ( u, v) must contain a path from u to v, and since u and v were chosen arbitrarily, G is connected.
▪
B.5.2 Rooted and ordered trees
A rooted tree is a free tree in which one of the vertices is distinguished
from the others. We call the distinguished vertex the root of the tree. We
often refer to a vertex of a rooted tree as a nod e 5 of the tree. Figure
B.6(a) shows a rooted tree on a set of 12 nodes with root 7.
Figure B.6 Rooted and ordered trees. (a) A rooted tree with height 4. The tree is drawn in a standard way: the root (node 7) is at the top, its children (nodes with depth 1) are beneath it, their children (nodes with depth 2) are beneath them, and so forth. If the tree is ordered, the relative left-to-right order of the children of a node matters; otherwise, it doesn’t. (b) Another rooted tree. As a rooted tree, it is identical to the tree in (a), but as an ordered tree it is different, since the children of node 3 appear in a different order.
Consider a node x in a rooted tree T with root r. We call any node y on the unique simple path from r to x an ancestor of x. If y is an ancestor of x, then x is a descendant of y. (Every node is both an ancestor and a descendant of itself.) If y is an ancestor of x and x ≠ y, then y is a proper ancestor of x and x is a proper descendant of y. The subtree rooted at x is the tree induced by descendants of x, rooted at x.
For example, the subtree rooted at node 8 in Figure B.6(a) contains nodes 8, 6, 5, and 9.
If the last edge on the simple path from the root r of a tree T to a
node x is ( y, x), then y is the parent of x, and x is a child of y. The root is the only node in T with no parent. If two nodes have the same parent,
they are siblings. A node with no children is a leaf or external node. A nonleaf node is an internal node.
The number of children of a node x in a rooted tree T is the degree of x. 6 The length of the simple path from the root r to a node x is the depth of x in T. A level of a tree consists of all nodes at the same depth.
The height of a node in a tree is the number of edges on the longest simple downward path from the node to a leaf, and the height of a tree
is the height of its root. The height of a tree is also equal to the largest
depth of any node in the tree.
An ordered tree is a rooted tree in which the children of each node are ordered. That is, if a node has k children, then there is a first child, a
second child, and so on, up to and including a k th child. The two trees
in Figure B.6 are different when considered to be ordered trees, but the same when considered to be just rooted trees.
B.5.3 Binary and positional trees
We define binary trees recursively. A binary tree T is a structure defined on a finite set of nodes that either
contains no nodes, or
is composed of three disjoint sets of nodes: a root node, a binary
tree called its left subtree, and a binary tree called its right subtree.
The binary tree that contains no nodes is called the empty tree or null
tree, sometimes denoted NIL. If the left subtree is nonempty, its root is
called the left child of the root of the entire tree. Likewise, the root of a
nonnull right subtree is the right child of the root of the entire tree. If a
subtree is the null tree NIL, we say that the child is absent or missing.
Figure B.7(a) shows a binary tree.
A binary tree is not simply an ordered tree in which each node has
degree at most 2. For example, in a binary tree, if a node has just one
child, the position of the child—whether it is the left child or the right
child—matters. In an ordered tree, there is no distinguishing a sole child
as being either left or right. Figure B.7(b) shows a binary tree that differs from the tree in Figure B.7(a) because of the position of one node. Considered as ordered trees, however, the two trees are identical.
One way to represent the positioning information in a binary tree is
by the internal nodes of an ordered tree, as shown in Figure B.7(c). The idea is to replace each missing child in the binary tree with a node
having no children. These leaf nodes are drawn as squares in the figure.
The tree that results is a full binary tree: each node is either a leaf or has
degree exactly 2. No nodes have degree 1. Consequently, the order of
the children of a node preserves the position information.

Figure B.7 Binary trees. (a) A binary tree drawn in a standard way. The left child of a node is drawn beneath the node and to the left. The right child is drawn beneath and to the right. (b) A binary tree different from the one in (a). In (a), the left child of node 7 is 5 and the right child is absent. In (b), the left child of node 7 is absent and the right child is 5. As ordered trees, these trees are the same, but as binary trees, they are distinct. (c) The binary tree in (a) represented by the internal nodes of a full binary tree: an ordered tree in which each internal node has degree 2.
The leaves in the tree are shown as squares.
The positioning information that distinguishes binary trees from
ordered trees extends to trees with more than two children per node. In
a positional tree, the children of a node are labeled with distinct positive
integers. The i th child of a node is absent if no child is labeled with integer i. A k-ary tree is a positional tree in which for every node, all children with labels greater than k are missing. Thus, a binary tree is a
k-ary tree with k = 2.
A complete k-ary tree is a k-ary tree in which all leaves have the same depth and all internal nodes have degree k. Figure B.8 shows a complete binary tree of height 3. How many leaves does a complete k-ary tree of
height h have? The root has k children at depth 1, each of which has k children at depth 2, etc. Thus, the number of nodes at depth d is kd. In a complete k-ary tree with height h, the leaves are at depth h, so that there are kh leaves. Consequently, the height of a complete k-ary tree with n leaves is log kn. A complete k-ary tree of height h has
internal nodes. Thus, a complete binary tree has 2 h − 1 internal nodes.
Figure B.8 A complete binary tree of height 3 with 8 leaves and 7 internal nodes.
Exercises
B.5-1
Draw all the free trees composed of the three vertices x, y, and z. Draw all the rooted trees with nodes x, y, and z with x as the root. Draw all the ordered trees with nodes x, y, and z with x as the root. Draw all the binary trees with nodes x, y, and z with x as the root.
B.5-2
Let G = ( V, E) be a directed acyclic graph in which there is a vertex v 0
∈ V such that there exists a unique path from v 0 to every vertex v ∈ V.
Prove that the undirected version of G forms a tree.
B.5-3
Show by induction that the number of degree-2 nodes in any nonempty
binary tree is one less than the number of leaves. Conclude that the
number of internal nodes in a full binary tree is one less than the
number of leaves.
B.5-4
Prove that for any integer k ≥ 1, there is a full binary tree with k leaves.
B.5-5
Use induction to show that a nonempty binary tree with n nodes has
height at least ⌊lg n⌊.
The internal path length of a full binary tree is the sum, taken over all
internal nodes of the tree, of the depth of each node. Likewise, the
external path length is the sum, taken over all leaves of the tree, of the
depth of each leaf. Consider a full binary tree with n internal nodes, internal path length i, and external path length e. Prove that e = i + 2 n.
★ B.5-7
Associate a “weight” w( x) = 2− d with each leaf x of depth d in a binary tree T, and let L be the set of leaves of T. Prove the Kraft inequality: Σ x∈ L w( x) ≤ 1.
★ B.5-8
Show that if L ≥ 2, then every binary tree with L leaves contains a subtree having between L/3 and 2 L/3 leaves, inclusive.
Problems
B-1 Graph coloring
A k-coloring of undirected graph G = ( V, E) is a function c : V → {1, 2,
… , k} such that c( u) ≠ c( v) for every edge ( u, v) ∈ E. In other words, the numbers 1, 2, … , k represent the k colors, and adjacent vertices must
have different colors.
a. Show that any tree is 2-colorable.
b. Show that the following are equivalent:
1. G is bipartite.
2. G is 2-colorable.
3. G has no cycles of odd length.
c. Let d be the maximum degree of any vertex in a graph G. Prove that G
can be colored with d + 1 colors.
d. Show that if G has O(| V|) edges, then G can be colored with colors.
B-2 Friendly graphs
Reword each of the following statements as a theorem about undirected
graphs, and then prove it. Assume that friendship is symmetric but not
reflexive.
a. Any group of at least two people contains at least two people with the
same number of friends in the group.
b. Every group of six people contains either at least three mutual friends
or at least three mutual strangers.
c. Any group of people can be partitioned into two subgroups such that
at least half the friends of each person belong to the subgroup of
which that person is not a member.
d. If everyone in a group is the friend of at least half the people in the
group, then the group can be seated around a table in such a way that
everyone is seated between two friends.
B-3 Bisecting trees
Many divide-and-conquer algorithms that operate on graphs require
that the graph be bisected into two nearly equal-sized subgraphs, which
are induced by a partition of the vertices. This problem investigates
bisections of trees formed by removing a small number of edges. We
require that whenever two vertices end up in the same subtree after
removing edges, then they must belong to the same partition.
a. Show that the vertices of any n-vertex binary tree can be partitioned into two sets A and B, such that | A| ≤ 3 n/4 and | B| ≤ 3 n/4, by removing a single edge.
b. Show that the constant 3/4 in part (a) is optimal in the worst case by
giving an example of a simple binary tree whose most evenly balanced
partition upon removal of a single edge has | A| = 3 n/4.
c. Show that by removing at most O(lg n) edges, we can partition the vertices of any n-vertex binary tree into two sets A and B such that | A|
= ⌊ n/2⌊ and | B| = ⌈ n/2⌉.
Appendix notes
G. Boole pioneered the development of symbolic logic, and he
introduced many of the basic set notations in a book published in 1854.
Modern set theory was created by G. Cantor during the period 1874–
1895. Cantor focused primarily on sets of infinite cardinality. The term
“function” is attributed to G. W. Leibniz, who used it to refer to several
kinds of mathematical formulas. His limited definition has been
generalized many times. Graph theory originated in 1736, when L.
Euler proved that it was impossible to cross each of the seven bridges in
the city of Königsberg exactly once and return to the starting point.
The book by Harary [208] provides a useful compendium of many definitions and results from graph theory.
1 A variation of a set, which can contain the same object more than once, is called a multiset.
2 Some authors start the natural numbers with 1 instead of 0. The modern trend seems to be to start with 0.
3 To be precise, in order for the “fit inside” relation to be a partial order, we need to view a box as fitting inside itself.
4 Some authors refer to what we call a path as a “walk” and to what we call a simple path as just a “path.”
5 The term “node” is often used in the graph theory literature as a synonym for “vertex.” We reserve the term “node” to mean a vertex of a rooted tree.
6 The degree of a node depends on whether we consider T to be a rooted tree or a free tree. The degree of a vertex in a free tree is, as in any undirected graph, the number of adjacent vertices. In a rooted tree, however, the degree is the number of children—the parent of a node does not count toward its degree.
This appendix reviews elementary combinatorics and probability theory.
If you have a good background in these areas, you may want to skim the
beginning of this appendix lightly and concentrate on the later sections.
Most of this book’s chapters do not require probability, but for some
chapters it is essential.
Section C.1 reviews elementary results in counting theory, including standard formulas for counting permutations and combinations. The
axioms of probability and basic facts concerning probability
distributions form Section C.2. Random variables are introduced in
Section C.3, along with the properties of expectation and variance.
Section C.4 investigates the geometric and binomial distributions that arise from studying Bernoulli trials. The study of the binomial
distribution continues in Section C.5, an advanced discussion of the
“tails” of the distribution.
Counting theory tries to answer the question “How many?” without
actually enumerating all the choices. For example, you might ask, “How
many different n-bit numbers are there?” or “How many orderings of n
distinct elements are there?” This section reviews the elements of
counting theory. Since some of the material assumes a basic
understanding of sets, you might wish to start by reviewing the material
in Section B.1.
We can sometimes express a set of items that we wish to count as a
union of disjoint sets or as a Cartesian product of sets.
The rule of sum says that the number of ways to choose one element
from one of two disjoint sets is the sum of the cardinalities of the sets.
That is, if A and B are two finite sets with no members in common, then
| A ∪ B| = | A| + | B|, which follows from equation (B.3) on page 1156. For example, if each position on a car’s license plate is a letter or a digit, then the number of possibilities for each position is 26 + 10 = 36, since
there are 26 choices if it is a letter and 10 choices if it is a digit.
The rule of product says that the number of ways to choose an
ordered pair is the number of ways to choose the first element times the
number of ways to choose the second element. That is, if A and B are
two finite sets, then | A × B| = | A|·| B|, which is simply equation (B.4) on page 1157. For example, if an ice-cream parlor offers 28 flavors of ice
cream and four toppings, the number of possible sundaes with one
scoop of ice cream and one topping is 28 · 4 = 112.
Strings
A string over a finite set S is a sequence of elements of S. For example, there are eight binary strings of length 3:
000, 001, 010, 011, 100, 101, 110, 111.
(Here we use the shorthand of omitting the angle brackets when
denoting a sequence.) We sometimes call a string of length k a k-string.
A substring s′ of a string s is an ordered sequence of consecutive elements of s. A k-substring of a string is a substring of length k. For example, 010 is a 3-substring of 01101001 (the 3-substring that begins in
position 4), but 111 is not a substring of 01101001.
We can view a k-string over a set S as an element of the Cartesian
product Sk of k-tuples, which means that there are | S| k strings of length k. For example, the number of binary k-strings is 2 k. Intuitively, to construct a k-string over an n-set, there are n ways to pick the first element; for each of these choices, there are n ways to pick the second

element; and so forth k times. This construction leads to the k-fold product
as the number of k-strings.
Permutations
A permutation of a finite set S is an ordered sequence of all the elements of S, with each element appearing exactly once. For example, if S = { a, b, c}, then S has 6 permutations:
abc, acb, bac, bca, cab, cba.
(Again, we use the shorthand of omitting the angle brackets when
denoting a sequence.) There are n! permutations of a set of n elements, since there are n ways to choose the first element of the sequence, n − 1
ways for the second element, n − 2 ways for the third, and so on.
A k-permutation of S is an ordered sequence of k elements of S, with no element appearing more than once in the sequence. (Thus, an
ordinary permutation is an n-permutation of an n-set.) Here are the 2-
permutations of the set { a, b, c, d}:
ab, ac, ad, ba, bc, bd, ca, cb, cd, da, db, dc.
The number of k-permutations of an n-set is
since there are n ways to choose the first element, n − 1 ways to choose the second element, and so on, until k elements are chosen, with the last
element chosen from the remaining n − k + 1 elements. For the above
example, with n = 4 and k = 2, the formula (C.1) evaluates to 4!/2! = 12, matching the number of 2-permutations listed.
Combinations
A k-combination of an n-set S is simply a k-subset of S. For example, the 4-set { a, b, c, d} has six 2-combinations:
ab, ac, ad, bc, bd, cd.




(Here we use the shorthand of omitting the braces around each subset.)
To construct a k-combination of an n-set, choose k distinct (different) elements from the n-set. The order of selecting the elements does not matter.
We can express the number of k-combinations of an n-set in terms of
the number of k-permutations of an n-set. Every k-combination has exactly k! permutations of its elements, each of which is a distinct k-
permutation of the n-set. Thus the number of k-combinations of an n-
set is the number of k-permutations divided by k!. From equation (C.1), this quantity is
For k = 0, this formula tells us that the number of ways to choose 0
elements from an n-set is 1 (not 0), since 0! = 1.
Binomial coefficients
The notation (read “n choose k”) denotes the number of k-
combinations of an n-set. Equation (C.2) gives
This formula is symmetric in k and n − k:
These numbers are also known as binomial coefficients, due to their
appearance in the binomial theorem:
where n ∈ ℕ and x, y ∈ ℝ. The right-hand side of equation (C.4) is called the binomial expansion of the left-hand side. A special case of the
binomial theorem occurs when x = y = 1:





This formula corresponds to counting the 2 n binary n-strings by the number of 1s they contain: binary n-strings contain exactly k 1s, since there are ways to choose k out of the n positions in which to place the 1s. Many identities involve binomial coefficients. The exercises at the
end of this section give you the opportunity to prove a few.
Binomial bounds
You sometimes need to bound the size of a binomial coefficient. For 1 ≤
k ≤ n, we have the lower bound
Taking advantage of the inequality k! ≥ ( k/ e) k derived from Stirling’s approximation (3.25) on page 67, we obtain the upper bounds
For all integers k such that 0 ≤ k ≤ n, you can use induction (see Exercise C.1-12) to prove the bound
where for convenience we assume that 00 = 1. For k = λn, where 0 ≤ λ ≤
1, we can rewrite this bound as

where
is the (binary) entropy function and where, for convenience, we assume
that 0 lg 0 = 0, so that H(0) = H(1) = 0.
Exercises
C.1-1
How many k-substrings does an n-string have? (Consider identical k-
substrings at different positions to be different.) How many substrings
does an n-string have in total?
C.1-2
An n-input, m-output boolean function is a function from {0, 1} n to {0, 1} m. How many n-input, 1-output boolean functions are there? How many n-input, m-output boolean functions are there?
C.1-3
In how many ways can n professors sit around a circular conference
table? Consider two seatings to be the same if one can be rotated to
form the other.
C.1-4
In how many ways is it possible to choose three distinct numbers from
the set {1, 2, … , 99} so that their sum is even?
C.1-5
Prove the identity











for 0 < k ≤ n.
C.1-6
Prove the identity
for 0 ≤ k < n.
C.1-7
To choose k objects from n, you can make one of the objects distinguished and consider whether the distinguished object is chosen.
Use this approach to prove that
C.1-8
Using the result of Exercise C.1-7, make a table for n = 0, 1, … , 6 and 0
≤ k ≤ n of the binomial coefficients with at the top, and on the next line, then , , and , and so forth. Such a table of binomial
coefficients is called Pascal’s triangle.
C.1-9
Prove that
C.1-10
Show that for any integers n ≥ 0 and 0 ≤ k ≤ n, the expression achieves its maximum value when k = ⌊ n/2⌊ or k = ⌈ n/2⌉.



★ C.1-11
Argue that for any integers n ≥ 0, j ≥ 0, k ≥ 0, and j + k ≤ n, Provide both an algebraic proof and an argument based on a method
for choosing j + k items out of n. Give an example in which equality does not hold.
★ C.1-12
Use induction on all integers k such that 0 ≤ k ≤ n/2 to prove inequality (C.7), and use equation (C.3) to extend it to all integers k such that 0 ≤ k
≤ n.
★ C.1-13
Use Stirling’s approximation to prove that
★ C.1-14
By differentiating the entropy function H( λ), show that it achieves its maximum value at λ = 1/2. What is H(1/2)?
★ C.1-15
Show that for any integer n ≥ 0,
★ C.1-16
Inequality (C.5) provides a lower bound on the binomial coefficient .
For small values of k, a stronger bound holds. Prove that

for
.
Probability is an essential tool for the design and analysis of
probabilistic and randomized algorithms. This section reviews basic
probability theory.
We define probability in terms of a sample space S, which is a set whose elements are called outcomes or elementary events. Think of each
outcome as a possible result of an experiment. For the experiment of
flipping two distinguishable coins, with each individual flip resulting in a
head (H) or a tail (T), you can view the sample space S as consisting of
the set of all possible 2-strings over {H, T}:
S = {HH, HT, TH, TT}.
An event is a subset1 of the sample space S. For example, in the experiment of flipping two coins, the event of obtaining one head and
one tail is {HT, TH}. The event S is called the certain event, and the event ∅ is called the null event. We say that two events A and B are mutually exclusive if A ∩ B = ∅ . An outcome s also defines the event
{ s}, which we sometimes write as just s. By definition, all outcomes are mutually exclusive.
Axioms of probability
A probability distribution Pr {} on a sample space S is a mapping from
events of S to real numbers satisfying the following probability axioms: 1. Pr { A} ≥ 0 for any event A.
2. Pr { S} = 1.
3. Pr { A ∪ B} = Pr { A} + Pr { B} for any two mutually exclusive events A and B. More generally, for any sequence of events A 1,


A 2, … (finite or countably infinite) that are pairwise mutually
exclusive,
We call Pr { A} the probability of the event A. Axiom 2 is simply a normalization requirement: there is really nothing fundamental about
choosing 1 as the probability of the certain event, except that it is
natural and convenient.
Several results follow immediately from these axioms and basic set
theory (see Section B.1). The null event ∅ has probability Pr {∅ } = 0. If A ⊆ B, then Pr { A} ≤ Pr { B}. Using Ā to denote the event S − A (the complement of A), we have Pr { Ā} = 1 − Pr { A}. For any two events A and B,
In our coin-flipping example, suppose that each of the four outcomes
has probability 1/4. Then the probability of getting at least one head is
Pr {HH, HT, TH} = Pr {HH} + Pr {HT} + Pr {TH}
= 3/4.
Another way to obtain the same result is to observe that since the
probability of getting strictly less than one head is Pr {TT} = 1/4, the
probability of getting at least one head is 1 − 1/4 = 3/4.
Discrete probability distributions
A probability distribution is discrete if it is defined over a finite or countably infinite sample space. Let S be the sample space. Then for any
event A,
since outcomes, specifically those in A, are mutually exclusive. If S is finite and every outcome s ∈ S has probability Pr { s} = 1/| S|, then we



have the uniform probability distribution on S. In such a case the experiment is often described as “picking an element of S at random.”
As an example, consider the process of flipping a fair coin, one for
which the probability of obtaining a head is the same as the probability
of obtaining a tail, that is, 1/2. Flipping the coin n times gives the uniform probability distribution defined on the sample space S = {H,
T} n, a set of size 2 n. We can represent each outcome in S as a string of length n over {H, T}, with each string occurring with probability 1/2 n.
The event A = {exactly k heads and exactly n − k tails occur} is a subset of S of size
, since strings of length n over {H, T} contain
exactly k H’s. The probability of event A is thus
.
Continuous uniform probability distribution
The continuous uniform probability distribution is an example of a
probability distribution in which not all subsets of the sample space are
considered to be events. The continuous uniform probability
distribution is defined over a closed interval [ a, b] of the reals, where a < b. The intuition is that each point in the interval [ a, b] should be
“equally likely.” Because there are an uncountable number of points,
however, if all points had the same finite, positive probability, axioms 2
and 3 would not be simultaneously satisfied. For this reason, we’d like
to associate a probability only with some of the subsets of S in such a way that the axioms are satisfied for these events.
For any closed interval [ c, d], where a ≤ c ≤ d ≤ b, the continuous uniform probability distribution defines the probability of the event [ c, d]
to be
Letting c = d gives that the probability of a single point is 0. Removing the endpoints [ c, c] and [ d, d] of an interval [ c, d] results in the open interval ( c, d). Since [ c, d] = [ c, c] ∪ ( c, d) ∪ [ d, d], axiom 3 gives Pr {[ c, d]} = Pr {( c, d)}. Generally, the set of events for the continuous uniform probability distribution contains any subset of the sample space [ a, b]

that can be obtained by a finite or countable union of open and closed
intervals, as well as certain more complicated sets.
Conditional probability and independence
Sometimes you have some prior partial knowledge about the outcome
of an experiment. For example, suppose that a friend has flipped two
fair coins and has told you that at least one of the coins showed a head.
What is the probability that both coins are heads? The information
given eliminates the possibility of two tails. The three remaining
outcomes are equally likely, and so you infer that each occurs with
probability 1/3. Since only one of these outcomes shows two heads, the
answer is 1/3.
Conditional probability formalizes the notion of having prior partial
knowledge of the outcome of an experiment. The conditional probability
of an event A given that another event B occurs is defined to be
whenever Pr { B} ≠ 0. (Read “Pr { A | B}” as “the probability of A given B.”) The idea behind equation (C.16) is that since we are given that event B occurs, the event that A also occurs is A ∩ B. That is, A ∩ B is the set of outcomes in which both A and B occur. Because the outcome
is one of the elementary events in B, we normalize the probabilities of
all the elementary events in B by dividing them by Pr { B}, so that they sum to 1. The conditional probability of A given B is, therefore, the ratio of the probability of event A ∩ B to the probability of event B. In the example above, A is the event that both coins are heads, and B is the event that at least one coin is a head. Thus, Pr { A | B} = (1/4)/(3/4) =
1/3.Two events are independent if
which is equivalent, if Pr { B} ≠ 0, to the condition
Pr { A | B} = Pr { A}.

For example, suppose that you flip two fair coins and that the outcomes
are independent. Then the probability of two heads is (1/2)(1/2) = 1/4.
Now suppose that one event is that the first coin comes up heads and
the other event is that the coins come up differently. Each of these
events occurs with probability 1/2, and the probability that both events
occur is 1/4. Thus, according to the definition of independence, the
events are independent—even though you might think that both events
depend on the first coin. Finally, suppose that the coins are welded
together so that they both fall heads or both fall tails and that the two
possibilities are equally likely. Then the probability that each coin comes
up heads is 1/2, but the probability that they both come up heads is 1/2
≠ (1/2)(1/2). Consequently, the event that one comes up heads and the
event that the other comes up heads are not independent.
A collection A 1, A 2, … , An of events is said to be pairwise independent if
Pr { Ai ∩ Aj } = Pr { Ai} Pr { Aj}
for all 1 ≤ i < j ≤ n. We say that the events of the collection are (mutually) independent if every k-subset
of the collection,
where 2 ≤ k ≤ n and 1 ≤ i 1 < i 2 < ⋯ < ik ≤ n, satisfies For example, suppose that you flip two fair coins. Let A 1 be the event
that the first coin is heads, let A 2 be the event that the second coin is
heads, and let A 3 be the event that the two coins are different. Then,
Pr { A 1} = 1/2,
Pr { A 2} = 1/2,
Pr { A 3} = 1/2,
Pr { A 1 ∩ A 2} = 1/4,
Pr { A 1 ∩ A 3} = 1/4,
Pr { A 2 ∩ A 3} = 1/4,


Pr { A 1 ∩ A 2 ∩ A 3} = 0.
Since for 1 ≤ i < j ≤ 3, we have Pr { Ai ∩ Aj } = Pr { Ai} Pr { Aj} = 1/4, the events A 1, A 2, and A 3 are pairwise independent. The events are not mutually independent, however, because Pr { A 1 ∩ A 2 ∩ A 3} = 0 and Pr
{ A 1} Pr { A 2} Pr { A 3} = 1/8 ≠ 0.
Bayes’s theorem
From the definition (C.16) of conditional probability and the
commutative law A ∩ B = B ∩ A, it follows that for two events A and B, each with nonzero probability,
Solving for Pr { A | B}, we obtain
which is known as Bayes’s theorem. The denominator Pr { B} is a normalizing constant, which we can reformulate as follows. Since B =
( B ∩ A) ∪ ( B ∩ Ā), and since B ∩ A and B ∩ Ā are mutually exclusive events,
Pr { B} = Pr { B ∩ A} + Pr { B ∩ Ā}
= Pr { A} Pr { B | A} + Pr { Ā} Pr { B | Ā}.
Substituting into equation (C.19) produces an equivalent form of
Bayes’s theorem:
Bayes’s theorem can simplify the computing of conditional
probabilities. For example, suppose that you have a fair coin and a
biased coin that always comes up heads. Run an experiment consisting
of three independent events: choose one of the two coins at random, flip
that coin once, and then flip it again. Suppose that the coin you have

chosen comes up heads both times. What is the probability that it’s the
biased coin?
Bayes’s theorem solves this problem. Let A be the event that you
choose the biased coin, and let B be the event that the chosen coin comes up heads both times. We wish to determine Pr { A | B}, knowing
that Pr { A} = 1/2, Pr { B | A} = 1, Pr { Ā} = 1/2, and Pr { B | Ā = 1/4.
Thus we have
Exercises
C.2-1
Professor Rosencrantz flips a fair coin twice. Professor Guildenstern
flips a fair coin once. What is the probability that Professor Rosencrantz
obtains strictly more heads than Professor Guildenstern?
C.2-2
Prove Boole’s inequality: For any finite or countably infinite sequence of
events A 1, A 2, …,
C.2-3
You shuffle a deck of 10 cards, each bearing a distinct number from 1 to
10, in order to mix the cards thoroughly. You then remove three cards,
one at a time, from the deck. What is the probability that the three cards
you select are in sorted (increasing) order?
C.2-4
Prove that
Pr { A | B} + Pr { Ā | B} = 1.
C.2-5
Prove that for any collection of events A 1, A 2, … , An,
★ C.2-6
Show how to construct a set of n events that are pairwise independent
but such that no subset of k > 2 of them is mutually independent.
★ C.2-7
Two events A and B are conditionally independent, given C, if Pr { A ∩ B | C} = Pr { A | C} · Pr { B | C}.
Give a simple but nontrivial example of two events that are not
independent but are conditionally independent given a third event.
★ C.2-8
Professor Gore teaches a music class on rhythm in which three students
—Jeff, Tim, and Carmine—are in danger of failing. Professor Gore tells
the three that one of them will pass the course and the other two will
fail. Carmine asks Professor Gore privately which of Jeff and Tim will
fail, arguing that since he already knows at least one of them will fail,
the professor won’t be revealing any information about Carmine’s
outcome. In a breach of privacy law, Professor Gore tells Carmine that
Jeff will fail. Carmine feels somewhat relieved now, figuring that either
he or Tim will pass, so that his probability of passing is now 1/2. Is
Carmine correct, or is his chance of passing still 1/3? Explain.
A (discrete) random variable X is a function from a finite or countably infinite sample space S to the real numbers. It associates a real number
with each possible outcome of an experiment, which allows us to work
with the probability distribution induced on the resulting set of
numbers. Random variables can also be defined for uncountably infinite



sample spaces, but they raise technical issues that are unnecessary to
address for our purposes. Therefore we’ll assume that random variables
are discrete.
For a random variable X and a real number x, we define the event X
= x to be { s ∈ S : X( s) = x}, and thus
The function
f( x) = Pr { X = x}
is the probability density function of the random variable X. From the probability axioms, Pr { X = x} ≥ 0 and ∑ x Pr { X = x} = 1.
As an example, consider the experiment of rolling a pair of ordinary,
6-sided dice. There are 36 possible outcomes in the sample space.
Assume that the probability distribution is uniform, so that each
outcome s ∈ S is equally likely: Pr { s} = 1/36. Define the random variable X to be the maximum of the two values showing on the dice. We
have Pr { X = 3} = 5/36, since X assigns a value of 3 to 5 of the 36
possible outcomes, namely, (1, 3), (2, 3), (3, 3), (3, 2), and (3, 1).
We can define several random variables on the same sample space. If
X and Y are random variables, the function
f( x, y) = Pr { X = x and Y = y}
is the joint probability density function of X and Y. For a fixed value y, and similarly, for a fixed value x,
Using the definition (C.16) of conditional probability on page 1187, we
have

We define two random variables X and Y to be independent if for all x and y, the events X = x and Y = y are independent or, equivalently, if for all x and y, we have Pr { X = x and Y = y} = Pr { X = x} Pr { Y = y}.
Given a set of random variables defined over the same sample space,
we can define new random variables as sums, products, or other
functions of the original variables.
Expected value of a random variable
The simplest, and often the most useful, summary of the distribution of
a random variable is the “average” of the values it takes on. The
expected value (or, synonymously, expectation or mean) of a discrete random variable X is
which is well defined if the sum is finite or converges absolutely.
Sometimes the expectation of X is denoted by μX or, when the random
variable is apparent from context, simply by μ.
Consider a game in which you flip two fair coins. You earn $3 for
each head but lose $2 for each tail. The expected value of the random
variable X representing your earnings is
E[ X] = 6 · Pr {2 H’s} + 1 · Pr {1 H, 1 T} − 4 · Pr {2 T’s}
= 6 · (1/4) + 1 · (1/2) − 4 · (1/4)
= 1.
Linearity of expectation says that the expectation of the sum of two
random variables is the sum of their expectations, that is,
whenever E[ X] and E[ Y] are defined. Linearity of expectation applies to a broad range of situations, holding even when X and Y are not independent. It also extends to finite and absolutely convergent
summations of expectations. Linearity of expectation is the key property
that enables us to perform probabilistic analyses by using indicator
random variables (see Section 5.2).





If X is any random variable, any function g( x) defines a new random variable g( X). If the expectation of g( X) is defined, then Letting g( x) = ax, we have for any constant a,
Consequently, expectations are linear: for any two random variables X
and Y and any constant a,
When two random variables X and Y are independent and each has
a defined expectation,
In general, when n random variables X 1, X 2, … , Xn are mutually independent,
When a random variable X takes on values from the set of natural
numbers ℕ = {0, 1, 2, …}, we have a nice formula for its expectation:
since each term Pr { X ≥ i} is added in i times and subtracted out i − 1
times (except Pr { X ≥ 0}, which is added in 0 times and not subtracted



out at all).
A function f( x) is convex if
for all x and y and for all 0 ≤ λ ≤ 1. Jensen’s inequality says that when a convex function f( x) is applied to a random variable X,
provided that the expectations exist and are finite.
Variance and standard deviation
The expected value of a random variable does not express how “spread
out” the variable’s values are. For example, consider random variables X
and Y for which Pr { X = 1/4} = Pr { X = 3/4} = 1/2 and Pr { Y = 0} = Pr
{ Y = 1} = 1/2. Then both E[ X] and E[ Y] are 1/2, yet the actual values taken on by Y are further from the mean than the actual values taken
on by X.
The notion of variance mathematically expresses how far from the
mean a random variable’s values are likely to be. The variance of a random variable X with mean E[ X] is
To justify the equation E[E2[ X]] = E2[ X], note that because E[ X] is a real number and not a random variable, so is E2[ X]. The equation
E[ X E[ X]] = E2[ X] follows from equation (C.25), with a = E[ X].
Rewriting equation (C.31) yields an expression for the expectation of
the square of a random variable:
The variance of a random variable X and the variance of aX are related (see Exercise C.3-10):
Var[ aX] = a 2Var[ X].
When X and Y are independent random variables,
Var[ X + Y] = Var[ X] + Var[ Y].