pointer, so that a pointer to the array is passed, rather than the
entire array, and changes to individual array elements are visible
to the calling procedure. Again, most contemporary programming
languages work this way.
A return statement immediately transfers control back to the
point of call in the calling procedure. Most return statements also
take a value to pass back to the caller. Our pseudocode differs
from many programming languages in that we allow multiple
values to be returned in a single return statement without having
to create objects to package them together.8
The boolean operators “and” and “or” are short circuiting. That
is, evaluate the expression “x and y” by first evaluating x. If x evaluates to FALSE, then the entire expression cannot evaluate to
TRUE, and therefore y is not evaluated. If, on the other hand, x
evaluates to TRUE, y must be evaluated to determine the value of
the entire expression. Similarly, in the expression “x or y” the expression y is evaluated only if x evaluates to FALSE. Short-circuiting operators allow us to write boolean expressions such as
“x ≠ NIL and x.f = y” without worrying about what happens upon evaluating x.f when x is NIL.
The keyword error indicates that an error occurred because
conditions were wrong for the procedure to have been called, and
the procedure immediately terminates. The calling procedure is
responsible for handling the error, and so we do not specify what
action to take.
Exercises
2.1-1
Using Figure 2.2 as a model, illustrate the operation of INSERTION-
SORT on an array initially containing the sequence 〈31, 41, 59, 26, 41,
58〉.
2.1-2
Consider the procedure SUM-ARRAY on the facing page. It computes
the sum of the n numbers in array A[1 : n]. State a loop invariant for this procedure, and use its initialization, maintenance, and termination
properties to show that the SUM-ARRAY procedure returns the sum of
the numbers in A[1 : n].


SUM-ARRAY( A, n)
1 sum = 0
2 for i = 1 to n
3
sum = sum + A[ i]
4 return sum
2.1-3
Rewrite the INSERTION-SORT procedure to sort into monotonically
decreasing instead of monotonically increasing order.
2.1-4
Consider the searching problem:
Input: A sequence of n numbers 〈 a 1, a 2, … , an〉 stored in array A[1 : n]
and a value x.
Output: An index i such that x equals A[ i] or the special value NIL if x does not appear in A.
Write pseudocode for linear search, which scans through the array
from beginning to end, looking for x. Using a loop invariant, prove that
your algorithm is correct. Make sure that your loop invariant fulfills the
three necessary properties.
2.1-5
Consider the problem of adding two n-bit binary integers a and b, stored in two n-element arrays A[0 : n – 1] and B[0 : n – 1], where each element is either 0 or 1,
, and
. The
sum c = a + b of the two integers should be stored in binary form in an ( n + 1)-element array C [0 : n], where
. Write a procedure
ADD-BINARY-INTEGERS that takes as input arrays A and B, along
with the length n, and returns array C holding the sum.
Analyzing an algorithm has come to mean predicting the resources that the algorithm requires. You might consider resources such as memory,
communication bandwidth, or energy consumption. Most often,
however, you’ll want to measure computational time. If you analyze
several candidate algorithms for a problem, you can identify the most
efficient one. There might be more than just one viable candidate, but
you can often rule out several inferior algorithms in the process.
Before you can analyze an algorithm, you need a model of the
technology that it runs on, including the resources of that technology
and a way to express their costs. Most of this book assumes a generic
one-processor, random-access machine (RAM) model of computation
as the implementation technology, with the understanding that
algorithms are implemented as computer programs. In the RAM model,
instructions execute one after another, with no concurrent operations.
The RAM model assumes that each instruction takes the same amount
of time as any other instruction and that each data access—using the
value of a variable or storing into a variable—takes the same amount of
time as any other data access. In other words, in the RAM model each
instruction or data access takes a constant amount of time—even
indexing into an array. 9
Strictly speaking, we should precisely define the instructions of the
RAM model and their costs. To do so, however, would be tedious and
yield little insight into algorithm design and analysis. Yet we must be
careful not to abuse the RAM model. For example, what if a RAM had
an instruction that sorts? Then you could sort in just one step. Such a
RAM would be unrealistic, since such instructions do not appear in real
computers. Our guide, therefore, is how real computers are designed.
The RAM model contains instructions commonly found in real
computers: arithmetic (such as add, subtract, multiply, divide,
remainder, floor, ceiling), data movement (load, store, copy), and
control (conditional and unconditional branch, subroutine call and
return).
The data types in the RAM model are integer, floating point (for
storing real-number approximations), and character. Real computers do
not usually have a separate data type for the boolean values TRUE and
FALSE. Instead, they often test whether an integer value is 0 (FALSE)
or nonzero (TRUE), as in C. Although we typically do not concern
ourselves with precision for floating-point values in this book (many
numbers cannot be represented exactly in floating point), precision is
crucial for most applications. We also assume that each word of data
has a limit on the number of bits. For example, when working with
inputs of size n, we typically assume that integers are represented by c
log2 n bits for some constant c ≥ 1. We require c ≥ 1 so that each word can hold the value of n, enabling us to index the individual input
elements, and we restrict c to be a constant so that the word size does
not grow arbitrarily. (If the word size could grow arbitrarily, we could
store huge amounts of data in one word and operate on it all in
constant time—an unrealistic scenario.)
Real computers contain instructions not listed above, and such
instructions represent a gray area in the RAM model. For example, is
exponentiation a constant-time instruction? In the general case, no: to
compute xn when x and n are general integers typically takes time logarithmic in n (see equation (31.34) on page 934), and you must worry
about whether the result fits into a computer word. If n is an exact power of 2, however, exponentiation can usually be viewed as a
constant-time operation. Many computers have a “shift left”
instruction, which in constant time shifts the bits of an integer by n
positions to the left. In most computers, shifting the bits of an integer
by 1 position to the left is equivalent to multiplying by 2, so that shifting
the bits by n positions to the left is equivalent to multiplying by 2 n.
Therefore, such computers can compute 2 n in 1 constant-time
instruction by shifting the integer 1 by n positions to the left, as long as
n is no more than the number of bits in a computer word. We’ll try to
avoid such gray areas in the RAM model and treat computing 2 n and
multiplying by 2 n as constant-time operations when the result is small
enough to fit in a computer word.
The RAM model does not account for the memory hierarchy that is
common in contemporary computers. It models neither caches nor
virtual memory. Several other computational models attempt to
account for memory-hierarchy effects, which are sometimes significant
in real programs on real machines. Section 11.5 and a handful of problems in this book examine memory-hierarchy effects, but for the
most part, the analyses in this book do not consider them. Models that
include the memory hierarchy are quite a bit more complex than the
RAM model, and so they can be difficult to work with. Moreover,
RAM-model analyses are usually excellent predictors of performance
on actual machines.
Although it is often straightforward to analyze an algorithm in the
RAM model, sometimes it can be quite a challenge. You might need to
employ mathematical tools such as combinatorics, probability theory,
algebraic dexterity, and the ability to identify the most significant terms
in a formula. Because an algorithm might behave differently for each
possible input, we need a means for summarizing that behavior in
simple, easily understood formulas.
Analysis of insertion sort
How long does the INSERTION-SORT procedure take? One way to tell
would be for you to run it on your computer and time how long it takes
to run. Of course, you’d first have to implement it in a real programming
language, since you cannot run our pseudocode directly. What would
such a timing test tell you? You would find out how long insertion sort
takes to run on your particular computer, on that particular input,
under the particular implementation that you created, with the
particular compiler or interpreter that you ran, with the particular
libraries that you linked in, and with the particular background tasks
that were running on your computer concurrently with your timing test
(such as checking for incoming information over a network). If you run
insertion sort again on your computer with the same input, you might
even get a different timing result. From running just one
implementation of insertion sort on just one computer and on just one
input, what would you be able to determine about insertion sort’s
running time if you were to give it a different input, if you were to run it
on a different computer, or if you were to implement it in a different
programming language? Not much. We need a way to predict, given a
new input, how long insertion sort will take.
Instead of timing a run, or even several runs, of insertion sort, we
can determine how long it takes by analyzing the algorithm itself. We’ll
examine how many times it executes each line of pseudocode and how
long each line of pseudocode takes to run. We’ll first come up with a
precise but complicated formula for the running time. Then, we’ll distill
the important part of the formula using a convenient notation that can
help us compare the running times of different algorithms for the same
problem.
How do we analyze insertion sort? First, let’s acknowledge that the
running time depends on the input. You shouldn’t be terribly surprised
that sorting a thousand numbers takes longer than sorting three
numbers. Moreover, insertion sort can take different amounts of time to
sort two input arrays of the same size, depending on how nearly sorted
they already are. Even though the running time can depend on many
features of the input, we’ll focus on the one that has been shown to have
the greatest effect, namely the size of the input, and describe the
running time of a program as a function of the size of its input. To do
so, we need to define the terms “running time” and “input size” more
carefully. We also need to be clear about whether we are discussing the
running time for an input that elicits the worst-case behavior, the best-
case behavior, or some other case.
The best notion for input size depends on the problem being studied.
For many problems, such as sorting or computing discrete Fourier
transforms, the most natural measure is the number of items in the input
—for example, the number n of items being sorted. For many other
problems, such as multiplying two integers, the best measure of input
size is the total number of bits needed to represent the input in ordinary
binary notation. Sometimes it is more appropriate to describe the size of
the input with more than just one number. For example, if the input to
an algorithm is a graph, we usually characterize the input size by both
the number of vertices and the number of edges in the graph. We’ll
indicate which input size measure is being used with each problem we
study.
The running time of an algorithm on a particular input is the number of instructions and data accesses executed. How we account for these
costs should be independent of any particular computer, but within the
framework of the RAM model. For the moment, let us adopt the
following view. A constant amount of time is required to execute each
line of our pseudocode. One line might take more or less time than
another line, but we’ll assume that each execution of the k th line takes
ck time, where ck is a constant. This viewpoint is in keeping with the
RAM model, and it also reflects how the pseudocode would be
implemented on most actual computers. 10
Let’s analyze the INSERTION-SORT procedure. As promised, we’ll
start by devising a precise formula that uses the input size and all the
statement costs ck. This formula turns out to be messy, however. We’ll
then switch to a simpler notation that is more concise and easier to use.
This simpler notation makes clear how to compare the running times of
algorithms, especially as the size of the input increases.
To analyze the INSERTION-SORT procedure, let’s view it on the
following page with the time cost of each statement and the number of
times each statement is executed. For each i = 2, 3, … , n, let ti denote the number of times the while loop test in line 5 is executed for that
value of i. When a for or while loop exits in the usual way—because the
test in the loop header comes up FALSE—the test is executed one time
more than the loop body. Because comments are not executable
statements, assume that they take no time.
The running time of the algorithm is the sum of running times for
each statement executed. A statement that takes ck steps to execute and
executes m times contributes ckm to the total running time. 11 We usually denote the running time of an algorithm on an input of size n by
T ( n). To compute T ( n), the running time of INSERTION-SORT on an input of n values, we sum the products of the cost and times columns, obtaining
INSERTION-SORT( A, n)
costtimes




1 for i = 2 to n
c 1 n
2
key = A[ i]
c 2 n – 1
3
// Insert A[ i] into the sorted subarray A[1 : i – 1]. 0 n – 1
4
j = i – 1
c 4 n – 1
5
while j > 0 and A[ j] > key
c 5
6
A[ j + 1] = A[ j]
c 6
7
j = j – 1
c 7
8
A[ j + 1] = key
c 8 n – 1
Even for inputs of a given size, an algorithm’s running time may
depend on which input of that size is given. For example, in
INSERTION-SORT, the best case occurs when the array is already
sorted. In this case, each time that line 5 executes, the value of key—the
value originally in A[ i]—is already greater than or equal to all values in A[1 : i – 1], so that the while loop of lines 5–7 always exits upon the first test in line 5. Therefore, we have that ti = 1 for i = 2, 3, … , n, and the best-case running time is given by
We can express this running time as an + b for constants a and b that depend on the statement costs ck (where a = c 1 + c 2 + c 4 + c 5 + c 8 and b = c 2 + c 4 + c 5 + c 8). The running time is thus a linear function of n.
The worst case arises when the array is in reverse sorted order—that
is, it starts out in decreasing order. The procedure must compare each
element A[ i] with each element in the entire sorted subarray A[1 : i – 1], and so ti = i for i = 2, 3, … , n. (The procedure finds that A[ j] > key


every time in line 5, and the while loop exits only when j reaches 0.) Noting that
and
we find that in the worst case, the running time of INSERTION-SORT
is
We can express this worst-case running time as an 2 + bn + c for constants a, b, and c that again depend on the statement costs ck (now, a = c 5/2 + c 6/2 + c 7/2, b = c 1 + c 2 + c 4 + c 5/2 – c 6/2 – c 7/2 + c 8, and c
= –( c 2 + c 4 + c 5 + c 8)). The running time is thus a quadratic function of n. Typically, as in insertion sort, the running time of an algorithm is
fixed for a given input, although we’ll also see some interesting
“randomized” algorithms whose behavior can vary even for a fixed
input.
Worst-case and average-case analysis
Our analysis of insertion sort looked at both the best case, in which the input array was already sorted, and the worst case, in which the input
array was reverse sorted. For the remainder of this book, though, we’ll
usually (but not always) concentrate on finding only the worst-case
running time, that is, the longest running time for any input of size n.
Why? Here are three reasons:
The worst-case running time of an algorithm gives an upper
bound on the running time for any input. If you know it, then you
have a guarantee that the algorithm never takes any longer. You
need not make some educated guess about the running time and
hope that it never gets much worse. This feature is especially
important for real-time computing, in which operations must
complete by a deadline.
For some algorithms, the worst case occurs fairly often. For
example, in searching a database for a particular piece of
information, the searching algorithm’s worst case often occurs
when the information is not present in the database. In some
applications, searches for absent information may be frequent.
The “average case” is often roughly as bad as the worst case.
Suppose that you run insertion sort on an array of n randomly
chosen numbers. How long does it take to determine where in
subarray A[1 : i – 1] to insert element A[ i]? On average, half the elements in A[1 : i – 1] are less than A[ i], and half the elements are greater. On average, therefore, A[ i] is compared with just half of
the subarray A[1 : i – 1], and so ti is about i/2. The resulting average-case running time turns out to be a quadratic function of
the input size, just like the worst-case running time.
In some particular cases, we’ll be interested in the average-case
running time of an algorithm. We’ll see the technique of probabilistic
analysis applied to various algorithms throughout this book. The scope
of average-case analysis is limited, because it may not be apparent what
constitutes an “average” input for a particular problem. Often, we’ll
assume that all inputs of a given size are equally likely. In practice, this
assumption may be violated, but we can sometimes use a randomized
algorithm, which makes random choices, to allow a probabilistic analysis and yield an expected running time. We explore randomized
algorithms more in Chapter 5 and in several other subsequent chapters.
Order of growth
In order to ease our analysis of the INSERTION-SORT procedure, we
used some simplifying abstractions. First, we ignored the actual cost of
each statement, using the constants ck to represent these costs. Still, the
best-case and worst-case running times in equations (2.1) and (2.2) are
rather unwieldy. The constants in these expressions give us more detail
than we really need. That’s why we also expressed the best-case running
time as an + b for constants a and b that depend on the statement costs ck and why we expressed the worst-case running time as an 2 + bn + c for constants a, b, and c that depend on the statement costs. We thus ignored not only the actual statement costs, but also the abstract costs
ck.Let’s now make one more simplifying abstraction: it is the rate of
growth, or order of growth, of the running time that really interests us.
We therefore consider only the leading term of a formula (e.g., an 2), since the lower-order terms are relatively insignificant for large values of
n. We also ignore the leading term’s constant coefficient, since constant
factors are less significant than the rate of growth in determining
computational efficiency for large inputs. For insertion sort’s worst-case
running time, when we ignore the lower-order terms and the leading
term’s constant coefficient, only the factor of n 2 from the leading term
remains. That factor, n 2, is by far the most important part of the
running time. For example, suppose that an algorithm implemented on
a particular machine takes n 2/100 + 100 n + 17 microseconds on an input of size n. Although the coefficients of 1/100 for the n 2 term and 100 for the n term differ by four orders of magnitude, the n 2/100 term dominates the 100 n term once n exceeds 10,000. Although 10,000 might
seem large, it is smaller than the population of an average town. Many
real-world problems have much larger input sizes.
To highlight the order of growth of the running time, we have a special notation that uses the Greek letter Θ (theta). We write that
insertion sort has a worst-case running time of Θ( n 2) (pronounced
“theta of n-squared” or just “theta n-squared”). We also write that insertion sort has a best-case running time of Θ( n) (“theta of n” or
“theta n”). For now, think of Θ-notation as saying “roughly
proportional when n is large,” so that Θ( n 2) means “roughly
proportional to n 2 when n is large” and Θ( n) means “roughly proportional to n when n is large” We’ll use Θ-notation informally in this chapter and define it precisely in Chapter 3.
We usually consider one algorithm to be more efficient than another
if its worst-case running time has a lower order of growth. Due to
constant factors and lower-order terms, an algorithm whose running
time has a higher order of growth might take less time for small inputs
than an algorithm whose running time has a lower order of growth. But
on large enough inputs, an algorithm whose worst-case running time is
Θ( n 2), for example, takes less time in the worst case than an algorithm
whose worst-case running time is Θ( n 3). Regardless of the constants
hidden by the Θ-notation, there is always some number, say n 0, such that for all input sizes n ≥ n 0, the Θ( n 2) algorithm beats the Θ( n 3) algorithm in the worst case.
Exercises
2.2-1
Express the function n 3/1000 + 100 n 2 – 100 n + 3 in terms of Θ-notation.
2.2-2
Consider sorting n numbers stored in array A[1 : n] by first finding the smallest element of A[1 : n] and exchanging it with the element in A[1].
Then find the smallest element of A[2 : n], and exchange it with A[2].
Then find the smallest element of A[3 : n], and exchange it with A[3].
Continue in this manner for the first n – 1 elements of A. Write
pseudocode for this algorithm, which is known as selection sort. What loop invariant does this algorithm maintain? Why does it need to run
for only the first n – 1 elements, rather than for all n elements? Give the worst-case running time of selection sort in Θ-notation. Is the best-case
running time any better?
2.2-3
Consider linear search again (see Exercise 2.1-4). How many elements of
the input array need to be checked on the average, assuming that the
element being searched for is equally likely to be any element in the
array? How about in the worst case? Using Θ-notation, give the average-
case and worst-case running times of linear search. Justify your answers.
2.2-4
How can you modify any sorting algorithm to have a good best-case
running time?
You can choose from a wide range of algorithm design techniques.
Insertion sort uses the incremental method: for each element A[ i], insert it into its proper place in the subarray A[1 : i], having already sorted the subarray A[1 : i – 1].
This section examines another design method, known as “divide-
and-conquer,” which we explore in more detail in Chapter 4. We’ll use divide-and-conquer to design a sorting algorithm whose worst-case
running time is much less than that of insertion sort. One advantage of
using an algorithm that follows the divide-and-conquer method is that
analyzing its running time is often straightforward, using techniques
that we’ll explore in Chapter 4.
2.3.1 The divide-and-conquer method
Many useful algorithms are recursive in structure: to solve a given
problem, they recurse (call themselves) one or more times to handle
closely related subproblems. These algorithms typically follow the
divide-and-conquer method: they break the problem into several subproblems that are similar to the original problem but smaller in size,
solve the subproblems recursively, and then combine these solutions to
create a solution to the original problem.
In the divide-and-conquer method, if the problem is small enough—
the base case—you just solve it directly without recursing. Otherwise—
the recursive case—you perform three characteristic steps:
Divide the problem into one or more subproblems that are smaller
instances of the same problem.
Conquer the subproblems by solving them recursively.
Combine the subproblem solutions to form a solution to the original
problem.
The merge sort algorithm closely follows the divide-and-conquer
method. In each step, it sorts a subarray A[ p : r], starting with the entire array A[1 : n] and recursing down to smaller and smaller subarrays.
Here is how merge sort operates:
Divide the subarray A[ p : r] to be sorted into two adjacent subarrays, each of half the size. To do so, compute the midpoint q of A[ p : r]
(taking the average of p and r), and divide A[ p : r] into subarrays A[ p : q] and A[ q + 1 : r].
Conquer by sorting each of the two subarrays A[ p : q] and A[ q + 1 : r]
recursively using merge sort.
Combine by merging the two sorted subarrays A[ p : q] and A[ q + 1 : r]
back into A[ p : r], producing the sorted answer.
The recursion “bottoms out”—it reaches the base case—when the
subarray A[ p : r] to be sorted has just 1 element, that is, when p equals r.
As we noted in the initialization argument for INSERTION-SORT’s
loop invariant, a subarray comprising just a single element is always
sorted.
The key operation of the merge sort algorithm occurs in the
“combine” step, which merges two adjacent, sorted subarrays. The
merge operation is performed by the auxiliary procedure MERGE( A, p,
q, r) on the following page, where A is an array and p, q, and r are indices into the array such that p ≤ q < r. The procedure assumes that the adjacent subarrays A[ p : q] and A[ q + 1 : r] were already recursively sorted. It merges the two sorted subarrays to form a single sorted
subarray that replaces the current subarray A[ p : r].
To understand how the MERGE procedure works, let’s return to our
card-playing motif. Suppose that you have two piles of cards face up on
a table. Each pile is sorted, with the smallest-value cards on top. You
wish to merge the two piles into a single sorted output pile, which is to
be face down on the table. The basic step consists of choosing the
smaller of the two cards on top of the face-up piles, removing it from its
pile—which exposes a new top card—and placing this card face down
onto the output pile. Repeat this step until one input pile is empty, at
which time you can just take the remaining input pile and flip over the
entire pile, placing it face down onto the output pile.
Let’s think about how long it takes to merge two sorted piles of
cards. Each basic step takes constant time, since you are comparing just
the two top cards. If the two sorted piles that you start with each have
n/2 cards, then the number of basic steps is at least n/2 (since in whichever pile was emptied, every card was found to be smaller than
some card from the other pile) and at most n (actually, at most n – 1,
since after n – 1 basic steps, one of the piles must be empty). With each
basic step taking constant time and the total number of basic steps
being between n/2 and n, we can say that merging takes time roughly proportional to n. That is, merging takes Θ( n) time.
In detail, the MERGE procedure works as follows. It copies the two
subarrays A[ p : q] and A[ q + 1 : r] into temporary arrays L and R (“left”
and “right”), and then it merges the values in L and R back into A[ p : r].
Lines 1 and 2 compute the lengths nL and nR of the subarrays A[ p : q]
and A[ q + 1 : r], respectively. Then line 3 creates arrays L[0 : nL – 1] and R[0 : nR – 1] with respective lengths nL and nR. 12 The for loop of lines 4–5 copies the subarray A[ p : q] into L, and the for loop of lines 6–7
copies the subarray A[ q + 1 : r] into R.
1 nL = q – p + 1
// length of A[ p : q]
2 nR = r – q
// length of A[ q + 1 : r]
3 let L[0 : nL – 1] and R[0 : nR – 1] be new arrays
4 for i = 0 to nL – 1 // copy A[ p : q] into L[0 : nL – 1]
5
L[ i] = A[ p + i]
6 for j = 0 to nR – 1 // copy A[ q + 1 : r] into R[0 : nR – 1]
7
R[ j] = A[ q + j + 1]
8 i = 0
// i indexes the smallest remaining element in L
9 j = 0
// j indexes the smallest remaining element in R
10 k = p
// k indexes the location in A to fill
11 // As long as each of the arrays L and R contains an unmerged element,
// copy the smallest unmerged element back into A[ p : r].
12 while i < nL and j < nR
13
if L[ i] ≤ R[ j]
14
A[ k] = L[ i]
15
i = i + 1
16
else A[ k] = R[ j]
17
j = j + 1
18
k = k + 1
19 // Having gone through one of L and R entirely, copy the
// remainder of the other to the end of A[ p : r].
20 while i < nL
21
A[ k] = L[ i]
22
i = i + 1
23
k = k + 1
24 while j < nR
25
A[ k] = R[ j]
26
j = j + 1
27
k = k + 1
Lines 8–18, illustrated in Figure 2.3, perform the basic steps. The while loop of lines 12–18 repeatedly identifies the smallest value in L
and R that has yet to be copied back into A[ p : r] and copies it back in.
As the comments indicate, the index k gives the position of A that is being filled in, and the indices i and j give the positions in L and R, respectively, of the smallest remaining values. Eventually, either all of L
or all of R is copied back into A[ p : r], and this loop terminates. If the loop terminates because all of R has been copied back, that is, because j
equals nR, then i is still less than nL, so that some of L has yet to be copied back, and these values are the greatest in both L and R. In this case, the while loop of lines 20–23 copies these remaining values of L
into the last few positions of A[ p : r]. Because j equals nR, the while loop of lines 24–27 iterates 0 times. If instead the while loop of lines 12–18
terminates because i equals nL, then all of L has already been copied back into A[ p : r], and the while loop of lines 24–27 copies the remaining values of R back into the end of A[ p : r].
Figure 2.3 The operation of the while loop in lines 8–18 in the call MERGE( A, 9, 12, 16), when the subarray A[9 : 16] contains the values 〈2, 4, 6, 7, 1, 2, 3, 5〉. After allocating and copying into the arrays L and R, the array L contains 〈2, 4, 6, 7〉, and the array R contains 〈1, 2, 3, 5〉. Tan positions in A contain their final values, and tan positions in L and R contain values that have yet to be copied back into A. Taken together, the tan positions always comprise the values originally in A[9 : 16]. Blue positions in A contain values that will be copied over, and dark positions in L and R contain values that have already been copied back into A. (a)–(g) The arrays A, L, and R, and their respective indices k, i, and j prior to each iteration of the loop of lines 12–18. At the point in part (g), all values in R have been copied back into A (indicated by j equaling the length of R), and so the while loop in lines 12–18 terminates. (h) The arrays and indices at termination. The while loops of lines 20–23 and 24–27 copied back into A the remaining values in L and R, which are the largest values originally in A[9 : 16]. Here, lines 20–
23 copied L[2 : 3] into A[15 : 16], and because all values in R had already been copied back into A, the while loop of lines 24–27 iterated 0 times. At this point, the subarray in A[9 : 16] is sorted.
To see that the MERGE procedure runs in Θ( n) time, where n = r – p
+ 1,13 observe that each of lines 1–3 and 8–10 takes constant time, and the for loops of lines 4–7 take Θ( nL + nR) = Θ( n) time. 14 To account for the three while loops of lines 12–18, 20–23, and 24–27, observe that each
iteration of these loops copies exactly one value from L or R back into A and that every value is copied back into A exactly once. Therefore, these three loops together make a total of n iterations. Since each
iteration of each of the three loops takes constant time, the total time
spent in these three loops is Θ( n).
We can now use the MERGE procedure as a subroutine in the merge
sort algorithm. The procedure MERGE-SORT( A, p, r) on the facing page sorts the elements in the subarray A[ p : r]. If p equals r, the subarray has just 1 element and is therefore already sorted. Otherwise,
we must have p < r, and MERGE-SORT runs the divide, conquer, and
combine steps. The divide step simply computes an index q that
partitions A[ p : r] into two adjacent subarrays: A[ p : q], containing ⌈ n/2⌉
elements, and A[ q + 1 : r], containing ⌊ n/2⌊ elements.15 The initial call MERGE-SORT( A, 1, n) sorts the entire array A[1 : n].
Figure 2.4 illustrates the operation of the procedure for n = 8, showing also the sequence of divide and merge steps. The algorithm
recursively divides the array down to 1-element subarrays. The combine
steps merge pairs of 1-element subarrays to form sorted subarrays of
length 2, merges those to form sorted subarrays of length 4, and merges
those to form the final sorted subarray of length 8. If n is not an exact
power of 2, then some divide steps create subarrays whose lengths differ
by 1. (For example, when dividing a subarray of length 7, one subarray
has length 4 and the other has length 3.) Regardless of the lengths of the
two subarrays being merged, the time to merge a total of n items is Θ( n).
MERGE-SORT( A, p, r)
1 if p ≥ r
// zero or one element?
2
return
3 q = ⌊( p + r)/2⌊
// midpoint of A[ p : r]
4 MERGE-SORT( A, p, q)
// recursively sort A[ p : q]
5 MERGE-SORT( A, q + 1, r)
// recursively sort A[ q + 1 : r]
6 // Merge A[ p : q] and A[ q + 1 : r] into A[ p : r].
7 MERGE( A, p, q, r)
2.3.2 Analyzing divide-and-conquer algorithms
When an algorithm contains a recursive call, you can often describe its
running time by a recurrence equation or recurrence, which describes the overall running time on a problem of size n in terms of the running time
of the same algorithm on smaller inputs. You can then use mathematical
tools to solve the recurrence and provide bounds on the performance of
the algorithm.
A recurrence for the running time of a divide-and-conquer algorithm
falls out from the three steps of the basic method. As we did for
insertion sort, let T ( n) be the worst-case running time on a problem of size n. If the problem size is small enough, say n < n 0 for some constant n 0 > 0, the straightforward solution takes constant time, which we write
as Θ(1). 16 Suppose that the division of the problem yields a subproblems, each with size n/ b, that is, 1/ b the size of the original. For merge sort, both a and b are 2, but we’ll see other divide-and-conquer
algorithms in which a ≠ b. It takes T ( n/ b) time to solve one subproblem of size n/ b, and so it takes aT ( n/ b) time to solve all a of them. If it takes D( n) time to divide the problem into subproblems and C( n) time to combine the solutions to the subproblems into the solution to the
original problem, we get the recurrence
Chapter 4 shows how to solve common recurrences of this form.
Figure 2.4 The operation of merge sort on the array A with length 8 that initially contains the sequence 〈12, 3, 7, 9, 14, 6, 11, 2〉. The indices p, q, and r into each subarray appear above their values. Numbers in italics indicate the order in which the MERGE-SORT and MERGE
procedures are called following the initial call of MERGE-SORT( A, 1, 8).
Sometimes, the n/ b size of the divide step isn’t an integer. For example, the MERGE-SORT procedure divides a problem of size n into
subproblems of sizes ⌈ n/2⌉ and ⌊ n/2⌊. Since the difference between ⌈ n/2⌉
and ⌊ n/2⌊ is at most 1, which for large n is much smaller than the effect of dividing n by 2, we’ll squint a little and just call them both size n/2.
As Chapter 4 will discuss, this simplification of ignoring floors and
ceilings does not generally affect the order of growth of a solution to a
divide-and-conquer recurrence.
Another convention we’ll adopt is to omit a statement of the base
cases of the recurrence, which we’ll also discuss in more detail in
Chapter 4. The reason is that the base cases are pretty much always T
( n) = Θ(1) if n < n 0 for some constant n 0 > 0. That’s because the running time of an algorithm on an input of constant size is constant.
We save ourselves a lot of extra writing by adopting this convention.
Analysis of merge sort
Here’s how to set up the recurrence for T ( n), the worst-case running time of merge sort on n numbers.
Divide: The divide step just computes the middle of the subarray, which
takes constant time. Thus, D( n) = Θ(1).
Conquer: Recursively solving two subproblems, each of size n/2,
contributes 2 T ( n/2) to the running time (ignoring the floors and ceilings, as we discussed).
Combine: Since the MERGE procedure on an n-element subarray takes
Θ( n) time, we have C( n) = Θ( n).
When we add the functions D( n) and C( n) for the merge sort analysis, we are adding a function that is Θ( n) and a function that is Θ(1). This sum is a linear function of n. That is, it is roughly
proportional to n when n is large, and so merge sort’s dividing and combining times together are Θ( n). Adding Θ( n) to the 2 T ( n/2) term from the conquer step gives the recurrence for the worst-case running
time T ( n) of merge sort:
Chapter 4 presents the “master theorem,” which shows that T ( n) = Θ( n lg n).17 Compared with insertion sort, whose worst-case running time is Θ( n 2), merge sort trades away a factor of n for a factor of lg n. Because the logarithm function grows more slowly than any linear function,
that’s a good trade. For large enough inputs, merge sort, with its Θ( n lg
n) worst-case running time, outperforms insertion sort, whose worst-
case running time is Θ( n 2).
We do not need the master theorem, however, to understand
intuitively why the solution to recurrence (2.3) is T ( n) = Θ( n lg n). For simplicity, assume that n is an exact power of 2 and that the implicit base case is n = 1. Then recurrence (2.3) is essentially
where the constant c 1 > 0 represents the time required to solve a problem of size 1, and c 2 > 0 is the time per array element of the divide
and combine steps.18
Figure 2.5 illustrates one way of figuring out the solution to recurrence (2.4). Part (a) of the figure shows T ( n), which part (b) expands into an equivalent tree representing the recurrence. The c 2 n term denotes the cost of dividing and combining at the top level of
recursion, and the two subtrees of the root are the two smaller
recurrences T ( n/2). Part (c) shows this process carried one step further by expanding T ( n/2). The cost for dividing and combining at each of
the two nodes at the second level of recursion is c 2 n/2. Continue to expand each node in the tree by breaking it into its constituent parts as
determined by the recurrence, until the problem sizes get down to 1,
each with a cost of c 1. Part (d) shows the resulting recursion tree.
Next, add the costs across each level of the tree. The top level has
total cost c 2 n, the next level down has total cost c 2( n/2) + c 2( n/2) = c 2 n, the level after that has total cost c 2( n/4) + c 2( n/4) + c 2( n/4) + c 2( n/4) =
c 2 n, and so on. Each level has twice as many nodes as the level above, but each node contributes only half the cost of a node from the level
above. From one level to the next, doubling and halving cancel each
other out, so that the cost across each level is the same: c 2 n. In general, the level that is i levels below the top has 2 i nodes, each contributing a cost of c 2( n/2 i), so that the i th level below the top has total cost 2 i ·
c 2( n/2 i) = c 2 n. The bottom level has n nodes, each contributing a cost of c 1, for a total cost of c 1 n.
The total number of levels of the recursion tree in Figure 2.5 is lg n +
1, where n is the number of leaves, corresponding to the input size. An
informal inductive argument justifies this claim. The base case occurs
when n = 1, in which case the tree has only 1 level. Since lg 1 = 0, we
have that lg n + 1 gives the correct number of levels. Now assume as an
inductive hypothesis that the number of levels of a recursion tree with 2 i
leaves is lg 2 i + 1 = i + 1 (since for any value of i, we have that lg 2 i = i).
Because we assume that the input size is an exact power of 2, the next
input size to consider is 2 i + 1. A tree with n = 2 i + 1 leaves has 1 more level than a tree with 2 i leaves, and so the total number of levels is ( i + 1)
+ 1 = lg 2 i + 1 + 1.
Figure 2.5 How to construct a recursion tree for the recurrence (2.4). Part (a) shows T ( n), which progressively expands in (b)–(d) to form the recursion tree. The fully expanded tree in part (d) has lg n + 1 levels. Each level above the leaves contributes a total cost of c 2 n, and the leaf level contributes c 1 n. The total cost, therefore, is c 2 n lg n + c 1 n = Θ( n lg n).
To compute the total cost represented by the recurrence (2.4), simply
add up the costs of all the levels. The recursion tree has lg n + 1 levels.
The levels above the leaves each cost c 2 n, and the leaf level costs c 1 n, for a total cost of c 2 n lg n + c 1 n = Θ( n lg n).
Exercises
2.3-1
Using Figure 2.4 as a model, illustrate the operation of merge sort on an array initially containing the sequence 〈3, 41, 52, 26, 38, 57, 9, 49〉.
2.3-2
The test in line 1 of the MERGE-SORT procedure reads “if p ≥ r”
rather than “if p ≠ r.” If MERGE-SORT is called with p > r, then the subarray A[ p : r] is empty. Argue that as long as the initial call of MERGE-SORT( A, 1, n) has n ≥ 1, the test “if p ≠ r” suffices to ensure that no recursive call has p > r.
2.3-3
State a loop invariant for the while loop of lines 12–18 of the MERGE
procedure. Show how to use it, along with the while loops of lines 20–23
and 24–27, to prove that the MERGE procedure is correct.
2.3-4
Use mathematical induction to show that when n ≥ 2 is an exact power
of 2, the solution of the recurrence
is T( n) = n lg n.
2.3-5
You can also think of insertion sort as a recursive algorithm. In order to
sort A[1 : n], recursively sort the subarray A[1 : n – 1] and then insert A[ n] into the sorted subarray A[1 : n – 1]. Write pseudocode for this
recursive version of insertion sort. Give a recurrence for its worst-case running time.
2.3-6
Referring back to the searching problem (see Exercise 2.1-4), observe
that if the subarray being searched is already sorted, the searching
algorithm can check the midpoint of the subarray against v and
eliminate half of the subarray from further consideration. The binary
search algorithm repeats this procedure, halving the size of the
remaining portion of the subarray each time. Write pseudocode, either
iterative or recursive, for binary search. Argue that the worst-case
running time of binary search is Θ(lg n).
2.3-7
The while loop of lines 5–7 of the INSERTION-SORT procedure in
Section 2.1 uses a linear search to scan (backward) through the sorted subarray A[1 : j – 1]. What if insertion sort used a binary search (see Exercise 2.3-6) instead of a linear search? Would that improve the
overall worst-case running time of insertion sort to Θ( n lg n)?
2.3-8
Describe an algorithm that, given a set S of n integers and another integer x, determines whether S contains two elements that sum to exactly x. Your algorithm should take Θ( n lg n) time in the worst case.
Problems
2-1 Insertion sort on small arrays in merge sort
Although merge sort runs in Θ( n lg n) worst-case time and insertion sort runs in Θ( n 2) worst-case time, the constant factors in insertion sort can
make it faster in practice for small problem sizes on many machines.
Thus it makes sense to coarsen the leaves of the recursion by using insertion sort within merge sort when subproblems become sufficiently
small. Consider a modification to merge sort in which n/ k sublists of
length k are sorted using insertion sort and then merged using the
standard merging mechanism, where k is a value to be determined.
a. Show that insertion sort can sort the n/ k sublists, each of length k, in Θ( nk) worst-case time.
b. Show how to merge the sublists in Θ( n lg( n/ k)) worst-case time.
c. Given that the modified algorithm runs in Θ( nk + n lg( n/ k)) worst-case time, what is the largest value of k as a function of n for which
the modified algorithm has the same running time as standard merge
sort, in terms of Θ-notation?
d. How should you choose k in practice?
2-2 Correctness of bubblesort
Bubblesort is a popular, but inefficient, sorting algorithm. It works by
repeatedly swapping adjacent elements that are out of order. The
procedure BUBBLESORT sorts array A[1 : n].
BUBBLESORT( A, n)
1 for i = 1 to n – 1
2
for j = n downto i + 1
3
if A[ j] < A[ j – 1]
4
exchange A[ j] with A[ j – 1]
a. Let A′ denote the array A after BUBBLESORT( A, n) is executed. To prove that
In order to show that BUBBLESORT actually sorts, what else do you
need to prove?
The next two parts prove inequality (2.5).
b. State precisely a loop invariant for the for loop in lines 2–4, and prove
that this loop invariant holds. Your proof should use the structure of
the loop-invariant proof presented in this chapter.

c. Using the termination condition of the loop invariant proved in part
(b), state a loop invariant for the for loop in lines 1–4 that allows you
to prove inequality (2.5). Your proof should use the structure of the
loop-invariant proof presented in this chapter.
d. What is the worst-case running time of BUBBLESORT? How does it
compare with the running time of INSERTION-SORT?
2-3 Correctness of Horner’s rule
You are given the coefficents a 0, a 1, a 2, … , an of a polynomial and you want to evaluate this polynomial for a given value of x.
Horner’s rule says to evaluate the polynomial according to this
parenthesization:
The procedure HORNER implements Horner’s rule to evaluate P( x), given the coefficients a 0, a 1, a 2, … , an in an array A[0 : n] and the value of x.
HORNER( A, n, x)
1
p = 0
2
for i = n downto 0
3
p = A[ i] + x · p
4
return p
a. In terms of Θ-notation, what is the running time of this procedure?
b. Write pseudocode to implement the naive polynomial-evaluation
algorithm that computes each term of the polynomial from scratch.
What is the running time of this algorithm? How does it compare with
HORNER?

c. Consider the following loop invariant for the procedure HORNER:
At the start of each iteration of the for loop of lines 2–3,
Interpret a summation with no terms as equaling 0. Following the
structure of the loop-invariant proof presented in this chapter, use this
loop invariant to show that, at termination,
.
2-4 Inversions
Let A[1 : n] be an array of n distinct numbers. If i < j and A[ i] > A[ j], then the pair ( i, j) is called an inversion of A.
a. List the five inversions of the array 〈2, 3, 8, 6, 1〉.
b. What array with elements from the set {1, 2, … , n} has the most
inversions? How many does it have?
c. What is the relationship between the running time of insertion sort
and the number of inversions in the input array? Justify your answer.
d. Give an algorithm that determines the number of inversions in any
permutation on n elements in Θ( n lg n) worst-case time. ( Hint: Modify merge sort.)
Chapter notes
In 1968, Knuth published the first of three volumes with the general title
The Art of Computer Programming [259, 260, 261]. The first volume ushered in the modern study of computer algorithms with a focus on
the analysis of running time. The full series remains an engaging and
worthwhile reference for many of the topics presented here. According
to Knuth, the word “algorithm” is derived from the name “al-
Khowârizmî,” a ninth-century Persian mathematician.
Aho, Hopcroft, and Ullman [5] advocated the asymptotic analysis of algorithms—using notations that Chapter 3 introduces, including Θ-notation—as a means of comparing relative performance. They also
popularized the use of recurrence relations to describe the running times of recursive algorithms.
Knuth [261] provides an encyclopedic treatment of many sorting algorithms. His comparison of sorting algorithms (page 381) includes
exact step-counting analyses, like the one we performed here for
insertion sort. Knuth’s discussion of insertion sort encompasses several
variations of the algorithm. The most important of these is Shell’s sort,
introduced by D. L. Shell, which uses insertion sort on periodic
subarrays of the input to produce a faster sorting algorithm.
Merge sort is also described by Knuth. He mentions that a
mechanical collator capable of merging two decks of punched cards in a
single pass was invented in 1938. J. von Neumann, one of the pioneers
of computer science, apparently wrote a program for merge sort on the
EDVAC computer in 1945.
The early history of proving programs correct is described by Gries
[200], who credits P. Naur with the first article in this field. Gries attributes loop invariants to R. W. Floyd. The textbook by Mitchell
[329] is a good reference on how to prove programs correct.
1 If you’re familiar with only Python, you can think of arrays as similar to Python lists.
2 When the loop is a for loop, the loop-invariant check just prior to the first iteration occurs immediately after the initial assignment to the loop-counter variable and just before the first test in the loop header. In the case of INSERTION-SORT, this time is after assigning 2 to the variable i but before the first test of whether i ≤ n.
3 In an if-else statement, we indent else at the same level as its matching if. The first executable line of an else clause appears on the same line as the keyword else. For multiway tests, we use elseif for tests after the first one. When it is the first line in an else clause, an if statement appears on the line following else so that you do not misconstrue it as elseif.
4 Each pseudocode procedure in this book appears on one page so that you do not need to discern levels of indentation in pseudocode that is split across pages.
5 Most block-structured languages have equivalent constructs, though the exact syntax may differ. Python lacks repeat-until loops, and its for loops operate differently from the for loops in this book. Think of the pseudocode line “for i = 1 to n” as equivalent to “for i in range(1, n+1)”
in Python.
6 In Python, the loop counter retains its value after the loop is exited, but the value it retains is the value it had during the final iteration of the for loop, rather than the value that exceeded the
loop bound. That is because a Python for loop iterates through a list, which may contain nonnumeric values.
7 If you’re used to programming in Python, bear in mind that in this book, the subarray A[ i : j]
includes the element A[ j]. In Python, the last element of A[ i : j] is A[ j – 1]. Python allows negative indices, which count from the back end of the list. This book does not use negative array indices.
8 Python’s tuple notation allows return statements to return multiple values without creating objects from a programmer-defined class.
9 We assume that each element of a given array occupies the same number of bytes and that the elements of a given array are stored in contiguous memory locations. For example, if array A[1 : n] starts at memory address 1000 and each element occupies four bytes, then element A[ i] is at address 1000 + 4( i – 1). In general, computing the address in memory of a particular array element requires at most one subtraction (no subtraction for a 0-origin array), one multiplication (often implemented as a shift operation if the element size is an exact power of 2), and one addition. Furthermore, for code that iterates through the elements of an array in order, an optimizing compiler can generate the address of each element using just one addition, by adding the element size to the address of the preceding element.
10 There are some subtleties here. Computational steps that we specify in English are often variants of a procedure that requires more than just a constant amount of time. For example, in the RADIX-SORT procedure on page 213, one line reads “use a stable sort to sort array A on digit i,” which, as we shall see, takes more than a constant amount of time. Also, although a statement that calls a subroutine takes only constant time, the subroutine itself, once invoked, may take more. That is, we separate the process of calling the subroutine—passing parameters to it, etc.—from the process of executing the subroutine.
11 This characteristic does not necessarily hold for a resource such as memory. A statement that references m words of memory and is executed n times does not necessarily reference mn distinct words of memory.
12 This procedure is the rare case that uses both 1-origin indexing (for array A) and 0-origin indexing (for arrays L and R). Using 0-origin indexing for L and R makes for a simpler loop invariant in Exercise 2.3-3.
13 If you’re wondering where the “+1” comes from, imagine that r = p + 1. Then the subarray A[ p : r] consists of two elements, and r – p + 1 = 2.
14 Chapter 3 shows how to formally interpret equations containing Θ-notation.
15 The expression ⌈ x⌉ denotes the least integer greater than or equal to x, and ⌊ x⌊ denotes the greatest integer less than or equal to x. These notations are defined in Section 3.3. The easiest way to verify that setting q to ⌊( p + r)/2⌊ yields subarrays A[ p : q] and A[ q + 1 : r] of sizes ⌈ n/2⌉
and ⌊ n/2⌊, respectively, is to examine the four cases that arise depending on whether each of p and r is odd or even.
16 If you’re wondering where Θ(1) comes from, think of it this way. When we say that n 2/100 is Θ( n 2), we are ignoring the coefficient 1/100 of the factor n 2. Likewise, when we say that a
constant c is Θ(1), we are ignoring the coefficient c of the factor 1 (which you can also think of as n 0).
17 The notation lg n stands for log2 n, although the base of the logarithm doesn’t matter here, but as computer scientists, we like logarithms base 2. Section 3.3 discusses other standard notation.
18 It is unlikely that c 1 is exactly the time to solve problems of size 1 and that c 2 n is exactly the time of the divide and combine steps. We’ll look more closely at bounding recurrences in
Chapter 4, where we’ll be more careful about this kind of detail.
3 Characterizing Running Times
The order of growth of the running time of an algorithm, defined in
Chapter 2, gives a simple way to characterize the algorithm’s efficiency and also allows us to compare it with alternative algorithms. Once the
input size n becomes large enough, merge sort, with its Θ( n lg n) worst-case running time, beats insertion sort, whose worst-case running time is
Θ( n 2). Although we can sometimes determine the exact running time of
an algorithm, as we did for insertion sort in Chapter 2, the extra precision is rarely worth the effort of computing it. For large enough
inputs, the multiplicative constants and lower-order terms of an exact
running time are dominated by the effects of the input size itself.
When we look at input sizes large enough to make relevant only the
order of growth of the running time, we are studying the asymptotic
efficiency of algorithms. That is, we are concerned with how the running
time of an algorithm increases with the size of the input in the limit, as
the size of the input increases without bound. Usually, an algorithm
that is asymptotically more efficient is the best choice for all but very
small inputs.
This chapter gives several standard methods for simplifying the
asymptotic analysis of algorithms. The next section presents informally
the three most commonly used types of “asymptotic notation,” of which
we have already seen an example in Θ-notation. It also shows one way
to use these asymptotic notations to reason about the worst-case
running time of insertion sort. Then we look at asymptotic notations
more formally and present several notational conventions used
throughout this book. The last section reviews the behavior of functions
that commonly arise when analyzing algorithms.
3.1 O-notation, Ω-notation, and Θ-notation
When we analyzed the worst-case running time of insertion sort in
Chapter 2, we started with the complicated expression
We then discarded the lower-order terms ( c 1 + c 2 + c 4 + c 5/2 – c 6/2 –
c 7/2 + c 8) n and c 2 + c 4 + c 5 + c 8, and we also ignored the coefficient c 5/2 + c 6/2 + c 7/2 of n 2. That left just the factor n 2, which we put into Θ-notation as Θ( n 2). We use this style to characterize running times of
algorithms: discard the lower-order terms and the coefficient of the
leading term, and use a notation that focuses on the rate of growth of
the running time.
Θ-notation is not the only such “asymptotic notation.” In this
section, we’ll see other forms of asymptotic notation as well. We start
with intuitive looks at these notations, revisiting insertion sort to see
how we can apply them. In the next section, we’ll see the formal
definitions of our asymptotic notations, along with conventions for
using them.
Before we get into specifics, bear in mind that the asymptotic
notations we’ll see are designed so that they characterize functions in
general. It so happens that the functions we are most interested in
denote the running times of algorithms. But asymptotic notation can
apply to functions that characterize some other aspect of algorithms
(the amount of space they use, for example), or even to functions that
have nothing whatsoever to do with algorithms.
O-notation
O-notation characterizes an upper bound on the asymptotic behavior of a function. In other words, it says that a function grows no faster than a
certain rate, based on the highest-order term. Consider, for example, the
function 7 n 3 + 100 n 2 – 20 n + 6. Its highest-order term is 7 n 3, and so we say that this function’s rate of growth is n 3. Because this function grows
no faster than n 3, we can write that it is O( n 3). You might be surprised that we can also write that the function 7 n 3 + 100 n 2 – 20 n + 6 is O( n 4).
Why? Because the function grows more slowly than n 4, we are correct in
saying that it grows no faster. As you might have guessed, this function
is also O( n 5), O( n 6), and so on. More generally, it is O( nc) for any constant c ≥ 3.
Ω-notation
Ω-notation characterizes a lower bound on the asymptotic behavior of a
function. In other words, it says that a function grows at least as fast as
a certain rate, based — as in O-notation—on the highest-order term.
Because the highest-order term in the function 7 n 3 + 100 n 2 – 20 n + 6
grows at least as fast as n 3, this function is Ω( n 3). This function is also Ω( n 2) and Ω( n). More generally, it is Ω( nc) for any constant c ≤ 3.
Θ-notation
Θ-notation characterizes a tight bound on the asymptotic behavior of a
function. It says that a function grows precisely at a certain rate, based
—once again—on the highest-order term. Put another way, Θ-notation
characterizes the rate of growth of the function to within a constant
factor from above and to within a constant factor from below. These
two constant factors need not be equal.
If you can show that a function is both O( f ( n)) and Ω( f ( n)) for some function f ( n), then you have shown that the function is Θ( f ( n)). (The next section states this fact as a theorem.) For example, since the
function 7 n 3 + 100 n 2 – 20 n + 6 is both O( n 3) and Ω( n 3), it is also Θ( n 3).
Let’s revisit insertion sort and see how to work with asymptotic
notation to characterize its Θ( n 2) worst-case running time without
evaluating summations as we did in Chapter 2. Here is the
INSERTION-SORT procedure once again:
INSERTION-SORT( A, n)
1 for i = 2 to n
2
key = A[ i]
3
// Insert A[ i] into the sorted subarray A[1 : i – 1].
4
j = i – 1
5
while j > 0 and A[ j] > key
6
A[ j + 1] = A[ j]
7
j = j – 1
8
A[ j + 1] = key
What can we observe about how the pseudocode operates? The
procedure has nested loops. The outer loop is a for loop that runs n – 1
times, regardless of the values being sorted. The inner loop is a while
loop, but the number of iterations it makes depends on the values being
sorted. The loop variable j starts at i – 1 and decreases by 1 in each iteration until either it reaches 0 or A[ j] ≤ key. For a given value of i, the while loop might iterate 0 times, i – 1 times, or anywhere in between.
The body of the while loop (lines 6–7) takes constant time per iteration
of the while loop.
Figure 3.1 The Ω( n 2) lower bound for insertion sort. If the first n/3 positions contain the n/3
largest values, each of these values must move through each of the middle n/3 positions, one position at a time, to end up somewhere in the last n/3 positions. Since each of n/3 values moves through at least each of n/3 positions, the time taken in this case is at least proportional to ( n/3) ( n/3) = n 2/9, or Ω( n 2).
These observations suffice to deduce an O( n 2) running time for any
case of INSERTION-SORT, giving us a blanket statement that covers
all inputs. The running time is dominated by the inner loop. Because
each of the n – 1 iterations of the outer loop causes the inner loop to
iterate at most i – 1 times, and because i is at most n, the total number of iterations of the inner loop is at most ( n – 1)( n – 1), which is less than n 2. Since each iteration of the inner loop takes constant time, the total
time spent in the inner loop is at most a constant times n 2, or O( n 2).
With a little creativity, we can also see that the worst-case running
time of INSERTION-SORT is Ω( n 2). By saying that the worst-case
running time of an algorithm is Ω( n 2), we mean that for every input size
n above a certain threshold, there is at least one input of size n for which the algorithm takes at least cn 2 time, for some positive constant c. It does not necessarily mean that the algorithm takes at least cn 2 time for
all inputs.
Let’s now see why the worst-case running time of INSERTION-
SORT is Ω( n 2). For a value to end up to the right of where it started, it
must have been moved in line 6. In fact, for a value to end up k
positions to the right of where it started, line 6 must have executed k times. As Figure 3.1 shows, let’s assume that n is a multiple of 3 so that we can divide the array A into groups of n/3 positions. Suppose that in the input to INSERTION-SORT, the n/3 largest values occupy the first
n/3 array positions A[1 : n/3]. (It does not matter what relative order they have within the first n/3 positions.) Once the array has been sorted,
each of these n/3 values ends up somewhere in the last n/3 positions A[2 n/3 + 1 : n]. For that to happen, each of these n/3 values must pass through each of the middle n/3 positions A[ n/3 + 1 : 2 n/3]. Each of these n/3 values passes through these middle n/3 positions one position at a time, by at least n/3 executions of line 6. Because at least n/3 values have to pass through at least n/3 positions, the time taken by INSERTION-SORT in the worst case is at least proportional to ( n/3)( n/3) = n 2/9, which is Ω( n 2).
Because we have shown that INSERTION-SORT runs in O( n 2) time
in all cases and that there is an input that makes it take Ω( n 2) time, we
can conclude that the worst-case running time of INSERTION-SORT is
Θ( n 2). It does not matter that the constant factors for upper and lower
bounds might differ. What matters is that we have characterized the
worst-case running time to within constant factors (discounting lower-
order terms). This argument does not show that INSERTION-SORT
runs in Θ( n 2) time in all cases. Indeed, we saw in Chapter 2 that the best-case running time is Θ( n).
Exercises
3.1-1
Modify the lower-bound argument for insertion sort to handle input
sizes that are not necessarily a multiple of 3.
3.1-2
Using reasoning similar to what we used for insertion sort, analyze the
running time of the selection sort algorithm from Exercise 2.2-2.
3.1-3
Suppose that α is a fraction in the range 0 < α < 1. Show how to generalize the lower-bound argument for insertion sort to consider an
input in which the αn largest values start in the first αn positions. What additional restriction do you need to put on α? What value of α
maximizes the number of times that the αn largest values must pass
through each of the middle (1 – 2 α) n array positions?
3.2 Asymptotic notation: formal definitions
Having seen asymptotic notation informally, let’s get more formal. The
notations we use to describe the asymptotic running time of an
algorithm are defined in terms of functions whose domains are typically
the set N of natural numbers or the set R of real numbers. Such
notations are convenient for describing a running-time function T ( n).
This section defines the basic asymptotic notations and also introduces
some common “proper” notational abuses.
Figure 3.2 Graphic examples of the O, Ω, and Θ notations. In each part, the value of n 0 shown is the minimum possible value, but any greater value also works. (a) O-notation gives an upper bound for a function to within a constant factor. We write f ( n) = O( g( n)) if there are positive constants n 0 and c such that at and to the right of n 0, the value of f ( n) always lies on or below cg( n). (b) Ω-notation gives a lower bound for a function to within a constant factor. We write f ( n) = Ω( g( n)) if there are positive constants n 0 and c such that at and to the right of n 0, the value of f ( n) always lies on or above cg( n). (c) Θ-notation bounds a function to within constant factors. We write f ( n) = Θ( g( n)) if there exist positive constants n 0, c 1, and c 2 such that at and to the right of n 0, the value of f ( n) always lies between c 1 g( n) and c 2 g( n) inclusive.
O-notation
As we saw in Section 3.1, O-notation describes an asymptotic upper bound. We use O-notation to give an upper bound on a function, to
Here is the formal definition of O-notation. For a given function
g( n), we denote by O( g( n)) (pronounced “big-oh of g of n” or sometimes just “oh of g of n”) the set of functions
O( g( n)) : there exist positive constants c and n 0 such
= { f ( n)
that 0 ≤ f ( n) ≤ cg( n) for all n ≥ n 0}.1
A function f ( n) belongs to the set O( g( n)) if there exists a positive constant c such that f ( n) ≤ cg( n) for sufficiently large n. Figure 3.2(a)
shows the intuition behind O-notation. For all values n at and to the right of n 0, the value of the function f ( n) is on or below cg( n).
The definition of O( g( n)) requires that every function f ( n) in the set O( g( n)) be asymptotically nonnegative: f ( n) must be nonnegative whenever n is sufficiently large. (An asymptotically positive function is one that is positive for all sufficiently large n.) Consequently, the function g( n) itself must be asymptotically nonnegative, or else the set O( g( n)) is empty. We therefore assume that every function used within O-notation is asymptotically nonnegative. This assumption holds for
the other asymptotic notations defined in this chapter as well.
You might be surprised that we define O-notation in terms of sets.
Indeed, you might expect that we would write “f ( n) ∈ O( g( n))” to indicate that f ( n) belongs to the set O( g( n)). Instead, we usually write “f ( n) = O( g( n))” and say “f ( n) is big-oh of g( n)” to express the same notion. Although it may seem confusing at first to abuse equality in this
way, we’ll see later in this section that doing so has its advantages.
Let’s explore an example of how to use the formal definition of O-
notation to justify our practice of discarding lower-order terms and
ignoring the constant coefficient of the highest-order term. We’ll show
that 4 n 2 + 100 n + 500 = O( n 2), even though the lower-order terms have much larger coefficients than the leading term. We need to find positive
constants c and n 0 such that 4 n 2 + 100 n + 500 ≤ cn 2 for all n ≥ n 0.
Dividing both sides by n 2 gives 4 + 100/ n + 500/ n 2 ≤ c. This inequality is satisfied for many choices of c and n 0. For example, if we choose n 0 = 1,
then this inequality holds for c = 604. If we choose n 0 = 10, then c = 19
works, and choosing n 0 = 100 allows us to use c = 5.05.
We can also use the formal definition of O-notation to show that the
function n 3 – 100 n 2 does not belong to the set O( n 2), even though the coefficient of n 2 is a large negative number. If we had n 3 – 100 n 2 =
O( n 2), then there would be positive constants c and n 0 such that n 3 –
100 n 2 ≤ cn 2 for all n ≥ n 0. Again, we divide both sides by n 2, giving n –
100 ≤ c. Regardless of what value we choose for the constant c, this inequality does not hold for any value of n > c + 100.
Ω-notation
Just as O-notation provides an asymptotic upper bound on a function,
Ω-notation provides an asymptotic lower bound. For a given function
g( n), we denote by Ω( g( n)) (pronounced “big-omega of g of n” or sometimes just “omega of g of n”) the set of functions
Ω( g( n)) : there exist positive constants c and n 0 such
= { f ( n)
that 0 ≤ cg( n) ≤ f ( n) for all n ≥ n 0}.
Figure 3.2(b) shows the intuition behind Ω-notation. For all values n at or to the right of n 0, the value of f ( n) is on or above cg( n).
We’ve already shown that 4 n 2 + 100 n + 500 = O( n 2). Now let’s show that 4 n 2 + 100 n + 500 = Ω( n 2). We need to find positive constants c and n 0 such that 4 n 2 + 100 n + 500 ≥ cn 2 for all n ≥ n 0. As before, we divide both sides by n 2, giving 4 + 100/ n + 500/ n 2 ≥ c. This inequality holds when n 0 is any positive integer and c = 4.
What if we had subtracted the lower-order terms from the 4 n 2 term
instead of adding them? What if we had a small coefficient for the n 2
term? The function would still be Ω( n 2). For example, let’s show that n 2/100 – 100 n – 500 = Ω( n 2). Dividing by n 2 gives 1/100 – 100/ n – 500/ n 2
≥ c. We can choose any value for n 0 that is at least 10,005 and find a positive value for c. For example, when n 0 = 10,005, we can choose c =
2.49 × 10–9. Yes, that’s a tiny value for c, but it is positive. If we select a larger value for n 0, we can also increase c. For example, if n 0 = 100,000, then we can choose c = 0.0089. The higher the value of n 0, the closer to the coefficient 1/100 we can choose c.
Θ-notation
We use Θ-notation for asymptotically tight bounds. For a given function
g( n), we denote by Θ( g( n)) (“theta of g of n”) the set of functions Θ( g( n)) : there exist positive constants c 1, c 2, and n 0
= { f ( n) such that 0 ≤ c 1 g( n) ≤ f ( n) ≤ c 2 g( n) for all n ≥
n 0}.
Figure 3.2(c) shows the intuition behind Θ-notation. For all values of n at and to the right of n 0, the value of f ( n) lies at or above c 1 g( n) and at or below c 2 g( n). In other words, for all n ≥ n 0, the function f ( n) is equal to g( n) to within constant factors.
The definitions of O-, Ω-, and Θ-notations lead to the following
theorem, whose proof we leave as Exercise 3.2-4.
Theorem 3.1
For any two functions f ( n) and g( n), we have f ( n) = Θ( g( n)) if and only if f ( n) = O( g( n)) and f ( n) = Ω( g( n)).
▪
We typically apply Theorem 3.1 to prove asymptotically tight bounds
from asymptotic upper and lower bounds.
Asymptotic notation and running times
When you use asymptotic notation to characterize an algorithm’s
running time, make sure that the asymptotic notation you use is as
precise as possible without overstating which running time it applies to.
Here are some examples of using asymptotic notation properly and improperly to characterize running times.
Let’s start with insertion sort. We can correctly say that insertion
sort’s worst-case running time is O( n 2), Ω( n 2), and—due to Theorem 3.1
—Θ( n 2). Although all three ways to characterize the worst-case running
times are correct, the Θ( n 2) bound is the most precise and hence the most preferred. We can also correctly say that insertion sort’s best-case
running time is O( n), Ω( n), and Θ( n), again with Θ( n) the most precise and therefore the most preferred.
Here is what we cannot correctly say: insertion sort’s running time is
Θ( n 2). That is an overstatement because by omitting “worst-case” from
the statement, we’re left with a blanket statement covering all cases. The
error here is that insertion sort does not run in Θ( n 2) time in all cases
since, as we’ve seen, it runs in Θ( n) time in the best case. We can correctly say that insertion sort’s running time is O( n 2), however, because in all cases, its running time grows no faster than n 2. When we
say O( n 2) instead of Θ( n 2), there is no problem in having cases whose running time grows more slowly than n 2. Likewise, we cannot correctly
say that insertion sort’s running time is Θ( n), but we can say that its running time is Ω( n).
How about merge sort? Since merge sort runs in Θ( n lg n) time in all
cases, we can just say that its running time is Θ( n lg n) without specifying worst-case, best-case, or any other case.
People occasionally conflate O-notation with Θ-notation by
mistakenly using O-notation to indicate an asymptotically tight bound.
They say things like “an O( n lg n)-time algorithm runs faster than an O( n 2)-time algorithm.” Maybe it does, maybe it doesn’t. Since O-
notation denotes only an asymptotic upper bound, that so-called O( n 2)-
time algorithm might actually run in Θ( n) time. You should be careful to
choose the appropriate asymptotic notation. If you want to indicate an
asymptotically tight bound, use Θ-notation.
We typically use asymptotic notation to provide the simplest and most precise bounds possible. For example, if an algorithm has a
running time of 3 n 2 + 20 n in all cases, we use asymptotic notation to
write that its running time is Θ( n 2). Strictly speaking, we are also correct in writing that the running time is O( n 3) or Θ(3 n 2 + 20 n).
Neither of these expressions is as useful as writing Θ( n 2) in this case, however: O( n 3) is less precise than Θ( n 2) if the running time is 3 n 2 +
20 n, and Θ(3 n 2 + 20 n) introduces complexity that obscures the order of growth. By writing the simplest and most precise bound, such as Θ( n 2),
we can categorize and compare different algorithms. Throughout the
book, you will see asymptotic running times that are almost always
based on polynomials and logarithms: functions such as n, n lg2 n, n 2 lg n, or n 1/2. You will also see some other functions, such as exponentials, lg lg n, and lg* n (see Section 3.3). It is usually fairly easy to compare the rates of growth of these functions. Problem 3-3 gives you good practice.
Asymptotic notation in equations and inequalities
Although we formally define asymptotic notation in terms of sets, we
use the equal sign (=) instead of the set membership sign (∈) within
formulas. For example, we wrote that 4 n 2 + 100 n + 500 = O( n 2). We might also write 2 n 2 + 3 n + 1 = 2 n 2 + Θ( n). How do we interpret such formulas?
When the asymptotic notation stands alone (that is, not within a
larger formula) on the right-hand side of an equation (or inequality), as
in 4 n 2 + 100 n + 500 = O( n 2), the equal sign means set membership: 4 n 2
+ 100 n + 500 ∈ O( n 2). In general, however, when asymptotic notation appears in a formula, we interpret it as standing for some anonymous
function that we do not care to name. For example, the formula 2 n 2 +
3 n + 1 = 2 n 2 + Θ( n) means that 2 n 2 + 3 n + 1 = 2 n 2 + f ( n), where f ( n)
∈ Θ( n). In this case, we let f ( n) = 3 n + 1, which indeed belongs to Θ( n).
Using asymptotic notation in this manner can help eliminate
inessential detail and clutter in an equation. For example, in Chapter 2
we expressed the worst-case running time of merge sort as the
recurrence
T ( n) = 2 T ( n/2) + Θ( n).
If we are interested only in the asymptotic behavior of T ( n), there is no point in specifying all the lower-order terms exactly, because they are all
understood to be included in the anonymous function denoted by the
term Θ( n).
The number of anonymous functions in an expression is understood
to be equal to the number of times the asymptotic notation appears. For
example, in the expression
there is only a single anonymous function (a function of i). This
expression is thus not the same as O(1) + O(2) + ⋯ + O( n), which doesn’t really have a clean interpretation.
In some cases, asymptotic notation appears on the left-hand side of
an equation, as in
2 n 2 + Θ( n) = Θ( n 2).
Interpret such equations using the following rule: No matter how the
anonymous functions are chosen on the left of the equal sign, there is a way to choose the anonymous functions on the right of the equal sign to
make the equation valid. Thus, our example means that for any function
f ( n) ∈ Θ( n), there is some function g( n) ∈ Θ( n 2) such that 2 n 2 + f ( n) =
g( n) for all n. In other words, the right-hand side of an equation provides a coarser level of detail than the left-hand side.
We can chain together a number of such relationships, as in
2 n 2 + 3 n + 1 = 2 n 2 + Θ( n)
= Θ( n 2).
By the rules above, interpret each equation separately. The first equation says that there is some function f ( n) ∈ Θ( n) such that 2 n 2 + 3 n + 1 =
2 n 2 + f ( n) for all n. The second equation says that for any function g( n)
∈ Θ( n) (such as the f ( n) just mentioned), there is some function h( n) ∈
Θ( n 2) such that 2 n 2 + g( n) = h( n) for all n. This interpretation implies that 2 n 2 + 3 n + 1 = Θ( n 2), which is what the chaining of equations intuitively says.
Proper abuses of asymptotic notation
Besides the abuse of equality to mean set membership, which we now
see has a precise mathematical interpretation, another abuse of
asymptotic notation occurs when the variable tending toward ∞ must be
inferred from context. For example, when we say O( g( n)), we can assume that we’re interested in the growth of g( n) as n grows, and if we say O( g( m)) we’re talking about the growth of g( m) as m grows. The free variable in the expression indicates what variable is going to ∞.
The most common situation requiring contextual knowledge of
which variable tends to ∞ occurs when the function inside the
asymptotic notation is a constant, as in the expression O(1). We cannot
infer from the expression which variable is going to ∞, because no
variable appears there. The context must disambiguate. For example, if
the equation using asymptotic notation is f ( n) = O(1), it’s apparent that the variable we’re interested in is n. Knowing from context that the variable of interest is n, however, allows us to make perfect sense of the
expression by using the formal definition of O-notation: the expression f
( n) = O(1) means that the function f ( n) is bounded from above by a constant as n goes to ∞. Technically, it might be less ambiguous if we
explicitly indicated the variable tending to ∞ in the asymptotic notation
itself, but that would clutter the notation. Instead, we simply ensure that
the context makes it clear which variable (or variables) tend to ∞.
When the function inside the asymptotic notation is bounded by a
positive constant, as in T ( n) = O(1), we often abuse asymptotic notation in yet another way, especially when stating recurrences. We
may write something like T ( n) = O(1) for n < 3. According to the
formal definition of O-notation, this statement is meaningless, because the definition only says that T ( n) is bounded above by a positive constant c for n ≥ n 0 for some n 0 > 0. The value of T ( n) for n < n 0 need not be so bounded. Thus, in the example T ( n) = O(1) for n < 3, we cannot infer any constraint on T ( n) when n < 3, because it might be that n 0 > 3.
What is conventionally meant when we say T ( n) = O(1) for n < 3 is that there exists a positive constant c such that T ( n) ≤ c for n < 3. This convention saves us the trouble of naming the bounding constant,
allowing it to remain anonymous while we focus on more important
variables in an analysis. Similar abuses occur with the other asymptotic
notations. For example, T ( n) = Θ(1) for n < 3 means that T ( n) is bounded above and below by positive constants when n < 3.
Occasionally, the function describing an algorithm’s running time
may not be defined for certain input sizes, for example, when an
algorithm assumes that the input size is an exact power of 2. We still use
asymptotic notation to describe the growth of the running time,
understanding that any constraints apply only when the function is
defined. For example, suppose that f ( n) is defined only on a subset of the natural or nonnegative real numbers. Then f ( n) = O( g( n)) means that the bound 0 ≤ T ( n) ≤ cg( n) in the definition of O-notation holds for all n ≥ n 0 over the domain of f ( n), that is, where f ( n) is defined. This abuse is rarely pointed out, since what is meant is generally clear from
context.
In mathematics, it’s okay — and often desirable — to abuse a
notation, as long as we don’t misuse it. If we understand precisely what
is meant by the abuse and don’t draw incorrect conclusions, it can
simplify our mathematical language, contribute to our higher-level
understanding, and help us focus on what really matters.
o-notation
The asymptotic upper bound provided by O-notation may or may not
be asymptotically tight. The bound 2 n 2 = O( n 2) is asymptotically tight, but the bound 2 n = O( n 2) is not. We use o-notation to denote an upper
bound that is not asymptotically tight. We formally define o( g( n)) (“little-oh of g of n”) as the set
o( g( n)) = : for any positive constant c > 0, there exists a constant n 0 > 0
{ f ( n)
such that 0 ≤ f ( n) < cg( n) for all n ≥ n 0}.
For example, 2 n = o( n 2), but 2 n 2 ≠ o( n 2).
The definitions of O-notation and o-notation are similar. The main
difference is that in f ( n) = O( g( n)), the bound 0 ≤ f ( n) ≤ cg( n) holds for some constant c > 0, but in f ( n) = o( g( n)), the bound 0 ≤ f ( n) < cg( n) holds for all constants c > 0. Intuitively, in o-notation, the function f ( n) becomes insignificant relative to g( n) as n gets large:
Some authors use this limit as a definition of the o-notation, but the definition in this book also restricts the anonymous functions to be
asymptotically nonnegative.
ω-notation
By analogy, ω-notation is to Ω-notation as o-notation is to O-notation.
We use ω-notation to denote a lower bound that is not asymptotically
tight. One way to define it is by
f ( n) ∈ ω( g( n)) if and only if g( n) ∈ o( f ( n)).
Formally, however, we define ω( g( n)) (“little-omega of g of n”) as the set ω( g( n)) : for any positive constant c > 0, there exists a constant n 0 > 0
= { f ( n)
such that 0 ≤ cg( n) < f ( n) for all n ≥ n 0}.
Where the definition of o-notation says that f ( n) < cg( n), the definition of ω-notation says the opposite: that cg( n) < f ( n). For examples of ω-
notation, we have n 2/2 = ω( n), but n 2/2 ≠ ω( n 2). The relation f ( n) =
ω( g( n)) implies that
if the limit exists. That is, f ( n) becomes arbitrarily large relative to g( n) as n gets large.
Comparing functions
Many of the relational properties of real numbers apply to asymptotic
comparisons as well. For the following, assume that f ( n) and g( n) are asymptotically positive.
Transitivity:
f
( n)
= and g( n)
= imply f
( n)
=
Θ( g( n))
Θ( h( n))
Θ( h( n)),
f
( n)
= and g( n)
= imply f
( n)
=
O( g( n))
O( h( n))
O( h( n)),
f
( n)
= and g( n)
= imply f
( n)
=
Ω( g( n))
Ω( h( n))
Ω( h( n)),
f
( n)
= and g( n)
= imply f
( n)
=
o( g( n))
o( h( n))
o( h( n)),
f
( n)
= and g( n)
= imply f
( n)
=
ω( g( n))
ω( h( n))
ω( h( n)).
Reflexivity:
f ( n) = Θ( f ( n)),
f ( n) = O( f ( n)),
f ( n) = Ω( f ( n)).
Symmetry:
f ( n) = Θ( g( n)) if and only if g( n) = Θ( f ( n)).
Transpose symmetry:
f
( n)
= if and only g( n) = Ω( f ( n)),
if
f ( n) = o( g( n)) if and only g( n) = ω( f if
( n)).
Because these properties hold for asymptotic notations, we can draw
an analogy between the asymptotic comparison of two functions f and g
and the comparison of two real numbers a and b:
f ( n) = O( g( n)) is like a ≤ b,
f ( n) = Ω( g( n)) is like a ≥ b,
f ( n) = Θ( g( n)) is like a = b,
f ( n) = o( g( n)) is like a < b, f ( n) = ω( g( n)) is like a > b.
We say that f ( n) is asymptotically smaller than g( n) if f ( n) = o( g( n)), and f ( n) is asymptotically larger than g( n) if f ( n) = ω( g( n)).
One property of real numbers, however, does not carry over to
asymptotic notation:
Trichotomy: For any two real numbers a and b, exactly one of the following must hold: a < b, a = b, or a > b.
Although any two real numbers can be compared, not all functions are
asymptotically comparable. That is, for two functions f ( n) and g( n), it may be the case that neither f ( n) = O( g( n)) nor f ( n) = Ω( g( n)) holds. For example, we cannot compare the functions n and n 1 + sin n using asymptotic notation, since the value of the exponent in n 1 + sin n oscillates between 0 and 2, taking on all values in between.
Exercises
3.2-1
Let f ( n) and g( n) be asymptotically nonnegative functions. Using the basic definition of Θ-notation, prove that max { f ( n), g( n)} = Θ( f ( n) +
g( n)).
3.2-2
Explain why the statement, “The running time of algorithm A is at least O( n 2),” is meaningless.
3.2-3
Is 2 n + 1 = O(2 n)? Is 22 n = O(2 n)?
3.2-4
Prove Theorem 3.1.
3.2-5
Prove that the running time of an algorithm is Θ( g( n)) if and only if its worst-case running time is O( g( n)) and its best-case running time is Ω( g( n)).
3.2-6
Prove that o( g( n)) ∩ ω( g( n)) is the empty set.
3.2-7
We can extend our notation to the case of two parameters n and m that
can go to ∞ independently at different rates. For a given function g( n, m), we denote by O( g( n, m)) the set of functions O( g( n,
: there exist positive constants c, n 0, and m 0
m)) = { f such that 0 ≤ f ( n, m) ≤ cg( n, m) for all n ≥ n 0
( n, m)
or m ≥ m 0}.
Give corresponding definitions for Ω( g( n, m)) and Θ( g( n, m)).
3.3 Standard notations and common functions
This section reviews some standard mathematical functions and
notations and explores the relationships among them. It also illustrates
the use of the asymptotic notations.
Monotonicity




A function f ( n) is monotonically increasing if m ≤ n implies f ( m) ≤ f ( n).
Similarly, it is monotonically decreasing if m ≤ n implies f ( m) ≥ f ( n). A function f ( n) is strictly increasing if m < n implies f ( m) < f ( n) and strictly decreasing if m < n implies f ( m) > f ( n).
Floors and ceilings
For any real number x, we denote the greatest integer less than or equal
to x by ⌊ x⌊ (read “the floor of x”) and the least integer greater than or equal to x by ⌈ x⌉ (read “the ceiling of x”). The floor function is monotonically increasing, as is the ceiling function.
Floors and ceilings obey the following properties. For any integer n,
we have
For all real x, we have
We also have
or equivalently,
For any real number x ≥ 0 and integers a, b > 0, we have
For any integer n and real number x, we have



Modular arithmetic
For any integer a and any positive integer n, the value a mod n is the remainder (or residue) of the quotient a/ n:
It follows that
even when a is negative.
Given a well-defined notion of the remainder of one integer when
divided by another, it is convenient to provide special notation to
indicate equality of remainders. If ( a mod n) = ( b mod n), we write a = b (mod n) and say that a is equivalent to b, modulo n. In other words, a =
b (mod n) if a and b have the same remainder when divided by n.
Equivalently, a = b (mod n) if and only if n is a divisor of b – a. We write a ≠ b (mod n) if a is not equivalent to b, modulo n.
Polynomials
Given a nonnegative integer d, a polynomial in n of degree d is a function p( n) of the form
where the constants a 0, a 1, … , ad are the coefficients of the polynomial and ad ≠ 0. A polynomial is asymptotically positive if and only if ad > 0.
For an asymptotically positive polynomial p( n) of degree d, we have p( n)
= Θ( nd). For any real constant a ≥ 0, the function na is monotonically increasing, and for any real constant a ≤ 0, the function na is monotonically decreasing. We say that a function f ( n) is polynomially bounded if f ( n) = O( nk) for some constant k.



Exponentials
For all real a > 0, m, and n, we have the following identities: a 0 = 1,
a 1 = a,
a–1 = 1/ a,
( am) n = amn,
( am) n = ( an) m,
aman = am+ n.
For all n and a ≥ 1, the function an is monotonically increasing in n.
When convenient, we assume that 00 = 1.
We can relate the rates of growth of polynomials and exponentials by
the following fact. For all real constants a > 1 and b, we have
from which we can conclude that
Thus, any exponential function with a base strictly greater than 1 grows
faster than any polynomial function.
Using e to denote 2.71828 …, the base of the natural-logarithm
function, we have for all real x,
where “!” denotes the factorial function defined later in this section. For
all real x, we have the inequality

where equality holds only when x = 0. When | x| ≤ 1, we have the approximation
When x → 0, the approximation of ex by 1 + x is quite good:
ex = 1 + x + Θ( x 2).
(In this equation, the asymptotic notation is used to describe the
limiting behavior as x → 0 rather than as x → ∞.) We have for all x, Logarithms
We use the following notations:
lg n = log2 n (binary logarithm),
ln n = log e n (natural logarithm),
lg k n = (lg n) k (exponentiation),