pointer, so that a pointer to the array is passed, rather than the

entire array, and changes to individual array elements are visible

to the calling procedure. Again, most contemporary programming

languages work this way.

A return statement immediately transfers control back to the

point of call in the calling procedure. Most return statements also

take a value to pass back to the caller. Our pseudocode differs

from many programming languages in that we allow multiple

values to be returned in a single return statement without having

to create objects to package them together.8

The boolean operators “and” and “or” are short circuiting. That

is, evaluate the expression “x and y” by first evaluating x. If x evaluates to FALSE, then the entire expression cannot evaluate to

TRUE, and therefore y is not evaluated. If, on the other hand, x

evaluates to TRUE, y must be evaluated to determine the value of

the entire expression. Similarly, in the expression “x or y” the expression y is evaluated only if x evaluates to FALSE. Short-circuiting operators allow us to write boolean expressions such as

x ≠ NIL and x.f = y” without worrying about what happens upon evaluating x.f when x is NIL.

The keyword error indicates that an error occurred because

conditions were wrong for the procedure to have been called, and

the procedure immediately terminates. The calling procedure is

responsible for handling the error, and so we do not specify what

action to take.

Exercises

2.1-1

Using Figure 2.2 as a model, illustrate the operation of INSERTION-

SORT on an array initially containing the sequence 〈31, 41, 59, 26, 41,

58〉.

2.1-2

Consider the procedure SUM-ARRAY on the facing page. It computes

the sum of the n numbers in array A[1 : n]. State a loop invariant for this procedure, and use its initialization, maintenance, and termination

properties to show that the SUM-ARRAY procedure returns the sum of

the numbers in A[1 : n].

Image 11

Image 12

Image 13

SUM-ARRAY( A, n)

1 sum = 0

2 for i = 1 to n

3

sum = sum + A[ i]

4 return sum

2.1-3

Rewrite the INSERTION-SORT procedure to sort into monotonically

decreasing instead of monotonically increasing order.

2.1-4

Consider the searching problem:

Input: A sequence of n numbers 〈 a 1, a 2, … , an〉 stored in array A[1 : n]

and a value x.

Output: An index i such that x equals A[ i] or the special value NIL if x does not appear in A.

Write pseudocode for linear search, which scans through the array

from beginning to end, looking for x. Using a loop invariant, prove that

your algorithm is correct. Make sure that your loop invariant fulfills the

three necessary properties.

2.1-5

Consider the problem of adding two n-bit binary integers a and b, stored in two n-element arrays A[0 : n – 1] and B[0 : n – 1], where each element is either 0 or 1,

, and

. The

sum c = a + b of the two integers should be stored in binary form in an ( n + 1)-element array C [0 : n], where

. Write a procedure

ADD-BINARY-INTEGERS that takes as input arrays A and B, along

with the length n, and returns array C holding the sum.

2.2 Analyzing algorithms

Analyzing an algorithm has come to mean predicting the resources that the algorithm requires. You might consider resources such as memory,

communication bandwidth, or energy consumption. Most often,

however, you’ll want to measure computational time. If you analyze

several candidate algorithms for a problem, you can identify the most

efficient one. There might be more than just one viable candidate, but

you can often rule out several inferior algorithms in the process.

Before you can analyze an algorithm, you need a model of the

technology that it runs on, including the resources of that technology

and a way to express their costs. Most of this book assumes a generic

one-processor, random-access machine (RAM) model of computation

as the implementation technology, with the understanding that

algorithms are implemented as computer programs. In the RAM model,

instructions execute one after another, with no concurrent operations.

The RAM model assumes that each instruction takes the same amount

of time as any other instruction and that each data access—using the

value of a variable or storing into a variable—takes the same amount of

time as any other data access. In other words, in the RAM model each

instruction or data access takes a constant amount of time—even

indexing into an array. 9

Strictly speaking, we should precisely define the instructions of the

RAM model and their costs. To do so, however, would be tedious and

yield little insight into algorithm design and analysis. Yet we must be

careful not to abuse the RAM model. For example, what if a RAM had

an instruction that sorts? Then you could sort in just one step. Such a

RAM would be unrealistic, since such instructions do not appear in real

computers. Our guide, therefore, is how real computers are designed.

The RAM model contains instructions commonly found in real

computers: arithmetic (such as add, subtract, multiply, divide,

remainder, floor, ceiling), data movement (load, store, copy), and

control (conditional and unconditional branch, subroutine call and

return).

The data types in the RAM model are integer, floating point (for

storing real-number approximations), and character. Real computers do

not usually have a separate data type for the boolean values TRUE and

FALSE. Instead, they often test whether an integer value is 0 (FALSE)

or nonzero (TRUE), as in C. Although we typically do not concern

ourselves with precision for floating-point values in this book (many

numbers cannot be represented exactly in floating point), precision is

crucial for most applications. We also assume that each word of data

has a limit on the number of bits. For example, when working with

inputs of size n, we typically assume that integers are represented by c

log2 n bits for some constant c ≥ 1. We require c ≥ 1 so that each word can hold the value of n, enabling us to index the individual input

elements, and we restrict c to be a constant so that the word size does

not grow arbitrarily. (If the word size could grow arbitrarily, we could

store huge amounts of data in one word and operate on it all in

constant time—an unrealistic scenario.)

Real computers contain instructions not listed above, and such

instructions represent a gray area in the RAM model. For example, is

exponentiation a constant-time instruction? In the general case, no: to

compute xn when x and n are general integers typically takes time logarithmic in n (see equation (31.34) on page 934), and you must worry

about whether the result fits into a computer word. If n is an exact power of 2, however, exponentiation can usually be viewed as a

constant-time operation. Many computers have a “shift left”

instruction, which in constant time shifts the bits of an integer by n

positions to the left. In most computers, shifting the bits of an integer

by 1 position to the left is equivalent to multiplying by 2, so that shifting

the bits by n positions to the left is equivalent to multiplying by 2 n.

Therefore, such computers can compute 2 n in 1 constant-time

instruction by shifting the integer 1 by n positions to the left, as long as

n is no more than the number of bits in a computer word. We’ll try to

avoid such gray areas in the RAM model and treat computing 2 n and

multiplying by 2 n as constant-time operations when the result is small

enough to fit in a computer word.

The RAM model does not account for the memory hierarchy that is

common in contemporary computers. It models neither caches nor

virtual memory. Several other computational models attempt to

account for memory-hierarchy effects, which are sometimes significant

in real programs on real machines. Section 11.5 and a handful of problems in this book examine memory-hierarchy effects, but for the

most part, the analyses in this book do not consider them. Models that

include the memory hierarchy are quite a bit more complex than the

RAM model, and so they can be difficult to work with. Moreover,

RAM-model analyses are usually excellent predictors of performance

on actual machines.

Although it is often straightforward to analyze an algorithm in the

RAM model, sometimes it can be quite a challenge. You might need to

employ mathematical tools such as combinatorics, probability theory,

algebraic dexterity, and the ability to identify the most significant terms

in a formula. Because an algorithm might behave differently for each

possible input, we need a means for summarizing that behavior in

simple, easily understood formulas.

Analysis of insertion sort

How long does the INSERTION-SORT procedure take? One way to tell

would be for you to run it on your computer and time how long it takes

to run. Of course, you’d first have to implement it in a real programming

language, since you cannot run our pseudocode directly. What would

such a timing test tell you? You would find out how long insertion sort

takes to run on your particular computer, on that particular input,

under the particular implementation that you created, with the

particular compiler or interpreter that you ran, with the particular

libraries that you linked in, and with the particular background tasks

that were running on your computer concurrently with your timing test

(such as checking for incoming information over a network). If you run

insertion sort again on your computer with the same input, you might

even get a different timing result. From running just one

implementation of insertion sort on just one computer and on just one

input, what would you be able to determine about insertion sort’s

running time if you were to give it a different input, if you were to run it

on a different computer, or if you were to implement it in a different

programming language? Not much. We need a way to predict, given a

new input, how long insertion sort will take.

Instead of timing a run, or even several runs, of insertion sort, we

can determine how long it takes by analyzing the algorithm itself. We’ll

examine how many times it executes each line of pseudocode and how

long each line of pseudocode takes to run. We’ll first come up with a

precise but complicated formula for the running time. Then, we’ll distill

the important part of the formula using a convenient notation that can

help us compare the running times of different algorithms for the same

problem.

How do we analyze insertion sort? First, let’s acknowledge that the

running time depends on the input. You shouldn’t be terribly surprised

that sorting a thousand numbers takes longer than sorting three

numbers. Moreover, insertion sort can take different amounts of time to

sort two input arrays of the same size, depending on how nearly sorted

they already are. Even though the running time can depend on many

features of the input, we’ll focus on the one that has been shown to have

the greatest effect, namely the size of the input, and describe the

running time of a program as a function of the size of its input. To do

so, we need to define the terms “running time” and “input size” more

carefully. We also need to be clear about whether we are discussing the

running time for an input that elicits the worst-case behavior, the best-

case behavior, or some other case.

The best notion for input size depends on the problem being studied.

For many problems, such as sorting or computing discrete Fourier

transforms, the most natural measure is the number of items in the input

—for example, the number n of items being sorted. For many other

problems, such as multiplying two integers, the best measure of input

size is the total number of bits needed to represent the input in ordinary

binary notation. Sometimes it is more appropriate to describe the size of

the input with more than just one number. For example, if the input to

an algorithm is a graph, we usually characterize the input size by both

the number of vertices and the number of edges in the graph. We’ll

indicate which input size measure is being used with each problem we

study.

The running time of an algorithm on a particular input is the number of instructions and data accesses executed. How we account for these

costs should be independent of any particular computer, but within the

framework of the RAM model. For the moment, let us adopt the

following view. A constant amount of time is required to execute each

line of our pseudocode. One line might take more or less time than

another line, but we’ll assume that each execution of the k th line takes

ck time, where ck is a constant. This viewpoint is in keeping with the

RAM model, and it also reflects how the pseudocode would be

implemented on most actual computers. 10

Let’s analyze the INSERTION-SORT procedure. As promised, we’ll

start by devising a precise formula that uses the input size and all the

statement costs ck. This formula turns out to be messy, however. We’ll

then switch to a simpler notation that is more concise and easier to use.

This simpler notation makes clear how to compare the running times of

algorithms, especially as the size of the input increases.

To analyze the INSERTION-SORT procedure, let’s view it on the

following page with the time cost of each statement and the number of

times each statement is executed. For each i = 2, 3, … , n, let ti denote the number of times the while loop test in line 5 is executed for that

value of i. When a for or while loop exits in the usual way—because the

test in the loop header comes up FALSE—the test is executed one time

more than the loop body. Because comments are not executable

statements, assume that they take no time.

The running time of the algorithm is the sum of running times for

each statement executed. A statement that takes ck steps to execute and

executes m times contributes ckm to the total running time. 11 We usually denote the running time of an algorithm on an input of size n by

T ( n). To compute T ( n), the running time of INSERTION-SORT on an input of n values, we sum the products of the cost and times columns, obtaining

INSERTION-SORT( A, n)

costtimes

Image 14

Image 15

Image 16

Image 17

Image 18

1 for i = 2 to n

c 1 n

2

key = A[ i]

c 2 n – 1

3

// Insert A[ i] into the sorted subarray A[1 : i – 1]. 0 n – 1

4

j = i – 1

c 4 n – 1

5

while j > 0 and A[ j] > key

c 5

6

A[ j + 1] = A[ j]

c 6

7

j = j – 1

c 7

8

A[ j + 1] = key

c 8 n – 1

Even for inputs of a given size, an algorithm’s running time may

depend on which input of that size is given. For example, in

INSERTION-SORT, the best case occurs when the array is already

sorted. In this case, each time that line 5 executes, the value of key—the

value originally in A[ i]—is already greater than or equal to all values in A[1 : i – 1], so that the while loop of lines 5–7 always exits upon the first test in line 5. Therefore, we have that ti = 1 for i = 2, 3, … , n, and the best-case running time is given by

We can express this running time as an + b for constants a and b that depend on the statement costs ck (where a = c 1 + c 2 + c 4 + c 5 + c 8 and b = c 2 + c 4 + c 5 + c 8). The running time is thus a linear function of n.

The worst case arises when the array is in reverse sorted order—that

is, it starts out in decreasing order. The procedure must compare each

element A[ i] with each element in the entire sorted subarray A[1 : i – 1], and so ti = i for i = 2, 3, … , n. (The procedure finds that A[ j] > key

Image 19

Image 20

Image 21

every time in line 5, and the while loop exits only when j reaches 0.) Noting that

and

we find that in the worst case, the running time of INSERTION-SORT

is

We can express this worst-case running time as an 2 + bn + c for constants a, b, and c that again depend on the statement costs ck (now, a = c 5/2 + c 6/2 + c 7/2, b = c 1 + c 2 + c 4 + c 5/2 – c 6/2 – c 7/2 + c 8, and c

= –( c 2 + c 4 + c 5 + c 8)). The running time is thus a quadratic function of n. Typically, as in insertion sort, the running time of an algorithm is

fixed for a given input, although we’ll also see some interesting

“randomized” algorithms whose behavior can vary even for a fixed

input.

Worst-case and average-case analysis

Our analysis of insertion sort looked at both the best case, in which the input array was already sorted, and the worst case, in which the input

array was reverse sorted. For the remainder of this book, though, we’ll

usually (but not always) concentrate on finding only the worst-case

running time, that is, the longest running time for any input of size n.

Why? Here are three reasons:

The worst-case running time of an algorithm gives an upper

bound on the running time for any input. If you know it, then you

have a guarantee that the algorithm never takes any longer. You

need not make some educated guess about the running time and

hope that it never gets much worse. This feature is especially

important for real-time computing, in which operations must

complete by a deadline.

For some algorithms, the worst case occurs fairly often. For

example, in searching a database for a particular piece of

information, the searching algorithm’s worst case often occurs

when the information is not present in the database. In some

applications, searches for absent information may be frequent.

The “average case” is often roughly as bad as the worst case.

Suppose that you run insertion sort on an array of n randomly

chosen numbers. How long does it take to determine where in

subarray A[1 : i – 1] to insert element A[ i]? On average, half the elements in A[1 : i – 1] are less than A[ i], and half the elements are greater. On average, therefore, A[ i] is compared with just half of

the subarray A[1 : i – 1], and so ti is about i/2. The resulting average-case running time turns out to be a quadratic function of

the input size, just like the worst-case running time.

In some particular cases, we’ll be interested in the average-case

running time of an algorithm. We’ll see the technique of probabilistic

analysis applied to various algorithms throughout this book. The scope

of average-case analysis is limited, because it may not be apparent what

constitutes an “average” input for a particular problem. Often, we’ll

assume that all inputs of a given size are equally likely. In practice, this

assumption may be violated, but we can sometimes use a randomized

algorithm, which makes random choices, to allow a probabilistic analysis and yield an expected running time. We explore randomized

algorithms more in Chapter 5 and in several other subsequent chapters.

Order of growth

In order to ease our analysis of the INSERTION-SORT procedure, we

used some simplifying abstractions. First, we ignored the actual cost of

each statement, using the constants ck to represent these costs. Still, the

best-case and worst-case running times in equations (2.1) and (2.2) are

rather unwieldy. The constants in these expressions give us more detail

than we really need. That’s why we also expressed the best-case running

time as an + b for constants a and b that depend on the statement costs ck and why we expressed the worst-case running time as an 2 + bn + c for constants a, b, and c that depend on the statement costs. We thus ignored not only the actual statement costs, but also the abstract costs

ck.Let’s now make one more simplifying abstraction: it is the rate of

growth, or order of growth, of the running time that really interests us.

We therefore consider only the leading term of a formula (e.g., an 2), since the lower-order terms are relatively insignificant for large values of

n. We also ignore the leading term’s constant coefficient, since constant

factors are less significant than the rate of growth in determining

computational efficiency for large inputs. For insertion sort’s worst-case

running time, when we ignore the lower-order terms and the leading

term’s constant coefficient, only the factor of n 2 from the leading term

remains. That factor, n 2, is by far the most important part of the

running time. For example, suppose that an algorithm implemented on

a particular machine takes n 2/100 + 100 n + 17 microseconds on an input of size n. Although the coefficients of 1/100 for the n 2 term and 100 for the n term differ by four orders of magnitude, the n 2/100 term dominates the 100 n term once n exceeds 10,000. Although 10,000 might

seem large, it is smaller than the population of an average town. Many

real-world problems have much larger input sizes.

To highlight the order of growth of the running time, we have a special notation that uses the Greek letter Θ (theta). We write that

insertion sort has a worst-case running time of Θ( n 2) (pronounced

“theta of n-squared” or just “theta n-squared”). We also write that insertion sort has a best-case running time of Θ( n) (“theta of n” or

“theta n”). For now, think of Θ-notation as saying “roughly

proportional when n is large,” so that Θ( n 2) means “roughly

proportional to n 2 when n is large” and Θ( n) means “roughly proportional to n when n is large” We’ll use Θ-notation informally in this chapter and define it precisely in Chapter 3.

We usually consider one algorithm to be more efficient than another

if its worst-case running time has a lower order of growth. Due to

constant factors and lower-order terms, an algorithm whose running

time has a higher order of growth might take less time for small inputs

than an algorithm whose running time has a lower order of growth. But

on large enough inputs, an algorithm whose worst-case running time is

Θ( n 2), for example, takes less time in the worst case than an algorithm

whose worst-case running time is Θ( n 3). Regardless of the constants

hidden by the Θ-notation, there is always some number, say n 0, such that for all input sizes nn 0, the Θ( n 2) algorithm beats the Θ( n 3) algorithm in the worst case.

Exercises

2.2-1

Express the function n 3/1000 + 100 n 2 – 100 n + 3 in terms of Θ-notation.

2.2-2

Consider sorting n numbers stored in array A[1 : n] by first finding the smallest element of A[1 : n] and exchanging it with the element in A[1].

Then find the smallest element of A[2 : n], and exchange it with A[2].

Then find the smallest element of A[3 : n], and exchange it with A[3].

Continue in this manner for the first n – 1 elements of A. Write

pseudocode for this algorithm, which is known as selection sort. What loop invariant does this algorithm maintain? Why does it need to run

for only the first n – 1 elements, rather than for all n elements? Give the worst-case running time of selection sort in Θ-notation. Is the best-case

running time any better?

2.2-3

Consider linear search again (see Exercise 2.1-4). How many elements of

the input array need to be checked on the average, assuming that the

element being searched for is equally likely to be any element in the

array? How about in the worst case? Using Θ-notation, give the average-

case and worst-case running times of linear search. Justify your answers.

2.2-4

How can you modify any sorting algorithm to have a good best-case

running time?

2.3 Designing algorithms

You can choose from a wide range of algorithm design techniques.

Insertion sort uses the incremental method: for each element A[ i], insert it into its proper place in the subarray A[1 : i], having already sorted the subarray A[1 : i – 1].

This section examines another design method, known as “divide-

and-conquer,” which we explore in more detail in Chapter 4. We’ll use divide-and-conquer to design a sorting algorithm whose worst-case

running time is much less than that of insertion sort. One advantage of

using an algorithm that follows the divide-and-conquer method is that

analyzing its running time is often straightforward, using techniques

that we’ll explore in Chapter 4.

2.3.1 The divide-and-conquer method

Many useful algorithms are recursive in structure: to solve a given

problem, they recurse (call themselves) one or more times to handle

closely related subproblems. These algorithms typically follow the

divide-and-conquer method: they break the problem into several subproblems that are similar to the original problem but smaller in size,

solve the subproblems recursively, and then combine these solutions to

create a solution to the original problem.

In the divide-and-conquer method, if the problem is small enough—

the base case—you just solve it directly without recursing. Otherwise—

the recursive case—you perform three characteristic steps:

Divide the problem into one or more subproblems that are smaller

instances of the same problem.

Conquer the subproblems by solving them recursively.

Combine the subproblem solutions to form a solution to the original

problem.

The merge sort algorithm closely follows the divide-and-conquer

method. In each step, it sorts a subarray A[ p : r], starting with the entire array A[1 : n] and recursing down to smaller and smaller subarrays.

Here is how merge sort operates:

Divide the subarray A[ p : r] to be sorted into two adjacent subarrays, each of half the size. To do so, compute the midpoint q of A[ p : r]

(taking the average of p and r), and divide A[ p : r] into subarrays A[ p : q] and A[ q + 1 : r].

Conquer by sorting each of the two subarrays A[ p : q] and A[ q + 1 : r]

recursively using merge sort.

Combine by merging the two sorted subarrays A[ p : q] and A[ q + 1 : r]

back into A[ p : r], producing the sorted answer.

The recursion “bottoms out”—it reaches the base case—when the

subarray A[ p : r] to be sorted has just 1 element, that is, when p equals r.

As we noted in the initialization argument for INSERTION-SORT’s

loop invariant, a subarray comprising just a single element is always

sorted.

The key operation of the merge sort algorithm occurs in the

“combine” step, which merges two adjacent, sorted subarrays. The

merge operation is performed by the auxiliary procedure MERGE( A, p,

q, r) on the following page, where A is an array and p, q, and r are indices into the array such that pq < r. The procedure assumes that the adjacent subarrays A[ p : q] and A[ q + 1 : r] were already recursively sorted. It merges the two sorted subarrays to form a single sorted

subarray that replaces the current subarray A[ p : r].

To understand how the MERGE procedure works, let’s return to our

card-playing motif. Suppose that you have two piles of cards face up on

a table. Each pile is sorted, with the smallest-value cards on top. You

wish to merge the two piles into a single sorted output pile, which is to

be face down on the table. The basic step consists of choosing the

smaller of the two cards on top of the face-up piles, removing it from its

pile—which exposes a new top card—and placing this card face down

onto the output pile. Repeat this step until one input pile is empty, at

which time you can just take the remaining input pile and flip over the

entire pile, placing it face down onto the output pile.

Let’s think about how long it takes to merge two sorted piles of

cards. Each basic step takes constant time, since you are comparing just

the two top cards. If the two sorted piles that you start with each have

n/2 cards, then the number of basic steps is at least n/2 (since in whichever pile was emptied, every card was found to be smaller than

some card from the other pile) and at most n (actually, at most n – 1,

since after n – 1 basic steps, one of the piles must be empty). With each

basic step taking constant time and the total number of basic steps

being between n/2 and n, we can say that merging takes time roughly proportional to n. That is, merging takes Θ( n) time.

In detail, the MERGE procedure works as follows. It copies the two

subarrays A[ p : q] and A[ q + 1 : r] into temporary arrays L and R (“left”

and “right”), and then it merges the values in L and R back into A[ p : r].

Lines 1 and 2 compute the lengths nL and nR of the subarrays A[ p : q]

and A[ q + 1 : r], respectively. Then line 3 creates arrays L[0 : nL – 1] and R[0 : nR – 1] with respective lengths nL and nR. 12 The for loop of lines 4–5 copies the subarray A[ p : q] into L, and the for loop of lines 6–7

copies the subarray A[ q + 1 : r] into R.

MERGE( A, p, q, r)

1 nL = qp + 1

// length of A[ p : q]

2 nR = rq

// length of A[ q + 1 : r]

3 let L[0 : nL – 1] and R[0 : nR – 1] be new arrays

4 for i = 0 to nL – 1 // copy A[ p : q] into L[0 : nL – 1]

5

L[ i] = A[ p + i]

6 for j = 0 to nR – 1 // copy A[ q + 1 : r] into R[0 : nR – 1]

7

R[ j] = A[ q + j + 1]

8 i = 0

// i indexes the smallest remaining element in L

9 j = 0

// j indexes the smallest remaining element in R

10 k = p

// k indexes the location in A to fill

11 // As long as each of the arrays L and R contains an unmerged element,

// copy the smallest unmerged element back into A[ p : r].

12 while i < nL and j < nR

13

if L[ i] ≤ R[ j]

14

A[ k] = L[ i]

15

i = i + 1

16

else A[ k] = R[ j]

17

j = j + 1

18

k = k + 1

19 // Having gone through one of L and R entirely, copy the

// remainder of the other to the end of A[ p : r].

20 while i < nL

21

A[ k] = L[ i]

22

i = i + 1

23

k = k + 1

24 while j < nR

25

A[ k] = R[ j]

26

j = j + 1

27

k = k + 1

Image 22

Lines 8–18, illustrated in Figure 2.3, perform the basic steps. The while loop of lines 12–18 repeatedly identifies the smallest value in L

and R that has yet to be copied back into A[ p : r] and copies it back in.

As the comments indicate, the index k gives the position of A that is being filled in, and the indices i and j give the positions in L and R, respectively, of the smallest remaining values. Eventually, either all of L

or all of R is copied back into A[ p : r], and this loop terminates. If the loop terminates because all of R has been copied back, that is, because j

equals nR, then i is still less than nL, so that some of L has yet to be copied back, and these values are the greatest in both L and R. In this case, the while loop of lines 20–23 copies these remaining values of L

into the last few positions of A[ p : r]. Because j equals nR, the while loop of lines 24–27 iterates 0 times. If instead the while loop of lines 12–18

terminates because i equals nL, then all of L has already been copied back into A[ p : r], and the while loop of lines 24–27 copies the remaining values of R back into the end of A[ p : r].

Image 23

Figure 2.3 The operation of the while loop in lines 8–18 in the call MERGE( A, 9, 12, 16), when the subarray A[9 : 16] contains the values 〈2, 4, 6, 7, 1, 2, 3, 5〉. After allocating and copying into the arrays L and R, the array L contains 〈2, 4, 6, 7〉, and the array R contains 〈1, 2, 3, 5〉. Tan positions in A contain their final values, and tan positions in L and R contain values that have yet to be copied back into A. Taken together, the tan positions always comprise the values originally in A[9 : 16]. Blue positions in A contain values that will be copied over, and dark positions in L and R contain values that have already been copied back into A. (a)–(g) The arrays A, L, and R, and their respective indices k, i, and j prior to each iteration of the loop of lines 12–18. At the point in part (g), all values in R have been copied back into A (indicated by j equaling the length of R), and so the while loop in lines 12–18 terminates. (h) The arrays and indices at termination. The while loops of lines 20–23 and 24–27 copied back into A the remaining values in L and R, which are the largest values originally in A[9 : 16]. Here, lines 20–

23 copied L[2 : 3] into A[15 : 16], and because all values in R had already been copied back into A, the while loop of lines 24–27 iterated 0 times. At this point, the subarray in A[9 : 16] is sorted.

To see that the MERGE procedure runs in Θ( n) time, where n = rp

+ 1,13 observe that each of lines 1–3 and 8–10 takes constant time, and the for loops of lines 4–7 take Θ( nL + nR) = Θ( n) time. 14 To account for the three while loops of lines 12–18, 20–23, and 24–27, observe that each

iteration of these loops copies exactly one value from L or R back into A and that every value is copied back into A exactly once. Therefore, these three loops together make a total of n iterations. Since each

iteration of each of the three loops takes constant time, the total time

spent in these three loops is Θ( n).

We can now use the MERGE procedure as a subroutine in the merge

sort algorithm. The procedure MERGE-SORT( A, p, r) on the facing page sorts the elements in the subarray A[ p : r]. If p equals r, the subarray has just 1 element and is therefore already sorted. Otherwise,

we must have p < r, and MERGE-SORT runs the divide, conquer, and

combine steps. The divide step simply computes an index q that

partitions A[ p : r] into two adjacent subarrays: A[ p : q], containing ⌈ n/2⌉

elements, and A[ q + 1 : r], containing ⌊ n/2⌊ elements.15 The initial call MERGE-SORT( A, 1, n) sorts the entire array A[1 : n].

Figure 2.4 illustrates the operation of the procedure for n = 8, showing also the sequence of divide and merge steps. The algorithm

recursively divides the array down to 1-element subarrays. The combine

steps merge pairs of 1-element subarrays to form sorted subarrays of

length 2, merges those to form sorted subarrays of length 4, and merges

those to form the final sorted subarray of length 8. If n is not an exact

power of 2, then some divide steps create subarrays whose lengths differ

by 1. (For example, when dividing a subarray of length 7, one subarray

has length 4 and the other has length 3.) Regardless of the lengths of the

two subarrays being merged, the time to merge a total of n items is Θ( n).

MERGE-SORT( A, p, r)

1 if pr

// zero or one element?

2

return

3 q = ⌊( p + r)/2⌊

// midpoint of A[ p : r]

4 MERGE-SORT( A, p, q)

// recursively sort A[ p : q]

5 MERGE-SORT( A, q + 1, r)

// recursively sort A[ q + 1 : r]

6 // Merge A[ p : q] and A[ q + 1 : r] into A[ p : r].

7 MERGE( A, p, q, r)

2.3.2 Analyzing divide-and-conquer algorithms

When an algorithm contains a recursive call, you can often describe its

running time by a recurrence equation or recurrence, which describes the overall running time on a problem of size n in terms of the running time

of the same algorithm on smaller inputs. You can then use mathematical

tools to solve the recurrence and provide bounds on the performance of

the algorithm.

A recurrence for the running time of a divide-and-conquer algorithm

falls out from the three steps of the basic method. As we did for

insertion sort, let T ( n) be the worst-case running time on a problem of size n. If the problem size is small enough, say n < n 0 for some constant n 0 > 0, the straightforward solution takes constant time, which we write

Image 24

as Θ(1). 16 Suppose that the division of the problem yields a subproblems, each with size n/ b, that is, 1/ b the size of the original. For merge sort, both a and b are 2, but we’ll see other divide-and-conquer

algorithms in which ab. It takes T ( n/ b) time to solve one subproblem of size n/ b, and so it takes aT ( n/ b) time to solve all a of them. If it takes D( n) time to divide the problem into subproblems and C( n) time to combine the solutions to the subproblems into the solution to the

original problem, we get the recurrence

Chapter 4 shows how to solve common recurrences of this form.

Image 25

Figure 2.4 The operation of merge sort on the array A with length 8 that initially contains the sequence 〈12, 3, 7, 9, 14, 6, 11, 2〉. The indices p, q, and r into each subarray appear above their values. Numbers in italics indicate the order in which the MERGE-SORT and MERGE

procedures are called following the initial call of MERGE-SORT( A, 1, 8).

Sometimes, the n/ b size of the divide step isn’t an integer. For example, the MERGE-SORT procedure divides a problem of size n into

subproblems of sizes ⌈ n/2⌉ and ⌊ n/2⌊. Since the difference between ⌈ n/2⌉

and ⌊ n/2⌊ is at most 1, which for large n is much smaller than the effect of dividing n by 2, we’ll squint a little and just call them both size n/2.

As Chapter 4 will discuss, this simplification of ignoring floors and

Image 26

ceilings does not generally affect the order of growth of a solution to a

divide-and-conquer recurrence.

Another convention we’ll adopt is to omit a statement of the base

cases of the recurrence, which we’ll also discuss in more detail in

Chapter 4. The reason is that the base cases are pretty much always T

( n) = Θ(1) if n < n 0 for some constant n 0 > 0. That’s because the running time of an algorithm on an input of constant size is constant.

We save ourselves a lot of extra writing by adopting this convention.

Analysis of merge sort

Here’s how to set up the recurrence for T ( n), the worst-case running time of merge sort on n numbers.

Divide: The divide step just computes the middle of the subarray, which

takes constant time. Thus, D( n) = Θ(1).

Conquer: Recursively solving two subproblems, each of size n/2,

contributes 2 T ( n/2) to the running time (ignoring the floors and ceilings, as we discussed).

Combine: Since the MERGE procedure on an n-element subarray takes

Θ( n) time, we have C( n) = Θ( n).

When we add the functions D( n) and C( n) for the merge sort analysis, we are adding a function that is Θ( n) and a function that is Θ(1). This sum is a linear function of n. That is, it is roughly

proportional to n when n is large, and so merge sort’s dividing and combining times together are Θ( n). Adding Θ( n) to the 2 T ( n/2) term from the conquer step gives the recurrence for the worst-case running

time T ( n) of merge sort:

Chapter 4 presents the “master theorem,” which shows that T ( n) = Θ( n lg n).17 Compared with insertion sort, whose worst-case running time is Θ( n 2), merge sort trades away a factor of n for a factor of lg n. Because the logarithm function grows more slowly than any linear function,

that’s a good trade. For large enough inputs, merge sort, with its Θ( n lg

Image 27

n) worst-case running time, outperforms insertion sort, whose worst-

case running time is Θ( n 2).

We do not need the master theorem, however, to understand

intuitively why the solution to recurrence (2.3) is T ( n) = Θ( n lg n). For simplicity, assume that n is an exact power of 2 and that the implicit base case is n = 1. Then recurrence (2.3) is essentially

where the constant c 1 > 0 represents the time required to solve a problem of size 1, and c 2 > 0 is the time per array element of the divide

and combine steps.18

Figure 2.5 illustrates one way of figuring out the solution to recurrence (2.4). Part (a) of the figure shows T ( n), which part (b) expands into an equivalent tree representing the recurrence. The c 2 n term denotes the cost of dividing and combining at the top level of

recursion, and the two subtrees of the root are the two smaller

recurrences T ( n/2). Part (c) shows this process carried one step further by expanding T ( n/2). The cost for dividing and combining at each of

the two nodes at the second level of recursion is c 2 n/2. Continue to expand each node in the tree by breaking it into its constituent parts as

determined by the recurrence, until the problem sizes get down to 1,

each with a cost of c 1. Part (d) shows the resulting recursion tree.

Next, add the costs across each level of the tree. The top level has

total cost c 2 n, the next level down has total cost c 2( n/2) + c 2( n/2) = c 2 n, the level after that has total cost c 2( n/4) + c 2( n/4) + c 2( n/4) + c 2( n/4) =

c 2 n, and so on. Each level has twice as many nodes as the level above, but each node contributes only half the cost of a node from the level

above. From one level to the next, doubling and halving cancel each

other out, so that the cost across each level is the same: c 2 n. In general, the level that is i levels below the top has 2 i nodes, each contributing a cost of c 2( n/2 i), so that the i th level below the top has total cost 2 i ·

c 2( n/2 i) = c 2 n. The bottom level has n nodes, each contributing a cost of c 1, for a total cost of c 1 n.

The total number of levels of the recursion tree in Figure 2.5 is lg n +

1, where n is the number of leaves, corresponding to the input size. An

informal inductive argument justifies this claim. The base case occurs

when n = 1, in which case the tree has only 1 level. Since lg 1 = 0, we

have that lg n + 1 gives the correct number of levels. Now assume as an

inductive hypothesis that the number of levels of a recursion tree with 2 i

leaves is lg 2 i + 1 = i + 1 (since for any value of i, we have that lg 2 i = i).

Because we assume that the input size is an exact power of 2, the next

input size to consider is 2 i + 1. A tree with n = 2 i + 1 leaves has 1 more level than a tree with 2 i leaves, and so the total number of levels is ( i + 1)

+ 1 = lg 2 i + 1 + 1.

Image 28

Figure 2.5 How to construct a recursion tree for the recurrence (2.4). Part (a) shows T ( n), which progressively expands in (b)–(d) to form the recursion tree. The fully expanded tree in part (d) has lg n + 1 levels. Each level above the leaves contributes a total cost of c 2 n, and the leaf level contributes c 1 n. The total cost, therefore, is c 2 n lg n + c 1 n = Θ( n lg n).

Image 29

To compute the total cost represented by the recurrence (2.4), simply

add up the costs of all the levels. The recursion tree has lg n + 1 levels.

The levels above the leaves each cost c 2 n, and the leaf level costs c 1 n, for a total cost of c 2 n lg n + c 1 n = Θ( n lg n).

Exercises

2.3-1

Using Figure 2.4 as a model, illustrate the operation of merge sort on an array initially containing the sequence 〈3, 41, 52, 26, 38, 57, 9, 49〉.

2.3-2

The test in line 1 of the MERGE-SORT procedure reads “if pr

rather than “if pr.” If MERGE-SORT is called with p > r, then the subarray A[ p : r] is empty. Argue that as long as the initial call of MERGE-SORT( A, 1, n) has n ≥ 1, the test “if pr” suffices to ensure that no recursive call has p > r.

2.3-3

State a loop invariant for the while loop of lines 12–18 of the MERGE

procedure. Show how to use it, along with the while loops of lines 20–23

and 24–27, to prove that the MERGE procedure is correct.

2.3-4

Use mathematical induction to show that when n ≥ 2 is an exact power

of 2, the solution of the recurrence

is T( n) = n lg n.

2.3-5

You can also think of insertion sort as a recursive algorithm. In order to

sort A[1 : n], recursively sort the subarray A[1 : n – 1] and then insert A[ n] into the sorted subarray A[1 : n – 1]. Write pseudocode for this

recursive version of insertion sort. Give a recurrence for its worst-case running time.

2.3-6

Referring back to the searching problem (see Exercise 2.1-4), observe

that if the subarray being searched is already sorted, the searching

algorithm can check the midpoint of the subarray against v and

eliminate half of the subarray from further consideration. The binary

search algorithm repeats this procedure, halving the size of the

remaining portion of the subarray each time. Write pseudocode, either

iterative or recursive, for binary search. Argue that the worst-case

running time of binary search is Θ(lg n).

2.3-7

The while loop of lines 5–7 of the INSERTION-SORT procedure in

Section 2.1 uses a linear search to scan (backward) through the sorted subarray A[1 : j – 1]. What if insertion sort used a binary search (see Exercise 2.3-6) instead of a linear search? Would that improve the

overall worst-case running time of insertion sort to Θ( n lg n)?

2.3-8

Describe an algorithm that, given a set S of n integers and another integer x, determines whether S contains two elements that sum to exactly x. Your algorithm should take Θ( n lg n) time in the worst case.

Problems

2-1 Insertion sort on small arrays in merge sort

Although merge sort runs in Θ( n lg n) worst-case time and insertion sort runs in Θ( n 2) worst-case time, the constant factors in insertion sort can

make it faster in practice for small problem sizes on many machines.

Thus it makes sense to coarsen the leaves of the recursion by using insertion sort within merge sort when subproblems become sufficiently

small. Consider a modification to merge sort in which n/ k sublists of

Image 30

length k are sorted using insertion sort and then merged using the

standard merging mechanism, where k is a value to be determined.

a. Show that insertion sort can sort the n/ k sublists, each of length k, in Θ( nk) worst-case time.

b. Show how to merge the sublists in Θ( n lg( n/ k)) worst-case time.

c. Given that the modified algorithm runs in Θ( nk + n lg( n/ k)) worst-case time, what is the largest value of k as a function of n for which

the modified algorithm has the same running time as standard merge

sort, in terms of Θ-notation?

d. How should you choose k in practice?

2-2 Correctness of bubblesort

Bubblesort is a popular, but inefficient, sorting algorithm. It works by

repeatedly swapping adjacent elements that are out of order. The

procedure BUBBLESORT sorts array A[1 : n].

BUBBLESORT( A, n)

1 for i = 1 to n – 1

2

for j = n downto i + 1

3

if A[ j] < A[ j – 1]

4

exchange A[ j] with A[ j – 1]

a. Let A′ denote the array A after BUBBLESORT( A, n) is executed. To prove that

In order to show that BUBBLESORT actually sorts, what else do you

need to prove?

The next two parts prove inequality (2.5).

b. State precisely a loop invariant for the for loop in lines 2–4, and prove

that this loop invariant holds. Your proof should use the structure of

the loop-invariant proof presented in this chapter.

Image 31

Image 32

c. Using the termination condition of the loop invariant proved in part

(b), state a loop invariant for the for loop in lines 1–4 that allows you

to prove inequality (2.5). Your proof should use the structure of the

loop-invariant proof presented in this chapter.

d. What is the worst-case running time of BUBBLESORT? How does it

compare with the running time of INSERTION-SORT?

2-3 Correctness of Horner’s rule

You are given the coefficents a 0, a 1, a 2, … , an of a polynomial and you want to evaluate this polynomial for a given value of x.

Horner’s rule says to evaluate the polynomial according to this

parenthesization:

The procedure HORNER implements Horner’s rule to evaluate P( x), given the coefficients a 0, a 1, a 2, … , an in an array A[0 : n] and the value of x.

HORNER( A, n, x)

1

p = 0

2

for i = n downto 0

3

p = A[ i] + x · p

4

return p

a. In terms of Θ-notation, what is the running time of this procedure?

b. Write pseudocode to implement the naive polynomial-evaluation

algorithm that computes each term of the polynomial from scratch.

What is the running time of this algorithm? How does it compare with

HORNER?

Image 33

Image 34

c. Consider the following loop invariant for the procedure HORNER:

At the start of each iteration of the for loop of lines 2–3,

Interpret a summation with no terms as equaling 0. Following the

structure of the loop-invariant proof presented in this chapter, use this

loop invariant to show that, at termination,

.

2-4 Inversions

Let A[1 : n] be an array of n distinct numbers. If i < j and A[ i] > A[ j], then the pair ( i, j) is called an inversion of A.

a. List the five inversions of the array 〈2, 3, 8, 6, 1〉.

b. What array with elements from the set {1, 2, … , n} has the most

inversions? How many does it have?

c. What is the relationship between the running time of insertion sort

and the number of inversions in the input array? Justify your answer.

d. Give an algorithm that determines the number of inversions in any

permutation on n elements in Θ( n lg n) worst-case time. ( Hint: Modify merge sort.)

Chapter notes

In 1968, Knuth published the first of three volumes with the general title

The Art of Computer Programming [259, 260, 261]. The first volume ushered in the modern study of computer algorithms with a focus on

the analysis of running time. The full series remains an engaging and

worthwhile reference for many of the topics presented here. According

to Knuth, the word “algorithm” is derived from the name “al-

Khowârizmî,” a ninth-century Persian mathematician.

Aho, Hopcroft, and Ullman [5] advocated the asymptotic analysis of algorithms—using notations that Chapter 3 introduces, including Θ-notation—as a means of comparing relative performance. They also

popularized the use of recurrence relations to describe the running times of recursive algorithms.

Knuth [261] provides an encyclopedic treatment of many sorting algorithms. His comparison of sorting algorithms (page 381) includes

exact step-counting analyses, like the one we performed here for

insertion sort. Knuth’s discussion of insertion sort encompasses several

variations of the algorithm. The most important of these is Shell’s sort,

introduced by D. L. Shell, which uses insertion sort on periodic

subarrays of the input to produce a faster sorting algorithm.

Merge sort is also described by Knuth. He mentions that a

mechanical collator capable of merging two decks of punched cards in a

single pass was invented in 1938. J. von Neumann, one of the pioneers

of computer science, apparently wrote a program for merge sort on the

EDVAC computer in 1945.

The early history of proving programs correct is described by Gries

[200], who credits P. Naur with the first article in this field. Gries attributes loop invariants to R. W. Floyd. The textbook by Mitchell

[329] is a good reference on how to prove programs correct.

1 If you’re familiar with only Python, you can think of arrays as similar to Python lists.

2 When the loop is a for loop, the loop-invariant check just prior to the first iteration occurs immediately after the initial assignment to the loop-counter variable and just before the first test in the loop header. In the case of INSERTION-SORT, this time is after assigning 2 to the variable i but before the first test of whether in.

3 In an if-else statement, we indent else at the same level as its matching if. The first executable line of an else clause appears on the same line as the keyword else. For multiway tests, we use elseif for tests after the first one. When it is the first line in an else clause, an if statement appears on the line following else so that you do not misconstrue it as elseif.

4 Each pseudocode procedure in this book appears on one page so that you do not need to discern levels of indentation in pseudocode that is split across pages.

5 Most block-structured languages have equivalent constructs, though the exact syntax may differ. Python lacks repeat-until loops, and its for loops operate differently from the for loops in this book. Think of the pseudocode line “for i = 1 to n” as equivalent to “for i in range(1, n+1)”

in Python.

6 In Python, the loop counter retains its value after the loop is exited, but the value it retains is the value it had during the final iteration of the for loop, rather than the value that exceeded the

loop bound. That is because a Python for loop iterates through a list, which may contain nonnumeric values.

7 If you’re used to programming in Python, bear in mind that in this book, the subarray A[ i : j]

includes the element A[ j]. In Python, the last element of A[ i : j] is A[ j – 1]. Python allows negative indices, which count from the back end of the list. This book does not use negative array indices.

8 Python’s tuple notation allows return statements to return multiple values without creating objects from a programmer-defined class.

9 We assume that each element of a given array occupies the same number of bytes and that the elements of a given array are stored in contiguous memory locations. For example, if array A[1 : n] starts at memory address 1000 and each element occupies four bytes, then element A[ i] is at address 1000 + 4( i – 1). In general, computing the address in memory of a particular array element requires at most one subtraction (no subtraction for a 0-origin array), one multiplication (often implemented as a shift operation if the element size is an exact power of 2), and one addition. Furthermore, for code that iterates through the elements of an array in order, an optimizing compiler can generate the address of each element using just one addition, by adding the element size to the address of the preceding element.

10 There are some subtleties here. Computational steps that we specify in English are often variants of a procedure that requires more than just a constant amount of time. For example, in the RADIX-SORT procedure on page 213, one line reads “use a stable sort to sort array A on digit i,” which, as we shall see, takes more than a constant amount of time. Also, although a statement that calls a subroutine takes only constant time, the subroutine itself, once invoked, may take more. That is, we separate the process of calling the subroutine—passing parameters to it, etc.—from the process of executing the subroutine.

11 This characteristic does not necessarily hold for a resource such as memory. A statement that references m words of memory and is executed n times does not necessarily reference mn distinct words of memory.

12 This procedure is the rare case that uses both 1-origin indexing (for array A) and 0-origin indexing (for arrays L and R). Using 0-origin indexing for L and R makes for a simpler loop invariant in Exercise 2.3-3.

13 If you’re wondering where the “+1” comes from, imagine that r = p + 1. Then the subarray A[ p : r] consists of two elements, and rp + 1 = 2.

14 Chapter 3 shows how to formally interpret equations containing Θ-notation.

15 The expression ⌈ x⌉ denotes the least integer greater than or equal to x, and ⌊ x⌊ denotes the greatest integer less than or equal to x. These notations are defined in Section 3.3. The easiest way to verify that setting q to ⌊( p + r)/2⌊ yields subarrays A[ p : q] and A[ q + 1 : r] of sizes ⌈ n/2⌉

and ⌊ n/2⌊, respectively, is to examine the four cases that arise depending on whether each of p and r is odd or even.

16 If you’re wondering where Θ(1) comes from, think of it this way. When we say that n 2/100 is Θ( n 2), we are ignoring the coefficient 1/100 of the factor n 2. Likewise, when we say that a

constant c is Θ(1), we are ignoring the coefficient c of the factor 1 (which you can also think of as n 0).

17 The notation lg n stands for log2 n, although the base of the logarithm doesn’t matter here, but as computer scientists, we like logarithms base 2. Section 3.3 discusses other standard notation.

18 It is unlikely that c 1 is exactly the time to solve problems of size 1 and that c 2 n is exactly the time of the divide and combine steps. We’ll look more closely at bounding recurrences in

Chapter 4, where we’ll be more careful about this kind of detail.

3 Characterizing Running Times

The order of growth of the running time of an algorithm, defined in

Chapter 2, gives a simple way to characterize the algorithm’s efficiency and also allows us to compare it with alternative algorithms. Once the

input size n becomes large enough, merge sort, with its Θ( n lg n) worst-case running time, beats insertion sort, whose worst-case running time is

Θ( n 2). Although we can sometimes determine the exact running time of

an algorithm, as we did for insertion sort in Chapter 2, the extra precision is rarely worth the effort of computing it. For large enough

inputs, the multiplicative constants and lower-order terms of an exact

running time are dominated by the effects of the input size itself.

When we look at input sizes large enough to make relevant only the

order of growth of the running time, we are studying the asymptotic

efficiency of algorithms. That is, we are concerned with how the running

time of an algorithm increases with the size of the input in the limit, as

the size of the input increases without bound. Usually, an algorithm

that is asymptotically more efficient is the best choice for all but very

small inputs.

This chapter gives several standard methods for simplifying the

asymptotic analysis of algorithms. The next section presents informally

the three most commonly used types of “asymptotic notation,” of which

we have already seen an example in Θ-notation. It also shows one way

to use these asymptotic notations to reason about the worst-case

running time of insertion sort. Then we look at asymptotic notations

more formally and present several notational conventions used

Image 35

throughout this book. The last section reviews the behavior of functions

that commonly arise when analyzing algorithms.

3.1 O-notation, Ω-notation, and Θ-notation

When we analyzed the worst-case running time of insertion sort in

Chapter 2, we started with the complicated expression

We then discarded the lower-order terms ( c 1 + c 2 + c 4 + c 5/2 – c 6/2 –

c 7/2 + c 8) n and c 2 + c 4 + c 5 + c 8, and we also ignored the coefficient c 5/2 + c 6/2 + c 7/2 of n 2. That left just the factor n 2, which we put into Θ-notation as Θ( n 2). We use this style to characterize running times of

algorithms: discard the lower-order terms and the coefficient of the

leading term, and use a notation that focuses on the rate of growth of

the running time.

Θ-notation is not the only such “asymptotic notation.” In this

section, we’ll see other forms of asymptotic notation as well. We start

with intuitive looks at these notations, revisiting insertion sort to see

how we can apply them. In the next section, we’ll see the formal

definitions of our asymptotic notations, along with conventions for

using them.

Before we get into specifics, bear in mind that the asymptotic

notations we’ll see are designed so that they characterize functions in

general. It so happens that the functions we are most interested in

denote the running times of algorithms. But asymptotic notation can

apply to functions that characterize some other aspect of algorithms

(the amount of space they use, for example), or even to functions that

have nothing whatsoever to do with algorithms.

O-notation

O-notation characterizes an upper bound on the asymptotic behavior of a function. In other words, it says that a function grows no faster than a

certain rate, based on the highest-order term. Consider, for example, the

function 7 n 3 + 100 n 2 – 20 n + 6. Its highest-order term is 7 n 3, and so we say that this function’s rate of growth is n 3. Because this function grows

no faster than n 3, we can write that it is O( n 3). You might be surprised that we can also write that the function 7 n 3 + 100 n 2 – 20 n + 6 is O( n 4).

Why? Because the function grows more slowly than n 4, we are correct in

saying that it grows no faster. As you might have guessed, this function

is also O( n 5), O( n 6), and so on. More generally, it is O( nc) for any constant c ≥ 3.

Ω-notation

Ω-notation characterizes a lower bound on the asymptotic behavior of a

function. In other words, it says that a function grows at least as fast as

a certain rate, based — as in O-notation—on the highest-order term.

Because the highest-order term in the function 7 n 3 + 100 n 2 – 20 n + 6

grows at least as fast as n 3, this function is Ω( n 3). This function is also Ω( n 2) and Ω( n). More generally, it is Ω( nc) for any constant c ≤ 3.

Θ-notation

Θ-notation characterizes a tight bound on the asymptotic behavior of a

function. It says that a function grows precisely at a certain rate, based

—once again—on the highest-order term. Put another way, Θ-notation

characterizes the rate of growth of the function to within a constant

factor from above and to within a constant factor from below. These

two constant factors need not be equal.

If you can show that a function is both O( f ( n)) and Ω( f ( n)) for some function f ( n), then you have shown that the function is Θ( f ( n)). (The next section states this fact as a theorem.) For example, since the

function 7 n 3 + 100 n 2 – 20 n + 6 is both O( n 3) and Ω( n 3), it is also Θ( n 3).

Example: Insertion sort

Let’s revisit insertion sort and see how to work with asymptotic

notation to characterize its Θ( n 2) worst-case running time without

evaluating summations as we did in Chapter 2. Here is the

INSERTION-SORT procedure once again:

INSERTION-SORT( A, n)

1 for i = 2 to n

2

key = A[ i]

3

// Insert A[ i] into the sorted subarray A[1 : i – 1].

4

j = i – 1

5

while j > 0 and A[ j] > key

6

A[ j + 1] = A[ j]

7

j = j – 1

8

A[ j + 1] = key

What can we observe about how the pseudocode operates? The

procedure has nested loops. The outer loop is a for loop that runs n – 1

times, regardless of the values being sorted. The inner loop is a while

loop, but the number of iterations it makes depends on the values being

sorted. The loop variable j starts at i – 1 and decreases by 1 in each iteration until either it reaches 0 or A[ j] ≤ key. For a given value of i, the while loop might iterate 0 times, i – 1 times, or anywhere in between.

The body of the while loop (lines 6–7) takes constant time per iteration

of the while loop.

Image 36

Figure 3.1 The Ω( n 2) lower bound for insertion sort. If the first n/3 positions contain the n/3

largest values, each of these values must move through each of the middle n/3 positions, one position at a time, to end up somewhere in the last n/3 positions. Since each of n/3 values moves through at least each of n/3 positions, the time taken in this case is at least proportional to ( n/3) ( n/3) = n 2/9, or Ω( n 2).

These observations suffice to deduce an O( n 2) running time for any

case of INSERTION-SORT, giving us a blanket statement that covers

all inputs. The running time is dominated by the inner loop. Because

each of the n – 1 iterations of the outer loop causes the inner loop to

iterate at most i – 1 times, and because i is at most n, the total number of iterations of the inner loop is at most ( n – 1)( n – 1), which is less than n 2. Since each iteration of the inner loop takes constant time, the total

time spent in the inner loop is at most a constant times n 2, or O( n 2).

With a little creativity, we can also see that the worst-case running

time of INSERTION-SORT is Ω( n 2). By saying that the worst-case

running time of an algorithm is Ω( n 2), we mean that for every input size

n above a certain threshold, there is at least one input of size n for which the algorithm takes at least cn 2 time, for some positive constant c. It does not necessarily mean that the algorithm takes at least cn 2 time for

all inputs.

Let’s now see why the worst-case running time of INSERTION-

SORT is Ω( n 2). For a value to end up to the right of where it started, it

must have been moved in line 6. In fact, for a value to end up k

positions to the right of where it started, line 6 must have executed k times. As Figure 3.1 shows, let’s assume that n is a multiple of 3 so that we can divide the array A into groups of n/3 positions. Suppose that in the input to INSERTION-SORT, the n/3 largest values occupy the first

n/3 array positions A[1 : n/3]. (It does not matter what relative order they have within the first n/3 positions.) Once the array has been sorted,

each of these n/3 values ends up somewhere in the last n/3 positions A[2 n/3 + 1 : n]. For that to happen, each of these n/3 values must pass through each of the middle n/3 positions A[ n/3 + 1 : 2 n/3]. Each of these n/3 values passes through these middle n/3 positions one position at a time, by at least n/3 executions of line 6. Because at least n/3 values have to pass through at least n/3 positions, the time taken by INSERTION-SORT in the worst case is at least proportional to ( n/3)( n/3) = n 2/9, which is Ω( n 2).

Because we have shown that INSERTION-SORT runs in O( n 2) time

in all cases and that there is an input that makes it take Ω( n 2) time, we

can conclude that the worst-case running time of INSERTION-SORT is

Θ( n 2). It does not matter that the constant factors for upper and lower

bounds might differ. What matters is that we have characterized the

worst-case running time to within constant factors (discounting lower-

order terms). This argument does not show that INSERTION-SORT

runs in Θ( n 2) time in all cases. Indeed, we saw in Chapter 2 that the best-case running time is Θ( n).

Exercises

3.1-1

Modify the lower-bound argument for insertion sort to handle input

sizes that are not necessarily a multiple of 3.

3.1-2

Using reasoning similar to what we used for insertion sort, analyze the

running time of the selection sort algorithm from Exercise 2.2-2.

3.1-3

Suppose that α is a fraction in the range 0 < α < 1. Show how to generalize the lower-bound argument for insertion sort to consider an

input in which the αn largest values start in the first αn positions. What additional restriction do you need to put on α? What value of α

Image 37

maximizes the number of times that the αn largest values must pass

through each of the middle (1 – 2 α) n array positions?

3.2 Asymptotic notation: formal definitions

Having seen asymptotic notation informally, let’s get more formal. The

notations we use to describe the asymptotic running time of an

algorithm are defined in terms of functions whose domains are typically

the set N of natural numbers or the set R of real numbers. Such

notations are convenient for describing a running-time function T ( n).

This section defines the basic asymptotic notations and also introduces

some common “proper” notational abuses.

Figure 3.2 Graphic examples of the O, Ω, and Θ notations. In each part, the value of n 0 shown is the minimum possible value, but any greater value also works. (a) O-notation gives an upper bound for a function to within a constant factor. We write f ( n) = O( g( n)) if there are positive constants n 0 and c such that at and to the right of n 0, the value of f ( n) always lies on or below cg( n). (b) Ω-notation gives a lower bound for a function to within a constant factor. We write f ( n) = Ω( g( n)) if there are positive constants n 0 and c such that at and to the right of n 0, the value of f ( n) always lies on or above cg( n). (c) Θ-notation bounds a function to within constant factors. We write f ( n) = Θ( g( n)) if there exist positive constants n 0, c 1, and c 2 such that at and to the right of n 0, the value of f ( n) always lies between c 1 g( n) and c 2 g( n) inclusive.

O-notation

As we saw in Section 3.1, O-notation describes an asymptotic upper bound. We use O-notation to give an upper bound on a function, to

within a constant factor.

Here is the formal definition of O-notation. For a given function

g( n), we denote by O( g( n)) (pronounced “big-oh of g of n” or sometimes just “oh of g of n”) the set of functions

O( g( n)) : there exist positive constants c and n 0 such

= { f ( n)

that 0 ≤ f ( n) ≤ cg( n) for all nn 0}.1

A function f ( n) belongs to the set O( g( n)) if there exists a positive constant c such that f ( n) ≤ cg( n) for sufficiently large n. Figure 3.2(a)

shows the intuition behind O-notation. For all values n at and to the right of n 0, the value of the function f ( n) is on or below cg( n).

The definition of O( g( n)) requires that every function f ( n) in the set O( g( n)) be asymptotically nonnegative: f ( n) must be nonnegative whenever n is sufficiently large. (An asymptotically positive function is one that is positive for all sufficiently large n.) Consequently, the function g( n) itself must be asymptotically nonnegative, or else the set O( g( n)) is empty. We therefore assume that every function used within O-notation is asymptotically nonnegative. This assumption holds for

the other asymptotic notations defined in this chapter as well.

You might be surprised that we define O-notation in terms of sets.

Indeed, you might expect that we would write “f ( n) ∈ O( g( n))” to indicate that f ( n) belongs to the set O( g( n)). Instead, we usually write “f ( n) = O( g( n))” and say “f ( n) is big-oh of g( n)” to express the same notion. Although it may seem confusing at first to abuse equality in this

way, we’ll see later in this section that doing so has its advantages.

Let’s explore an example of how to use the formal definition of O-

notation to justify our practice of discarding lower-order terms and

ignoring the constant coefficient of the highest-order term. We’ll show

that 4 n 2 + 100 n + 500 = O( n 2), even though the lower-order terms have much larger coefficients than the leading term. We need to find positive

constants c and n 0 such that 4 n 2 + 100 n + 500 ≤ cn 2 for all nn 0.

Dividing both sides by n 2 gives 4 + 100/ n + 500/ n 2 ≤ c. This inequality is satisfied for many choices of c and n 0. For example, if we choose n 0 = 1,

then this inequality holds for c = 604. If we choose n 0 = 10, then c = 19

works, and choosing n 0 = 100 allows us to use c = 5.05.

We can also use the formal definition of O-notation to show that the

function n 3 – 100 n 2 does not belong to the set O( n 2), even though the coefficient of n 2 is a large negative number. If we had n 3 – 100 n 2 =

O( n 2), then there would be positive constants c and n 0 such that n 3 –

100 n 2 ≤ cn 2 for all nn 0. Again, we divide both sides by n 2, giving n

100 ≤ c. Regardless of what value we choose for the constant c, this inequality does not hold for any value of n > c + 100.

Ω-notation

Just as O-notation provides an asymptotic upper bound on a function,

Ω-notation provides an asymptotic lower bound. For a given function

g( n), we denote by Ω( g( n)) (pronounced “big-omega of g of n” or sometimes just “omega of g of n”) the set of functions

Ω( g( n)) : there exist positive constants c and n 0 such

= { f ( n)

that 0 ≤ cg( n) ≤ f ( n) for all nn 0}.

Figure 3.2(b) shows the intuition behind Ω-notation. For all values n at or to the right of n 0, the value of f ( n) is on or above cg( n).

We’ve already shown that 4 n 2 + 100 n + 500 = O( n 2). Now let’s show that 4 n 2 + 100 n + 500 = Ω( n 2). We need to find positive constants c and n 0 such that 4 n 2 + 100 n + 500 ≥ cn 2 for all nn 0. As before, we divide both sides by n 2, giving 4 + 100/ n + 500/ n 2 ≥ c. This inequality holds when n 0 is any positive integer and c = 4.

What if we had subtracted the lower-order terms from the 4 n 2 term

instead of adding them? What if we had a small coefficient for the n 2

term? The function would still be Ω( n 2). For example, let’s show that n 2/100 – 100 n – 500 = Ω( n 2). Dividing by n 2 gives 1/100 – 100/ n – 500/ n 2

c. We can choose any value for n 0 that is at least 10,005 and find a positive value for c. For example, when n 0 = 10,005, we can choose c =

2.49 × 10–9. Yes, that’s a tiny value for c, but it is positive. If we select a larger value for n 0, we can also increase c. For example, if n 0 = 100,000, then we can choose c = 0.0089. The higher the value of n 0, the closer to the coefficient 1/100 we can choose c.

Θ-notation

We use Θ-notation for asymptotically tight bounds. For a given function

g( n), we denote by Θ( g( n)) (“theta of g of n”) the set of functions Θ( g( n)) : there exist positive constants c 1, c 2, and n 0

= { f ( n) such that 0 ≤ c 1 g( n) ≤ f ( n) ≤ c 2 g( n) for all n

n 0}.

Figure 3.2(c) shows the intuition behind Θ-notation. For all values of n at and to the right of n 0, the value of f ( n) lies at or above c 1 g( n) and at or below c 2 g( n). In other words, for all nn 0, the function f ( n) is equal to g( n) to within constant factors.

The definitions of O-, Ω-, and Θ-notations lead to the following

theorem, whose proof we leave as Exercise 3.2-4.

Theorem 3.1

For any two functions f ( n) and g( n), we have f ( n) = Θ( g( n)) if and only if f ( n) = O( g( n)) and f ( n) = Ω( g( n)).

We typically apply Theorem 3.1 to prove asymptotically tight bounds

from asymptotic upper and lower bounds.

Asymptotic notation and running times

When you use asymptotic notation to characterize an algorithm’s

running time, make sure that the asymptotic notation you use is as

precise as possible without overstating which running time it applies to.

Here are some examples of using asymptotic notation properly and improperly to characterize running times.

Let’s start with insertion sort. We can correctly say that insertion

sort’s worst-case running time is O( n 2), Ω( n 2), and—due to Theorem 3.1

—Θ( n 2). Although all three ways to characterize the worst-case running

times are correct, the Θ( n 2) bound is the most precise and hence the most preferred. We can also correctly say that insertion sort’s best-case

running time is O( n), Ω( n), and Θ( n), again with Θ( n) the most precise and therefore the most preferred.

Here is what we cannot correctly say: insertion sort’s running time is

Θ( n 2). That is an overstatement because by omitting “worst-case” from

the statement, we’re left with a blanket statement covering all cases. The

error here is that insertion sort does not run in Θ( n 2) time in all cases

since, as we’ve seen, it runs in Θ( n) time in the best case. We can correctly say that insertion sort’s running time is O( n 2), however, because in all cases, its running time grows no faster than n 2. When we

say O( n 2) instead of Θ( n 2), there is no problem in having cases whose running time grows more slowly than n 2. Likewise, we cannot correctly

say that insertion sort’s running time is Θ( n), but we can say that its running time is Ω( n).

How about merge sort? Since merge sort runs in Θ( n lg n) time in all

cases, we can just say that its running time is Θ( n lg n) without specifying worst-case, best-case, or any other case.

People occasionally conflate O-notation with Θ-notation by

mistakenly using O-notation to indicate an asymptotically tight bound.

They say things like “an O( n lg n)-time algorithm runs faster than an O( n 2)-time algorithm.” Maybe it does, maybe it doesn’t. Since O-

notation denotes only an asymptotic upper bound, that so-called O( n 2)-

time algorithm might actually run in Θ( n) time. You should be careful to

choose the appropriate asymptotic notation. If you want to indicate an

asymptotically tight bound, use Θ-notation.

We typically use asymptotic notation to provide the simplest and most precise bounds possible. For example, if an algorithm has a

running time of 3 n 2 + 20 n in all cases, we use asymptotic notation to

write that its running time is Θ( n 2). Strictly speaking, we are also correct in writing that the running time is O( n 3) or Θ(3 n 2 + 20 n).

Neither of these expressions is as useful as writing Θ( n 2) in this case, however: O( n 3) is less precise than Θ( n 2) if the running time is 3 n 2 +

20 n, and Θ(3 n 2 + 20 n) introduces complexity that obscures the order of growth. By writing the simplest and most precise bound, such as Θ( n 2),

we can categorize and compare different algorithms. Throughout the

book, you will see asymptotic running times that are almost always

based on polynomials and logarithms: functions such as n, n lg2 n, n 2 lg n, or n 1/2. You will also see some other functions, such as exponentials, lg lg n, and lg* n (see Section 3.3). It is usually fairly easy to compare the rates of growth of these functions. Problem 3-3 gives you good practice.

Asymptotic notation in equations and inequalities

Although we formally define asymptotic notation in terms of sets, we

use the equal sign (=) instead of the set membership sign (∈) within

formulas. For example, we wrote that 4 n 2 + 100 n + 500 = O( n 2). We might also write 2 n 2 + 3 n + 1 = 2 n 2 + Θ( n). How do we interpret such formulas?

When the asymptotic notation stands alone (that is, not within a

larger formula) on the right-hand side of an equation (or inequality), as

in 4 n 2 + 100 n + 500 = O( n 2), the equal sign means set membership: 4 n 2

+ 100 n + 500 ∈ O( n 2). In general, however, when asymptotic notation appears in a formula, we interpret it as standing for some anonymous

function that we do not care to name. For example, the formula 2 n 2 +

3 n + 1 = 2 n 2 + Θ( n) means that 2 n 2 + 3 n + 1 = 2 n 2 + f ( n), where f ( n)

∈ Θ( n). In this case, we let f ( n) = 3 n + 1, which indeed belongs to Θ( n).

Image 38

Using asymptotic notation in this manner can help eliminate

inessential detail and clutter in an equation. For example, in Chapter 2

we expressed the worst-case running time of merge sort as the

recurrence

T ( n) = 2 T ( n/2) + Θ( n).

If we are interested only in the asymptotic behavior of T ( n), there is no point in specifying all the lower-order terms exactly, because they are all

understood to be included in the anonymous function denoted by the

term Θ( n).

The number of anonymous functions in an expression is understood

to be equal to the number of times the asymptotic notation appears. For

example, in the expression

there is only a single anonymous function (a function of i). This

expression is thus not the same as O(1) + O(2) + ⋯ + O( n), which doesn’t really have a clean interpretation.

In some cases, asymptotic notation appears on the left-hand side of

an equation, as in

2 n 2 + Θ( n) = Θ( n 2).

Interpret such equations using the following rule: No matter how the

anonymous functions are chosen on the left of the equal sign, there is a way to choose the anonymous functions on the right of the equal sign to

make the equation valid. Thus, our example means that for any function

f ( n) ∈ Θ( n), there is some function g( n) ∈ Θ( n 2) such that 2 n 2 + f ( n) =

g( n) for all n. In other words, the right-hand side of an equation provides a coarser level of detail than the left-hand side.

We can chain together a number of such relationships, as in

2 n 2 + 3 n + 1 = 2 n 2 + Θ( n)

= Θ( n 2).

By the rules above, interpret each equation separately. The first equation says that there is some function f ( n) ∈ Θ( n) such that 2 n 2 + 3 n + 1 =

2 n 2 + f ( n) for all n. The second equation says that for any function g( n)

∈ Θ( n) (such as the f ( n) just mentioned), there is some function h( n) ∈

Θ( n 2) such that 2 n 2 + g( n) = h( n) for all n. This interpretation implies that 2 n 2 + 3 n + 1 = Θ( n 2), which is what the chaining of equations intuitively says.

Proper abuses of asymptotic notation

Besides the abuse of equality to mean set membership, which we now

see has a precise mathematical interpretation, another abuse of

asymptotic notation occurs when the variable tending toward ∞ must be

inferred from context. For example, when we say O( g( n)), we can assume that we’re interested in the growth of g( n) as n grows, and if we say O( g( m)) we’re talking about the growth of g( m) as m grows. The free variable in the expression indicates what variable is going to ∞.

The most common situation requiring contextual knowledge of

which variable tends to ∞ occurs when the function inside the

asymptotic notation is a constant, as in the expression O(1). We cannot

infer from the expression which variable is going to ∞, because no

variable appears there. The context must disambiguate. For example, if

the equation using asymptotic notation is f ( n) = O(1), it’s apparent that the variable we’re interested in is n. Knowing from context that the variable of interest is n, however, allows us to make perfect sense of the

expression by using the formal definition of O-notation: the expression f

( n) = O(1) means that the function f ( n) is bounded from above by a constant as n goes to ∞. Technically, it might be less ambiguous if we

explicitly indicated the variable tending to ∞ in the asymptotic notation

itself, but that would clutter the notation. Instead, we simply ensure that

the context makes it clear which variable (or variables) tend to ∞.

When the function inside the asymptotic notation is bounded by a

positive constant, as in T ( n) = O(1), we often abuse asymptotic notation in yet another way, especially when stating recurrences. We

may write something like T ( n) = O(1) for n < 3. According to the

formal definition of O-notation, this statement is meaningless, because the definition only says that T ( n) is bounded above by a positive constant c for nn 0 for some n 0 > 0. The value of T ( n) for n < n 0 need not be so bounded. Thus, in the example T ( n) = O(1) for n < 3, we cannot infer any constraint on T ( n) when n < 3, because it might be that n 0 > 3.

What is conventionally meant when we say T ( n) = O(1) for n < 3 is that there exists a positive constant c such that T ( n) ≤ c for n < 3. This convention saves us the trouble of naming the bounding constant,

allowing it to remain anonymous while we focus on more important

variables in an analysis. Similar abuses occur with the other asymptotic

notations. For example, T ( n) = Θ(1) for n < 3 means that T ( n) is bounded above and below by positive constants when n < 3.

Occasionally, the function describing an algorithm’s running time

may not be defined for certain input sizes, for example, when an

algorithm assumes that the input size is an exact power of 2. We still use

asymptotic notation to describe the growth of the running time,

understanding that any constraints apply only when the function is

defined. For example, suppose that f ( n) is defined only on a subset of the natural or nonnegative real numbers. Then f ( n) = O( g( n)) means that the bound 0 ≤ T ( n) ≤ cg( n) in the definition of O-notation holds for all nn 0 over the domain of f ( n), that is, where f ( n) is defined. This abuse is rarely pointed out, since what is meant is generally clear from

context.

In mathematics, it’s okay — and often desirable — to abuse a

notation, as long as we don’t misuse it. If we understand precisely what

is meant by the abuse and don’t draw incorrect conclusions, it can

simplify our mathematical language, contribute to our higher-level

understanding, and help us focus on what really matters.

o-notation

The asymptotic upper bound provided by O-notation may or may not

be asymptotically tight. The bound 2 n 2 = O( n 2) is asymptotically tight, but the bound 2 n = O( n 2) is not. We use o-notation to denote an upper

Image 39

bound that is not asymptotically tight. We formally define o( g( n)) (“little-oh of g of n”) as the set

o( g( n)) = : for any positive constant c > 0, there exists a constant n 0 > 0

{ f ( n)

such that 0 ≤ f ( n) < cg( n) for all nn 0}.

For example, 2 n = o( n 2), but 2 n 2 ≠ o( n 2).

The definitions of O-notation and o-notation are similar. The main

difference is that in f ( n) = O( g( n)), the bound 0 ≤ f ( n) ≤ cg( n) holds for some constant c > 0, but in f ( n) = o( g( n)), the bound 0 ≤ f ( n) < cg( n) holds for all constants c > 0. Intuitively, in o-notation, the function f ( n) becomes insignificant relative to g( n) as n gets large:

Some authors use this limit as a definition of the o-notation, but the definition in this book also restricts the anonymous functions to be

asymptotically nonnegative.

ω-notation

By analogy, ω-notation is to Ω-notation as o-notation is to O-notation.

We use ω-notation to denote a lower bound that is not asymptotically

tight. One way to define it is by

f ( n) ∈ ω( g( n)) if and only if g( n) ∈ o( f ( n)).

Formally, however, we define ω( g( n)) (“little-omega of g of n”) as the set ω( g( n)) : for any positive constant c > 0, there exists a constant n 0 > 0

= { f ( n)

such that 0 ≤ cg( n) < f ( n) for all nn 0}.

Where the definition of o-notation says that f ( n) < cg( n), the definition of ω-notation says the opposite: that cg( n) < f ( n). For examples of ω-

notation, we have n 2/2 = ω( n), but n 2/2 ≠ ω( n 2). The relation f ( n) =

ω( g( n)) implies that

Image 40

if the limit exists. That is, f ( n) becomes arbitrarily large relative to g( n) as n gets large.

Comparing functions

Many of the relational properties of real numbers apply to asymptotic

comparisons as well. For the following, assume that f ( n) and g( n) are asymptotically positive.

Transitivity:

f

( n)

= and g( n)

= imply f

( n)

=

Θ( g( n))

Θ( h( n))

Θ( h( n)),

f

( n)

= and g( n)

= imply f

( n)

=

O( g( n))

O( h( n))

O( h( n)),

f

( n)

= and g( n)

= imply f

( n)

=

Ω( g( n))

Ω( h( n))

Ω( h( n)),

f

( n)

= and g( n)

= imply f

( n)

=

o( g( n))

o( h( n))

o( h( n)),

f

( n)

= and g( n)

= imply f

( n)

=

ω( g( n))

ω( h( n))

ω( h( n)).

Reflexivity:

f ( n) = Θ( f ( n)),

f ( n) = O( f ( n)),

f ( n) = Ω( f ( n)).

Symmetry:

f ( n) = Θ( g( n)) if and only if g( n) = Θ( f ( n)).

Transpose symmetry:

f

( n)

= if and only g( n) = Ω( f ( n)),

O( g( n))

if

f ( n) = o( g( n)) if and only g( n) = ω( f if

( n)).

Because these properties hold for asymptotic notations, we can draw

an analogy between the asymptotic comparison of two functions f and g

and the comparison of two real numbers a and b:

f ( n) = O( g( n)) is like ab,

f ( n) = Ω( g( n)) is like ab,

f ( n) = Θ( g( n)) is like a = b,

f ( n) = o( g( n)) is like a < b, f ( n) = ω( g( n)) is like a > b.

We say that f ( n) is asymptotically smaller than g( n) if f ( n) = o( g( n)), and f ( n) is asymptotically larger than g( n) if f ( n) = ω( g( n)).

One property of real numbers, however, does not carry over to

asymptotic notation:

Trichotomy: For any two real numbers a and b, exactly one of the following must hold: a < b, a = b, or a > b.

Although any two real numbers can be compared, not all functions are

asymptotically comparable. That is, for two functions f ( n) and g( n), it may be the case that neither f ( n) = O( g( n)) nor f ( n) = Ω( g( n)) holds. For example, we cannot compare the functions n and n 1 + sin n using asymptotic notation, since the value of the exponent in n 1 + sin n oscillates between 0 and 2, taking on all values in between.

Exercises

3.2-1

Let f ( n) and g( n) be asymptotically nonnegative functions. Using the basic definition of Θ-notation, prove that max { f ( n), g( n)} = Θ( f ( n) +

g( n)).

3.2-2

Explain why the statement, “The running time of algorithm A is at least O( n 2),” is meaningless.

3.2-3

Is 2 n + 1 = O(2 n)? Is 22 n = O(2 n)?

3.2-4

Prove Theorem 3.1.

3.2-5

Prove that the running time of an algorithm is Θ( g( n)) if and only if its worst-case running time is O( g( n)) and its best-case running time is Ω( g( n)).

3.2-6

Prove that o( g( n)) ∩ ω( g( n)) is the empty set.

3.2-7

We can extend our notation to the case of two parameters n and m that

can go to ∞ independently at different rates. For a given function g( n, m), we denote by O( g( n, m)) the set of functions O( g( n,

: there exist positive constants c, n 0, and m 0

m)) = { f such that 0 ≤ f ( n, m) ≤ cg( n, m) for all nn 0

( n, m)

or mm 0}.

Give corresponding definitions for Ω( g( n, m)) and Θ( g( n, m)).

3.3 Standard notations and common functions

This section reviews some standard mathematical functions and

notations and explores the relationships among them. It also illustrates

the use of the asymptotic notations.

Monotonicity

Image 41

Image 42

Image 43

Image 44

Image 45

A function f ( n) is monotonically increasing if mn implies f ( m) ≤ f ( n).

Similarly, it is monotonically decreasing if mn implies f ( m) ≥ f ( n). A function f ( n) is strictly increasing if m < n implies f ( m) < f ( n) and strictly decreasing if m < n implies f ( m) > f ( n).

Floors and ceilings

For any real number x, we denote the greatest integer less than or equal

to x by ⌊ x⌊ (read “the floor of x”) and the least integer greater than or equal to x by ⌈ x⌉ (read “the ceiling of x”). The floor function is monotonically increasing, as is the ceiling function.

Floors and ceilings obey the following properties. For any integer n,

we have

For all real x, we have

We also have

or equivalently,

For any real number x ≥ 0 and integers a, b > 0, we have

For any integer n and real number x, we have

Image 46

Image 47

Image 48

Image 49

Modular arithmetic

For any integer a and any positive integer n, the value a mod n is the remainder (or residue) of the quotient a/ n:

It follows that

even when a is negative.

Given a well-defined notion of the remainder of one integer when

divided by another, it is convenient to provide special notation to

indicate equality of remainders. If ( a mod n) = ( b mod n), we write a = b (mod n) and say that a is equivalent to b, modulo n. In other words, a =

b (mod n) if a and b have the same remainder when divided by n.

Equivalently, a = b (mod n) if and only if n is a divisor of ba. We write ab (mod n) if a is not equivalent to b, modulo n.

Polynomials

Given a nonnegative integer d, a polynomial in n of degree d is a function p( n) of the form

where the constants a 0, a 1, … , ad are the coefficients of the polynomial and ad ≠ 0. A polynomial is asymptotically positive if and only if ad > 0.

For an asymptotically positive polynomial p( n) of degree d, we have p( n)

= Θ( nd). For any real constant a ≥ 0, the function na is monotonically increasing, and for any real constant a ≤ 0, the function na is monotonically decreasing. We say that a function f ( n) is polynomially bounded if f ( n) = O( nk) for some constant k.

Image 50

Image 51

Image 52

Image 53

Exponentials

For all real a > 0, m, and n, we have the following identities: a 0 = 1,

a 1 = a,

a–1 = 1/ a,

( am) n = amn,

( am) n = ( an) m,

aman = am+ n.

For all n and a ≥ 1, the function an is monotonically increasing in n.

When convenient, we assume that 00 = 1.

We can relate the rates of growth of polynomials and exponentials by

the following fact. For all real constants a > 1 and b, we have

from which we can conclude that

Thus, any exponential function with a base strictly greater than 1 grows

faster than any polynomial function.

Using e to denote 2.71828 …, the base of the natural-logarithm

function, we have for all real x,

where “!” denotes the factorial function defined later in this section. For

all real x, we have the inequality

Image 54

Image 55

where equality holds only when x = 0. When | x| ≤ 1, we have the approximation

When x → 0, the approximation of ex by 1 + x is quite good:

ex = 1 + x + Θ( x 2).

(In this equation, the asymptotic notation is used to describe the

limiting behavior as x → 0 rather than as x → ∞.) We have for all x, Logarithms

We use the following notations:

lg n = log2 n (binary logarithm),

ln n = log e n (natural logarithm),

lg k n = (lg n) k (exponentiation),