Image 1

Introduction to Algorithms

Fourth Edition

Thomas H. Cormen

Charles E. Leiserson

Ronald L. Rivest

Clifford Stein

Introduction to Algorithms

Fourth Edition

The MIT Press

Cambridge, Massachusetts London, England

© 2022 Massachusetts Institute of Technology

All rights reserved. No part of this book may be reproduced in any form or by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.

Names: Cormen, Thomas H., author. | Leiserson, Charles Eric, author. | Rivest, Ronald L., author. | Stein, Clifford, author.

Title: Introduction to algorithms / Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein.

Description: Fourth edition. | Cambridge, Massachusetts : The MIT Press, [2022] | Includes bibliographical references and index.

Identifiers: LCCN 2021037260 | ISBN 9780262367509

Subjects: LCSH: Computer programming. | Computer algorithms.

Classification: LCC QA76.6 .C662 2022 | DDC 005.13--dc23

LC record available at http://lccn.loc.gov/2021037260

10 9 8 7 6 5 4 3 2 1

d_r0

Contents

Copyright

Preface

I Foundations

Introduction

1 The Role of Algorithms in Computing

1.1 Algorithms

1.2 Algorithms as a technology

2 Getting Started

2.1 Insertion sort

2.2 Analyzing algorithms

2.3 Designing algorithms

3 Characterizing Running Times

3.1 O-notation, Ω-notation, and Θ-notation

3.2 Asymptotic notation: formal definitions

3.3 Standard notations and common functions

4 Divide-and-Conquer

4.1 Multiplying square matrices

4.2 Strassen’s algorithm for matrix multiplication

4.3 The substitution method for solving recurrences

4.4 The recursion-tree method for solving

recurrences

4.5 The master method for solving recurrences

★ 4.6 Proof of the continuous master theorem

★ 4.7 Akra-Bazzi recurrences

5 Probabilistic Analysis and Randomized Algorithms

5.1 The hiring problem

5.2 Indicator random variables

5.3 Randomized algorithms

★ 5.4 Probabilistic analysis and further uses of

indicator random variables

II Sorting and Order Statistics

Introduction

6 Heapsort

6.1 Heaps

6.2 Maintaining the heap property

6.3 Building a heap

6.4 The heapsort algorithm

6.5 Priority queues

7 Quicksort

7.1 Description of quicksort

7.2 Performance of quicksort

7.3 A randomized version of quicksort

7.4 Analysis of quicksort

8 Sorting in Linear Time

8.1 Lower bounds for sorting

8.2 Counting sort

8.3 Radix sort

8.4 Bucket sort

9 Medians and Order Statistics

9.1 Minimum and maximum

9.2 Selection in expected linear time

9.3 Selection in worst-case linear time

III Data Structures

Introduction

10 Elementary Data Structures

10.1 Simple array-based data structures: arrays,

matrices, stacks, queues

10.2 Linked lists

10.3 Representing rooted trees

11 Hash Tables

11.1 Direct-address tables

11.2 Hash tables

11.3 Hash functions

11.4 Open addressing

11.5 Practical considerations

12 Binary Search Trees

12.1 What is a binary search tree?

12.2 Querying a binary search tree

12.3 Insertion and deletion

13 Red-Black Trees

13.1 Properties of red-black trees

13.2 Rotations

13.3 Insertion

13.4 Deletion

IV Advanced Design and Analysis Techniques

Introduction

14 Dynamic Programming

14.1 Rod cutting

14.2 Matrix-chain multiplication

14.3 Elements of dynamic programming

14.4 Longest common subsequence

14.5 Optimal binary search trees

15 Greedy Algorithms

15.1 An activity-selection problem

15.2 Elements of the greedy strategy

15.3 Huffman codes

15.4 Offline caching

16 Amortized Analysis

16.1 Aggregate analysis

16.2 The accounting method

16.3 The potential method

16.4 Dynamic tables

V Advanced Data Structures

Introduction

17 Augmenting Data Structures

17.1 Dynamic order statistics

17.2 How to augment a data structure

17.3 Interval trees

18 B-Trees

18.1 Definition of B-trees

18.2 Basic operations on B-trees

18.3 Deleting a key from a B-tree

19 Data Structures for Disjoint Sets

19.1 Disjoint-set operations

19.2 Linked-list representation of disjoint sets

19.3 Disjoint-set forests

★ 19.4 Analysis of union by rank with path

compression

VI Graph Algorithms

Introduction

20 Elementary Graph Algorithms

20.1 Representations of graphs

20.2 Breadth-first search

20.3 Depth-first search

20.4 Topological sort

20.5 Strongly connected components

21 Minimum Spanning Trees

21.1 Growing a minimum spanning tree

21.2 The algorithms of Kruskal and Prim

22 Single-Source Shortest Paths

22.1 The Bellman-Ford algorithm

22.2 Single-source shortest paths in directed acyclic

graphs

22.3 Dijkstra’s algorithm

22.4 Difference constraints and shortest paths

22.5 Proofs of shortest-paths properties

23 All-Pairs Shortest Paths

23.1 Shortest paths and matrix multiplication

23.2 The Floyd-Warshall algorithm

23.3 Johnson’s algorithm for sparse graphs

24 Maximum Flow

24.1 Flow networks

24.2 The Ford-Fulkerson method

24.3 Maximum bipartite matching

25 Matchings in Bipartite Graphs

25.1 Maximum bipartite matching (revisited)

25.2 The stable-marriage problem

25.3 The Hungarian algorithm for the assignment

problem

VII Selected Topics

Introduction

26 Parallel Algorithms

26.1 The basics of fork-join parallelism

26.2 Parallel matrix multiplication

26.3 Parallel merge sort

27 Online Algorithms

27.1 Waiting for an elevator

27.2 Maintaining a search list

27.3 Online caching

28 Matrix Operations

28.1 Solving systems of linear equations

28.2 Inverting matrices

28.3 Symmetric positive-definite matrices and least-

squares approximation

29 Linear Programming

29.1 Linear programming formulations and

algorithms

29.2 Formulating problems as linear programs

29.3 Duality

30 Polynomials and the FFT

30.1 Representing polynomials

30.2 The DFT and FFT

30.3 FFT circuits

31 Number-Theoretic Algorithms

31.1 Elementary number-theoretic notions

31.2 Greatest common divisor

31.3 Modular arithmetic

31.4 Solving modular linear equations

31.5 The Chinese remainder theorem

31.6 Powers of an element

31.7 The RSA public-key cryptosystem

★ 31.8 Primality testing

32 String Matching

32.1 The naive string-matching algorithm

32.2 The Rabin-Karp algorithm

32.3 String matching with finite automata

★ 32.4 The Knuth-Morris-Pratt algorithm

32.5 Suffix arrays

33 Machine-Learning Algorithms

33.1 Clustering

33.2 Multiplicative-weights algorithms

33.3 Gradient descent

34 NP-Completeness

34.1 Polynomial time

34.2 Polynomial-time verification

34.3 NP-completeness and reducibility

34.4 NP-completeness proofs

34.5 NP-complete problems

35 Approximation Algorithms

35.1 The vertex-cover problem

35.2 The traveling-salesperson problem

35.3 The set-covering problem

35.4 Randomization and linear programming

35.5 The subset-sum problem

VIII Appendix: Mathematical Background

Introduction

A Summations

A.1 Summation formulas and properties

A.2 Bounding summations

B Sets, Etc.

B.1 Sets

B.2 Relations

B.3 Functions

B.4 Graphs

B.5 Trees

C Counting and Probability

C.1 Counting

C.2 Probability

C.3 Discrete random variables

C.4 The geometric and binomial distributions

★ C.5 The tails of the binomial distribution

D Matrices

D.1 Matrices and matrix operations

D.2 Basic matrix properties

Bibliography

Index

Preface

Not so long ago, anyone who had heard the word “algorithm” was

almost certainly a computer scientist or mathematician. With

computers having become prevalent in our modern lives, however, the

term is no longer esoteric. If you look around your home, you’ll find

algorithms running in the most mundane places: your microwave oven,

your washing machine, and, of course, your computer. You ask

algorithms to make recommendations to you: what music you might

like or what route to take when driving. Our society, for better or for

worse, asks algorithms to suggest sentences for convicted criminals. You

even rely on algorithms to keep you alive, or at least not to kill you: the

control systems in your car or in medical equipment.1 The word

“algorithm” appears somewhere in the news seemingly every day.

Therefore, it behooves you to understand algorithms not just as a

student or practitioner of computer science, but as a citizen of the

world. Once you understand algorithms, you can educate others about

what algorithms are, how they operate, and what their limitations are.

This book provides a comprehensive introduction to the modern

study of computer algorithms. It presents many algorithms and covers

them in considerable depth, yet makes their design accessible to all

levels of readers. All the analyses are laid out, some simple, some more

involved. We have tried to keep explanations clear without sacrificing

depth of coverage or mathematical rigor.

Each chapter presents an algorithm, a design technique, an

application area, or a related topic. Algorithms are described in English

and in a pseudocode designed to be readable by anyone who has done a

little programming. The book contains 231 figures—many with multiple

parts—illustrating how the algorithms work. Since we emphasize

efficiency as a design criterion, we include careful analyses of the

running times of the algorithms.

The text is intended primarily for use in undergraduate or graduate

courses in algorithms or data structures. Because it discusses

engineering issues in algorithm design, as well as mathematical aspects,

it is equally well suited for self-study by technical professionals.

In this, the fourth edition, we have once again updated the entire

book. The changes cover a broad spectrum, including new chapters and

sections, color illustrations, and what we hope you’ll find to be a more

engaging writing style.

To the teacher

We have designed this book to be both versatile and complete. You

should find it useful for a variety of courses, from an undergraduate

course in data structures up through a graduate course in algorithms.

Because we have provided considerably more material than can fit in a

typical one-term course, you can select the material that best supports

the course you wish to teach.

You should find it easy to organize your course around just the

chapters you need. We have made chapters relatively self-contained, so

that you need not worry about an unexpected and unnecessary

dependence of one chapter on another. Whereas in an undergraduate

course, you might use only some sections from a chapter, in a graduate

course, you might cover the entire chapter.

We have included 931 exercises and 162 problems. Each section ends

with exercises, and each chapter ends with problems. The exercises are

generally short questions that test basic mastery of the material. Some

are simple self-check thought exercises, but many are substantial and

suitable as assigned homework. The problems include more elaborate

case studies which often introduce new material. They often consist of

several parts that lead the student through the steps required to arrive at

a solution.

As with the third edition of this book, we have made publicly available solutions to some, but by no means all, of the problems and

exercises. You can find these solutions on our website,

http://mitpress.mit.edu/algorithms/. You will want to check this site to see whether it contains the solution to an exercise or problem that you

plan to assign. Since the set of solutions that we post might grow over

time, we recommend that you check the site each time you teach the

course.

We have starred (★) the sections and exercises that are more suitable

for graduate students than for undergraduates. A starred section is not

necessarily more difficult than an unstarred one, but it may require an

understanding of more advanced mathematics. Likewise, starred

exercises may require an advanced background or more than average

creativity.

To the student

We hope that this textbook provides you with an enjoyable introduction

to the field of algorithms. We have attempted to make every algorithm

accessible and interesting. To help you when you encounter unfamiliar

or difficult algorithms, we describe each one in a step-by-step manner.

We also provide careful explanations of the mathematics needed to

understand the analysis of the algorithms and supporting figures to help

you visualize what is going on.

Since this book is large, your class will probably cover only a portion

of its material. Although we hope that you will find this book helpful to

you as a course textbook now, we have also tried to make it

comprehensive enough to warrant space on your future professional

bookshelf.

What are the prerequisites for reading this book?

You need some programming experience. In particular, you

should understand recursive procedures and simple data

structures, such as arrays and linked lists (although Section 10.2

covers linked lists and a variant that you may find new).

You should have some facility with mathematical proofs, and

especially proofs by mathematical induction. A few portions of

the book rely on some knowledge of elementary calculus.

Although this book uses mathematics throughout, Part I and

Appendices AD teach you all the mathematical techniques you will need.

Our website, http://mitpress.mit.edu/algorithms/, links to solutions for some of the problems and exercises. Feel free to check your solutions

against ours. We ask, however, that you not send your solutions to us.

To the professional

The wide range of topics in this book makes it an excellent handbook

on algorithms. Because each chapter is relatively self-contained, you can

focus on the topics most relevant to you.

Since most of the algorithms we discuss have great practical utility,

we address implementation concerns and other engineering issues. We

often provide practical alternatives to the few algorithms that are

primarily of theoretical interest.

If you wish to implement any of the algorithms, you should find the

translation of our pseudocode into your favorite programming language

to be a fairly straightforward task. We have designed the pseudocode to

present each algorithm clearly and succinctly. Consequently, we do not

address error handling and other software-engineering issues that

require specific assumptions about your programming environment. We

attempt to present each algorithm simply and directly without allowing

the idiosyncrasies of a particular programming language to obscure its

essence. If you are used to 0-origin arrays, you might find our frequent

practice of indexing arrays from 1 a minor stumbling block. You can

always either subtract 1 from our indices or just overallocate the array

and leave position 0 unused.

We understand that if you are using this book outside of a course,

then you might be unable to check your solutions to problems and

exercises against solutions provided by an instructor. Our website,

http://mitpress.mit.edu/algorithms/, links to solutions for some of the

problems and exercises so that you can check your work. Please do not

send your solutions to us.

To our colleagues

We have supplied an extensive bibliography and pointers to the current

literature. Each chapter ends with a set of chapter notes that give

historical details and references. The chapter notes do not provide a

complete reference to the whole field of algorithms, however. Though it

may be hard to believe for a book of this size, space constraints

prevented us from including many interesting algorithms.

Despite myriad requests from students for solutions to problems and

exercises, we have adopted the policy of not citing references for them,

removing the temptation for students to look up a solution rather than

to discover it themselves.

Changes for the fourth edition

As we said about the changes for the second and third editions,

depending on how you look at it, the book changed either not much or

quite a bit. A quick look at the table of contents shows that most of the

third-edition chapters and sections appear in the fourth edition. We

removed three chapters and several sections, but we have added three

new chapters and several new sections apart from these new chapters.

We kept the hybrid organization from the first three editions. Rather

than organizing chapters only by problem domains or only according to

techniques, this book incorporates elements of both. It contains

technique-based

chapters

on

divide-and-conquer,

dynamic

programming, greedy algorithms, amortized analysis, augmenting data

structures, NP-completeness, and approximation algorithms. But it also

has entire parts on sorting, on data structures for dynamic sets, and on

algorithms for graph problems. We find that although you need to know

how to apply techniques for designing and analyzing algorithms,

problems seldom announce to you which techniques are most amenable

to solving them.

Some of the changes in the fourth edition apply generally across the

book, and some are specific to particular chapters or sections. Here is a

summary of the most significant general changes:

We added 140 new exercises and 22 new problems. We also

improved many of the old exercises and problems, often as the

result of reader feedback. (Thanks to all readers who made

suggestions.)

We have color! With designers from the MIT Press, we selected a

limited palette, devised to convey information and to be pleasing

to the eye. (We are delighted to display red-black trees in—get this

—red and black!) To enhance readability, defined terms,

pseudocode comments, and page numbers in the index are in

color.

Pseudocode procedures appear on a tan background to make

them easier to spot, and they do not necessarily appear on the

page of their first reference. When they don’t, the text directs you

to the relevant page. In the same vein, nonlocal references to

numbered equations, theorems, lemmas, and corollaries include

the page number.

We removed topics that were rarely taught. We dropped in their

entirety the chapters on Fibonacci heaps, van Emde Boas trees,

and computational geometry. In addition, the following material

was excised: the maximum-subarray problem, implementing

pointers and objects, perfect hashing, randomly built binary

search trees, matroids, push-relabel algorithms for maximum flow,

the iterative fast Fourier transform method, the details of the

simplex algorithm for linear programming, and integer

factorization. You can find all the removed material on our

website, http://mitpress.mit.edu/algorithms/.

We reviewed the entire book and rewrote sentences, paragraphs,

and sections to make the writing clearer, more personal, and

gender neutral. For example, the “traveling-salesman problem” in

the previous editions is now called the “traveling-salesperson

problem.” We believe that it is critically important for engineering

and science, including our own field of computer science, to be

welcoming to everyone. (The one place that stumped us is in

Chapter 13, which requires a term for a parent’s sibling. Because the English language has no such gender-neutral term, we

regretfully stuck with “uncle.”)

The chapter notes, bibliography, and index were updated,

reflecting the dramatic growth of the field of algorithms since the

third edition.

We corrected errors, posting most corrections on our website of

third-edition errata. Those that were reported while we were in

full swing preparing this edition were not posted, but were

corrected in this edition. (Thanks again to all readers who helped

us identify issues.)

The specific changes for the fourth edition include the following:

We renamed Chapter 3 and added a section giving an overview of

asymptotic notation before delving into the formal definitions.

Chapter 4 underwent substantial changes to improve its

mathematical foundation and make it more robust and intuitive.

The notion of an algorithmic recurrence was introduced, and the

topic of ignoring floors and ceilings in recurrences was addressed

more rigorously. The second case of the master theorem

incorporates polylogarithmic factors, and a rigorous proof of a

“continuous” version of the master theorem is now provided. We

also present the powerful and general Akra-Bazzi method

(without proof).

The deterministic order-statistic algorithm in Chapter 9 is slightly different, and the analyses of both the randomized and

deterministic order-statistic algorithms have been revamped.

In addition to stacks and queues, Section 10.1 discusses ways to

store arrays and matrices.

Chapter 11 on hash tables includes a modern treatment of hash functions. It also emphasizes linear probing as an efficient method

for resolving collisions when the underlying hardware implements

caching to favor local searches.

To replace the sections on matroids in Chapter 15, we converted a problem in the third edition about offline caching into a full

section.

Section 16.4 now contains a more intuitive explanation of the potential functions to analyze table doubling and halving.

Chapter 17 on augmenting data structures was relocated from

Part III to Part V, reflecting our view that this technique goes beyond basic material.

Chapter 25 is a new chapter about matchings in bipartite graphs.

It presents algorithms to find a matching of maximum cardinality,

to solve the stable-marriage problem, and to find a maximum-

weight matching (known as the “assignment problem”).

Chapter 26, on task-parallel computing, has been updated with modern terminology, including the name of the chapter.

Chapter 27, which covers online algorithms, is another new chapter. In an online algorithm, the input arrives over time, rather

than being available in its entirety at the start of the algorithm.

The chapter describes several examples of online algorithms,

including determining how long to wait for an elevator before

taking the stairs, maintaining a linked list via the move-to-front

heuristic, and evaluating replacement policies for caches.

In Chapter 29, we removed the detailed presentation of the simplex algorithm, as it was math heavy without really conveying

many algorithmic ideas. The chapter now focuses on the key

aspect of how to model problems as linear programs, along with

the essential duality property of linear programming.

Section 32.5 adds to the chapter on string matching the simple, yet powerful, structure of suffix arrays.

Chapter 33, on machine learning, is the third new chapter. It introduces several basic methods used in machine learning:

clustering to group similar items together, weighted-majority

algorithms, and gradient descent to find the minimizer of a

function.

Section 34.5.6 summarizes strategies for polynomial-time reductions to show that problems are NP-hard.

The proof of the approximation algorithm for the set-covering

problem in Section 35.3 has been revised.

Website

You can use our website, http://mitpress.mit.edu/algorithms/, to obtain supplementary information and to communicate with us. The website

links to a list of known errors, material from the third edition that is not

included in the fourth edition, solutions to selected exercises and

problems, Python implementations of many of the algorithms in this

book, a list explaining the corny professor jokes (of course), as well as

other content, which we may add to. The website also tells you how to

report errors or make suggestions.

How we produced this book

Like the previous three editions, the fourth edition was produced in

LATEX 2 ε. We used the Times font with mathematics typeset using the

MathTime Professional II fonts. As in all previous editions, we

compiled the index using Windex, a C program that we wrote, and

produced the bibliography using BIBTEX. The PDF files for this book

were created on a MacBook Pro running macOS 10.14.

Our plea to Apple in the preface of the third edition to update

MacDraw Pro for macOS 10 went for naught, and so we continued to

draw illustrations on pre-Intel Macs running MacDraw Pro under the

Classic environment of older versions of macOS 10. Many of the

mathematical expressions appearing in illustrations were laid in with the

psfrag package for LATEX 2 ε.

Acknowledgments for the fourth edition

We have been working with the MIT Press since we started writing the

first edition in 1987, collaborating with several directors, editors, and

production staff. Throughout our association with the MIT Press, their

support has always been outstanding. Special thanks to our editors Marie Lee, who put up with us for far too long, and Elizabeth Swayze,

who pushed us over the finish line. Thanks also to Director Amy Brand

and to Alex Hoopes.

As in the third edition, we were geographically distributed while

producing the fourth edition, working in the Dartmouth College

Department of Computer Science; the MIT Computer Science and

Artificial Intelligence Laboratory and the MIT Department of

Electrical Engineering and Computer Science; and the Columbia

University Department of Industrial Engineering and Operations

Research, Department of Computer Science, and Data Science Institute.

During the COVID-19 pandemic, we worked largely from home. We

thank our respective universities and colleagues for providing such

supportive and stimulating environments. As we complete this book,

those of us who are not retired are eager to return to our respective

universities now that the pandemic seems to be abating.

Julie Sussman, P.P.A., came to our rescue once again with her

technical copy-editing under tremendous time pressure. If not for Julie,

this book would be riddled with errors (or, let’s say, many more errors

than it has) and would be far less readable. Julie, we will be forever

indebted to you. Errors that remain are the responsibility of the authors

(and probably were inserted after Julie read the material).

Dozens of errors in previous editions were corrected in the process of

creating this edition. We thank our readers—too many to list them all—

who have reported errors and suggested improvements over the years.

We received considerable help in preparing some of the new material

in this edition. Neville Campbell (unaffiliated), Bill Kuszmaul of MIT,

and Chee Yap of NYU provided valuable advice regarding the

treatment of recurrences in Chapter 4. Yan Gu of the University of California, Riverside, provided feedback on parallel algorithms in

Chapter 26. Rob Shapire of Microsoft Research altered our approach to the material on machine learning with his detailed comments on

Chapter 33. Qi Qi of MIT helped with the analysis of the Monty Hall problem (Problem C-1).

Molly Seaman and Mary Reilly of the MIT Press helped us select the

color palette in the illustrations, and Wojciech Jarosz of Dartmouth

College suggested design improvements to our newly colored figures.

Yichen (Annie) Ke and Linda Xiao, who have since graduated from

Dartmouth, aided in colorizing the illustrations, and Linda also

produced many of the Python implementations that are available on the

book’s website.

Finally, we thank our wives—Wendy Leiserson, Gail Rivest, Rebecca

Ivry, and the late Nicole Cormen—and our families. The patience and

encouragement of those who love us made this project possible. We

affectionately dedicate this book to them.

THOMAS H. CORMEN

Lebanon, New Hampshire

CHARLES E. LEISERSON

Cambridge, Massachusetts

RONALD L. RIVEST

Cambridge, Massachusetts

CLIFFORD STEIN

New York, New York

June, 2021

1 To understand many of the ways in which algorithms influence our daily lives, see the book by Fry [162].

Part I Foundations

Introduction

When you design and analyze algorithms, you need to be able to

describe how they operate and how to design them. You also need some

mathematical tools to show that your algorithms do the right thing and

do it efficiently. This part will get you started. Later parts of this book

will build upon this base.

Chapter 1 provides an overview of algorithms and their place in modern computing systems. This chapter defines what an algorithm is

and lists some examples. It also makes a case for considering algorithms

as a technology, alongside technologies such as fast hardware, graphical

user interfaces, object-oriented systems, and networks.

In Chapter 2, we see our first algorithms, which solve the problem of sorting a sequence of n numbers. They are written in a pseudocode

which, although not directly translatable to any conventional

programming language, conveys the structure of the algorithm clearly

enough that you should be able to implement it in the language of your

choice. The sorting algorithms we examine are insertion sort, which uses

an incremental approach, and merge sort, which uses a recursive

technique known as “divide-and-conquer.” Although the time each

requires increases with the value of n, the rate of increase differs between the two algorithms. We determine these running times in

Chapter 2, and we develop a useful “asymptotic” notation to express them.

Chapter 3 precisely defines asymptotic notation. We’ll use

asymptotic notation to bound the growth of functions—most often,

functions that describe the running time of algorithms—from above and

below. The chapter starts by informally defining the most commonly

used asymptotic notations and giving an example of how to apply them.

It then formally defines five asymptotic notations and presents

conventions for how to put them together. The rest of Chapter 3 is primarily a presentation of mathematical notation, more to ensure that

your use of notation matches that in this book than to teach you new

mathematical concepts.

Chapter 4 delves further into the divide-and-conquer method

introduced in Chapter 2. It provides two additional examples of divide-and-conquer algorithms for multiplying square matrices, including

Strassen’s surprising method. Chapter 4 contains methods for solving recurrences, which are useful for describing the running times of

recursive algorithms. In the substitution method, you guess an answer

and prove it correct. Recursion trees provide one way to generate a

guess. Chapter 4 also presents the powerful technique of the “master method,” which you can often use to solve recurrences that arise from

divide-and-conquer algorithms. Although the chapter provides a proof

of a foundational theorem on which the master theorem depends, you

should feel free to employ the master method without delving into the

proof. Chapter 4 concludes with some advanced topics.

Chapter 5 introduces probabilistic analysis and randomized

algorithms. You typically use probabilistic analysis to determine the

running time of an algorithm in cases in which, due to the presence of

an inherent probability distribution, the running time may differ on

different inputs of the same size. In some cases, you might assume that

the inputs conform to a known probability distribution, so that you are

averaging the running time over all possible inputs. In other cases, the

probability distribution comes not from the inputs but from random

choices made during the course of the algorithm. An algorithm whose

behavior is determined not only by its input but by the values produced

by a random-number generator is a randomized algorithm. You can use

randomized algorithms to enforce a probability distribution on the

inputs—thereby ensuring that no particular input always causes poor

performance—or even to bound the error rate of algorithms that are

allowed to produce incorrect results on a limited basis.

Appendices A–D contain other mathematical material that you will find helpful as you read this book. You might have seen much of the

material in the appendix chapters before having read this book

(although the specific definitions and notational conventions we use

may differ in some cases from what you have seen in the past), and so

you should think of the appendices as reference material. On the other

hand, you probably have not already seen most of the material in Part I.

All the chapters in Part I and the appendices are written with a tutorial

flavor.

1 The Role of Algorithms in Computing

What are algorithms? Why is the study of algorithms worthwhile? What

is the role of algorithms relative to other technologies used in

computers? This chapter will answer these questions.

1.1 Algorithms

Informally, an algorithm is any well-defined computational procedure

that takes some value, or set of values, as input and produces some value, or set of values, as output in a finite amount of time. An

algorithm is thus a sequence of computational steps that transform the

input into the output.

You can also view an algorithm as a tool for solving a well-specified

computational problem. The statement of the problem specifies in

general terms the desired input/output relationship for problem

instances, typically of arbitrarily large size. The algorithm describes a

specific computational procedure for achieving that input/output

relationship for all problem instances.

As an example, suppose that you need to sort a sequence of numbers

into monotonically increasing order. This problem arises frequently in

practice and provides fertile ground for introducing many standard

design techniques and analysis tools. Here is how we formally define the

sorting problem:

Input: A sequence of n numbers 〈 a 1, a 2, … , an〉.

Image 2

Image 3

Output: A permutation (reordering)

of the input sequence

such that

.

Thus, given the input sequence 〈31, 41, 59, 26, 41, 58〉, a correct sorting

algorithm returns as output the sequence 〈26, 31, 41, 41, 58, 59〉. Such

an input sequence is called an instance of the sorting problem. In

general, an instance of a problem 1 consists of the input (satisfying whatever constraints are imposed in the problem statement) needed to

compute a solution to the problem.

Because many programs use it as an intermediate step, sorting is a

fundamental operation in computer science. As a result, you have a

large number of good sorting algorithms at your disposal. Which

algorithm is best for a given application depends on—among other

factors—the number of items to be sorted, the extent to which the items

are already somewhat sorted, possible restrictions on the item values,

the architecture of the computer, and the kind of storage devices to be

used: main memory, disks, or even—archaically—tapes.

An algorithm for a computational problem is correct if, for every

problem instance provided as input, it halts—finishes its computing in

finite time—and outputs the correct solution to the problem instance. A

correct algorithm solves the given computational problem. An incorrect

algorithm might not halt at all on some input instances, or it might halt

with an incorrect answer. Contrary to what you might expect, incorrect

algorithms can sometimes be useful, if you can control their error rate.

We’ll see an example of an algorithm with a controllable error rate in

Chapter 31 when we study algorithms for finding large prime numbers.

Ordinarily, however, we’ll concern ourselves only with correct

algorithms.

An algorithm can be specified in English, as a computer program, or

even as a hardware design. The only requirement is that the specification

must provide a precise description of the computational procedure to be

followed.

What kinds of problems are solved by algorithms?

Sorting is by no means the only computational problem for which algorithms have been developed. (You probably suspected as much

when you saw the size of this book.) Practical applications of algorithms

are ubiquitous and include the following examples:

The Human Genome Project has made great progress toward the

goals of identifying all the roughly 30,000 genes in human DNA,

determining the sequences of the roughly 3 billion chemical base

pairs that make up human DNA, storing this information in

databases, and developing tools for data analysis. Each of these

steps requires sophisticated algorithms. Although the solutions to

the various problems involved are beyond the scope of this book,

many methods to solve these biological problems use ideas

presented here, enabling scientists to accomplish tasks while using

resources efficiently. Dynamic programming, as in Chapter 14, is

an important technique for solving several of these biological

problems, particularly ones that involve determining similarity

between DNA sequences. The savings realized are in time, both

human and machine, and in money, as more information can be

extracted by laboratory techniques.

The internet enables people all around the world to quickly access

and retrieve large amounts of information. With the aid of clever

algorithms, sites on the internet are able to manage and

manipulate this large volume of data. Examples of problems that

make essential use of algorithms include finding good routes on

which the data travels (techniques for solving such problems

appear in Chapter 22), and using a search engine to quickly find

pages on which particular information resides (related techniques

are in Chapters 11 and 32).

Electronic commerce enables goods and services to be negotiated

and exchanged electronically, and it depends on the privacy of

personal information such as credit card numbers, passwords, and

bank statements. The core technologies used in electronic

commerce include public-key cryptography and digital signatures

(covered in Chapter 31), which are based on numerical algorithms and number theory.

Manufacturing and other commercial enterprises often need to

allocate scarce resources in the most beneficial way. An oil

company might wish to know where to place its wells in order to

maximize its expected profit. A political candidate might want to

determine where to spend money buying campaign advertising in

order to maximize the chances of winning an election. An airline

might wish to assign crews to flights in the least expensive way

possible, making sure that each flight is covered and that

government regulations regarding crew scheduling are met. An

internet service provider might wish to determine where to place

additional resources in order to serve its customers more

effectively. All of these are examples of problems that can be

solved by modeling them as linear programs, which Chapter 29

explores.

Although some of the details of these examples are beyond the scope

of this book, we do give underlying techniques that apply to these

problems and problem areas. We also show how to solve many specific

problems, including the following:

You have a road map on which the distance between each pair of

adjacent intersections is marked, and you wish to determine the

shortest route from one intersection to another. The number of

possible routes can be huge, even if you disallow routes that cross

over themselves. How can you choose which of all possible routes

is the shortest? You can start by modeling the road map (which is

itself a model of the actual roads) as a graph (which we will meet

in Part VI and Appendix B). In this graph, you wish to find the shortest path from one vertex to another. Chapter 22 shows how

to solve this problem efficiently.

Given a mechanical design in terms of a library of parts, where

each part may include instances of other parts, list the parts in

order so that each part appears before any part that uses it. If the

design comprises n parts, then there are n! possible orders, where

n! denotes the factorial function. Because the factorial function grows faster than even an exponential function, you cannot

feasibly generate each possible order and then verify that, within

that order, each part appears before the parts using it (unless you

have only a few parts). This problem is an instance of topological

sorting, and Chapter 20 shows how to solve this problem

efficiently.

A doctor needs to determine whether an image represents a

cancerous tumor or a benign one. The doctor has available images

of many other tumors, some of which are known to be cancerous

and some of which are known to be benign. A cancerous tumor is

likely to be more similar to other cancerous tumors than to

benign tumors, and a benign tumor is more likely to be similar to

other benign tumors. By using a clustering algorithm, as in

Chapter 33, the doctor can identify which outcome is more likely.

You need to compress a large file containing text so that it

occupies less space. Many ways to do so are known, including

“LZW compression,” which looks for repeating character

sequences. Chapter 15 studies a different approach, “Huffman

coding,” which encodes characters by bit sequences of various

lengths, with characters occurring more frequently encoded by

shorter bit sequences.

These lists are far from exhaustive (as you again have probably

surmised from this book’s heft), but they exhibit two characteristics

common to many interesting algorithmic problems:

1. They have many candidate solutions, the overwhelming majority

of which do not solve the problem at hand. Finding one that

does, or one that is “best,” without explicitly examining each

possible solution, can present quite a challenge.

2. They have practical applications. Of the problems in the above

list, finding the shortest path provides the easiest examples. A

transportation firm, such as a trucking or railroad company, has

a financial interest in finding shortest paths through a road or

rail network because taking shorter paths results in lower labor

and fuel costs. Or a routing node on the internet might need to

find the shortest path through the network in order to route a

message quickly. Or a person wishing to drive from New York to

Boston might want to find driving directions using a navigation

app.

Not every problem solved by algorithms has an easily identified set

of candidate solutions. For example, given a set of numerical values

representing samples of a signal taken at regular time intervals, the

discrete Fourier transform converts the time domain to the frequency

domain. That is, it approximates the signal as a weighted sum of

sinusoids, producing the strength of various frequencies which, when

summed, approximate the sampled signal. In addition to lying at the

heart of signal processing, discrete Fourier transforms have applications

in data compression and multiplying large polynomials and integers.

Chapter 30 gives an efficient algorithm, the fast Fourier transform (commonly called the FFT), for this problem. The chapter also sketches

out the design of a hardware FFT circuit.

Data structures

This book also presents several data structures. A data structure is a way

to store and organize data in order to facilitate access and

modifications. Using the appropriate data structure or structures is an

important part of algorithm design. No single data structure works well

for all purposes, and so you should know the strengths and limitations

of several of them.

Technique

Although you can use this book as a “cookbook” for algorithms, you

might someday encounter a problem for which you cannot readily find a

published algorithm (many of the exercises and problems in this book,

for example). This book will teach you techniques of algorithm design

and analysis so that you can develop algorithms on your own, show that

they give the correct answer, and analyze their efficiency. Different

chapters address different aspects of algorithmic problem solving. Some chapters address specific problems, such as finding medians and order

statistics in Chapter 9, computing minimum spanning trees in Chapter

21, and determining a maximum flow in a network in Chapter 24. Other

chapters introduce techniques, such as divide-and-conquer in Chapters

2 and 4, dynamic programming in Chapter 14, and amortized analysis

in Chapter 16.

Hard problems

Most of this book is about efficient algorithms. Our usual measure of

efficiency is speed: how long does an algorithm take to produce its

result? There are some problems, however, for which we know of no

algorithm that runs in a reasonable amount of time. Chapter 34 studies an interesting subset of these problems, which are known as NP-complete.

Why are NP-complete problems interesting? First, although no

efficient algorithm for an NP-complete problem has ever been found,

nobody has ever proven that an efficient algorithm for one cannot exist.

In other words, no one knows whether efficient algorithms exist for NP-

complete problems. Second, the set of NP-complete problems has the

remarkable property that if an efficient algorithm exists for any one of

them, then efficient algorithms exist for all of them. This relationship

among the NP-complete problems makes the lack of efficient solutions

all the more tantalizing. Third, several NP-complete problems are

similar, but not identical, to problems for which we do know of efficient

algorithms. Computer scientists are intrigued by how a small change to

the problem statement can cause a big change to the efficiency of the

best known algorithm.

You should know about NP-complete problems because some of

them arise surprisingly often in real applications. If you are called upon

to produce an efficient algorithm for an NP-complete problem, you are

likely to spend a lot of time in a fruitless search. If, instead, you can show that the problem is NP-complete, you can spend your time

developing an efficient approximation algorithm, that is, an algorithm

that gives a good, but not necessarily the best possible, solution.

As a concrete example, consider a delivery company with a central

depot. Each day, it loads up delivery trucks at the depot and sends them

around to deliver goods to several addresses. At the end of the day, each

truck must end up back at the depot so that it is ready to be loaded for

the next day. To reduce costs, the company wants to select an order of

delivery stops that yields the lowest overall distance traveled by each

truck. This problem is the well-known “traveling-salesperson problem,”

and it is NP-complete.2 It has no known efficient algorithm. Under certain assumptions, however, we know of efficient algorithms that

compute overall distances close to the smallest possible. Chapter 35

discusses such “approximation algorithms.”

Alternative computing models

For many years, we could count on processor clock speeds increasing at

a steady rate. Physical limitations present a fundamental roadblock to

ever-increasing clock speeds, however: because power density increases

superlinearly with clock speed, chips run the risk of melting once their

clock speeds become high enough. In order to perform more

computations per second, therefore, chips are being designed to contain

not just one but several processing “cores.” We can liken these multicore

computers to several sequential computers on a single chip. In other

words, they are a type of “parallel computer.” In order to elicit the best

performance from multicore computers, we need to design algorithms

with parallelism in mind. Chapter 26 presents a model for “task-parallel” algorithms, which take advantage of multiple processing cores.

This model has advantages from both theoretical and practical

standpoints, and many modern parallel-programming platforms

embrace something similar to this model of parallelism.

Most of the examples in this book assume that all of the input data

are available when an algorithm begins running. Much of the work in

algorithm design makes the same assumption. For many important real-

world examples, however, the input actually arrives over time, and the

algorithm must decide how to proceed without knowing what data will

arrive in the future. In a data center, jobs are constantly arriving and

departing, and a scheduling algorithm must decide when and where to

run a job, without knowing what jobs will be arriving in the future.

Traffic must be routed in the internet based on the current state, without

knowing about where traffic will arrive in the future. Hospital

emergency rooms make triage decisions about which patients to treat

first without knowing when other patients will be arriving in the future

and what treatments they will need. Algorithms that receive their input

over time, rather than having all the input present at the start, are online

algorithms, which Chapter 27 examines.

Exercises

1.1-1

Describe your own real-world example that requires sorting. Describe

one that requires finding the shortest distance between two points.

1.1-2

Other than speed, what other measures of efficiency might you need to

consider in a real-world setting?

1.1-3

Select a data structure that you have seen, and discuss its strengths and

limitations.

1.1-4

How are the shortest-path and traveling-salesperson problems given

above similar? How are they different?

1.1-5

Suggest a real-world problem in which only the best solution will do.

Then come up with one in which “approximately” the best solution is

good enough.

1.1-6

Describe a real-world problem in which sometimes the entire input is

available before you need to solve the problem, but other times the input

is not entirely available in advance and arrives over time.

1.2 Algorithms as a technology

If computers were infinitely fast and computer memory were free, would

you have any reason to study algorithms? The answer is yes, if for no

other reason than that you would still like to be certain that your

solution method terminates and does so with the correct answer.

If computers were infinitely fast, any correct method for solving a

problem would do. You would probably want your implementation to

be within the bounds of good software engineering practice (for

example, your implementation should be well designed and

documented), but you would most often use whichever method was the

easiest to implement.

Of course, computers may be fast, but they are not infinitely fast.

Computing time is therefore a bounded resource, which makes it

precious. Although the saying goes, “Time is money,” time is even more

valuable than money: you can get back money after you spend it, but

once time is spent, you can never get it back. Memory may be

inexpensive, but it is neither infinite nor free. You should choose

algorithms that use the resources of time and space efficiently.

Efficiency

Different algorithms devised to solve the same problem often differ

dramatically in their efficiency. These differences can be much more

significant than differences due to hardware and software.

As an example, Chapter 2 introduces two algorithms for sorting. The

first, known as insertion sort, takes time roughly equal to c 1 n 2 to sort n items, where c 1 is a constant that does not depend on n. That is, it takes time roughly proportional to n 2. The second, merge sort, takes time roughly equal to c 2 n lg n, where lg n stands for log2 n and c 2 is another constant that also does not depend on n. Insertion sort typically has a

smaller constant factor than merge sort, so that c 1 < c 2. We’ll see that the constant factors can have far less of an impact on the running time

than the dependence on the input size n. Let’s write insertion sort’s running time as c 1 n · n and merge sort’s running time as c 2 n · lg n. Then

Image 4

Image 5

we see that where insertion sort has a factor of n in its running time, merge sort has a factor of lg n, which is much smaller. For example, when n is 1000, lg n is approximately 10, and when n is 1,000,000, lg n is approximately only 20. Although insertion sort usually runs faster than

merge sort for small input sizes, once the input size n becomes large enough, merge sort’s advantage of lg n versus n more than compensates

for the difference in constant factors. No matter how much smaller c 1 is

than c 2, there is always a crossover point beyond which merge sort is

faster.

For a concrete example, let us pit a faster computer (computer A)

running insertion sort against a slower computer (computer B) running

merge sort. They each must sort an array of 10 million numbers.

(Although 10 million numbers might seem like a lot, if the numbers are

eight-byte integers, then the input occupies about 80 megabytes, which

fits in the memory of even an inexpensive laptop computer many times

over.) Suppose that computer A executes 10 billion instructions per

second (faster than any single sequential computer at the time of this

writing) and computer B executes only 10 million instructions per

second (much slower than most contemporary computers), so that

computer A is 1000 times faster than computer B in raw computing

power. To make the difference even more dramatic, suppose that the

world’s craftiest programmer codes insertion sort in machine language

for computer A, and the resulting code requires 2 n 2 instructions to sort

n numbers. Suppose further that just an average programmer

implements merge sort, using a high-level language with an inefficient

compiler, with the resulting code taking 50 n lg n instructions. To sort 10

million numbers, computer A takes

while computer B takes

By using an algorithm whose running time grows more slowly, even with a poor compiler, computer B runs more than 17 times faster than

computer A! The advantage of merge sort is even more pronounced

when sorting 100 million numbers: where insertion sort takes more than

23 days, merge sort takes under four hours. Although 100 million might

seem like a large number, there are more than 100 million web searches

every half hour, more than 100 million emails sent every minute, and

some of the smallest galaxies (known as ultra-compact dwarf galaxies)

contain about 100 million stars. In general, as the problem size

increases, so does the relative advantage of merge sort.

Algorithms and other technologies

The example above shows that you should consider algorithms, like

computer hardware, as a technology. Total system performance depends

on choosing efficient algorithms as much as on choosing fast hardware.

Just as rapid advances are being made in other computer technologies,

they are being made in algorithms as well.

You might wonder whether algorithms are truly that important on

contemporary computers in light of other advanced technologies, such

as

advanced computer architectures and fabrication technologies,

easy-to-use, intuitive, graphical user interfaces (GUIs),

object-oriented systems,

integrated web technologies,

fast networking, both wired and wireless,

machine learning,

and mobile devices.

The answer is yes. Although some applications do not explicitly require

algorithmic content at the application level (such as some simple, web-

based applications), many do. For example, consider a web-based

service that determines how to travel from one location to another. Its

implementation would rely on fast hardware, a graphical user interface,

wide-area networking, and also possibly on object orientation. It would also require algorithms for operations such as finding routes (probably

using a shortest-path algorithm), rendering maps, and interpolating

addresses.

Moreover, even an application that does not require algorithmic

content at the application level relies heavily upon algorithms. Does the

application rely on fast hardware? The hardware design used

algorithms. Does the application rely on graphical user interfaces? The

design of any GUI relies on algorithms. Does the application rely on

networking? Routing in networks relies heavily on algorithms. Was the

application written in a language other than machine code? Then it was

processed by a compiler, interpreter, or assembler, all of which make

extensive use of algorithms. Algorithms are at the core of most

technologies used in contemporary computers.

Machine learning can be thought of as a method for performing

algorithmic tasks without explicitly designing an algorithm, but instead

inferring patterns from data and thereby automatically learning a

solution. At first glance, machine learning, which automates the process

of algorithmic design, may seem to make learning about algorithms

obsolete. The opposite is true, however. Machine learning is itself a

collection of algorithms, just under a different name. Furthermore, it

currently seems that the successes of machine learning are mainly for

problems for which we, as humans, do not really understand what the

right algorithm is. Prominent examples include computer vision and

automatic language translation. For algorithmic problems that humans

understand well, such as most of the problems in this book, efficient

algorithms designed to solve a specific problem are typically more

successful than machine-learning approaches.

Data science is an interdisciplinary field with the goal of extracting

knowledge and insights from structured and unstructured data. Data

science uses methods from statistics, computer science, and

optimization. The design and analysis of algorithms is fundamental to

the field. The core techniques of data science, which overlap significantly

with those in machine learning, include many of the algorithms in this

book.

Furthermore, with the ever-increasing capacities of computers, we use them to solve larger problems than ever before. As we saw in the

above comparison between insertion sort and merge sort, it is at larger

problem sizes that the differences in efficiency between algorithms

become particularly prominent.

Having a solid base of algorithmic knowledge and technique is one

characteristic that defines the truly skilled programmer. With modern

computing technology, you can accomplish some tasks without

knowing much about algorithms, but with a good background in

algorithms, you can do much, much more.

Exercises

1.2-1

Give an example of an application that requires algorithmic content at

the application level, and discuss the function of the algorithms

involved.

1.2-2

Suppose that for inputs of size n on a particular computer, insertion sort

runs in 8 n 2 steps and merge sort runs in 64 n lg n steps. For which values of n does insertion sort beat merge sort?

1.2-3

What is the smallest value of n such that an algorithm whose running

time is 100 n 2 runs faster than an algorithm whose running time is 2 n on the same machine?

Problems

1-1 Comparison of running times

For each function f ( n) and time t in the following table, determine the largest size n of a problem that can be solved in time t, assuming that the algorithm to solve the problem takes f ( n) microseconds.

Image 6

Chapter notes

There are many excellent texts on the general topic of algorithms,

including those by Aho, Hopcroft, and Ullman [5, 6], Dasgupta, Papadimitriou, and Vazirani [107], Edmonds [133], Erickson [135], Goodrich and Tamassia [195, 196], Kleinberg and Tardos [257], Knuth

[259, 260, 261, 262, 263], Levitin [298], Louridas [305], Mehlhorn and Sanders [325], Mitzenmacher and Upfal [331], Neapolitan [342], Roughgarden [385, 386, 387, 388], Sanders, Mehlhorn, Dietzfelbinger, and Dementiev [393], Sedgewick and Wayne [402], Skiena [414], Soltys-Kulinicz [419], Wilf [455], and Williamson and Shmoys [459]. Some of the more practical aspects of algorithm design are discussed by Bentley

[49, 50, 51], Bhargava [54], Kochenderfer and Wheeler [268], and McGeoch [321]. Surveys of the field of algorithms can also be found in books by Atallah and Blanton [27, 28] and Mehta and Sahhi [326]. For less technical material, see the books by Christian and Griffiths [92], Cormen [104], Erwig [136], MacCormick [307], and Vöcking et al. [448].

Overviews of the algorithms used in computational biology can be

found in books by Jones and Pevzner [240], Elloumi and Zomaya [134], and Marchisio [315].

1 Sometimes, when the problem context is known, problem instances are themselves simply called “problems.”

2 To be precise, only decision problems—those with a “yes/no” answer—can be NP-complete.

The decision version of the traveling salesperson problem asks whether there exists an order of stops whose distance totals at most a given amount.

Image 7

Image 8

2 Getting Started

This chapter will familiarize you with the framework we’ll use

throughout the book to think about the design and analysis of

algorithms. It is self-contained, but it does include several references to

material that will be introduced in Chapters 3 and 4. (It also contains several summations, which Appendix A shows how to solve.)

We’ll begin by examining the insertion sort algorithm to solve the

sorting problem introduced in Chapter 1. We’ll specify algorithms using a pseudocode that should be understandable to you if you have done

computer programming. We’ll see why insertion sort correctly sorts and

analyze its running time. The analysis introduces a notation that

describes how running time increases with the number of items to be

sorted. Following a discussion of insertion sort, we’ll use a method

called divide-and-conquer to develop a sorting algorithm called merge

sort. We’ll end with an analysis of merge sort’s running time.

2.1 Insertion sort

Our first algorithm, insertion sort, solves the sorting problem introduced

in Chapter 1:

Input: A sequence of n numbers 〈 a 1, a 2, … , an〉.

Output: A permutation (reordering)

of the input sequence

such that

.

The numbers to be sorted are also known as the keys. Although the problem is conceptually about sorting a sequence, the input comes in

the form of an array with n elements. When we want to sort numbers,

it’s often because they are the keys associated with other data, which we

call satellite data. Together, a key and satellite data form a record. For example, consider a spreadsheet containing student records with many

associated pieces of data such as age, grade-point average, and number

of courses taken. Any one of these quantities could be a key, but when

the spreadsheet sorts, it moves the associated record (the satellite data)

with the key. When describing a sorting algorithm, we focus on the keys,

but it is important to remember that there usually is associated satellite

data.

In this book, we’ll typically describe algorithms as procedures

written in a pseudocode that is similar in many respects to C, C++, Java,

Python,1 or JavaScript. (Apologies if we’ve omitted your favorite programming language. We can’t list them all.) If you have been

introduced to any of these languages, you should have little trouble

understanding algorithms “coded” in pseudocode. What separates

pseudocode from real code is that in pseudocode, we employ whatever

expressive method is most clear and concise to specify a given

algorithm. Sometimes the clearest method is English, so do not be

surprised if you come across an English phrase or sentence embedded

within a section that looks more like real code. Another difference

between pseudocode and real code is that pseudocode often ignores

aspects of software engineering—such as data abstraction, modularity,

and error handling—in order to convey the essence of the algorithm

more concisely.

We start with insertion sort, which is an efficient algorithm for

sorting a small number of elements. Insertion sort works the way you

might sort a hand of playing cards. Start with an empty left hand and

the cards in a pile on the table. Pick up the first card in the pile and hold

it with your left hand. Then, with your right hand, remove one card at a

time from the pile, and insert it into the correct position in your left

hand. As Figure 2.1 illustrates, you find the correct position for a card by comparing it with each of the cards already in your left hand,

Image 9

starting at the right and moving left. As soon as you see a card in your

left hand whose value is less than or equal to the card you’re holding in

your right hand, insert the card that you’re holding in your right hand

just to the right of this card in your left hand. If all the cards in your left

hand have values greater than the card in your right hand, then place

this card as the leftmost card in your left hand. At all times, the cards

held in your left hand are sorted, and these cards were originally the top

cards of the pile on the table.

The pseudocode for insertion sort is given as the procedure

INSERTION-SORT on the facing page. It takes two parameters: an

array A containing the values to be sorted and the number n of values of sort. The values occupy positions A[1] through A[ n] of the array, which we denote by A[1 : n]. When the INSERTION-SORT procedure is

finished, array A[1 : n] contains the original values, but in sorted order.

Figure 2.1 Sorting a hand of cards using insertion sort.

INSERTION-SORT( A, n)

1 for i = 2 to n

2

key = A[ i]

3

// Insert A[ i] into the sorted subarray A[1 : i – 1].

4

j = i – 1

5

while j > 0 and A[ j] > key

Image 10

6

A[ j + 1] = A[ j]

7

j = j – 1

8

A[ j + 1] = key

Loop invariants and the correctness of insertion sort

Figure 2.2 shows how this algorithm works for an array A that starts out with the sequence 〈5, 2, 4, 6, 1, 3〉. The index i indicates the “current

card” being inserted into the hand. At the beginning of each iteration of

the for loop, which is indexed by i, the subarray (a contiguous portion of the array) consisting of elements A[1 : i – 1] (that is, A[1] through A[ i

1]) constitutes the currently sorted hand, and the remaining subarray

A[ i + 1 : n] (elements A[ i + 1] through A[ n]) corresponds to the pile of cards still on the table. In fact, elements A[1 : i – 1] are the elements originally in positions 1 through i – 1, but now in sorted order. We state these properties of A[1 : i – 1] formally as a loop invariant:

Figure 2.2 The operation of INSERTION-SORT( A, n), where A initially contains the sequence

〈5, 2, 4, 6, 1, 3〉 and n = 6. Array indices appear above the rectangles, and values stored in the array positions appear within the rectangles. (a)–(e) The iterations of the for loop of lines 1–8. In each iteration, the blue rectangle holds the key taken from A[ i], which is compared with the values in tan rectangles to its left in the test of line 5. Orange arrows show array values moved one position to the right in line 6, and blue arrows indicate where the key moves to in line 8. (f) The final sorted array.

At the start of each iteration of the for loop of lines 1–8, the

subarray A[1 : i – 1] consists of the elements originally in A[1 : i

– 1], but in sorted order.

Loop invariants help us understand why an algorithm is correct.

When you’re using a loop invariant, you need to show three things:

Initialization: It is true prior to the first iteration of the loop.

Maintenance: If it is true before an iteration of the loop, it remains true

before the next iteration.

Termination: The loop terminates, and when it terminates, the invariant

—usually along with the reason that the loop terminated—gives us a

useful property that helps show that the algorithm is correct.

When the first two properties hold, the loop invariant is true prior to

every iteration of the loop. (Of course, you are free to use established

facts other than the loop invariant itself to prove that the loop invariant

remains true before each iteration.) A loop-invariant proof is a form of

mathematical induction, where to prove that a property holds, you

prove a base case and an inductive step. Here, showing that the

invariant holds before the first iteration corresponds to the base case,

and showing that the invariant holds from iteration to iteration

corresponds to the inductive step.

The third property is perhaps the most important one, since you are

using the loop invariant to show correctness. Typically, you use the loop

invariant along with the condition that caused the loop to terminate.

Mathematical induction typically applies the inductive step infinitely,

but in a loop invariant the “induction” stops when the loop terminates.

Let’s see how these properties hold for insertion sort.

Initialization: We start by showing that the loop invariant holds before

the first loop iteration, when i = 2.2 The subarray A[1 : i – 1] consists of just the single element A[1], which is in fact the original element in

A[1]. Moreover, this subarray is sorted (after all, how could a subarray

with just one value not be sorted?), which shows that the loop

invariant holds prior to the first iteration of the loop.

Maintenance: Next, we tackle the second property: showing that each

iteration maintains the loop invariant. Informally, the body of the for

loop works by moving the values in A[ i – 1], A[ i – 2], A[ i – 3], and so on by one position to the right until it finds the proper position for

A[ i] (lines 4–7), at which point it inserts the value of A[ i] (line 8). The subarray A[1 : i] then consists of the elements originally in A[1 : i], but

in sorted order. Incrementing i (increasing its value by 1) for the next iteration of the for loop then preserves the loop invariant.

A more formal treatment of the second property would require us to

state and show a loop invariant for the while loop of lines 5–7. Let’s

not get bogged down in such formalism just yet. Instead, we’ll rely on

our informal analysis to show that the second property holds for the

outer loop.

Termination: Finally, we examine loop termination. The loop variable i

starts at 2 and increases by 1 in each iteration. Once i’s value exceeds n

in line 1, the loop terminates. That is, the loop terminates once i

equals n + 1. Substituting n + 1 for i in the wording of the loop invariant yields that the subarray A[1 : n] consists of the elements originally in A[1 : n], but in sorted order. Hence, the algorithm is correct.

This method of loop invariants is used to show correctness in various

places throughout this book.

Pseudocode conventions

We use the following conventions in our pseudocode.

Indentation indicates block structure. For example, the body of

the for loop that begins on line 1 consists of lines 2–8, and the

body of the while loop that begins on line 5 contains lines 6–7 but

not line 8. Our indentation style applies to if-else statements3 as

well. Using indentation instead of textual indicators of block

structure, such as begin and end statements or curly braces,

reduces clutter while preserving, or even enhancing, clarity.4

The looping constructs while, for, and repeat-until and the if-else

conditional construct have interpretations similar to those in C,

C++, Java, Python, and JavaScript.5 In this book, the loop

counter retains its value after the loop is exited, unlike some

situations that arise in C++ and Java. Thus, immediately after a

for loop, the loop counter’s value is the value that first exceeded

the for loop bound.6 We used this property in our correctness argument for insertion sort. The for loop header in line 1 is for i =

2 to n, and so when this loop terminates, i equals n + 1. We use the keyword to when a for loop increments its loop counter in each

iteration, and we use the keyword downto when a for loop

decrements its loop counter (reduces its value by 1 in each

iteration). When the loop counter changes by an amount greater

than 1, the amount of change follows the optional keyword by.

The symbol “//” indicates that the remainder of the line is a

comment.

Variables (such as i, j, and key) are local to the given procedure.

We won’t use global variables without explicit indication.

We access array elements by specifying the array name followed

by the index in square brackets. For example, A[ i] indicates the i th element of the array A.

Although many programming languages enforce 0-origin indexing

for arrays (0 is the smallest valid index), we choose whichever

indexing scheme is clearest for human readers to understand.

Because people usually start counting at 1, not 0, most—but not

all—of the arrays in this book use 1-origin indexing. To be clear

about whether a particular algorithm assumes 0-origin or 1-origin

indexing, we’ll specify the bounds of the arrays explicitly. If you

are implementing an algorithm that we specify using 1-origin

indexing, but you’re writing in a programming language that

enforces 0-origin indexing (such as C, C++, Java, Python, or

JavaScript), then give yourself credit for being able to adjust. You

can either always subtract 1 from each index or allocate each

array with one extra position and just ignore position 0.

The notation “:” denotes a subarray. Thus, A[ i : j] indicates the subarray of A consisting of the elements A[ i], A[ i + 1], … , A[ j]. 7

We also use this notation to indicate the bounds of an array, as we

did earlier when discussing the array A[1 : n].

We typically organize compound data into objects, which are composed of attributes. We access a particular attribute using the

syntax found in many object-oriented programming languages:

the object name, followed by a dot, followed by the attribute

name. For example, if an object x has attribute f, we denote this

attribute by x.f.

We treat a variable representing an array or object as a pointer

(known as a reference in some programming languages) to the

data representing the array or object. For all attributes f of an

object x, setting y = x causes y.f to equal x.f. Moreover, if we now set x.f = 3, then afterward not only does x.f equal 3, but y.f equals 3 as well. In other words, x and y point to the same object after

the assignment y = x. This way of treating arrays and objects is

consistent with most contemporary programming languages.

Our attribute notation can “cascade.” For example, suppose that

the attribute f is itself a pointer to some type of object that has an

attribute g. Then the notation x.f.g is implicitly parenthesized as

( x.f). g. In other words, if we had assigned y = x.f, then x.f.g is the same as y.g.

Sometimes a pointer refers to no object at all. In this case, we give

it the special value NIL.

We pass parameters to a procedure by value: the called procedure

receives its own copy of the parameters, and if it assigns a value to

a parameter, the change is not seen by the calling procedure. When

objects are passed, the pointer to the data representing the object

is copied, but the object’s attributes are not. For example, if x is a

parameter of a called procedure, the assignment x = y within the

called procedure is not visible to the calling procedure. The

assignment x.f = 3, however, is visible if the calling procedure has

a pointer to the same object as x. Similarly, arrays are passed by