Lecture Notes for CSCI 3110:

Design and Analysis of Algorithms

Travis Gagie

Faculty of Computer Science

Dalhousie University

Summer 2021

Contents

1 “Clink” versus “BOOM” 3

I Divide and Conquer 8

2 Colouring Graphs 9

Assignment 1 19

Solution 21

3 Euclid, Karatsuba, Strassen 24

4 Fast Fourier Transform 32

Assignment 2 37

Solutions 39

5 Asymptotic Bounds 44

6 Sorting 51

Assignment 3 59

Solutions 61

1

II Greedy Algorithms 62

7 Huffman Coding 63

8 Minimum Spanning Trees 74

9 More Greedy Algorithms 82

Assignment 4 91

Midterm 1 93

Solutions 95

III Dynamic Programming 98

10 Edit Distance 99

11 Subset Sum and Knapsack 107

12 Optimal Binary Search Trees and Longest Increasing Subsequences 113

Assignment 5 120

2

Chapter 1

“Clink” versus “BOOM”

Imagine I’m standing in front of you holding up two little pieces of silver-grey metal and between

us is a bucket of water. You can see the pieces of metal look pretty much the same; I tap them

on the desk so you can tell they sound about the same; I pass them around so you can check they

feel about the same, weigh about the same, smell the same (not really at all), although I warn you

not to taste them. I take them back and throw first one into the bucket, where it goes “clink”, and

then the other, where it goes “BOOM”.

Even if we had in-person classes I probably wouldn’t be allowed to explode things in front of

my students — that’s what tenure is for! — but I want you to think about how two things that

seem the same can behave so differently. Explaining that for those two little pieces of metal gets us

into discussions that get pretty deep pretty quickly, of chemical reactions and molecules and atoms.

The famous physicist Richard Feynman said that if some disaster were going to wipe out nearly all

scientific knowledge and he could choose once sentence to pass on to future generations, it would be

“all things are made of atoms”. Other scientists might choose deep statements from their own fields

that have changed humanity’s view of the universe and our place in it: for an astronomer, “the

earth goes around the sun”; for a biologist, “species arise through a process of natural selection”;

for a psychologist, “the mind is a function of the brain”. Statements about mathematics usually

aren’t quite so controversial, at least among non-mathematicians, but the story goes that the first

mathematician to prove that the square root of 2 is irrational was thrown overboard by his fellow

Pythagoreans.

So, what have we got? You’ve been studying computer science in university for a few years

now, and I’d like to know what you’ve learned about our field that can reasonably be considered

one of the great scientific truths. “The speed of processors doubles every 18 months”? “Adding

workers to a late software project makes it later”? “All your bases are belong to us”? “The cake

is a lie”? This is when I’d really like to be able to stop and let you think, and then hear what

you have to say. Under the circumstances, however, I’m just going to have to assume most of you

chose to study computer science because it’s fun, you can do cool things with it (including saving

people’s lives and making the world a better place), it pays pretty well and it’s indoor work with

no heavy lifting — even though it may not tell us deep things about the universe.

3

That’s rather a shame. If you check Scientific American’s list of “100 or so Books that Shaped a

Century of Science”, under “Physical Sciences” and mixed in with Stephen Hawking’s A Brief His-

tory of Time, Primo Levi’s The Periodic Table, Carl Sagan’s The Pale Blue Dot, Paul Dirac’s Quan-

tum Mechanics, The Collected Papers of Albert Einstein, Linus Pauling’s Nature of the Chemical

Bond and Feynman’s QED, you’ll find books about computer science (or things close to it): Benoit

Mandelbrot’s Fractals, Norbert Weiner’s Cybernetics and, last but definitely not least, Knuth’s Art

of Computer Programming. So apparently we are doing deep things, we’re just not teaching you

about them.

Actually, I think that in a very important sense, we are the deepest of the sciences. To under-

stand where the basic rules of psychology come from, for example, you must know something about

biology; to understand where the basic rules of biology come from, you must know something about

chemistry; to understand where the basic rules of chemistry come from, you must know something

about physics; to understand where the basic rules of physics come from, you must know some-

thing about mathematics; but who tells the mathematicians what they can and cannot do? We do.

Philosophers, theologians and politicians might claim to, but I think the mathematicians largely

ignore them. One field has really stood up to mathematics and laid down the law, and that field

was computer science.

At the International Congress of Mathematicians in 1900, a famous mathematician named

David Hilbert proposed ten problems that he thought mathematicians should work on in the 20th

century. He later expanded the list to 23 problems, which were published in 1902 in the Bulletin of

the American Mathematical Society. Let’s talk about Hilbert’s Tenth Problem (for the integers): “to

devise a process according to which it can be determined in a finite number of operations whether

[a given Diophantine equation with integer coefficients] is solvable in [the integers]”. Notice Hilbert

didn’t ask whether such a process existed; back then, people assumed that if a mathematical

statement were true, with enough effort they should be able to prove it.

I’ll assume you all know what a quadratic equation in one variable is, and that you can tell if

one has a solution in the reals using the quadratic formula. There are similar but more complicated

formulas for solving cubic and quartic equations in one variable, which fortunately I never learned.

If you’re given a quintic equation in one variable with integer coefficients, then you may be able

to use the rational-root theorem (which I actually did learn once upon a time!) to get it down to

a quartic, which you can then solve with the quartic formulas. (For anything more complicated,

maybe you can try Galois Theory, but I can’t help you there.) Anyway, the Diophantine equations

Hilbert was talking about are sort of like these equations but with a finite number of variables. He

wanted a way to solve any given Diophantine equation that has integer coefficients or, at least, a

way to tell if it even has an integer solution.

So, solving Diophantine equations over the integers was considered an important mathematics

problem by important mathematicians. Unfortunately for them, in 1970 Yuri Matiyasevich showed

in his doctoral thesis that we can take a certain general model of a computer, called a Turing

Machine, and encode it as a Diophantine equation with integer coefficients, in such a way that the

Turing Machine halts if and only if the Diophantine equation has an integer solution. It was already

known that determining whether a Turing Machine halts is generally incomputable, meaning there

is no algorithm for it (and that had already thrown a spanner in Hilbert’s plan for 20th-century

mathematics), so Matiyasevich’s Theorem (also known as a Matiyasevich-Robinson-Davis-Putnam

4

Figure 1.1: The seven bridges of Ko¨nigsberg in Euler’s time (https://commons.wikimedia.org/

wiki/File:Konigsberg_bridges.png).

Theorem, for some other mathematicians who laid the groundwork) says Hilbert’s Tenth Problem

is impossible (for the integers). So, computer science told mathematics that it couldn’t have

something it really wanted.

I guess 1970 seems like a long time ago and we don’t want to relive old victories so — although

we’ll talk a bit about Turing Machines and the Halting Problem and computability and Hilbert’s

Entscheidungsproblem — in this course mostly we’re not going to focus on the gap between possible

and impossible problems (and I won’t mention Diophantine equations again). Instead, we’re going

to focus on the nature of the gap between easy problems (we have a polynomial-time solution that

runs on a standard computer) and hard problems (we don’t) — which is the deepest open question

in computer science and has been for a long time, since 1971. Echoing Hilbert, in 2000 the Clay

Mathematics Institute proposed a list of seven problems they thought mathematicians should work

on in the 21st century, including this one of easy versus hard, and backed each with a million-

dollar prize. To quickly illustrate the gap between easy and hard, I’d like to talk about two classic

problems, determining whether a graph has an Eulerian cycle and whether it has a Hamiltonian

cycle. At first these problems seem almost the same, but it turns out that one goes “clink” and

the other goes “BOOM” (we think).

In the 1730s, the citizens of Ko¨nigsberg (then in Prussia, now the Russian city of Kaliningrad)

wrote to the famous mathematician Leonhard Euler asking whether it was possible to start some-

where in their city (shown as it existed then in Figure 1.1), walk around it in such a way as to

cross each of their seven bridges once and only once, and arrive back at the starting point. Euler

answered that it was not possible since, if we draw a graph with each shore and island as a vertex

and each bridge as an edge, all of the vertices have odd degree, where a vertex’s degree is the

number of edges incident to it. He observed that in such a walk — now called an Eulerian cycle of

a graph — every time we visit a vertex we reduce the number of uncrossed edges incident to it by

2 (we arrive across one and leave across one); therefore, in order for a graph to have an Eulerian

cycle, it is a necessary condition that it be connected (that is, it must be possible to get from any

vertex to any other vertex) and every vertex must have even degree.

Euler also showed that this is also a sufficient condition, which is less obvious. Suppose we

start at some vertex v in a graph G that satisfies the condition, and walk around until we get stuck

5

(that is, we arrive at a vertex all of whose incident edges we’ve already crossed). Where can we get

stuck? Well, when we’re at a vertex w 6= v then we’ll have “crossed off” an odd number of edges

incident to w (one for each time we’ve arrived there and one for each time we’ve left, and we’ve

arrived once more often than we’ve left); therefore, since an even number minus an odd number

is an odd number and 0 isn’t odd, there’s always at least one edge left for us to leave across. It

follows that we can only get stuck back at v, after tracing out a cycle C in G.

Let G′ be the graph resulting from deleting from G the edges in C. If v has positive degree in

G′ then we can start there again and find and delete another cycle; otherwise, if some other vertex

has positive degree in G′, we can start there and find and delete another cycle. If we keep doing

this until there are no edges left, we get a decomposition of G into edge-disjoint cycles (meaning

the cycles can share vertices but not edges). Take any of the cycles C1, choose a vertex v it shares

with any of the other cycles, and choose another cycle C2 containing v. (If C1 doesn’t share any

vertices with any other cycles, then its vertices aren’t connected to the rest of the graph, contrary

to our assumption.) Replace C1 and C2 by a single (non-simple) cycle C3 that makes a figure 8 in

some sense: if we start at v then we first cross all the edges in C1 and return to v, then cross all

the edges in C2 and return to v. Repeating this cycle-merging, we eventually obtain an Eulerian

cycle of the graph.

It’s not totally trivial to implement this algorithm cleanly but it’s not all that hard either (so

I gave it as a homework exercise last year) and it obviously takes only polynomial time in the size

of the graph. In other words, the problem of determining whether a graph is Eulerian (contains an

Eulerian cycle) goes “clink”. It’s actually an important problem that comes up in de novo genome

assembly: one of the most common ways to assemble DNA reads is by building what’s called a

de Bruijn graph of the k-mers they contain, for some appropriate value of k, and then trying to

find an Eulerian tour of that graph (“tour” instead of “cycle” because it need not end where it

started). It’s likely, for example, that this played a role in sequencing the Covid genome, which

was important for developing a test for whether people were infected. Now let’s consider a problem

which seems deceptively similar at first.

In 1857 William Rowan Hamilton showed how to walk along the edges of a dodecahedron,

visiting each vertex once and only once, until arriving back at the starting point. Such a walk in

a graph is now called a Hamiltonian cycle, and it’s an obvious counterpart to an Eulerian cycle

but focusing on visiting vertices instead crossing edges. Hamilton didn’t give a general result like

Euler’s, however, and in fact no one has ever found an efficient algorithm for determining whether a

given graph is Hamiltonian (that is, it has a Hamiltonian cycle). By “efficient”, in computer science

we mean that the algorithm takes time polynomial in the size of the graph (in this case, the sum of

the number of vertices and the number of edges). Obviously there are necessary conditions we can

check quickly (for example, the graph has to be connected), and there are sufficient conditions we

can check quickly (for example, if a graph is a clique then it’s Hamiltonian, meaning it contains a

Hamiltonian cycle), and there are some kinds of graphs that are easy (for example, cycles or trees),

and some graphs are small enough that we can deal with them by brute force. . . but nobody’s ever

found an algorithm that can handle all graphs in polynomial time. In other words, it seems this

problem goes “BOOM”.

To see why people would really like to be able to determine whether a graph is Hamiltonian,

suppose you’re in charge of logistics for a delivery company and you’re looking at a map trying to

6

plan the route of a delivery truck that has to visit n cities, in any order and then return home. You

can tell the distance between each pair of cities and you want to know if it’s possible for the truck

to make its deliveries while travelling at most a certain total distance. (This problem is called the

Travelling Salesperson Problem or TSP or, assuming the truck travels only in the plane,

Euclidian TSP; computer scientists often use Small Caps for tricky problems.) If every road

between two cities has distance 1, then the truck can travel distance n if and only if the graph with

cities as vertices and roads as edges, is Hamiltonian. It follows that if HamCycle (the problem of

determining whether a graph is Hamiltonian) goes “BOOM”, then TSP goes “BOOM” too. That

is, if HamCycle is hard, then so is TSP, because in some sense HamCycle can be turned into

TSP. Don’t worry, we’ll see a lot more about this later in the course.

How can two problems — determining if a graph is Eulerian and if it’s Hamiltonian — that

seem so similar be so different? Well, we hope investigating that will lead us to a deep truth, like

investigating the reactions of those pieces of metal to water could lead us to studying atoms. We

may not even really care as much about the problems themselves as we do about what studying

them can tell us: my chemistry teacher might forgive me if I forgot that sodium metal explodes in

water, but he’d be really annoyed if I forgot the atomic theory; similarly, you may pass this course

even if you forget Euler’s algorithm, but I really do want you to remember something about the

difference between easy and hard problems.

Another way of saying this is that you shouldn’t miss the forest for the trees. Think of Euler’s

Algorithm as a tree and HamCycle as a tree and yourself as a student of geography: you’re not

really interested in individual trees as much as you’re interested in the general layout of the forest

and, metaphorically, whether there’s a big river splitting it into two pieces. In the next class,

we’ll talk about another classic problem, colouring graphs, some versions of which (1-colourability,

2-colourability, planar k-colourability for k ≥ 4) are easy and some versions of which (planar 3-

colourability, general k-colourability for k ≥ 3) are thought to be hard. Does anyone really care

about 47-colourability in particular, say? Probably not, but by the end of this course you should

understand that it’s on the same side of the river as HamCycle and TSP and planar 3-colourability

and general k-colourability for any other k ≥ 3 (that is, the “BOOM” side, to mix metaphors), and

on the opposite side from finding Eulerian cycles and 1- or 2-colourability (the “clink” side).

Investigating the difference between the complexity of finding Eulerian and Hamiltonian cycles

and between 2-colouring and 3-colouring graphs also leads us to a shallow but important truth

about this course: it’s dangerous to answer questions just by pattern matching! That is, just

because a problem sounds similar to something you’ve seen before, don’t think you can just write

the answer to the old problem and get part marks.

Summing up, here are some things to remember while taking this course:

Computer science is the deepest of the sciences!

Some problems are possible, some aren’t — and that’s deep.

Some problems are easy, some probably aren’t — and that’s deep, too.

Don’t miss the forest for the trees.

It’s dangerous to answer questions just by pattern matching.

7

Part I

Divide and Conquer

8

Chapter 2

Colouring Graphs

When I was in primary school geography was mainly about colouring maps, which I was pretty

good at except that I kept loosing my pencil crayons, so my maps weren’t as colourful as the other

children’s. I was therefore really impressed when I learned that in 1852 Francis Guthrie found a

way to colour a map of the counties of England (there were 39 back then) using only four colours,

such that no two counties with the same colour share a border (although meeting at a point is

ok). This made him wonder if it was possible to 4-colour any (planar) map like this. He wrote to

his brother, who was studying with the mathematician De Morgan (who formulated De Morgan’s

Laws of propositional logic, which you may have heard about last year), and the conjecture was

published in The Athen?um magazine in 1854 and 1860. In 1890 Heawood proved any planar map

can be 5-coloured, and in 1976 Appel and Haken finally proved the original conjecture by checking

1834 cases with a computer.

(An important (and very long) part of the proof was checked manually by Haken’s daughter

Dorothea, who taught me third-year algorithms at Queen’s 22 years later; she’s also a coauthor of

the Master Theorem that we’ll discuss next week.)

Appel and Haken’s Four-Colour Theorem is considered part of graph theory, instead of geogra-

phy, because if someone gives us a planar map and we draw a vertex in each region and put an edge

between two vertices if and only if their regions share a border, then we get a planar graph that

can be coloured with k colours such that no two vertices with the same colour share an edge, if and

only if the original map can be coloured with k colours. For example, Figure 2.1 shows the graph

we get for the map of the counties of Nova Scotia. (Actually, if we put an edge for every shared

border — so one edge between France and Spain for their border west of Andorra and another for

their border east of it, for example — then the graph and the map are duals, meaning that from

the graph we can reconstruct something that’s topologically equivalent to the map, and from that

we can reconstruct the graph, and so on.) It’s easy to find planar graphs that require 4 colours,

such as a 4-clique (a graph on four vertices with edges between every pair), but it’s not so easy

to tell whether a given planar graph can be 3-coloured. In fact, as mentioned in the last lecture,

that’s a “BOOM” problem.

版权所有：留学生编程辅导网 2020,All Rights Reserved 联系方式：QQ:99515681 电子信箱：99515681@qq.com

免责声明：本站部分内容从网络整理而来，只供参考！如有版权问题可联系本站删除。