# Combinatorial optimization

```Optimization
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
A process cannot be understood by stopping it. Understanding must
move with the flow of the process, must join it and flow with it.
— The First Law of Mentat, in Frank Herbert’s Dune (1965)
There’s a difference between knowing the path and walking the path.
— Morpheus [Laurence Fishburne], The Matrix (1999)
23
Maximum Flows and Minimum Cuts
In the mid-1950s, Air Force researcher Theodore E. Harris and retired army general Frank S.
Ross published a classified report studying the rail network that linked the Soviet Union to its
satellite countries in Eastern Europe. The network was modeled as a graph with 44 vertices,
representing geographic regions, and 105 edges, representing links between those regions in the
rail network. Each edge was given a weight, representing the rate at which material could be
shipped from one region to the next. Essentially by trial and error, they determined both the
maximum amount of stuff that could be moved from Russia into Europe, as well as the cheapest
way to disrupt the network by removing links (or in less abstract terms, blowing up train tracks),
which they called ‘the bottleneck’. Their results, including the drawing of the network below,
were only declassified in 1999.¹
Harris and Ross’s map of the Warsaw Pact rail network
Figure
2 maximum flow and minimum cut problems.
This one of the first recorded applications
of the
From Harris and Ross [1955]: Schematic diagram of the railway network of the Western SoFor both problems,
the
a directed
graph
= (V, E),
with special
viet Union
andinput
EasternisEuropean
countries,
with aGmaximum
flow along
of value 163,000
tons from vertices s and t
Russia
to Eastern
Europe,
a cut
of capacitylectures,
163,000 tonsIindicated
as ‘The
called the source
and
target.
As inandthe
previous
will use
u vbottleneck’.
to denote the directed
edge from vertex u to vertex v. Intuitively, the maximum flow problem asks for the largest
max-flow
min-cut theorem
¹Both the map and the story wereThe
taken
from Alexander
Schrijver’s fascinating survey ‘On the history of
combinatorial optimization (till 1960)’.
In the RAND Report of 19 November 1954, Ford and Fulkerson [1954] gave (next to defining
the maximum flow problem and suggesting the simplex method for it) the max-flow mincut theorem for undirected graphs,© saying
the
maximum flow value is equal to the
2014 Jeff
Erickson.
minimum
capacity
of
a
cut
separating
source
and
terminal.
Their proof is not constructive,
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
but for planar graphs,
with source and sink on the outer for
boundary,
they
give a polynomialSee http://www.cs.uiuc.edu/~jeffe/teaching/algorithms
the most recent
revision.
time, constructive method. In a report of 26 May
1955,
Robacker
[1955a]
showed that the
1
max-flow min-cut theorem can be derived also from the vertex-disjoint version of Menger’s
theorem.
As for the directed case, Ford and Fulkerson [1955] observed that the max-flow min-cut
theorem holds also for directed graphs. Dantzig and Fulkerson [1955] showed, by extending
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
amount of material that can be transported from s to t; the minimum cut problem asks for the
minimum damage needed to separate s from t.
23.1
Flows
An (s , t )-flow (or just a flow if the source and target are clear from context) is a function
f : E → R≥0 that satisfies the following conservation constraint at every vertex v except possibly
s and t:
X
X
f (u v) =
f (v w).
u
w
In English, the total flow into v is equal to the total flow out of v. To keep the notation simple,
we define f (u v) = 0 if there is no edge u v in the graph. The value of the flow f , denoted | f |,
is the total net flow out of the source vertex s:
X
X
f (sw) −
f (us).
| f | :=
w
u
It’s not hard to prove that | f | is also equal to the total net flow into the target vertex t, as
follows. To simplify notation, let ∂ f (v) denote the total net flow out of any vertex v:
X
X
∂ f (v) :=
f (u v) −
f (v w).
u
w
The conservation constraint implies that ∂ f (v) = 0 or every vertex v except s and t, so
X
∂ f (v) = ∂ f (s) + ∂ f (t).
v
On
P the other hand, any flow that leaves one vertex must enter another vertex, so we must have
v ∂ f (v) = 0. It follows immediately that | f | = ∂ f (s) = −∂ f (t).
Now suppose we have another function c : E → R≥0 that assigns a non-negative capacity c(e)
to each edge e. We say that a flow f is feasible (with respect to c) if f (e) ≤ c(e) for every edge e.
Most of the time we will consider only flows that are feasible with respect to some fixed capacity
function c. We say that a flow f saturates edge e if f (e) = c(e), and avoids edge e if f (e) = 0.
The maximum flow problem is to compute a feasible (s, t)-flow in a given directed graph, with
a given capacity function, whose value is as large as possible.
0/5
5/15
10/20
0/15
s
5/10
10/10
0/10
t
5/20
10/10
An (s, t)-flow with value 10. Each edge is labeled with its flow/capacity.
23.2
Cuts
An (s , t )-cut (or just cut if the source and target are clear from context) is a partition of the
vertices into disjoint subsets S and T —meaning S ∪ T = V and S ∩ T = ∅—where s ∈ S and
t ∈ T.
2
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
If we have a capacity function c : E → R≥0 , the capacity of a cut is the sum of the capacities
of the edges that start in S and end in T :
XX
kS, T k :=
c(v w).
v∈S w∈T
(Again, if v w is not an edge in the graph, we assume c(v w) = 0.) Notice that the definition is
asymmetric; edges that start in T and end in S are unimportant. The minimum cut problem is
to compute an (s, t)-cut whose capacity is as large as possible.
5
15
20
15
10
s
10
10
t
20
10
An (s, t)-cut with capacity 15. Each edge is labeled with its capacity.
Intuitively, the minimum cut is the cheapest way to disrupt all flow from s to t. Indeed, it
is not hard to show that the value of any feasible (s , t )-flow is at most the capacity of any
(s , t )-cut. Choose your favorite flow f and your favorite cut (S, T ), and then follow the bouncing
inequalities:
|f | =
X
w
=
X X
v∈S
=
w

X X
v∈S
≤
f (sw) −
w∈T
XX
v∈S w∈T
≤
XX
v∈S w∈T
X
u
f (us)
f (v w) −
X
f (v w) −
u
X
u∈T
by definition
f (u v)
by the conservation constraint

f (u v)
removing duplicate edges
f (v w)
since f (u v) ≥ 0
c(v w)
since f (u v) ≤ c(v w)
= kS, T k
by definition
Our derivation actually implies the following stronger observation: | f | = kS, T k if and only if
f saturates every edge from S to T and avoids every edge from T to S. Moreover, if we have
a flow f and a cut (S, T ) that satisfies this equality condition, f must be a maximum flow, and
(S, T ) must be a minimum cut.
23.3
The Maxflow Mincut Theorem
Surprisingly, for any weighted directed graph, there is always a flow f and a cut (S, T ) that
satisfy the equality condition. This is the famous max-flow min-cut theorem, first proved by Lester
Ford (of shortest path fame) and Delbert Ferguson in 1954 and independently by Peter Elias,
Amiel Feinstein, and and Claude Shannon (of information theory fame) in 1956.
3
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
The Maxflow Mincut Theorem. In any flow network with source s and target t, the value of the
maximum (s, t)-flow is equal to the capacity of the minimum (s, t)-cut.
Ford and Fulkerson proved this theorem as follows. Fix a graph G, vertices s and t, and a
capacity function c : E → R≥0 . The proof will be easier if we assume that the capacity function
is reduced: For any vertices u and v, either c(u v) = 0 or c(v u) = 0, or equivalently, if an
edge appears in G, then its reversal does not. This assumption is easy to enforce. Whenever an
edge u v and its reversal v u are both the graph, replace the edge u v with a path u x v of
length two, where x is a new vertex and c(u x) = c(x v) = c(u v). The modified graph has
the same maximum flow value and minimum cut capacity as the original graph.
Enforcing the one-direction assumption.
Let f be a feasible flow. We define a new capacity function c f : V × V → R, called the
residual capacity, as follows:


c(u v) − f (u v) if u v ∈ E
c f (u v) = f (v u)
if v u ∈ E .

0
otherwise
Since f ≥ 0 and f ≤ c, the residual capacities are always non-negative. It is possible to have
c f (u v) > 0 even if u v is not an edge in the original graph G. Thus, we define the residual
graph G f = (V, E f ), where E f is the set of edges whose residual capacity is positive. Notice that
the residual capacities are not necessarily reduced; it is quite possible to have both c f (u v) > 0
and c f (v u) > 0.
0/5
5
5/10
10/10
5
10
0/15
s
10
10
5/15
10/20
t
s
10
15
5
5
t
5
0/10
10
5/20
10/10
15
10
A flow f in a weighted graph G and the corresponding residual graph G f .
Suppose there is no path from the source s to the target t in the residual graph G f . Let S
be the set of vertices that are reachable from s in G f , and let T = V \ S. The partition (S, T ) is
clearly an (s, t)-cut. For every vertex u ∈ S and v ∈ T , we have
c f (u v) = (c(u v) − f (u v)) + f (v u) = 0,
which implies that c(u v) − f (u v) = 0 and f (v u) = 0. In other words, our flow f saturates
every edge from S to T and avoids every edge from T to S. It follows that | f | = kS, T k. Moreover,
f is a maximum flow and (S, T ) is a minimum cut.
On the other hand, suppose there is a path s = v0 v1 · · · vr = t in G f . We refer to
v0 v1 · · · vr as an augmenting path. Let F = mini c f (vi vi+1 ) denote the maximum amount
4
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
5
5/5
10
10
10
s
5/15
10/20
5
10
15
5
0/15
5
t
s
0/10
5/10
t
5
10
5/10
15
10/20
10
10/10
An augmenting path in G f with value F = 5 and the augmented flow f 0 .
of flow that we can push through the augmenting path in G f . We define a new flow function
f 0 : E → R as follows:


 f (u v) + F if u v is in the augmenting path
0
f (u v) = f (u v) − F if v u is in the augmenting path

 f (u v)
otherwise
To prove that the flow f 0 is feasible with respect to the original capacities c, we need to verify
that f 0 ≥ 0 and f 0 ≤ c. Consider an edge u v in G. If u v is in the augmenting path, then
f 0 (u v) > f (u v) ≥ 0 and
f 0 (u v) = f (u v) + F
by definition of f 0
≤ f (u v) + c f (u v)
by definition of F
= f (u v) + c(u v) − f (u v)
by definition of c f
= c(u v)
Duh.
On the other hand, if the reversal v u is in the augmenting path, then f 0 (u v) < f (u v) ≤
c(u v), which implies that
f 0 (u v) = f (u v) − F
by definition of f 0
≥ f (u v) − c f (v u)
by definition of F
= f (u v) − f (u v)
by definition of c f
=0
Duh.
Finally, we observe that (without loss of generality) only the first edge in the augmenting path
leaves s, so | f 0 | = | f | + F > 0. In other words, f is not a maximum flow.
This completes the proof!
23.4
Ford and Fulkerson’s augmenting-path algorithm
Ford and Fulkerson’s proof of the Maxflow-Mincut Theorem translates immediately to an
algorithm to compute maximum flows: Starting with the zero flow, repeatedly augment the flow
along any path from s to t in the residual graph, until there is no such path.
This algorithm has an important but straightforward corollary:
Integrality Theorem. If all capacities in a flow network are integers, then there is a maximum
flow such that the flow through every edge is an integer.
5
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
Proof: We argue by induction that after each iteration of the augmenting path algorithm, all
flow values and residual capacities are integers. Before the first iteration, residual capacities are
the original capacities, which are integral by definition. In each later iteration, the induction
hypothesis implies that the capacity of the augmenting path is an integer, so augmenting changes
the flow on each edge, and therefore the residual capacity of each edge, by an integer.
In particular, the algorithm increases the overall value of the flow by a positive integer, which
implies that the augmenting path algorithm halts and returns a maximum flow.

If every edge capacity is an integer, the algorithm halts after | f ∗ | iterations, where f ∗ is
the actual maximum flow. In each iteration, we can build the residual graph G f and perform a
whatever-first-search to find an augmenting path in O(E) time. Thus, for networks with integer
capacities, the Ford-Fulkerson algorithm runs in O(E| f ∗ |) time in the worst case.
The following example shows that this running time analysis is essentially tight. Consider
the 4-node network illustrated below, where X is some large integer. The maximum flow in this
network is clearly 2X . However, Ford-Fulkerson might alternate between pushing 1 unit of flow
along the augmenting path su v t and then pushing 1 unit of flow along the augmenting path
s v u t, leading to a running time of Θ(X ) = Ω(| f ∗ |).
v
X
X
s
t
1
X
X
u
A bad example for the Ford-Fulkerson algorithm.
Ford and Fulkerson’s algorithm works quite well in many practical situations, or in settings
where the maximum flow value | f ∗ | is small, but without further constraints on the augmenting
paths, this is not an efficient algorithm in general. The example network above can be described
using only O(log X ) bits; thus, the running time of Ford-Fulkerson is actually exponential in the
input size.
23.5
Irrational Capacities
If we multiply all the capacities by the same (positive) constant, the maximum flow increases
everywhere by the same constant factor. It follows that if all the edge capacities are rational,
then the Ford-Fulkerson algorithm eventually halts, although still in exponential time.
However, if we allow irrational capacities, the algorithm can actually loop forever, always
finding smaller and smaller augmenting paths! Worse yet, this infinite sequence of augmentations
may not even converge to the maximum flow, or even to a significant fraction of the maximum
flow! Perhaps the simplest example of this effect was discovered by Uri Zwick.
Consider the six-node network shown on the next page. Six of the p
nine edges have some
large integer capacity X , two have capacity 1, and one has capacity φ = ( 5 − 1)/2 ≈ 0.618034,
chosen so that 1 − φ = φ 2 . To prove that the Ford-Fulkerson algorithm can get stuck, we can
watch the residual capacities of the three horizontal edges as the algorithm progresses. (The
residual capacities of the other six edges will always be at least X − 3.)
Suppose the Ford-Fulkerson algorithm starts by choosing the central augmenting path, shown
in the large figure on the next page. The three horizontal edges, in order from left to right, now
have residual capacities 1, 0, and φ. Suppose inductively that the horizontal residual capacities
are φ k−1 , 0, φ k for some non-negative integer k.
6
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
1. Augment along B, adding φ k to the flow; the residual capacities are now φ k+1 , φ k , 0.
2. Augment along C, adding φ k to the flow; the residual capacities are now φ k+1 , 0, φ k .
3. Augment along B, adding φ k+1 to the flow; the residual capacities are now 0, φ k+1 , φ k+2 .
4. Augment along A, adding φ k+1 to the flow; the residual capacities are now φ k+1 , 0, φ k+2 .
It follows by induction that after 4n + 1 augmentation steps, the horizontal edges have residual
capacities φ 2n−2 , 0, φ 2n−1 . As the number of augmentations grows to infinity, the value of the
flow converges to
∞
X
p
2
1+2
φi = 1 +
= 4 + 5 < 7,
1−φ
i=1
even though the maximum flow value is clearly 2X + 1 7.
s
X
1
X
X
1
ϕ
X
X
X
t
A
B
C
Uri Zwick’s non-terminating flow example, and three augmenting paths.
Picky students might wonder at this point why we care about irrational capacities; after all,
computers can’t represent anything but (small) integers or (dyadic) rationals exactly. Good
question! One reason is that the integer restriction is literally artificial; it’s an artifact of actual
computational hardware², not an inherent feature of the abstract mathematical problem. Another
reason, which is probably more convincing to most practical computer scientists, is that the
behavior of the algorithm with irrational inputs tells us something about its worst-case behavior in
practice given floating-point capacities—terrible! Even with very reasonable capacities, a careless
implementation of Ford-Fulkerson could enter an infinite loop simply because of round-off error.
23.6
Edmonds and Karp’s Algorithms
Ford and Fulkerson’s algorithm does not specify which path in the residual graph to augment,
and the poor behavior of the algorithm can be blamed on poor choices for the augmenting path.
In the early 1970s, Jack Edmonds and Richard Karp analyzed two natural rules for choosing
augmenting paths, both of which led to more efficient algorithms.
²...or perhaps the laws of physics. Yeah, whatever. Like reality actually matters in this class.
7
Algorithms
23.6.1
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
Fat Pipes
Edmonds and Karp’s first rule is essentially a greedy algorithm:
Choose the augmenting path with largest bottleneck value.
It’s a fairly easy to show that the maximum-bottleneck (s, t)-path in a directed graph can be
computed in O(E log V ) time using a variant of Jarník’s minimum-spanning-tree algorithm, or
of Dijkstra’s shortest path algorithm. Simply grow a directed spanning tree T , rooted at s.
Repeatedly find the highest-capacity edge leaving T and add it to T , until T contains a path
from s to t. Alternately, one could emulate Kruskal’s algorithm—insert edges one at a time in
decreasing capacity order until there is a path from s to t—although this is less efficient, at least
when the graph is directed.
We can now analyze the algorithm in terms of the value of the maximum flow f ∗ . Let f
be any flow in G, and let f 0 be the maximum flow in the current residual graph G f . (At the
beginning of the algorithm, G f = G and f 0 = f ∗ .) Let e be the bottleneck edge in the next
augmenting path. Let S be the set of vertices reachable from s through edges in G f with capacity
greater than c f (e) and let T = V \ S. By construction, T is non-empty, and every edge from S to
T has capacity at most c f (e). Thus, the capacity of the cut (S, T ) is at most c f (e) · E. On the other
hand, the maxflow-mincut theorem implies that kS, T k ≥ | f 0 |. We conclude that c(e) ≥ | f 0 |/E.
The preceding argument implies that augmenting f along the maximum-bottleneck path in
G f multiplies the maximum flow value in G f by a factor of at most 1 − 1/E. In other words, the
residual maximum flow value decays exponentially with the number of iterations. After E · ln| f ∗ |
iterations, the maximum flow value in G f is at most
∗
∗
| f ∗ | · (1 − 1/E) E·ln| f | < | f ∗ | e− ln| f | = 1.
(That’s Euler’s constant e, not the edge e. Sorry.) In particular, if all the capacities are integers,
then after E · ln| f ∗ | iterations, the maximum capacity of the residual graph is zero and f is a
maximum flow.
We conclude that for graphs with integer capacities, the Edmonds-Karp ‘fat pipe’ algorithm
runs in O(E 2 log E log| f ∗ |) time, which is actually a polynomial function of the input size.
23.6.2
Short Pipes
The second Edmonds-Karp rule was actually proposed by Ford and Fulkerson in their original
max-flow paper; a variant of this rule was independently considered by the Russian mathematician
Yefim Dinits around the same time as Edmonds and Karp.
Choose the augmenting path with the smallest number of edges.
The shortest augmenting path can be found in O(E) time by running breadth-first search in the
residual graph. Surprisingly, the resulting algorithm halts after a polynomial number of iterations,
independent of the actual edge capacities!
The proof of this polynomial upper bound relies on two observations about the evolution of the
residual graph. Let f i be the current flow after i augmentation steps, let Gi be the corresponding
residual graph. In particular, f0 is zero everywhere and G0 = G. For each vertex v, let leveli (v)
denote the unweighted shortest path distance from s to v in Gi , or equivalently, the level of v in a
breadth-first search tree of Gi rooted at s.
Our first observation is that these levels can only increase over time.
8
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
Lemma 1. leveli+1 (v) ≥ leveli (v) for all vertices v and integers i.
Proof: The claim is trivial for v = s, since leveli (s) = 0 for all i. Choose an arbitrary vertex
v 6= s, and let s · · · u v be a shortest path from s to v in Gi+1 . (If there is no such
path, then leveli+1 (v) = ∞, and we’re done.) Because this is a shortest path, we have
leveli+1 (v) = leveli+1 (u) + 1, and the inductive hypothesis implies that leveli+1 (u) ≥ leveli (u).
We now have two cases to consider. If u v is an edge in Gi , then leveli (v) ≤ leveli (u) + 1,
because the levels are defined by breadth-first traversal.
On the other hand, if u v is not an edge in Gi , then v u must be an edge in the ith
augmenting path. Thus, v u must lie on the shortest path from s to t in Gi , which implies that
leveli (v) = leveli (u) − 1 ≤ leveli (u) + 1.
In both cases, we have leveli+1 (v) = leveli+1 (u) + 1 ≥ leveli (u) + 1 ≥ leveli (v).

Whenever we augment the flow, the bottleneck edge in the augmenting path disappears from
the residual graph, and some other edge in the reversal of the augmenting path may (re-)appear.
Our second observation is that an edge cannot appear or disappear too many times.
Lemma 2. During the execution of the Edmonds-Karp short-pipe algorithm, any edge u v disappears from the residual graph G f at most V /2 times.
Proof: Suppose u v is in two residual graphs Gi and G j+1 , but not in any of the intermediate
residual graphs Gi+1 , . . . , G j , for some i < j. Then u v must be in the ith augmenting path, so
leveli (v) = leveli (u) + 1, and v u must be on the jth augmenting path, so level j (v) = level j (u) − 1.
By the previous lemma, we have
level j (u) = level j (v) + 1 ≥ leveli (v) + 1 = leveli (u) + 2.
In other words, the distance from s to u increased by at least 2 between the disappearance
and reappearance of u v. Since every level is either less than V or infinite, the number of
disappearances is at most V /2.

Now we can derive an upper bound on the number of iterations. Since each edge can
disappear at most V /2 times, there are at most EV /2 edge disappearances overall. But at least
one edge disappears on each iteration, so the algorithm must halt after at most EV /2 iterations.
Finally, since each iteration requires O(E) time, this algorithm runs in O(V E 2 ) time overall.
23.7
Further Progress
This is nowhere near the end of the story for maximum-flow algorithms. Decades of further
research have led to a number of even faster algorithms, some of which are summarized in the
table below.³ All of the algorithms listed below compute a maximum flow in several iterations.
Each algorithm has two variants: a simpler version that performs each iteration by brute force,
and a faster variant that uses sophisticated data structures to maintain a spanning tree of the flow
network, so that each iteration can be performed (and the spanning tree updated) in logarithmic
time. There is no reason to believe that the best algorithms known so far are optimal; indeed,
maximum flows are still a very active area of research.
³To keep the table short, I have deliberately omitted algorithms whose running time depends on the maximum
capacity, the sum of the capacities, or the maximum flow value. Even with this restriction, the table is incomplete!
9
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
Technique
Direct
With dynamic trees
Sources
Blocking flow
O(V E)
O(V E log V )
[Dinits; Sleator and Tarjan]
Network simplex
O(V 2 E)
O(V E log V )
[Dantzig; Goldfarb and Hao;
2
2
Push-relabel (generic)
Push-relabel (FIFO)
Push-relabel (highest label)
Pseudoflow
Compact abundance graphs
O(V E)
O(V 3 )
p
O(V 2 E)
O(V 2 E)
—
[Goldberg and Tarjan]
O(V 2 log(V 2 /E))
[Goldberg and Tarjan]
—
[Cheriyan and Maheshwari; Tunçel]
O(V E log V )
O(V E)
[Hochbaum]
[Orlin 2012]
Several purely combinatorial maximum-flow algorithms and their running times.
The fastest known maximum flow algorithm, announced by James Orlin in 2012, runs in
O(V E) time. The details of Orlin’s algorithm are far beyond the scope of this course; in addition
to his own new techniques, Orlin uses several existing algorithms and data structures as black
boxes, most of which are themselves quite complicated. Nevertheless, for purposes of analyzing
algorithms that use maximum flows, this is the time bound you should cite. So write the following
Maximum flows can be computed in O(V E) time.
Exercises
1. Suppose you are given a directed graph G = (V, E), two vertices s and t, a capacity function
c : E → R+ , and a second function f : E → R. Describe an algorithm to determine whether
f is a maximum (s, t)-flow in G.
2. Let (S, T ) and (S 0 , T 0 ) be minimum (s, t)-cuts in some flow network G. Prove that (S ∩ S 0 ,
T ∪ T 0 ) and (S ∪ S 0 , T ∩ T 0 ) are also minimum (s, t)-cuts in G.
3. Suppose (S, T ) is the unique minimum (s, t)-cut in some flow network. Prove that (S, T ) is
also a minimum (x, y)-cut for all vertices x ∈ S and y ∈ T .
4. Cuts are sometimes defined as subsets of the edges of the graph, instead of as partitions of
its vertices. In this problem, you will prove that these two definitions are almost equivalent.
We say that a subset X of (directed) edges separates s and t if every directed path from
s to t contains at least one (directed) edge in X . For any subset S of vertices, let δS denote
the set of directed edges leaving S; that is, δS := {u v | u ∈ S, v 6∈ S}.
(a) Prove that if (S, T ) is an (s, t)-cut, then δS separates s and t.
(b) Let X be an arbitrary subset of edges that separates s and t. Prove that there is an
(s, t)-cut (S, T ) such that δS ⊆ X .
(c) Let X be a minimal subset of edges that separates s and t. (Such a set of edges is
sometimes called a bond.) Prove that there is an (s, t)-cut (S, T ) such that δS = X .
10
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
5. A flow f is acyclic if the subgraph of directed edges with positive flow contains no directed
cycles.
(a) Prove that for any flow f , there is an acyclic flow with the same value as f . (In
particular, this implies that some maximum flow is acyclic.)
(b) A path flow assigns positive values only to the edges of one simple directed path from
s to t. Prove that every acyclic flow can be written as the sum of O(E) path flows.
(c) Describe a flow in a directed graph that cannot be written as the sum of path flows.
(d) A cycle flow assigns positive values only to the edges of one simple directed cycle.
Prove that every flow can be written as the sum of O(E) path flows and cycle flows.
(e) Prove that every flow with value 0 can be written as the sum of O(E) cycle flows.
(Zero-value flows are also called circulations.)
6. Suppose instead of capacities, we consider networks where each edge u v has a nonnegative demand d(u v). Now an (s, t)-flow f is feasible if and only if f (u v) ≥ d(u v)
for every edge u v. (Feasible flow values can now be arbitrarily large.) A natural problem
in this setting is to find a feasible (s, t)-flow of minimum value.
(a) Describe an efficient algorithm to compute a feasible (s, t)-flow, given the graph, the
demand function, and the vertices s and t as input. [Hint: Find a flow that is non-zero
everywhere, and then scale it up to make it feasible.]
(b) Suppose you have access to a subroutine MaxFlow that computes maximum flows in
networks with edge capacities. Describe an efficient algorithm to compute a minimum
flow in a given network with edge demands; your algorithm should call MaxFlow
exactly once.
(c) State and prove an analogue of the max-flow min-cut theorem for this setting. (Do
minimum flows correspond to maximum cuts?)
7. For any flow network G and any vertices u and v, let bottleneckG (u, v) denote the maximum,
over all paths π in G from u to v, of the minimum-capacity edge along π.
(a) Describe and analyze an algorithm to compute bottleneckG (s, t) in O(E log V ) time.
(b) Describe an algorithm to construct a spanning tree T of G such that bottleneck T (u, v) =
bottleneckG (u, v) for all vertices u and v. (Edges in T inherit their capacities from G.)
8. Describe an efficient algorithm to determine whether a given flow network contains a
unique maximum flow.
9. Suppose you have already computed a maximum flow f ∗ in a flow network G with integer
edge capacities.
(a) Describe and analyze an algorithm to update the maximum flow after the capacity of
a single edge is increased by 1.
(b) Describe and analyze an algorithm to update the maximum flow after the capacity of
a single edge is decreased by 1.
11
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
Both algorithms should be significantly faster than recomputing the maximum flow from
scratch.
10. Let G be a network with integer edge capacities. An edge in G is upper-binding if increasing
its capacity by 1 also increases the value of the maximum flow in G. Similarly, an edge is
lower-binding if decreasing its capacity by 1 also decreases the value of the maximum flow
in G.
(a) Does every network G have at least one upper-binding edge? Prove your answer is
correct.
(b) Does every network G have at least one lower-binding edge? Prove your answer is
correct.
(c) Describe an algorithm to find all upper-binding edges in G, given both G and a
maximum flow in G as input, in O(E) time.
(d) Describe an algorithm to find all lower-binding edges in G, given both G and a
maximum flow in G as input, in O(EV ) time.
11. A given flow network G may have more than one minimum (s, t)-cut. Let’s define the best
minimum (s, t)-cut to be any minimum cut with the smallest number of edges.
(a) Describe an efficient algorithm to determine whether a given flow network contains
a unique minimum (s, t)-cut.
(b) Describe an efficient algorithm to find the best minimum (s, t)-cut when the capacities
are integers.
(c) Describe an efficient algorithm to find the best minimum (s, t)-cut for arbitrary edge
capacities.
(d) Describe an efficient algorithm to determine whether a given flow network contains
a unique best minimum (s, t)-cut.
12. A new assistant professor, teaching maximum flows for the first time, suggests the following
greedy modification to the generic Ford-Fulkerson augmenting path algorithm. Instead
of maintaining a residual graph, just reduce the capacity of edges along the augmenting
path! In particular, whenever we saturate an edge, just remove it from the graph.
GreedyFlow(G, c, s, t):
for every edge e in G
f (e) ← 0
while there is a path from s to t
π ← an arbitrary path from s to t
F ← minimum capacity of any edge in π
for every edge e in π
f (e) ← f (e) + F
if c(e) = F
remove e from G
else
c(e) ← c(e) − F
return f
12
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
(a) Show that this algorithm does not always compute a maximum flow.
(b) Prove that for any flow network, if the Greedy Path Fairy tells you precisely which
path π to use at each iteration, then GreedyFlow does compute a maximum flow.
(Sadly, the Greedy Path Fairy does not actually exist.)
13. We can speed up the Edmonds-Karp ‘fat pipe’ heuristic, at least for integer capacities, by
relaxing our requirements for the next augmenting path. Instead of finding the augmenting
path with maximum bottleneck capacity, we find a path whose bottleneck capacity is at
least half of maximum, using the following capacity scaling algorithm.
The algorithm maintains a bottleneck threshold ∆; initially, ∆ is the maximum capacity
among all edges in the graph. In each phase, the algorithm augments along paths from s to
t in which every edge has residual capacity at least ∆. When there is no such path, the
phase ends, we set ∆ ← b∆/2c, and the next phase begins.
(a) How many phases will the algorithm execute in the worst case, if the edge capacities
are integers?
(b) Let f be the flow at the end of a phase for a particular value of ∆. Let S be the nodes
that are reachable from s in the residual graph G f using only edges with residual
capacity at least ∆, and let T = V \ S. Prove that the capacity (with respect to G’s
original edge capacities) of the cut (S, T ) is at most | f | + E · ∆.
(c) Prove that in each phase of the scaling algorithm, there are at most 2E augmentations.
(d) What is the overall running time of the scaling algorithm, assuming all the edge
capacities are integers?
14. In 1980 Maurice Queyranne published the following example of a flow network where
Edmonds and Karp’s “fat pipe” heuristic does not halt. Here, as in Zwick’s bad
pexample
for the original Ford-Fulkerson algorithm, φ denotes the inverse golden ratio ( 5 − 1)/2.
The three vertical edges play essentially the same role as the horizontal edges in Zwick’s
example.
s
(ϕ+1)/2
a
1/2
b
ϕ/2
c
(ϕ+1)/2
d
ϕ/2
1/2
ϕ
ϕ/2
ϕ
(ϕ+1)/2
1
1/2
(ϕ+1)/2
e
ϕ/2
f
(ϕ+1)/2
g
1/2
h
ϕ/2
t
s
s
s
t
t
t
Queyranne’s network, and a sequence of “fat-pipe” augmentations.
(a) Show that the following infinite sequence of path augmentations is a valid execution
of the Edmonds-Karp algorithm. (See the figure above.)
13
Algorithms
Lecture 23: Maximum Flows and Minimum Cuts [Fa’13]
QueyranneFatPipes:
for i ← 1 to ∞
push φ 3i−2 units of flow along sa f g bhc d t
push φ 3i−1 units of flow along s f a b g hc t
push φ 3i units of flow along se f a g bc h t
forever
(b) Describe a sequence of O(1) path augmentations that yields a maximum flow in
Queyranne’s network.
15. An (s , t )-series-parallel graph is an directed acyclic graph with two designated vertices s
(the source) and t (the target or sink) and with one of the following structures:
• Base case: A single directed edge from s to t.
• Series: The union of an (s, u)-series-parallel graph and a (u, t)-series-parallel graph
that share a common vertex u but no other vertices or edges.
• Parallel: The union of two smaller (s, t)-series-parallel graphs with the same source
s and target t, but with no other vertices or edges in common.
Describe an efficient algorithm to compute a maximum flow from s to t in an (s, t)-seriesparallel graph with arbitrary edge capacities.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
14
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
For a long time it puzzled me how something so expensive, so leading edge,
could be so useless, and then it occurred to me that a computer is a stupid
machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things.
They are, in short, a perfect match.
— Bill Bryson, Notes from a Big Country (1999)
24
24.1
Applications of Maximum Flow
Edge-Disjoint Paths
One of the easiest applications of maximum flows is computing the maximum number of edgedisjoint paths between two specified vertices s and t in a directed graph G using maximum flows.
A set of paths in G is edge-disjoint if each edge in G appears in at most one of the paths; several
edge-disjoint paths may pass through the same vertex, however.
If we give each edge capacity 1, then the maxflow from s to t assigns a flow of either 0 or 1
to every edge. Since any vertex of G lies on at most two saturated edges (one in and one out, or
none at all), the subgraph S of saturated edges is the union of several edge-disjoint paths and
cycles. Moreover, the number of paths is exactly equal to the value of the flow. Extracting the
actual paths from S is easy—just follow any directed path in S from s to t, remove that path from
S, and recurse.
Conversely, we can transform any collection of k edge-disjoint paths into a flow by pushing
one unit of flow along each path from s to t; the value of the resulting flow is exactly k. It follows
that any maxflow algorithm actually computes the largest possible set of edge-disjoint paths.
If we use Orlin’s algorithm to compute the maximum (s, t)-flow, we can compute edge-disjoint
paths in O(V E) time, but Orlin’s algorithm is overkill for this simple application. The cut
({s}, V \ {s}) has capacity at most V − 1, so the maximum flow has value at most V − 1. Thus,
Ford and Fulkerson’s original augmenting path algorithm also runs in O(| f ∗ |E) = O(V E) time.
The same algorithm can also be used to find edge-disjoint paths in undirected graphs. We
simply replace every undirected edge in G with a pair of directed edges, each with unit capacity,
and compute a maximum flow from s to t in the resulting directed graph G 0 using the FordFulkerson algorithm. For any edge uv in G, if our max flow saturates both directed edges u v
and v u in G 0 , we can remove both edges from the flow without changing its value. Thus,
without loss of generality, the maximum flow assigns a direction to every saturated edge, and we
can extract the edge-disjoint paths by searching the graph of directed saturated edges.
24.2
Vertex Capacities and Vertex-Disjoint Paths
Suppose we have capacities on the vertices as well as the edges. Here, in addition to our other
constraints, we require that for any vertex v other than s and t, the total flow into v (and
therefore the total flow out of v) is at most some non-negative value c(v). How can we compute
a maximum flow with these new constraints?
The simplest method is to transform the input into a traditional flow network, with only edge
capacities. Specifically, we replace every vertex v with two vertices vin and vout , connected by
an edge vin vout with capacity c(v), and then replace every directed edge u v with the edge
uout vin (keeping the same capacity). Finally, we compute the maximum flow from sout to t in in
this modified flow network.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
1
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
It is now easy to compute the maximum number of vertex-disjoint paths from s to t in any
directed graph. Simply give every vertex capacity 1, and compute a maximum flow!
ÆÆÆ
Figure!
24.3
Maximum Matchings in Bipartite Graphs
Another natural application of maximum flows is finding large matchings in bipartite graphs. A
matching is a subgraph in which every vertex has degree at most one, or equivalently, a collection
of edges such that no two share a vertex. The problem is to find the matching with the maximum
number of edges in a given bipartite graph.
We can solve this problem by reducing it to a maximum flow problem as follows. Let G be the
given bipartite graph with vertex set U ∪ W , such that every edge joins a vertex in U to a vertex
in W . We create a new directed graph G 0 by (1) orienting each edge from U to W , (2) adding
two new vertices s and t, (3) adding edges from s to every vertex in U, and (4) adding edges
from each vertex in W to t. Finally, we assign every edge in G 0 a capacity of 1.
Any matching M in G can be transformed into a flow f M in G 0 as follows: For each edge uw
in M , push one unit of flow along the path suw t. These paths are disjoint except at s and t,
so the resulting flow satisfies the capacity constraints. Moreover, the value of the resulting flow is
equal to the number of edges in M .
Conversely, consider any (s, t)-flow f in G 0 computed using the Ford-Fulkerson augmenting
path algorithm. Because the edge capacities are integers, the Ford-Fulkerson algorithm assigns
an integer flow to every edge. (This is easy to verify by induction, hint, hint.) Moreover, since
each edge has unit capacity, the computed flow either saturates ( f (e) = 1) or avoids ( f (e) = 0)
every edge in G 0 . Finally, since at most one unit of flow can enter any vertex in U or leave any
vertex in W , the saturated edges from U to W form a matching in G. The size of this matching is
exactly | f |.
Thus, the size of the maximum matching in G is equal to the value of the maximum flow in
0
G , and provided we compute the maxflow using augmenting paths, we can convert the actual
maxflow into a maximum matching in O(E) time. Again, we can compute the maximum flow in
O(V E) time using either Orlin’s algorithm or off-the-shelf Ford-Fulkerson.
s
t
A maximum matching in a bipartite graph G , and the corresponding maximum flow in G 0 .
24.4
Assignment Problems
Maximum-cardinality matchings are a special case of a general family of so-called assignment
problems.¹ An unweighted binary assignment problem involves two disjoint finite sets X and Y ,
¹Most authors refer to finding a maximum-weight matching in a bipartite graph as the assignment problem.
2
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
which typically represent two different kinds of resources, such as web pages and servers, jobs
and machines, rows and columns of a matrix, hospitals and interns, or customers and ice cream
flavors. Our task is to choose the largest possible collection of pairs (x, y) as possible, where
x ∈ X and y ∈ Y , subject to several constraints of the following form:
• Each element x ∈ X can appear in at most c(x) pairs.
• Each element y ∈ Y can appear in at most c( y) pairs.
• Each pair (x, y) ∈ X × Y can appear in the output at most c(x, y) times.
Each upper bound c(x), c( y), and c(x, y) is either a (typically small) non-negative integer or ∞.
Intuitively, we create each pair in our output by assigning an element of X to an element of Y .
The maximum-matching problem is a special case, where c(z) = 1 for all z ∈ X ∪ Y , and each
c(x, y) is either 0 or 1, depending on whether the pair x y defines an edge in the underlying
bipartite graph.
Here is a slightly more interesting example. A nearby school, famous for its onerous
administrative hurdles, decides to organize a dance. Every pair of students (one boy, one girl)
who wants to dance must register in advance. School regulations limit each boy-girl pair to at
most three dances together, and limits each student to at most ten dances overall. How can
we maximize the number of dances? This is a binary assignment problem for the set X of girls
and the set Y of boys, where for each girl x and boy y, we have c(x) = c( y) = 10 and either
c(x, y) = 3 (if x and y registered to dance) or c(x, y) = 0 (if they didn’t register).
Every binary assignment problem can be reduced to a standard maximum flow problem as
follows. We construct a flow network G = (V, E) with vertices X ∪ Y ∪ {s, t} and the following
edges:
• an edge s x with capacity c(x) for each x ∈ X ,
• an edge y t with capacity c( y) for each y ∈ Y .
• an edge x y with capacity c(x, y) for each x ∈ X and y ∈ Y , and
Because all the edges have integer capacities, the any augmenting-path algorithm constructs
an integer maximum flow f ∗ , which can be decomposed into the sum of | f ∗ | paths of the form
s x y t for some x ∈ X and y ∈ Y . For each such path, we report the pair (x, y). (Thus, the
pair (x, y) appears in our output collection exactly f (x y) times.
It is easy to verify (hint, hint) that this collection of pairs satisfies all the necessary constraints.
Conversely, any legal collection of r pairs can be transformed into a feasible integer flow in G
with value r. Thus, the largest legal collection of pairs corresponds to a maximum flow in G.
So our algorithm is correct. If we use Orlin’s algorithm to compute the maximum flow, this
assignment algorithm runs in O(V E) = O(n 3 ) time, where n = |X | + |Y |.
24.5
Baseball Elimination
Every year millions of baseball fans eagerly watch their favorite team, hoping they will win a
spot in the playoffs, and ultimately the World Series. Sadly, most teams are “mathematically
eliminated" days or even weeks before the regular season ends. Often, it is easy to spot when
a team is eliminated—they can’t win enough games to catch up to the current leader in their
division. But sometimes the situation is more subtle.
For example, here are the actual standings from the American League East on August 30,
1996.
3
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
Team
New York Yankees
Baltimore Orioles
Boston Red Sox
Toronto Blue Jays
Detroit Tigers
Won–Lost
Left
75–59
71–63
69–66
63–72
49–86
28
28
27
27
27
NYY
3
8
7
3
BAL
BOS
TOR
DET
3
8
2
7
7
0
3
4
0
0
2
7
4
0
0
0
Detroit is clearly behind, but some die-hard Tigers fans may hold out hope that their team can
still win. After all, if Detroit wins all 27 of their remaining games, they will end the season
with 76 wins, more than any other team has now. So as long as every other team loses every
game. . . but that’s not possible, because some of those other teams still have to play each other.
Here is one complete argument:²
By winning all of their remaining games, Detroit can finish the season with a record of 76 and 86. If the
Yankees win just 2 more games, then they will finish the season with a 77 and 85 record which would put
them ahead of Detroit. So, let’s suppose the Tigers go undefeated for the rest of the season and the Yankees
fail to win another game.
The problem with this scenario is that New York still has 8 games left with Boston. If the Red Sox win
all of these games, they will end the season with at least 77 wins putting them ahead of the Tigers. Thus,
the only way for Detroit to even have a chance of finishing in first place, is for New York to win exactly one
of the 8 games with Boston and lose all their other games. Meanwhile, the Sox must loss all the games
they play agains teams other than New York. This puts them in a 3-way tie for first place. . . .
Now let’s look at what happens to the Orioles and Blue Jays in our scenario. Baltimore has 2 games
left with with Boston and 3 with New York. So, if everything happens as described above, the Orioles will
finish with at least 76 wins. So, Detroit can catch Baltimore only if the Orioles lose all their games to teams
other than New York and Boston. In particular, this means that Baltimore must lose all 7 of its remaining
games with Toronto. The Blue Jays also have 7 games left with the Yankees and we have already seen that
for Detroit to finish in first place, Toronto must will all of these games. But if that happens, the Blue Jays
will win at least 14 more games giving them at final record of 77 and 85 or better which means they will
finish ahead of the Tigers. So, no matter what happens from this point in the season on, Detroit can not
finish in first place in the American League East.
There has to be a better way to figure this out!
Here is a more abstract formulation of the problem. Our input consists of two arrays W [1 .. n]
and G[1 .. n, 1 .. n], where W [i] is the number of games team i has already won, and G[i, j] is
the number of upcoming games between teams i and j. We want to determine whether team n
can end the season with the most wins (possibly tied with other teams).³
In the mid-1960s, Benjamin Schwartz showed that this question can be modeled as an
assignment problem: We want to assign a winner to
Peach game, so that team n comes in first
place. We have an assignment problem! Let R[i] = j G[i, j] denote the number of remaining
games for team i. We will assume that team n wins all R[n] of its remaining games. Then team
n can come in first place if and only if every other team i wins at most W [n] + R[n] − W [i] of its
R[i] remaining games.
Since we want to assign winning teams to games, we start
by building a bipartite graph,
n
whose nodes represent the games and the teams. We have 2 game nodes g i, j , one for each
pair 1 ≤ i < j < n, and n − 1 team nodes t i , one for each 1 ≤ i < n. For each pair i, j, we add
edges g i, j t i and g i, j t j with infinite capacity. We add a source vertex s and edges s g i, j with
capacity G[i, j] for each pair i, j. Finally, we add a target node t and edges t i t with capacity
W [n] − W [i] + R[n] for each team i.
²Both the example and this argument are taken from http://riot.ieor.berkeley.edu/~baseball/detroit.html.
³We assume here that no games end in a tie (always true for Major League Baseball), and that every game is
actually played (not always true).
4
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
Theorem. Team n can end the season in first place if and only if there is a feasible flow in this
graph that saturates every edge leaving s.
Proof: Suppose it is possible for team n to end the season in first place. Then every team i < n
wins at most W [n] + R[n] − W [i] of the remaining games. For each game between team i and
team j that team i wins, add one unit of flow along the path s g i, j t i t. Because there are
exactly G[i, j] games between teams i and j, every edge leaving s is saturated. Because each
team i wins at most W [n] + R[n] − W [i] games, the resulting flow is feasible.
Conversely, Let f be a feasible flow that saturates every edge out of s. Suppose team
i wins exactly f (g i, j t i ) games against team j, for all i and j. Then teams i and j play
f (g i, j t i )+ f (g i, j t j ) = f (sP
g i, j ) = G[i, j] games, so every upcoming game is played. Moreover,
each team i wins a total of j f (g i, j t i ) = f (t i t) ≤ W [n] + R[n] − W [i] upcoming games,
and therefore at most W [n] + R[n] games overall. Thus, if team n win all their upcoming games,
they end the season in first place.

So, to decide whether our favorite team can win, we construct the flow network, compute a
maximum flow, and report whether than maximum flow saturates the edges leaving s. The flow
network has O(n2 ) vertices and O(n2 ) edges, and it can be constructed in O(n2 ) time. Using
Orlin’s algorithm, we can compute the maximum flow in O(V E) = O(n 4 ) time.
The graph derived from the 1996 American League East standings is shown below. The total
capacity of the edges leaving s is 27 (there are 27 remaining games), but the total capacity of
the edges entering t is only 26. So the maximum flow has value at most 26, which means that
Detroit is mathematically eliminated.
NYY
BAL
3
NYY
NYY
BOS
BAL
8
s
7
5
NYY
TOR
t
7
2
7
1
BOS
BAL
BOS
BAL
TOR
13
TOR
The flow graph for the 1996 American League East standings. Unlabeled edges have infinite capacity.
More recently, Kevin Wayne⁴ proved that one can determine all the teams that are mathematically eliminated in only O(n 3 ) time, essentially using a single maximum-flow computation.
24.6
Project Selection
In our final example, suppose we are given a set of n projects that we could possibly perform; for
simplicity, we identify each project by an integer between 1 and n. Some projects cannot be
started until certain other projects are completed. This set of dependencies is described by a
directed acyclic graph, where an edge i j indicates that project i depends on project j. Finally,
each project i has an associated profit pi which is given to us if the project is completed; however,
some projects have negative profits, which we interpret as positive costs. We can choose to finish
⁴Kevin D. Wayne. A new property and a faster algorithm for baseball elimination. SIAM J. Discrete Math
14(2):223–229, 2001.
5
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
any subset X of the projects that includes all its dependents; that is, for every project x ∈ X ,
every project that x depends on is also in X . Our goal is to find a valid subset of the projects
whose total profit is as large as possible. In particular, if all of the jobs have negative profit, the
correct answer is to do nothing.
-2
-3
-5
-8
4
6
2
3
A dependency graph for a set of projects. Circles represent profitable projects; squares represent costly projects.
At a high level, our task to partition the projects into two subsets S and T , the jobs we Select
and the jobs we Turn down. So intuitively, we’d like to model our problem as a minimum cut
problem in a certain graph. But in which graph? How do we enforce prerequisites? We want to
maximize profit, but we only know how to find minimum cuts. And how do we convert negative
profits into positive capacities?
We define a new graph G by adding a source vertex s and a target vertex t to the dependency
graph, with an edge s j for every profitable job (with p j > 0), and an edge i t for every costly
job (with pi < 0). Intuitively, we can think of s as a new job (“To the bank!”) with profit/cost 0
that we must perform last. We assign edge capacities as follows:
• c(s j) = p j for every profitable job j;
• c(i t) = −pi for every costly job i;
• c(i j) = ∞ for every dependency edge i j.
All edge-capacities are positive, so this is a legal input to the maximum cut problem.
Now consider an (s, t)-cut (S, T ) in G. If the capacity kS, T k is finite, then for every
dependency edge i j, projects i and j are on the same side of the cut, which implies that S
is a valid solution. Moreover, we claim that selecting the jobs in S earns us a total profit of
C − kS, T k, where C is the sum of all the positive profits. This claim immediately implies that we
can maximize our total profit by computing a minimum cut in G.
t
3
2
∞
-2
∞
5
-3
-5
∞
∞
4
-8
∞
∞
6
4
8
2
6
∞
2
∞
3
3
s
The flow network for the example dependency graph, along with its minimum cut.
The cut has capacity 13 and C = 15, so the total profit for the selected jobs is 2.
6
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
We prove our key claim as follows. For any subset A of projects, we define three functions:
X
X
cost(A) :=
−pi =
c(i t)
i∈A: pi <0
X
benefit(A) :=
pj =
j∈A: pi >0
profit(A) :=
X
i∈A
X
j∈A
c(s j)
pi = benefit(A) − cost(A).
i∈A
By definition, C = benefit(S) + benefit(T ). Because the cut (S, T ) has finite capacity, only edges of
the form s j and i t can cross the cut. By construction, every edge s j points to a profitable job
and each edge i t points from a costly job. Thus, kS, T k = cost(S) + benefit(T ). We immediately
conclude that C − kS, T k = benefit(S) − cost(S) = profit(S), as claimed.
Exercises
1. Given an undirected graph G = (V, E), with three vertices u, v, and w, describe and analyze
an algorithm to determine whether there is a path from u to w that passes through v.
2. Let G = (V, E) be a directed graph where for each vertex v, the in-degree and out-degree
of v are equal. Let u and v be two vertices G, and suppose G contains k edge-disjoint paths
from u to v. Under these conditions, must G also contain k edge-disjoint paths from v to
u? Give a proof or a counterexample with explanation.
3. Consider a directed graph G = (V, E) with multiple source vertices s1 , s2 , . . . , sσ and multiple
target vertices t 1 , t 1 , . . . , t τ , where no vertex is both a source and a target. A multiterminal
flow is a function f : E → R≥0 that satisfies the flow conservation constraint at every vertex
that is neither a source nor a target. The value | f | of a multiterminal flow is the total
excess flow out of all the source vertices:
σ X
X
X
| f | :=
i=1
w
f (si w) −
u
f (usi )
As usual, we are interested in finding flows with maximum value, subject to capacity
constraints on the edges. (In particular, we don’t care how much flow moves from any
particular source to any particular target.)
(a) Consider the following algorithm for computing multiterminal flows. The variables f
and f 0 represent flow functions. The subroutine MaxFlow(G, s, t) solves the standard
maximum flow problem with source s and target t.
MaxMultiFlow(G, s[1 .. σ], t[1 .. τ]):
f ←0
〈〈Initialize the flow〉〉
for i ← 1 to σ
for j ← 1 to τ
f 0 ← MaxFlow(G f , s[i], t[ j])
f ← f + f0
〈〈Update the flow〉〉
return f
7
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
Prove that this algorithm correctly computes a maximum multiterminal flow in G.
(b) Describe a more efficient algorithm to compute a maximum multiterminal flow in G.
4. A cycle cover of a given directed graph G = (V, E) is a set of vertex-disjoint cycles that cover
all the vertices. Describe and analyze an efficient algorithm to find a cycle cover for a given
graph, or correctly report that no cycle cover exists. [Hint: Use bipartite matching!]
5. The Island of Sodor is home to a large number of towns and villages, connected by an
extensive rail network. Recently, several cases of a deadly contagious disease (either swine
flu or zombies; reports are unclear) have been reported in the village of Ffarquhar. The
controller of the Sodor railway plans to close down certain railway stations to prevent the
disease from spreading to Tidmouth, his home town. No trains can pass through a closed
station. To minimize expense (and public notice), he wants to close down as few stations
as possible. However, he cannot close the Ffarquhar station, because that would expose
him to the disease, and he cannot close the Tidmouth station, because then he couldn’t
visit his favorite pub.
Describe and analyze an algorithm to find the minimum number of stations that must
be closed to block all rail travel from Ffarquhar to Tidmouth. The Sodor rail network is
represented by an undirected graph, with a vertex for each station and an edge for each
rail connection between two stations. Two special vertices f and t represent the stations
in Ffarquhar and Tidmouth.
For example, given the following input graph, your algorithm should return the
number 2.
f
t
6. The UIUC Computer Science Department is installing a mini-golf course in the basement
of the Siebel Center! The playing field is a closed polygon bounded by m horizontal and
vertical line segments, meeting at right angles. The course has n starting points and n
holes, in one-to-one correspondence. It is always possible hit the ball along a straight line
directly from each starting point to the corresponding hole, without touching the boundary
of the playing field. (Players are not allowed to bounce golf balls off the walls; too much
glass.) The n starting points and n holes are all at distinct locations.
A minigolf course with five starting points (?) and five holes (◦), and a legal correspondence between them.
8
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
Sadly, the architect’s computer crashed just as construction was about to begin. Thanks
to the herculean efforts of their sysadmins, they were able to recover the locations of the
starting points and the holes, but all information about which starting points correspond to
which holes was lost!
Describe and analyze an algorithm to compute a one-to-one correspondence between
the starting points and the holes that meets the straight-line requirement, or to report
that no such correspondence exists. The input consists of the x- and y-coordinates of
the m corners of the playing field, the n starting points, and the n holes. Assume you
can determine in constant time whether two line segments intersect, given the x- and
y-coordinates of their endpoints.
7. Suppose you are given an n × n checkerboard with some of the squares deleted. You have a
large set of dominos, just the right size to cover two squares of the checkerboard. Describe
and analyze an algorithm to determine whether one tile the board with dominos—each
domino must cover exactly two undeleted squares, and each undeleted square must be
covered by exactly one domino.
Your input is a two-dimensional array Deleted[1 .. n, 1 .. n] of bits, where Deleted[i, j] = True
if and only if the square in row i and column j has been deleted. Your output is a single
bit; you do not have to compute the actual placement of dominos. For example, for the
board shown above, your algorithm should return True.
8. Suppose we are given an n × n square grid, some of whose squares are colored black and
the rest white. Describe and analyze an algorithm to determine whether tokens can be
placed on the grid so that
• every token is on a white square;
• every row of the grid contains exactly one token; and
• every column of the grid contains exactly one token.
9
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
Your input is a two dimensional array IsWhite[1 .. n, 1 .. n] of booleans, indicating which
squares are white. Your output is a single boolean. For example, given the grid above as
input, your algorithm should return True.
9. An n × n grid is an undirected graph with n2 vertices organized into n rows and n columns.
We denote the vertex in the ith row and the jth column by (i, j). Every vertex in the grid
have exactly four neighbors, except for the boundary vertices, which are the vertices (i, j)
such that i = 1, i = n, j = 1, or j = n.
Let (x 1 , y1 ), (x 2 , y2 ), . . . , (x m , ym ) be distinct vertices, called terminals, in the n × n grid.
The escape problem is to determine whether there are m vertex-disjoint paths in the grid
that connect the terminals to any m distinct boundary vertices. Describe and analyze an
efficient algorithm to solve the escape problem.
A positive instance of the escape problem, and its solution.
10. The UIUC Faculty Senate has decided to convene a committee to determine whether
Chief Illiniwek should become the official mascot symbol of the University of Illinois
Global Campus.⁵ Exactly one faculty member must be chosen from each academic
department to serve on this committee. Some faculty members have appointments in
multiple departments, but each committee member will represent only one department.
For example, if Prof. Blagojevich is affiliated with both the Department of Corruption and
the Department of Stupidity, and he is chosen as the Stupidity representative, then someone
else must represent Corruption. Finally, University policy requires that any committee on
virtual mascots symbols must contain the same number of assistant professors, associate
professors, and full professors. Fortunately, the number of departments is a multiple of 3.
Describe an efficient algorithm to select the membership of the Global Illiniwek
Committee. Your input is a list of all UIUC faculty members, their ranks (assistant,
associate, or full), and their departmental affiliation(s). There are n faculty members and
3k departments.
11. You’re organizing the First Annual UIUC Computer Science 72-Hour Dance Exchange, to be
held all day Friday, Saturday, and Sunday. Several 30-minute sets of music will be played
during the event, and a large number of DJs have applied to perform. You need to hire DJs
according to the following constraints.
• Exactly k sets of music must be played each day, and thus 3k sets altogether.
⁵Thankfully, the Global Campus has faded into well-deserved obscurity, thanks in part to the 2009 admissions
scandal. Imagine MOOCs, but with the same business model and faculty oversight as the University of Phoenix.
10
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
• Each set must be played by a single DJ in a consistent music genre (ambient,
bubblegum, dubstep, horrorcore, hyphy, trip-hop, Nitzhonot, Kwaito, J-pop, Nashville
country, . . . ).
• Each genre must be played at most once per day.
• Each candidate DJ has given you a list of genres they are willing to play.
• Each DJ can play at most three sets during the entire event.
Suppose there are n candidate DJs and g different musical genres available. Describe and
analyze an efficient algorithm that either assigns a DJ and a genre to each of the 3k sets,
or correctly reports that no such assignment is possible.
12. The University of Southern North Dakota at Hoople has hired you to write an algorithm to
schedule their final exams. Each semester, USNDH offers n different classes. There are r
different rooms on campus and t different time slots in which exams can be offered. You
are given two arrays E[1 .. n] and S[1 .. r], where E[i] is the number of students enrolled
in the ith class, and S[ j] is the number of seats in the jth room. At most one final exam
can be held in each room during each time slot. Class i can hold its final exam in room j
only if E[i] < S[ j].
Describe and analyze an efficient algorithm to assign a room and a time slot to each
class (or report correctly that no such assignment is possible).
13. Suppose you are running a web site that is visited by the same set of people every day. Each
visitor claims membership in one or more demographic groups; for example, a visitor might
describe himself as male, 40–50 years old, a father, a resident of Illinois, an academic,
has told you which demographic groups should see its ads and how many of its ads you must
show each day. Altogether, there are n visitors, k demographic groups, and m advertisers.
Describe an efficient algorithm to determine, given all the data described in the
previous paragraph, whether you can show each visitor exactly one ad per day, so that
in an appropriate demographic group.
14. Suppose we are given an array A[1 .. m][1 .. n] of non-negative real numbers. We want to
round A to an integer matrix, by replacing each entry x in A with either bxc or dxe, without
changing the sum of entries in any row or column of A. For example:




1.2 3.4 2.4
1 4 2
3.9 4.0 2.1 7−→ 4 4 2
7.9 1.6 0.5
8 1 1
(a) Describe and analyze an efficient algorithm that either rounds A in this fashion, or
reports correctly that no such rounding is possible.
? (b)
Suppose we are guaranteed that none of the entries in the input matrix A are integers.
Describe and analyze an even faster algorithm that either rounds A or reports correctly
⁶Har har har! Mine is an evil laugh! Now die!
11
Algorithms
Lecture 24: Applications of Maximum Flow [Fa’13]
that no such rounding is possible. For full credit, your algorithm must run in O(mn)
time. [Hint: Don’t use flows.]
15. Ad-hoc networks are made up of low-powered wireless devices. In principle⁷, these networks
can be used on battlefields, in regions that have recently suffered from natural disasters,
and in other hard-to-reach areas. The idea is that a large collection of cheap, simple devices
could be distributed through the area of interest (for example, by dropping them from an
airplane); the devices would then automatically configure themselves into a functioning
wireless network.
These devices can communicate only within a limited range. We assume all the devices
are identical; there is a distance D such that two devices can communicate if and only if
the distance between them is at most D.
We would like our ad-hoc network to be reliable, but because the devices are cheap
and low-powered, they frequently fail. If a device detects that it is likely to fail, it should
transmit its information to some other backup device within its communication range. We
require each device x to have k potential backup devices, all within distance D of x; we call
these k devices the backup set of x. Also, we do not want any device to be in the backup
set of too many other devices; otherwise, a single failure might affect a large fraction of
the network.
So suppose we are given the communication radius D, parameters b and k, and an
array d[1 .. n, 1 .. n] of distances, where d[i, j] is the distance between device i and device
j. Describe an algorithm that either computes a backup set of size k for each of the n
devices, such that no device appears in more than b backup sets, or reports (correctly) that
no good collection of backup sets exists.
? 16.
A rooted tree is a directed acyclic graph, in which every vertex has exactly one incoming
edge, except for the root, which has no incoming edges. Equivalently, a rooted tree consists
of a root vertex, which has edges pointing to the roots of zero or more smaller rooted trees.
Describe a polynomial-time algorithm to compute, given two rooted trees A and B, the
largest common rooted subtree of A and B.
[Hint: Let LCS(u, v) denote the largest common subtree whose root in A is u and whose
root in B is v. Your algorithm should compute LCS(u, v) for all vertices u and v using dynamic
programming. This would be easy if every vertex had O(1) children, and still straightforward
if the children of each node were ordered from left to right and the common subtree had
to respect that ordering. But for unordered trees with large degree, you need another trick
to combine recursive subproblems efficiently. Don’t waste your time trying to reduce the
polynomial running time.]
⁷but not really in practice
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
12
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
“Who are you?" said Lunkwill, rising angrily from his seat. “What do you want?"
“I am Majikthise!" announced the older one.
“And I demand that I am Vroomfondel!" shouted the younger one.
Majikthise turned on Vroomfondel. “It’s alright," he explained angrily, “you don’t
need to demand that."
“Alright!" bawled Vroomfondel banging on an nearby desk. “I am Vroomfondel, and
that is not a demand, that is a solid fact! What we demand is solid facts!"
“No we don’t!" exclaimed Majikthise in irritation. “That is precisely what we don’t
demand!"
Scarcely pausing for breath, Vroomfondel shouted, “We don’t demand solid facts!
What we demand is a total absence of solid facts. I demand that I may or may not be
Vroomfondel!"
— Douglas Adams, The Hitchhiker’s Guide to the Galaxy (1979)
? 25
Extensions of Maximum Flow
25.1
Maximum Flows with Edge Demands
Now suppose each directed edge e in has both a capacity c(e) and a demand d(e) ≤ c(e), and we
want a flow f of maximum value that satisfies d(e) ≤ f (e) ≤ c(e) at every edge e. We call a flow
that satisfies these constraints a feasible flow. In our original setting, where d(e) = 0 for every
edge e, the zero flow is feasible; however, in this more general setting, even determining whether
a feasible flow exists is a nontrivial task.
Perhaps the easiest way to find a feasible flow (or determine that none exists) is to reduce the
problem to a standard maximum flow problem, as follows. The input consists of a directed graph
G = (V, E), nodes s and t, demand function d : E → R, and capacity function c : E → R. Let D
denote the sum of all edge demands in G:
X
D :=
d(u v).
u v∈E
We construct a new graph G 0 = (V 0 , E 0 ) from G by adding new source and target vertices s0
and t 0 , adding edges from s0 to each vertex in V , adding edges from each vertex in V to t 0 , and
finally adding an edge from t to s. We also define a new capacity function c 0 : E 0 → R as follows:
• For each vertex v ∈ V , we set c 0 (s0 v) =
P
u∈V
d(u v) and c 0 (v t 0 ) =
P
• For each edge u v ∈ E, we set c 0 (u v) = c(u v) − d(u v).
w∈V
d(v w).
• Finally, we set c 0 (t s) = ∞.
Intuitively, we construct G 0 by replacing any edge u v in G with three edges: an edge u v
with capacity c(u v) − d(u v), an edge s0 v with capacity d(u v), and an edge u t 0 with
capacity d(u v). If this construction produces multiple edges from s0 to the same vertex v (or to
t 0 from the same vertex v), we merge them into a single edge with the same total capacity.
In G 0 , the total capacity out of s0 and the total capacity into t 0 are both equal to D. We
call a flow with value exactly D a saturating flow, since it saturates all the edges leaving s0 or
entering t 0 . If G 0 has a saturating flow, it must be a maximum flow, so we can find it using any
max-flow algorithm.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
1
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
0..5
6..20
2..15
0..15
s
5..10
5..10
4..10
t
7..20
3..10
8
11
4
5
5
7
3
13
s
s'
13
5
14
6
10
t
6
t'
14
7
10
3
3
∞
A flow network G with demands and capacities (written d .. c ), and the transformed network G 0 .
Lemma 1. G has a feasible (s, t)-flow if and only if G 0 has a saturating (s0 , t 0 )-flow.
Proof: Let f : E → R be a feasible (s, t)-flow in the original graph G. Consider the following
function f 0 : E 0 → R:
f 0 (u v) = f (u v) − d(u v)
X
f 0 (s0 v) =
d(u v)
for all u v ∈ E
for all v ∈ V
u∈V
f 0 (v t 0 ) =
X
d(u → w)
for all v ∈ V
w∈V
f 0 (t s) = | f |
We easily verify that f 0 is a saturating (s0 , t 0 )-flow in G. The admissibility of f implies that
f (e) ≥ d(e) for every edge e ∈ E, so f 0 (e) ≥ 0 everywhere. Admissibility also implies f (e) ≤ c(e)
for every edge e ∈ E, so f 0 (e) ≤ c 0 (e) everywhere. Tedious algebra implies that
X
X
f 0 (u v) =
f (v w)
u∈V 0
w∈V 0
for every vertex v ∈ V (including s and t). Thus, f 0 is a legal (s0 , t 0 )-flow, and every edge out
of s0 or into t 0 is clearly saturated. Intuitively, f 0 diverts d(u v) units of flow from u directly to
the new target t 0 , and injects the same amount of flow into v directly from the new source s0 .
The same tedious algebra implies that for any saturating (s0 , t 0 )-flow f 0 : E 0 → R for G 0 , the
function f = f 0 | E + d is a feasible (s, t)-flow in G.

2
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
Thus, we can compute a feasible (s, t)-flow for G, if one exists, by searching for a maximum
(s , t 0 )-flow in G 0 and checking that it is saturating. Once we’ve found a feasible (s, t)-flow in G,
we can transform it into a maximum flow using an augmenting-path algorithm, but with one
small change. To ensure that every flow we consider is feasible, we must redefine the residual
capacity of an edge as follows:


c(u v) − f (u v) if u v ∈ E,
c f (u v) = f (v u) − d(v u) if v u ∈ E,

0
otherwise.
0
Otherwise, the algorithm is unchanged. If we use the Dinitz/Edmonds-Karp fat-pipe algorithm,
we get an overall running time of O(V E 2 ).
8/8
11/11
4/4
5/5
7/7
2/5
3/3
0/13
s
s'
3/13
0/5
0/14
t
0/6
0/6
t'
0/14
7/7
10/10
10/10
3/3
3/3
11/∞
3
2/0..5
7/7..20
2
5/2..15
1/1..15
s
5/5..10
4/4..10
4/4..10
3
13
s
t
5
10
14
6
6/6..20
t
6
7
14
10/3..10
A saturating flow f 0 in G 0 , the corresponding feasible flow f in G , and the corresponding residual network G f .
25.2
Node Supplies and Demands
Another useful variant to consider allows flow to be injected or extracted from the flow network
at vertices other than s or t. Let x : (V \ {s, t}) → R be an excess function describing how much
flow is to be injected (or extracted if the value is negative) at each vertex. We now want a
maximum ‘flow’ that satisfies the variant balance condition
X
X
f (u v) −
f (v w) = x(v)
u∈V
w∈V
for every node v except s and t, or prove that no such flow exists. As above, call such a function
f a feasible flow.
3
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
As for flows with edge demands, the only real difficulty in finding a maximum flow under
these modified constraints is finding a feasible flow (if one exists). We can reduce this problem to
a standard max-flow problem, just as we did for edge demands.
To simplify the transformation,
let us assume without loss of generality that the total excess
P
in the network is zero: v x(v) = 0. If the total excess is positive,
P we add an infinite capacity
edge t ˜t , where ˜t is a new target node, and set x(t) = − v x(v). Similarly, if the total
excess is negative,
we add an infinite capacity edge ˜ss, where ˜s is a new source node, and
P
set x(s) = − v x(v). In both cases, every feasible flow in the modified graph corresponds to a
feasible flow in the original graph.
As before, we modify G to obtain a new graph G 0 by adding a new source s0 , a new target t 0 ,
an infinite-capacity edge t s from the old target to the old source, and several edges from s0 and
to t 0 . Specifically, for each vertex v, if x(v) > 0, we add a new edge s0 v with capacity x(v), and
if x(v) < 0, we add an edge v t 0 with capacity −x(v). As before, we call an (s0 , t 0 )-flow in G 0
saturating if every edge leaving s0 or entering t 0 is saturated; any saturating flow is a maximum
flow. It is easy to check that saturating flows in G 0 are in direct correspondence with feasible
flows in G; we leave details as an exercise (hint, hint).
Similar reductions allow us to several other variants of the maximum flow problem using the
same path-augmentation techniques. For example, we could associate capacities and demands
with the vertices instead of (or in addition to) the edges, as well as a range of excesses with every
vertex, instead of a single excess value.
25.3
Minimum-Cost Flows
Now imagine that each edge e in the network has both a capacity c(e) and a cost \$(e). The cost
function describes the cost of sending a unit of flow through the edges; thus, the cost any flow f
is defined as follows:
X
\$( f ) =
\$(e) · f (e).
e∈E
The minimum-cost maximum-flow problem is to compute a maximum flow of minimum cost.
If the network has only one maximum flow, that’s what we want, but if there is more than one
maximum flow, we want the maximum flow whose cost is as small as possible. Costs can either
be positive, negative, or zero. However, if an edge u v and its reversal v u both appear in the
graph, their costs must sum to zero: \$(u v) = −\$(v u). Otherwise, we could make an infinite
profit by pushing flow back and forth along the edge!
Each augmentation step in the standard Ford-Fulkerson algorithm both increases the value
of the flow and changes its cost. If the total cost of the augmenting path is positive, the cost
of the flow decreases; conversely, if the total cost of the augmenting path is negative, the cost
of the flow decreases. We can also change the cost of the flow without changing its value, by
augmenting along a directed cycle in the residual graph. Again, augmenting along a negative-cost
cycle decreases the cost of the flow, and augmenting along a positive-cost cycle increases the cost
of the flow.
It follows immediately that a flow f is a minimum-cost maximum flow in G if and only if the
residual graph G f has no directed paths from s to t and no negative-cost cycles.
We can compute a min-cost max-flow using the so-called cycle cancelling algorithm first
proposed by Morton Klein in 1967. The algorithm has two phases; in the first, we compute an
arbitrary maximum flow f , using any method we like. The second phase repeatedly decreases
the cost of f , by augmenting f along a negative-cost cycle in the residual graph G f , until no such
4
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
cycle exists. As in Ford-Fulkerson, the amount of flow we push around each cycle is equal to the
minimum residual capacity of any edge on the cycle.
In each iteration of the second phase, we can use a modification of Shimbel’s shortest path
algorithm (often called “Bellman-Ford”) to find a negative-cost cycle in O(V E) time. To bound
the number of iterations in the second phase, we assume that both the capacity and the cost of
each edge is an integer, and we define
C = max c(e)
e∈E
and
D = max |\$(e)|.
e∈E
The cost of any feasible flow is clearly between −EC D and EC D, and each augmentation step
decreases the cost of the flow by a positive integer, and therefore by at least 1. We conclude that
the second phase requires at most 2EC D iterations, and therefore runs in O(V E 2 C D) time. As
with the raw Ford-Fulkerson algorithm, this running time is exponential in the complexity of the
input, and it may never terminate if the capacities and/or costs are irrational.
Like Ford-Fulkerson, more careful choices of which cycle to cancel can lead to more efficient
algorithms. Unfortunately, some obvious choices are NP-hard to compute, including the cycle
with most negative cost and the negative cycle with the fewest edges. In the late 1980s, Andrew
Goldberg and Bob Tarjan developed a min-cost flow algorithm that repeatedly cancels the so-called
minimum-mean cycle, which is the cycle whose average cost per edge is smallest. By combining an
algorithm of Karp to compute minimum-mean cycles in O(EV ) time, efficient dynamic tree data
structures, and other sophisticated techniques that are (unfortunately) beyond the scope of this
class, their algorithm achieves a running time of O(E 2 V log2 V). The fastest min-cost max-flow
algorithm currently known,¹ due to James Orlin, reduces the problem to O(E log V ) iterations of
Dijkstra’s shortest-path algorithm; Orlin’s algorithm runs in O(E 2 log V + E V log2 V) time.
25.4
Maximum-Weight Matchings
Recall from the previous lecture that we can find a maximum-cardinality matching in any bipartite
graph in O(V E) time by reduction to the standard maximum flow problem.
Now suppose the input graph has weighted edges, and we want to find the matching with
maximum total weight. Given a bipartite graph G = (U ×W, E) and a non-negativePweight function
w: E → R, the goal is to compute a matching M whose total weight w(M ) = uw∈M w(uw) is
as large as possible. Max-weight matchings can’t be found directly using standard max-flow
algorithms², but we can modify the algorithm for maximum-cardinality matchings described
above.
It will be helpful to reinterpret the behavior of our earlier algorithm directly in terms of
the original bipartite graph instead of the derived flow network. Our algorithm maintains a
matching M , which is initially empty. We say that a vertex is matched if it is an endpoint of an
edge in M . At each iteration, we find an alternating path π that starts and ends at unmatched
vertices and alternates between edges in E \ M and edges in M . Equivalently, let G M be the
directed graph obtained by orienting every edge in M from W to U, and every edge in E \ M
from U to W . An alternating path is just a directed path in G M between two unmatched vertices.
Any alternating path has odd length and has exactly one more edge in E \ M than in M . The
iteration ends by setting M ← M ⊕ π, thereby increasing the number of edges in M by one.
The max-flow/min-cut theorem implies that when there are no more alternating paths, M is a
maximum matching.
¹at least, among algorithms whose running times do not depend on C and D
²However, max-flow algorithms can be modified to P
compute maximum weighted flows, where every edge has both
a capacity and a weight, and the goal is to maximize u v w(u v) f (u v).
5
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
A matching M with 5 edges, an alternating path π, and the augmented matching M ⊕ π with 6 edges.
If the edges of G are weighted, we only need to make two changes to the algorithm. First,
instead of looking for an arbitrary alternating path at each iteration, we look for the alternating
path π such that M ⊕ π has largest weight. Suppose we weight the edges in the residual graph
G M as follows:
w0 (uw) = −w(uw)
0
w (wu) = w(uw)
for all uw 6∈ M
for all uw ∈ M
We now have w(M ⊕ π) = w(M ) − w0 (π). Thus, the correct augmenting path π must be
the directed path in G M with minimum total residual weight w 0 (π). Second, because the
matching with the maximum weight may not be the matching with the maximum cardinality, we
return the heaviest matching considered in any iteration of the algorithm.
2
3
3
10
5
2
10
3
5
3
A maximum-weight matching is not necessarily a maximum-cardinality matching.
Before we determine the running time of the algorithm, we need to check that it actually
finds the maximum-weight matching. After all, it’s a greedy algorithm, and greedy algorithms
don’t work unless you prove them into submission! Let Mi denote the maximum-weight matching
in G with exactly i edges. In particular, M0 = ∅, and the global maximum-weight matching is
equal to Mi for some i. (The figure above show M1 and M2 for the same graph.) Let Gi denote
the directed residual graph for Mi , let w i denote the residual weight function for Mi as defined
above, and let πi denote the directed path in Gi such that w i (πi ) is minimized. To simplify the
proof, I will assume that there is a unique maximum-weight matching Mi of any particular size;
this assumption can be enforced by applying a consistent tie-breaking rule. With this assumption
in place, the correctness of our algorithm follows inductively from the following lemma.
Lemma 2. If G contains a matching with i + 1 edges, then Mi+1 = Mi ⊕ πi .
Proof: I will prove the equivalent statement Mi+1 ⊕ Mi = πi−1 . To simplify notation, call an
edge in Mi+1 ⊕ Mi red if it is an edge in Mi+1 , and blue if it is an edge in Mi .
The graph Mi+1 ⊕ Mi has maximum degree 2, and therefore consists of pairwise disjoint paths
and cycles, each of which alternates between red and blue edges. Since G is bipartite, every cycle
6
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
must have even length. The number of edges in Mi+1 ⊕ Mi is odd; specifically, Mi+1 ⊕ Mi has
2i + 1 − 2k edges, where k is the number of edges that are in both matchings. Thus, Mi+1 ⊕ Mi
contains an odd number of paths of odd length, some number of paths of even length, and some
number of cycles of even length.
Let γ be a cycle in Mi+1 ⊕ Mi . Because γ has an equal number of edges from each matching,
Mi ⊕γ is another matching with i edges. The total weight of this matching is exactly w(Mi ) − w i (γ),
which must be less than w(Mi ), so w i (γ) must be positive. On the other hand, Mi+1 ⊕ γ is a
matching with i + 1 edges whose total weight is w(Mi+1 ) + w i (γ) < w(Mi+1 ), so w i (γ) must be
negative! We conclude that no such cycle γ exists; Mi+1 ⊕ Mi consists entirely of disjoint paths.
Exactly the same reasoning implies that no path in Mi+1 ⊕ Mi has an even number of edges.
Finally, since the number of red edges in Mi+1 ⊕ Mi is one more than the number of blue
edges, the number of paths that start with a red edge is exactly one more than the number of
paths that start with a blue edge. The same reasoning as above implies that Mi+1 ⊕ Mi does not
contain a blue-first path, because we can pair it up with a red-first path.
We conclude that Mi+1 ⊕ Mi consists of a single alternating path π whose first edge is red.
Since w(Mi+1 ) = w(Mi ) − w i (π), the path π must be the one with minimum weight w i (π). 
We can find the alternating path πi using a single-source shortest path algorithm. Modify the
residual graph Gi by adding zero-weight edges from a new source vertex s to every unmatched
node in U, and from every unmatched node in W to a new target vertex t, exactly as in out
unweighted matching algorithm. Then πi is the shortest path from s to t in this modified graph.
Since Mi is the maximum-weight matching with i vertices, Gi has no negative cycles, so this
shortest path is well-defined. We can compute the shortest path in Gi in O(V E) time using
Shimbel’s algorithm, so the overall running time our algorithm is O(V 2 E).
The residual graph Gi has negative-weight edges, so we can’t speed up the algorithm by
replacing Shimbel’s algorithm with Dijkstra’s. However, we can use a variant of Johnson’s all-pairs
shortest path algorithm to improve the running time to O(V E + V 2 log V). Let di (v) denote the
˜ i denote
distance from s to v in the residual graph Gi , using the distance function w i . Let w
˜ i (u v) = di−1 (u) + w i (u v) − di−1 (v). As we argued in the
the modified distance function w
discussion of Johnson’s algorithm, shortest paths with respect to w i are still shortest paths with
˜ i . Moreover, w
˜ i (u v) > 0 for every edge u v in Gi :
respect to w
• If u v is an edge in Gi−1 , then w i (u v) = w i−1 (u v) and di−1 (v) ≤ di−1 (u) + w i−1 (u v).
• If u v is not in Gi−1 , then w i (u v) = −w i−1 (v u) and v u is an edge in the shortest
path πi−1 , so di−1 (u) = di−1 (v) + w i−1 (v u).
˜ i.
Let d˜i (v) denote the shortest path distance from s to v with respect to the distance function w
˜ i is positive everywhere, we can quickly compute d˜i (v) for all v using Dijkstra’s algorithm.
Because w
This gives us both the shortest alternating path πi and the distances di (v) = d˜i (v) + di−1 (v)
needed for the next iteration.
Exercises
1. Suppose we are given a directed graph G = (V, E), two vertices s an t, and a capacity
function c : V → R+ . A flow f is feasible if the total flow into every vertex v is at most c(v):
X
f (u v) ≤ c(v) for every vertex v.
u
7
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
Describe and analyze an efficient algorithm to compute a feasible flow of maximum value.
2. Suppose we are given an n × n grid, some of whose cells are marked; the grid is represented
by an array M [1 .. n, 1 .. n] of booleans, where M [i, j] = True if and only if cell (i, j) is
marked. A monotone path through the grid starts at the top-left cell, moves only right or
down at each step, and ends at the bottom-right cell. Our goal is to cover the marked cells
with as few monotone paths as possible.
Greedily covering the marked cells in a grid with four monotone paths.
(a) Describe an algorithm to find a monotone path that covers the largest number of
marked cells.
(b) There is a natural greedy heuristic to find a small cover by monotone paths: If there
are any marked cells, find a monotone path π that covers the largest number of
marked cells, unmark any cells covered by π those marked cells, and recurse. Show
that this algorithm does not always compute an optimal solution.
(c) Describe and analyze an efficient algorithm to compute the smallest set of monotone
paths that covers every marked cell.
3. Suppose we are given a set of boxes, each specified by their height, width, and depth in
centimeters. All three side lengths of every box lie strictly between 10cm and 20cm. As you
should expect, one box can be placed inside another if the smaller box can be rotated so
that its height, width, and depth are respectively smaller than the height, width, and depth
of the larger box. Boxes can be nested recursively. Call a box is visible if it is not inside
another box.
Describe and analyze an algorithm to nest the boxes so that the number of visible boxes
is as small as possible.
4. Let G be a directed flow network whose edges have costs, but which contains no negativecost cycles. Prove that one can compute a minimum-cost maximum flow in G using a
variant of Ford-Fulkerson that repeatedly augments the (s, t)-path of minimum total cost in
the current residual graph. What is the running time of this algorithm?
5. An (s , t )-series-parallel graph is an directed acyclic graph with two designated vertices s
(the source) and t (the target or sink) and with one of the following structures:
• Base case: A single directed edge from s to t.
• Series: The union of an (s, u)-series-parallel graph and a (u, t)-series-parallel graph
that share a common vertex u but no other vertices or edges.
8
Algorithms
Lecture 25: Extensions of Maximum Flow [Faâ13]
• Parallel: The union of two smaller (s, t)-series-parallel graphs with the same source
s and target t, but with no other vertices or edges in common.
(a) Describe an efficient algorithm to compute a maximum flow from s to t in an
(s, t)-series-parallel graph with arbitrary edge capacities.
(b) Describe an efficient algorithm to compute a minimum-cost maximum flow from s to
t in an (s, t)-series-parallel graph whose edges have unit capacity and arbitrary costs.
? (c)
Describe an efficient algorithm to compute a minimum-cost maximum flow from s to
t in an (s, t)-series-parallel graph whose edges have arbitrary capacities and costs.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
9
Algorithms
Lecture 26: Linear Programming [Fa ’13]
The greatest flood has the soonest ebb;
the sorest tempest the most sudden calm;
the hottest love the coldest end; and
from the deepest desire oftentimes ensues the deadliest hate.
— Socrates
Th’ extremes of glory and of shame,
Like east and west, become the same.
— Samuel Butler, Hudibras Part II, Canto I (c. 1670)
Extremes meet, and there is no better example
than the haughtiness of humility.
— Ralph Waldo Emerson, “Greatness”,
in Letters and Social Aims (1876)
? 26
Linear Programming
The maximum flow/minimum cut problem is a special case of a very general class of problems
called linear programming. Many other optimization problems fall into this class, including
minimum spanning trees and shortest paths, as well as several common problems in scheduling,
logistics, and economics. Linear programming was used implicitly by Fourier in the early
1800s, but it was first formalized and applied to problems in economics in the 1930s by Leonid
Kantorovich. Kantorivich’s work was hidden behind the Iron Curtain (where it was largely
ignored) and therefore unknown in the West. Linear programming was rediscovered and applied
to shipping problems in the early 1940s by Tjalling Koopmans. The first complete algorithm
to solve linear programming problems, called the simplex method, was published by George
Dantzig in 1947. Koopmans first proposed the name “linear programming" in a discussion with
Dantzig in 1948. Kantorovich and Koopmans shared the 1975 Nobel Prize in Economics “for
their contributions to the theory of optimum allocation of resources”. Dantzig did not; his work
was apparently too pure. Koopmans wrote to Kantorovich suggesting that they refuse the prize
in protest of Dantzig’s exclusion, but Kantorovich saw the prize as a vindication of his use of
mathematics in economics, which his Soviet colleagues had written off as “a means for apologists
of capitalism”.
A linear programming problem asks for a vector x ∈ Rd that maximizes (or equivalently,
minimizes) a given linear function, among all vectors x that satisfy a given set of linear inequalities.
The general form of a linear programming problem is the following:
maximize
d
X
cj x j
j=1
subject to
d
X
ai j x j ≤ bi
for each i = 1 .. p
ai j x j = bi
for each i = p + 1 .. p + q
ai j x j ≥ bi
for each i = p + q + 1 .. n
j=1
d
X
j=1
d
X
j=1
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
1
Algorithms
Lecture 26: Linear Programming [Fa ’13]
Here, the input consists of a matrix A = (ai j ) ∈ Rn×d , a column vector b ∈ Rn , and a row vector
c ∈ Rd . Each coordinate of the vector x is called a variable. Each of the linear inequalities is
called a constraint. The function x 7→ c · x is called the objective function. I will always use d to
denote the number of variables, also known as the dimension of the problem. The number of
constraints is usually denoted n.
A linear programming problem is said to be in canonical form¹ if it has the following structure:
maximize
d
X
cj x j
j=1
subject to
d
X
ai j x j ≤ bi
for each i = 1 .. n
xj ≥ 0
for each j = 1 .. d
j=1
We can express this canonical form more compactly as follows. For two vectors x = (x 1 , x 2 , . . . , x d )
and y = ( y1 , y2 , . . . , yd ), the expression x ≥ y means that x i ≥ yi for every index i.
max c · x
s.t. Ax ≤ b
x≥0
Any linear programming problem can be converted into canonical form as follows:
• For each variable x j , add the equality constraint x j = x +j − x −j and the inequalities x +j ≥ 0
and x −j ≥ 0.
P
P
• Replace any equality constraint j ai j x j = bi with two inequality constraints j ai j x j ≥ bi
P
and j ai j x j ≤ bi .
P
P
• Replace any upper bound j ai j x j ≥ bi with the equivalent lower bound j −ai j x j ≤ −bi .
This conversion potentially double the number of variables and the number of constraints;
fortunately, it is rarely necessary in practice.
Another useful format for linear programming problems is slack form², in which every
inequality is of the form x j ≥ 0:
max c · x
s.t. Ax = b
x≥0
It’s fairly easy to convert any linear programming problem into slack form. Slack form is especially
useful in executing the simplex algorithm (which we’ll see in the next lecture).
26.1
The Geometry of Linear Programming
A point x ∈ Rd is feasible with respect to some linear programming problem if it satisfies all the
linear constraints. The set of all feasible points is called the feasible region for that linear program.
¹Confusingly, some authors call this standard form.
²Confusingly, some authors call this standard form.
2
Algorithms
Lecture 26: Linear Programming [Fa ’13]
The feasible region has a particularly nice geometric structure that lends some useful intuition to
the linear programming algorithms we’ll see later.
Any linear equation in d variables defines a hyperplane in Rd ; think of a line when d = 2,
or a plane when d = 3. This hyperplane divides Rd into two halfspaces; each halfspace is
the set of points that satisfy some linear inequality. Thus, the set of feasible points is the
intersection of several hyperplanes (one for each equality constraint) and halfspaces (one for
each inequality constraint). The intersection of a finite number of hyperplanes and halfspaces is
called a polyhedron. It’s not hard to verify that any halfspace, and therefore any polyhedron, is
convex—if a polyhedron contains two points x and y, then it contains the entire line segment x y.
A two-dimensional polyhedron (white) defined by 10 linear inequalities.
By rotating Rd (or choosing a coordinate frame) so that the objective function points
downward, we can express any linear programming problem in the following geometric form:
Find the lowest point in a given polyhedron.
With this geometry in hand, we can easily picture two pathological cases where a given linear
programming problem has no solution. The first possibility is that there are no feasible points; in
this case the problem is called infeasible. For example, the following LP problem is infeasible:
maximize x − y
subject to 2x + y ≤ 1
x+ y ≥2
x, y ≥ 0
An infeasible linear programming problem; arrows indicate the constraints.
The second possibility is that there are feasible points at which the objective function is
arbitrarily large; in this case, we call the problem unbounded. The same polyhedron could be
unbounded for some objective functions but not others, or it could be unbounded for every
objective function.
3
Algorithms
Lecture 26: Linear Programming [Fa ’13]
A two-dimensional polyhedron (white) that is unbounded downward but bounded upward.
26.2
Example 1: Shortest Paths
We can compute the length of the shortest path from s to t in a weighted directed graph by
solving the following very simple linear programming problem.
maximize
dt
subject to
ds = 0
d v − du ≤ `u v
for every edge u v
Here, `u v is the length of the edge u v. Each variable d v represents a tentative shortest-path
distance from s to v. The constraints mirror the requirement that every edge in the graph must be
relaxed. These relaxation constraints imply that in any feasible solution, d v is at most the shortest
path distance from s to v. Thus, somewhat counterintuitively, we are correctly maximizing the
objective function to compute the shortest path! In the optimal solution, the objective function d t
is the actual shortest-path distance from s to t, but for any vertex v that is not on the shortest
path from s to t, d v may be an underestimate of the true distance from s to v. However, we can
obtain the true distances from s to every other vertex by modifying the objective function:
X
maximize
dv
v
subject to
ds = 0
d v − du ≤ `u v
for every edge u v
There is another formulation of shortest paths as an LP minimization problem using an
indicator variable x u v for each edge u v.
X
minimize
`u v · x u v
subject to
u
X
u
X
u
u v
x us −
X
x u t −
X
x u v −
X
X
w
w
w
x sw = 1
x t w = −1
x v w = 0
for every vertex v 6= s, t
x u v ≥ 0
for every edge u v
Intuitively, x u v = 1 means u v lies on the shortest path from s to t, and x u v = 0 means u v
does not lie on this shortest path. The constraints merely state that the path should start at s, end
at t, and either pass through or avoid every other vertex v. Any path from s to t—in particular,
the shortest path—clearly implies a feasible point for this linear program.
4
Algorithms
Lecture 26: Linear Programming [Fa ’13]
However, there are other feasible solutions, possibly even optimal solutions, with non-integral
values that do not represent paths. Nevertheless, there is always an optimal solution in which
every x e is either 0 or 1 and the edges e with x e = 1 comprise the shortest path. (This fact is by
no means obvious, but a proof is beyond the scope of these notes.) Moreover, in any optimal
solution, even if not every x e is an integer, the objective function gives the shortest path distance!
26.3
Example 2: Maximum Flows and Minimum Cuts
Recall that the input to the maximum (s, t)-flow problem consists of a weighted directed graph
G = (V, E), two special vertices s and t, and a function assigning a non-negative capacity ce to
each edge e. Our task is to choose the flow f e across each edge e, as follows:
X
X
maximize
f sw −
fus
subject to
w
X
w
f v w −
u
X
u
fu v = 0
for every vertex v 6= s, t
fu v ≤ cu v for every edge u v
fu v ≥ 0
for every edge u v
Similarly, the minimum cut problem can be formulated using ‘indicator’ variables similarly to
the shortest path problem. We have a variable S v for each vertex v, indicating whether v ∈ S or
v ∈ T , and a variable X u v for each edge u v, indicating whether u ∈ S and v ∈ T , where (S, T )
is some (s, t)-cut.³
X
minimize
cu v · X u v
subject to
u v
X u v + S v − Su ≥ 0 for every edge u v
X u v ≥ 0 for every edge u v
Ss = 1
St = 0
Like the minimization LP for shortest paths, there can be optimal solutions that assign fractional
values to the variables. Nevertheless, the minimum value for the objective function is the cost
of the minimum cut, and there is an optimal solution for which every variable is either 0 or 1,
representing an actual minimum cut. No, this is not obvious; in particular, my claim is not a
proof!
26.4
Linear Programming Duality
Each of these pairs of linear programming problems is related by a transformation called duality.
For any linear programming problem, there is a corresponding dual linear program that can be
obtained by a mechanical translation, essentially by swapping the constraints and the variables.
³These two linear programs are not quite syntactic duals; I’ve added two redundant variables Ss and S t to the
5
Algorithms
Lecture 26: Linear Programming [Fa ’13]
The translation is simplest when the LP is in canonical form:
Dual (q)
Primal (Π)
max c · x
s.t. Ax ≤ b
x≥0
min y · b
s.t. yA≥ c
y≥0
⇐⇒
We can also write the dual linear program in exactly the same canonical form as the primal, by
swapping the coefficient vector c and the objective vector b, negating both vectors, and replacing
the constraint matrix A with its negative transpose.⁴
Dual (q)
Primal (Π)
max c · x
s.t. Ax≤ b
x≥ 0
⇐⇒
max −b> · y >
s.t. −A> y > ≤ −c
y >≥ 0
Written in this form, it should be immediately clear that duality is an involution: The dual of the
dual linear program q is identical to the primal linear program Π. The choice of which LP to call
the ‘primal’ and which to call the ‘dual’ is totally arbitrary.⁵
The Fundamental Theorem of Linear Programming. A linear program Π has an optimal solution x ∗ if and only if the dual linear program q has an optimal solution y ∗ such that c · x ∗ =
y ∗ Ax ∗ = y ∗ · b.
The weak form of this theorem is trivial to prove.
Weak Duality Theorem. If x is a feasible solution for a canonical linear program Π and y is a
feasible solution for its dual q, then c · x ≤ yAx ≤ y · b.
Proof: Because x is feasible for Π, we have Ax ≤ b. Since y is positive, we can multiply both
sides of the inequality to obtain yAx ≤ y · b. Conversely, y is feasible for q and x is positive, so
yAx ≥ c · x.

It immediately follows that if c · x = y · b, then x and y are optimal solutions to their
respective linear programs. This is in fact a fairly common way to prove that we have the optimal
value for a linear program.
⁴For the notational purists: In these formulations, x and b are column vectors, and y and c are row vectors. This
is a somewhat nonstandard choice. Yes, that means the dot in c · x is redundant. Sue me.
⁵For historical reasons, maximization LPs tend to be called ‘primal’ and minimization LPs tend to be called ‘dual’.
This is a pointless religious tradition, nothing more. Duality is a relationship between LP problems, not a type of LP
problem.
6
Algorithms
26.5
Lecture 26: Linear Programming [Fa ’13]
Duality Example
Before I prove the stronger duality theorem, let me first provide some intuition about where this
duality thing comes from in the first place.⁶ Consider the following linear programming problem:
maximize 4x 1 + x 2 + 3x 3
subject to
x 1 + 4x 2
≤2
3x 1 − x 2 + x 3 ≤ 4
x1, x2, x3 ≥ 0
Let σ∗ denote the optimum objective value for this LP. The feasible solution x = (1, 0, 0) gives us
a lower bound σ∗ ≥ 4. A different feasible solution x = (0, 0, 3) gives us a better lower bound
σ∗ ≥ 9. We could play this game all day, finding different feasible solutions and getting ever
larger lower bounds. How do we know when we’re done? Is there a way to prove an upper
bound on σ∗ ?
In fact, there is. Let’s multiply each of the constraints in our LP by a new non-negative scalar
value yi :
maximize 4x 1 + x 2 + 3x 3
subject to
y1 ( x 1 + 4x 2
) ≤ 2 y1
y2 (3x 1 − x 2 + x 3 ) ≤ 4 y2
x1, x2, x3 ≥ 0
Because each yi is non-negative, we do not reverse any of the inequalities. Any feasible solution
(x 1 , x 2 , x 3 ) must satisfy both of these inequalities, so it must also satisfy their sum:
( y1 + 3 y2 )x 1 + (4 y1 − y2 )x 2 + y2 x 3 ≤ 2 y1 + 4 y2 .
Now suppose that each yi is larger than the ith coefficient of the objective function:
y1 + 3 y2 ≥ 4,
4 y1 − y2 ≥ 1,
y2 ≥ 3.
This assumption lets us derive an upper bound on the objective value of any feasible solution:
4x 1 + x 2 + 3x 3 ≤ ( y1 + 3 y2 )x 1 + (4 y1 − y2 )x 2 + y2 x 3 ≤ 2 y1 + 4 y2 .
(∗)
In particular, by plugging in the optimal solution (x 1∗ , x 2∗ , x 3∗ ) for the original LP, we obtain the
following upper bound on σ∗ :
σ∗ = 4x 1∗ + x 2∗ + 3x 3∗ ≤ 2 y1 + 4 y2 .
Now it’s natural to ask how tight we can make this upper bound. How small can we make
the expression 2 y1 + 4 y2 without violating any of the inequalities we used to prove the upper
bound? This is just another linear programming problem.
minimize
subject to
2 y1 + 4 y2
y1 + 3 y2 ≥ 4
4 y1 − y2 ≥ 1
y2 ≥ 3
y1 , y2 ≥ 0
⁶This example is taken from Robert Vanderbei’s excellent textbook Linear Programming: Foundations and Extensions
[Springer, 2001], but the idea appears earlier in Jens Clausen’s 1997 paper ‘Teaching Duality in Linear Programming:
The Multiplier Approach’.
7
Algorithms
Lecture 26: Linear Programming [Fa ’13]
In fact, this is precisely the dual of our original linear program! Moreover, inequality (∗) is just
an instantiation of the Weak Duality Theorem.
26.6
Strong Duality
The Fundamental Theorem can be rephrased in the following form:
Strong Duality Theorem. If x ∗ is an optimal solution for a canonical linear program Π, then
there is an optimal solution y ∗ for its dual q, such that c · x ∗ = y ∗ Ax ∗ = y ∗ · b.
Proof (sketch): I’ll prove the theorem only for non-degenerate linear programs, in which (a)
the optimal solution (if one exists) is a unique vertex of the feasible region, and (b) at most d
constraint hyperplanes pass through any point. These non-degeneracy assumptions are relatively
easy to enforce in practice and can be removed from the proof at the expense of some technical
detail. I will also prove the theorem only for the case n ≥ d; the argument for under-constrained
LPs is similar (if not simpler).
To develop some intuition, let’s first consider the very special case where x ∗ = (0, 0, . . . , 0).
Let ei denote the ith standard basis vector, whose ith coordinate is 1 and all other coordinates
are 0. Because x i∗ = 0 for all i, our non-degeneracy assumption implies the strict inequality
ai · x ∗ < bi for all i. Thus, any sufficiently small ball around the origin does not intersect any
other constraint hyperplane ai · x = bi . Thus, for all i, and for any sufficiently small δ > 0, the
vector δei is feasible. Because x ∗ is the unique optimum, we must have δci = c · (δei ) < c · x ∗ = 0.
We conclude that ci < 0 for all i.
Now let y = (0, 0, . . . , 0) as well. We immediately observe that yA ≥ c and y ≥ 0; in other
words, y is a feasible solution for the dual linear program q. But y · b = 0 = c · x ∗ , so the weak
duality theorem implies that y is an optimal solution to q, and the proof is complete for this very
special case.
Now let us consider the more general case. Let x ∗ be the optimal solution for the linear
program Π; our non-degeneracy assumption implies that this solution is unique, and that exactly
d of the n linear constraints are satisfied with equality. Without loss of generality (by permuting
the constraints and possibly changing coordinates), we can assume that these are the first d
constraints. Thus, we have
ai · x ∗ = bi
∗
ai · x < bi
for all i ≤ d,
for all i ≥ d + 1,
where ai denotes the ith row of A. Let A• denote the d × d matrix containing the first d rows of
A. Our non-degeneracy assumption implies that A• has full rank, and thus has a well-defined
inverse V = A−1
• .
Now define a vector y ∈ Rn by setting
y j := c · v j
for all j ≤ d,
y j := 0
for all j ≥ d + 1,
j
j
where v j denotes the jth column of V = A−1
• . Note that ai · v = 0 if i 6= j, and ai · v = 1 if i = j.
To simplify notation, let y• = ( y1 , y2 , . . . , yd ) and let b• = (b1 , b2 , . . . , bd ) = A• x ∗ . Because
yi = 0 for all i ≥ d + 1, we immediately have
∗
y · b = y• · b• = cV b• = cA−1
• b• = c · x
8
Algorithms
Lecture 26: Linear Programming [Fa ’13]
and
yA = y• A• = cVA• = cA−1
• A• = c.
The point x ∗ lies on exactly d constraint hyperplanes; moreover, any sufficiently small ball
around x ∗ intersects only those d constraint hyperplanes. Consider the point x˜ = x ∗ − "v j , for
some index 1 ≤ j ≤ d and some sufficiently small " > 0. We have ai · x˜ = ai · x ∗ − "(ai · v j ) = bi
for all i 6= j, and a j · x˜ = a j · x ∗ − "(a j · v j ) = b j − " < b j . Thus, x˜ is a feasible point for Π. Because
x ∗ is the unique optimum for Π, we must have c · x˜ = c · x ∗ − "(c · v j ) < c · x ∗ . We conclude that
y j = c · v j > 0 for all j.
We have shown that yA ≥ c and y ≥ 0, so y is a feasible solution for the dual linear program
q. We have also shown that y · b = c · x ∗ , so by the Weak Duality Theorem, y is also an optimal
solution for q, and the proof is complete!

We can also give a useful geometric interpretation to the vector y• ∈ Rd . Each linear equation
ai · x = bi defines a hyperplane in Rd with normal vector ai . The normal vectors a1 , . . . , ad are
linearly independent (by non-degeneracy) and therefore describe a coordinate frame for the
Pd
vector space Rd . The definition of y• implies that c = y• A• = i=1 yi ai . In other words, y• lists
the coefficients of the objective vector c in the coordinate frame a1 , . . . , ad .
26.7
Complementary Slackness
Complementary Slackness Theorem. Let x ∗ be an optimal solution to a canonical linear program Π, and let y ∗ be an optimal solution to its dual q. Then for every index i, we have yi∗ > 0 if
and only if ai · x ∗ = bi . Symmetrically, for every index j, we have x ∗j > 0 if and only if y ∗ · a j = c j .
To be written
Exercises
1. (a) Describe how to transform any linear program written in general form into an
equivalent linear program written in slack form.
maximize
subject to
d
P
j=1
d
P
j=1
d
P
j=1
d
P
cj x j
ai j x j ≤ bi
for each i = 1 .. p
ai j x j = bi
for each i = p + 1 .. p + q
ai j x j ≥ bi
for each i = p + q + 1 .. n
Z=⇒
max c · x
s.t. Ax = b
x≥0
j=1
(b) Describe precisely how to dualize a linear program written in slack form.
(c) Describe precisely how to dualize a linear program written in general form:
In all cases, keep the number of variables in the resulting linear program as small as
possible.
9
Algorithms
Lecture 26: Linear Programming [Fa ’13]
2. A matrix A = (ai j ) is skew-symmetric if and only if a ji = −ai j for all indices i 6= j; in
particular, every skew-symmetric matrix is square. A canonical linear program max{c · x |
Ax ≤ b; x ≥ 0} is self-dual if the matrix A is skew-symmetric and the objective vector c is
equal to the constraint vector b.
(a) Prove that any self-dual linear program Π is syntactically equivalent to its dual
program q.
(b) Show that any linear program Π with d variables and n constraints can be transformed
into a self-dual linear program with n + d variables and n + d constraints. The optimal
solution to the self-dual program should include both the optimal solution for Π (in
d of the variables) and the optimal solution for the dual program q (in the other n
variables).
3. (a) Give a linear-programming formulation of the maximum-cardinality bipartite matching
problem. The input is a bipartite graph G = (U ∪ V ; E), where E ⊆ U × V ; the output
is the largest matching in G. Your linear program should have one variable for each
edge.
(b) Now dualize the linear program from part (a). What do the dual variables represent?
What does the objective function represent? What problem is this!?
4. Give a linear-programming formulation of the minimum-cost feasible circulation problem.
Here you are given a flow network whose edges have both capacities and costs, and your
goal is to find a feasible circulation (flow with value 0) whose cost is as small as possible.
5. An integer program is a linear program with the additional constraint that the variables
must take only integer values.
(a) Prove that deciding whether an integer program has a feasible solution is NP-complete.
(b) Prove that finding the optimal feasible solution to an integer program is NP-hard.
[Hint: Almost any NP-hard decision problem can be formulated as an integer program. Pick
? 6.
Helly’s theorem states that for any collection of convex bodies in Rd , if every d + 1 of them
intersect, then there is a point lying in the intersection of all of them. Prove Helly’s theorem
for the special case where the convex bodies are halfspaces. Equivalently, show that if a
system of linear inequalities Ax ≤ b does not have a solution, then we can select d + 1 of
the inequalities such that the resulting subsystem also does not have a solution. [Hint:
Construct a dual LP from the system by choosing a 0 cost vector.]
7. Given points (x 1 , y1 ), (x 2 , y2 ), . . . , (x n , yn ) in the plane, the linear regression problem asks
for real numbers a and b such that the line y = a x + b fits the points as closely as possible,
according to some criterion. The most common fit criterion is minimizing the L2 error,
10
Algorithms
Lecture 26: Linear Programming [Fa ’13]
defined as follows:⁷
"2 (a, b) =
n
X
( yi − a x i − b)2 .
i=1
But there are several other fit criteria, some of which can be optimized via linear programming.
(a) The L1 error (or total absolute deviation) of the line y = a x + b is defined as follows:
"1 (a, b) =
n
X
| yi − a x i − b| .
i=1
Describe a linear program whose solution (a, b) describes the line with minimum L1
error.
(b) The L∞ error (or maximum absolute deviation) of the line y = a x + b is defined as
follows:
n
"∞ (a, b) = max | yi − a x i − b| .
i=1
Describe a linear program whose solution (a, b) describes the line with minimum L∞
error.
⁷This measure is also known as sum of squared residuals, and the algorithm to compute the best fit is normally
called (ordinary/linear) least squares fitting.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
11
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
Simplicibus itaque verbis gaudet Mathematica Veritas, cum etiam per
se simplex sit Veritatis oratio. [And thus Mathematical Truth prefers
simple words, because the language of Truth is itself simple.]
— Tycho Brahe (quoting Seneca (quoting Euripides))
Epistolarum astronomicarum liber primus (1596)
When a jar is broken, the space that was inside
Merges into the space outside.
In the same way, my mind has merged in God;
To me, there appears no duality.
— Sankara, Viveka-Chudamani (c. 700), translator unknown
? 27
Linear Programming Algorithms
In this lecture, we’ll see a few algorithms for actually solving linear programming problems. The
most famous of these, the simplex method, was proposed by George Dantzig in 1947. Although
most variants of the simplex algorithm performs well in practice, no deterministic simplex variant
is known to run in sub-exponential time in the worst case.¹ However, if the dimension of the
problem is considered a constant, there are several linear programming algorithms that run in
linear time. I’ll describe a particularly simple randomized algorithm due to Raimund Seidel.
My approach to describing these algorithms will rely much more heavily on geometric
intuition than the usual linear-algebraic formalism. This works better for me, but your mileage
may vary. For a more traditional description of the simplex algorithm, see Robert Vanderbei’s
excellent textbook Linear Programming: Foundations and Extensions [Springer, 2001], which can
27.1
Bases, Feasibility, and Local Optimality
Consider the canonical linear program max{c · x | Ax ≤ b, x ≥ 0}, where A is an n × d constraint
matrix, b is an n-dimensional coefficient vector, and c is a d-dimensional objective vector. We will
interpret this linear program geometrically as looking for the lowest point in a convex polyhedron
in Rd , described as the intersection of n + d halfspaces. As in the last lecture, we will consider
only non-degenerate linear programs: Every subset of d constraint hyperplanes intersects in a
single point; at most d constraint hyperplanes pass through any point; and objective vector is
linearly independent from any d − 1 constraint vectors.
A basis is a subset of d constraints, which by our non-degeneracy assumption must be linearly
independent. The location of a basis is the unique point x that satisfies all d constraints with
equality; geometrically, x is the unique intersection point of the d hyperplanes.
The value of a
n+d
basis is c · x, where x is the location of the basis. There are precisely d bases. Geometrically,
the set of constraint hyperplanes defines a decomposition of Rd into convex polyhedra; this cell
decomposition is called the arrangement of the hyperplanes. Every subset of d hyperplanes
(that is, every basis) defines a vertex of this arrangement (the location of the basis). I will use the
words ‘vertex’ and ‘basis’ interchangeably.
¹However, there are randomized variants of the simplex algorithm that run in subexponential expected time, most
notably the RandomFacet algorithm analyzed by Gil Kalai in 1992, and independently by Jií Matoušek, Micha
Sharir, and Emo Welzl in 1996. No randomized variant is known to run in polynomial time. In particular, in 2010,
Oliver Friedmann, Thomas Dueholm Hansen, and Uri Zwick proved that the worst-case expected running time of
RandomFacet is superpolynomial.
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
1
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
A basis is feasible if its location x satisfies all the linear constraints, or geometrically, if
the point x is a vertex of the polyhedron. If there are no feasible bases, the linear program is
infeasible.
A basis is locally optimal if its location x is the optimal solution to the linear program
with the same objective function and only the constraints in the basis. Geometrically, a basis is
locally optimal if its location x is the lowest point in the intersection of those d halfspaces. A
careful reading of the proof of the Strong Duality Theorem reveals that local optimality is the
dual equivalent of feasibility; a basis is locally feasible for a linear program Π if and only if the
same basis is feasible for the dual linear program q. For this reason, locally optimal bases are
sometimes also called dual feasible. If there are no locally optimal bases, the linear program is
unbounded.²
Two bases are neighbors if they have d − 1 constraints in common. Equivalently, in geometric
terms, two vertices are neighbors if they lie on a line determined by some d − 1 constraint
hyperplanes. Every basis is a neighbor of exactly d n other bases; to change a basis into one of its
neighbors, there are d choices for which constraint to remove and n choices for which constraint
to add. The graph of vertices and edges on the boundary of the feasible polyhedron is a subgraph
of the basis graph.
The Weak Duality Theorem implies that the value of every feasible basis is less than or equal
to the value of every locally optimal basis; equivalently, every feasible vertex is higher than every
locally optimal vertex. The Strong Duality Theorem implies that (under our non-degeneracy
assumption), if a linear program has an optimal solution, it is the unique vertex that is both
feasible and locally optimal. Moreover, the optimal solution is both the lowest feasible vertex and
the highest locally optimal vertex.
27.2
The Primal Simplex Algorithm: Falling Marbles
From a geometric standpoint, Dantzig’s simplex algorithm is very simple. The input is a set H of
halfspaces; we want the lowest vertex in the intersection of these halfspaces.
Simplex1(H):
if ∩H = ∅
return Infeasible
x ← any feasible vertex
while x is not locally optimal
〈〈pivot downward, maintaining feasibility〉〉
if every feasible neighbor of x is higher than x
return Unbounded
else
x ← any feasible neighbor of x that is lower than x
return x
Let’s ignore the first three lines for the moment. The algorithm maintains a feasible vertex
x. At each so-called pivot operation, the algorithm moves to a lower vertex, so the algorithm
n+d
never visits the same vertex more than once. Thus, the algorithm must halt after at most d
pivots. When the algorithm halts, either the feasible vertex x is locally optimal, and therefore the
optimum vertex, or the feasible vertex x is not locally optimal but has no lower feasible neighbor,
in which case the feasible region must be unbounded.
²For non-degenerate linear programs, the feasible region is unbounded in the objective direction if and only if
no basis is locally optimal. However, there are degenerate linear programs with no locally optimal basis that are
infeasible.
2
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
Notice that we have not specified which neighbor to choose at each pivot. Many different
pivoting rules have been proposed, but for almost every known pivot rule, there is an input
polyhedron that requires an exponential number of pivots under that rule. No pivoting rule is
known that guarantees a polynomial number of pivots in the worst case, or even in expectation.³
27.3
The Dual Simplex Algorithm: Rising Bubbles
We can also geometrically interpret the execution of the simplex algorithm on the dual linear
program q. Again, the input is a set H of halfspaces, and we want the lowest vertex in the
intersection of these halfspaces. By the Strong Duality Theorem, this is the same as the highest
locally-optimal vertex in the hyperplane arrangement.
Simplex2(H):
if there is no locally optimal vertex
return Unbounded
x ← any locally optimal vertex
while x is not feasbile
〈〈pivot upward, maintaining local optimality〉〉
if every locally optimal neighbor of x is lower than x
return Infeasible
else
x ← any locally-optimal neighbor of x that is higher than x
return x
Let’s ignore the first three lines for the moment. The algorithm maintains a locally optimal
vertex x. At each pivot operation, it moves to a higher vertex, so the algorithm
never visits the
n+d
same vertex more than once. Thus, the algorithm must halt after at most d pivots. When the
algorithm halts, either the locally optimal vertex x is feasible, and therefore the optimum vertex,
or the locally optimal vertex x is not feasible but has no higher locally optimal neighbor, in which
case the problem must be infeasible.
The primal simplex (falling marble) algorithm in action.
The dual simplex (rising bubble) algorithm in action.
From the standpoint of linear algebra, there is absolutely no difference between running
Simplex1 on any linear program Π and running Simplex2 on the dual linear program q. The
³In 1957, Hirsch conjectured that for any linear programming instance with d variables and n + d constraints,
starting at any feasible basis, there is a sequence of at most n pivots that leads to the optimal basis. This long-standing
conjecture was finally disproved in 2010 by Fransisco Santos, who described an counterexample with 43 variables, 86
facets, and diameter 44.
3
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
actual code is identical. The only difference between the two algorithms is how we interpret the
linear algebra geometrically.
27.4
Computing the Initial Basis
To complete our description of the simplex algorithm, we need to describe how to find the initial
vertex x. Our algorithm relies on the following simple observations.
First, the feasibility of a vertex does not depend at all on the choice of objective vector; a
vertex is either feasible for every objective function or for none. No matter how we rotate the
polyhedron, every feasible vertex stays feasible. Conversely (or by duality, equivalently), the local
optimality of a vertex does not depend on the exact location of the d hyperplanes, but only on
their normal directions and the objective function. No matter how we translate the hyperplanes,
every locally optimal vertex stays locally optimal. In terms of the original matrix formulation,
feasibility depends on A and b but not c, and local optimality depends on A and c but not b.
The second important observation is that every basis is locally optimal for some objective
function. Specifically, it suffices to choose any vector that has a positive inner product with each
of the normal vectors of the d chosen hyperplanes. Equivalently, we can make any basis feasible
by translating the hyperplanes appropriately. Specifically, it suffices to translate the chosen d
hyperplanes so that they pass through the origin, and then translate all the other halfspaces so
that they strictly contain the origin.
Our strategy for finding our initial feasible vertex is to choose any vertex, choose a new
objective function that makes that vertex locally optimal, and then find the optimal vertex for
that objective function by running the (dual) simplex algorithm. This vertex must be feasible,
even after we restore the original objective function!
(a)
(b)
(c)
(a) Choose any basis. (b) Rotate objective to make it locally optimal, and pivot ’upward’ to find a feasible basis.
(c) Pivot downward to the optimum basis for the original objective.
Equivalently, to find an initial locally optimal vertex, we choose any vertex, translate the
hyperplanes so that that vertex becomes feasible, and then find the optimal vertex for those
translated constraints using the (primal) simplex algorithm. This vertex must be locally optimal,
even after we restore the hyperplanes to their original locations!
Here are more complete descriptions of the simplex algorithm with this initialization rule, in
both primal and dual forms. As usual, the input is a set H of halfspaces, and the algorithms either
return the lowest vertex in the intersection of these halfspaces or report that no such vertex exists.
4
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
(a)
(b)
(c)
(a) Choose any basis. (b) Translate constraints to make it feasible, and pivot downward to find a locally optimal basis.
(c) Pivot upward to the optimum basis for the original constraints.
Simplex1(H):
x ← any vertex
˜ ← any rotation of H that makes x locally optimal
H
while x is not feasible
˜ than x
if every locally optimal neighbor of x is lower (wrt H)
return Infeasible
else
˜ than x
x ← any locally optimal neighbor of x that is higher (wrt H)
while x is not locally optimal
if every feasible neighbor of x is higher than x
return Unbounded
else
x ← any feasible neighbor of x that is lower than x
return x
Simplex2(H):
x ← any vertex
˜ ← any translation of H that makes x feasible
H
while x is not locally optimal
˜ than x
if every feasible neighbor of x is higher (wrt H)
return Unbounded
else
˜ than x
x ← any feasible neighbor of x that is lower (wrt H)
while x is not feasible
if every locally optimal neighbor of x is lower than x
return Infeasible
else
x ← any locally-optimal neighbor of x that is higher than x
return x
27.5
Linear Expected Time for Fixed Dimensions
In most geometric applications of linear programming, the number of variables is a small constant,
but the number of constraints may still be very large.
The input to the following algorithm is a set H of n halfspaces and a set B of b hyperplanes.
(B stands for basis.) The algorithm returns the lowest point in the intersection of the halfspaces
5
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
in H and the hyperplanes B. At the top level of recursion, B is empty. I will implicitly assume that
the linear program is both feasible and bounded. (If necessary, we can guarantee boundedness
by adding a single halfspace to H, and we can guarantee feasibility by adding a dimension.) A
point x violates a constraint h if it is not contained in the corresponding halfspace.
SeidelLP(H, B) :
if |B| = d T
return B
if |H ∪ B| =T
d
return (H ∪ B)
h ← random element of H
x ← SeidelLP(H \ h, B)
(∗)
if x violates h
return SeidelLP(H \ h, B ∪ ∂ h)
else
return x
The point x recursively computed in line (∗) is the optimal solution if and only if the random
halfspace h is not one of the d halfspaces that define the optimal solution. In other words, the
probability of calling SeidelLP(H, B ∪ h) is exactly (d − b)/n. Thus, we have the following
recurrence for the expected number of recursive calls for this algorithm:

if b = d or n + b = d
1
T (n, b) =
 T (n − 1, b) + d − b · T (n − 1, b + 1) otherwise
n
The recurrence is somewhat simpler if we write δ = d − b:

if δ = 0 or n = δ
1
T (n, δ) =
 T (n − 1, δ) + δ · T (n − 1, δ − 1) otherwise
n
It’s easy to prove by induction that T (n, δ) = O(δ! n):
T (n, δ) = T (n − 1, δ) +
δ
· T (n − 1, δ − 1)
n
δ
(δ − 1)! · (n − 1)
n
n−1
= δ! (n − 1) + δ!
n
≤ δ! n
≤ δ! (n − 1) +
[induction hypothesis]
At the top level of recursion, we perform one violation test in O(d) time. In each of the base
cases, we spend O(d 3 ) time computing the intersection point of d hyperplanes, and in the first
base case, we spend O(d n) additional time testing for violations. More careful analysis implies
that the algorithm runs in O(d! · n) expected time.
Exercises
1. Fix a non-degenerate linear program in canonical form with d variables and n+d constraints.
6
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
(a) Prove that every feasible basis has exactly d feasible neighbors.
(b) Prove that every locally optimal basis has exactly n locally optimal neighbors.
2. Suppose you have a subroutine that can solve linear programs in polynomial time, but
only if they are both feasible and bounded. Describe an algorithm that solves arbitrary
linear programs in polynomial time. Your algorithm should return an optimal solution if
one exists; if no optimum exists, your algorithm should report that the input instance is
Unbounded or Infeasible, whichever is appropriate. [Hint: Add one variable and one
constraint.]
3. (a) Give an example of a non-empty polyhedron Ax ≤ b that is unbounded for every
objective vector c.
(b) Give an example of an infeasible linear program whose dual is also infeasible.
In both cases, your linear program will be degenerate.
4. Describe and analyze an algorithm that solves the following problem in O(n) time: Given
n red points and n blue points in the plane, either find a line that separates every red point
from every blue point, or prove that no such line exists.
5. The single-source shortest path problem can be formulated as a linear programming
problem, with one variable d v for each vertex v 6= s in the input graph, as follows:
X
maximize
dv
v
subject to
d v ≤ `s→v
for every edge s → v
d v − du ≤ `u→v
for every edge u → v with u 6= s
for every vertex v 6= s
dv ≥ 0
This problem asks you to describe the behavior of the simplex algorithm on this linear
program in terms of distances. Assume that the edge weights `u→v are all non-negative
and that there is a unique shortest path between any two vertices in the graph.
(a) What is a basis for this linear program? What is a feasible basis? What is a locally
optimal basis?
(b) Show that in the optimal basis, every variable d v is equal to the shortest-path distance
from s to v.
(c) Describe the primal simplex algorithm for the shortest-path linear program directly in
terms of vertex distances. In particular, what does it mean to pivot from a feasible
basis to a neighboring feasible basis, and how can we execute such a pivot quickly?
(d) Describe the dual simplex algorithm for the shortest-path linear program directly in
terms of vertex distances. In particular, what does it mean to pivot from a locally
optimal basis to a neighboring locally optimal basis, and how can we execute such a
pivot quickly?
(e) Is Dijkstra’s algorithm an instance of the simplex method? Justify your answer.
7
Algorithms
Lecture 27: Linear Programming Algorithms [Fa’13]
(f) Is Shimbel’s algorithm an instance of the simplex method? Justify your answer.
6. The maximum (s, t)-flow problem can be formulated as a linear programming problem,
with one variable fu→v for each edge u → v in the input graph:
X
X
maximize
fs→w −
fu→s
subject to
w
X
w
f v→w −
u
X
fu→v = 0
for every vertex v 6= s, t
u
fu→v ≤ cu→v for every edge u → v
fu→v ≥ 0
for every edge u → v
This problem asks you to describe the behavior of the simplex algorithm on this linear
program in terms of flows.
(a) What is a basis for this linear program? What is a feasible basis? What is a locally
optimal basis?
(b) Show that the optimal basis represents a maximum flow.
(c) Describe the primal simplex algorithm for the flow linear program directly in terms of
flows. In particular, what does it mean to pivot from a feasible basis to a neighboring
feasible basis, and how can we execute such a pivot quickly?
(d) Describe the dual simplex algorithm for the flow linear program directly in terms
of flows. In particular, what does it mean to pivot from a locally optimal basis to a
neighboring locally optimal basis, and how can we execute such a pivot quickly?
(e) Is the Ford-Fulkerson augmenting path algorithm an instance of the simplex method?
7. (a) Formulate the minimum spanning tree problem as an instance of linear programming.
Try to minimize the number of variables and constraints.
(b) In your MST linear program, what is a basis? What is a feasible basis? What is a
locally optimal basis?
(c) Describe the primal simplex algorithm for your MST linear program directly in terms
of the input graph. In particular, what does it mean to pivot from a feasible basis to a
neighboring feasible basis, and how can we execute such a pivot quickly?
(d) Describe the dual simplex algorithm for your MST linear program directly in terms of
the input graph. In particular, what does it mean to pivot from a locally optimal basis
to a neighboring locally optimal basis, and how can we execute such a pivot quickly?
(e) Which of the classical MST algorithms (Borvka, Jarník, Kruskal, reverse greedy), if
Free distribution is strongly encouraged; commercial distribution is expressly forbidden.
See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision.
8
```