Optimization Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] A process cannot be understood by stopping it. Understanding must move with the flow of the process, must join it and flow with it. — The First Law of Mentat, in Frank Herbert’s Dune (1965) There’s a difference between knowing the path and walking the path. — Morpheus [Laurence Fishburne], The Matrix (1999) 23 Maximum Flows and Minimum Cuts In the mid-1950s, Air Force researcher Theodore E. Harris and retired army general Frank S. Ross published a classified report studying the rail network that linked the Soviet Union to its satellite countries in Eastern Europe. The network was modeled as a graph with 44 vertices, representing geographic regions, and 105 edges, representing links between those regions in the rail network. Each edge was given a weight, representing the rate at which material could be shipped from one region to the next. Essentially by trial and error, they determined both the maximum amount of stuff that could be moved from Russia into Europe, as well as the cheapest way to disrupt the network by removing links (or in less abstract terms, blowing up train tracks), which they called ‘the bottleneck’. Their results, including the drawing of the network below, were only declassified in 1999.¹ Harris and Ross’s map of the Warsaw Pact rail network Figure 2 maximum flow and minimum cut problems. This one of the first recorded applications of the From Harris and Ross [1955]: Schematic diagram of the railway network of the Western SoFor both problems, the a directed graph = (V, E), with special viet Union andinput EasternisEuropean countries, with aGmaximum flow along of value 163,000 tons from vertices s and t Russia to Eastern Europe, a cut of capacitylectures, 163,000 tonsIindicated as ‘The called the source and target. As inandthe previous will use u vbottleneck’. to denote the directed edge from vertex u to vertex v. Intuitively, the maximum flow problem asks for the largest max-flow min-cut theorem ¹Both the map and the story wereThe taken from Alexander Schrijver’s fascinating survey ‘On the history of combinatorial optimization (till 1960)’. In the RAND Report of 19 November 1954, Ford and Fulkerson [1954] gave (next to defining the maximum flow problem and suggesting the simplex method for it) the max-flow mincut theorem for undirected graphs,© saying the maximum flow value is equal to the Copyrightthat 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). minimum capacity of a cut separating source and terminal. Their proof is not constructive, Free distribution is strongly encouraged; commercial distribution is expressly forbidden. but for planar graphs, with source and sink on the outer for boundary, they give a polynomialSee http://www.cs.uiuc.edu/~jeffe/teaching/algorithms the most recent revision. time, constructive method. In a report of 26 May 1955, Robacker [1955a] showed that the 1 max-flow min-cut theorem can be derived also from the vertex-disjoint version of Menger’s theorem. As for the directed case, Ford and Fulkerson [1955] observed that the max-flow min-cut theorem holds also for directed graphs. Dantzig and Fulkerson [1955] showed, by extending Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] amount of material that can be transported from s to t; the minimum cut problem asks for the minimum damage needed to separate s from t. 23.1 Flows An (s , t )-flow (or just a flow if the source and target are clear from context) is a function f : E → R≥0 that satisfies the following conservation constraint at every vertex v except possibly s and t: X X f (u v) = f (v w). u w In English, the total flow into v is equal to the total flow out of v. To keep the notation simple, we define f (u v) = 0 if there is no edge u v in the graph. The value of the flow f , denoted | f |, is the total net flow out of the source vertex s: X X f (sw) − f (us). | f | := w u It’s not hard to prove that | f | is also equal to the total net flow into the target vertex t, as follows. To simplify notation, let ∂ f (v) denote the total net flow out of any vertex v: X X ∂ f (v) := f (u v) − f (v w). u w The conservation constraint implies that ∂ f (v) = 0 or every vertex v except s and t, so X ∂ f (v) = ∂ f (s) + ∂ f (t). v On P the other hand, any flow that leaves one vertex must enter another vertex, so we must have v ∂ f (v) = 0. It follows immediately that | f | = ∂ f (s) = −∂ f (t). Now suppose we have another function c : E → R≥0 that assigns a non-negative capacity c(e) to each edge e. We say that a flow f is feasible (with respect to c) if f (e) ≤ c(e) for every edge e. Most of the time we will consider only flows that are feasible with respect to some fixed capacity function c. We say that a flow f saturates edge e if f (e) = c(e), and avoids edge e if f (e) = 0. The maximum flow problem is to compute a feasible (s, t)-flow in a given directed graph, with a given capacity function, whose value is as large as possible. 0/5 5/15 10/20 0/15 s 5/10 10/10 0/10 t 5/20 10/10 An (s, t)-flow with value 10. Each edge is labeled with its flow/capacity. 23.2 Cuts An (s , t )-cut (or just cut if the source and target are clear from context) is a partition of the vertices into disjoint subsets S and T —meaning S ∪ T = V and S ∩ T = ∅—where s ∈ S and t ∈ T. 2 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] If we have a capacity function c : E → R≥0 , the capacity of a cut is the sum of the capacities of the edges that start in S and end in T : XX kS, T k := c(v w). v∈S w∈T (Again, if v w is not an edge in the graph, we assume c(v w) = 0.) Notice that the definition is asymmetric; edges that start in T and end in S are unimportant. The minimum cut problem is to compute an (s, t)-cut whose capacity is as large as possible. 5 15 20 15 10 s 10 10 t 20 10 An (s, t)-cut with capacity 15. Each edge is labeled with its capacity. Intuitively, the minimum cut is the cheapest way to disrupt all flow from s to t. Indeed, it is not hard to show that the value of any feasible (s , t )-flow is at most the capacity of any (s , t )-cut. Choose your favorite flow f and your favorite cut (S, T ), and then follow the bouncing inequalities: |f | = X w = X X v∈S = w X X v∈S ≤ f (sw) − w∈T XX v∈S w∈T ≤ XX v∈S w∈T X u f (us) f (v w) − X f (v w) − u X u∈T by definition f (u v) by the conservation constraint f (u v) removing duplicate edges f (v w) since f (u v) ≥ 0 c(v w) since f (u v) ≤ c(v w) = kS, T k by definition Our derivation actually implies the following stronger observation: | f | = kS, T k if and only if f saturates every edge from S to T and avoids every edge from T to S. Moreover, if we have a flow f and a cut (S, T ) that satisfies this equality condition, f must be a maximum flow, and (S, T ) must be a minimum cut. 23.3 The Maxflow Mincut Theorem Surprisingly, for any weighted directed graph, there is always a flow f and a cut (S, T ) that satisfy the equality condition. This is the famous max-flow min-cut theorem, first proved by Lester Ford (of shortest path fame) and Delbert Ferguson in 1954 and independently by Peter Elias, Amiel Feinstein, and and Claude Shannon (of information theory fame) in 1956. 3 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] The Maxflow Mincut Theorem. In any flow network with source s and target t, the value of the maximum (s, t)-flow is equal to the capacity of the minimum (s, t)-cut. Ford and Fulkerson proved this theorem as follows. Fix a graph G, vertices s and t, and a capacity function c : E → R≥0 . The proof will be easier if we assume that the capacity function is reduced: For any vertices u and v, either c(u v) = 0 or c(v u) = 0, or equivalently, if an edge appears in G, then its reversal does not. This assumption is easy to enforce. Whenever an edge u v and its reversal v u are both the graph, replace the edge u v with a path u x v of length two, where x is a new vertex and c(u x) = c(x v) = c(u v). The modified graph has the same maximum flow value and minimum cut capacity as the original graph. Enforcing the one-direction assumption. Let f be a feasible flow. We define a new capacity function c f : V × V → R, called the residual capacity, as follows: c(u v) − f (u v) if u v ∈ E c f (u v) = f (v u) if v u ∈ E . 0 otherwise Since f ≥ 0 and f ≤ c, the residual capacities are always non-negative. It is possible to have c f (u v) > 0 even if u v is not an edge in the original graph G. Thus, we define the residual graph G f = (V, E f ), where E f is the set of edges whose residual capacity is positive. Notice that the residual capacities are not necessarily reduced; it is quite possible to have both c f (u v) > 0 and c f (v u) > 0. 0/5 5 5/10 10/10 5 10 0/15 s 10 10 5/15 10/20 t s 10 15 5 5 t 5 0/10 10 5/20 10/10 15 10 A flow f in a weighted graph G and the corresponding residual graph G f . Suppose there is no path from the source s to the target t in the residual graph G f . Let S be the set of vertices that are reachable from s in G f , and let T = V \ S. The partition (S, T ) is clearly an (s, t)-cut. For every vertex u ∈ S and v ∈ T , we have c f (u v) = (c(u v) − f (u v)) + f (v u) = 0, which implies that c(u v) − f (u v) = 0 and f (v u) = 0. In other words, our flow f saturates every edge from S to T and avoids every edge from T to S. It follows that | f | = kS, T k. Moreover, f is a maximum flow and (S, T ) is a minimum cut. On the other hand, suppose there is a path s = v0 v1 · · · vr = t in G f . We refer to v0 v1 · · · vr as an augmenting path. Let F = mini c f (vi vi+1 ) denote the maximum amount 4 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] 5 5/5 10 10 10 s 5/15 10/20 5 10 15 5 0/15 5 t s 0/10 5/10 t 5 10 5/10 15 10/20 10 10/10 An augmenting path in G f with value F = 5 and the augmented flow f 0 . of flow that we can push through the augmenting path in G f . We define a new flow function f 0 : E → R as follows: f (u v) + F if u v is in the augmenting path 0 f (u v) = f (u v) − F if v u is in the augmenting path f (u v) otherwise To prove that the flow f 0 is feasible with respect to the original capacities c, we need to verify that f 0 ≥ 0 and f 0 ≤ c. Consider an edge u v in G. If u v is in the augmenting path, then f 0 (u v) > f (u v) ≥ 0 and f 0 (u v) = f (u v) + F by definition of f 0 ≤ f (u v) + c f (u v) by definition of F = f (u v) + c(u v) − f (u v) by definition of c f = c(u v) Duh. On the other hand, if the reversal v u is in the augmenting path, then f 0 (u v) < f (u v) ≤ c(u v), which implies that f 0 (u v) = f (u v) − F by definition of f 0 ≥ f (u v) − c f (v u) by definition of F = f (u v) − f (u v) by definition of c f =0 Duh. Finally, we observe that (without loss of generality) only the first edge in the augmenting path leaves s, so | f 0 | = | f | + F > 0. In other words, f is not a maximum flow. This completes the proof! 23.4 Ford and Fulkerson’s augmenting-path algorithm Ford and Fulkerson’s proof of the Maxflow-Mincut Theorem translates immediately to an algorithm to compute maximum flows: Starting with the zero flow, repeatedly augment the flow along any path from s to t in the residual graph, until there is no such path. This algorithm has an important but straightforward corollary: Integrality Theorem. If all capacities in a flow network are integers, then there is a maximum flow such that the flow through every edge is an integer. 5 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] Proof: We argue by induction that after each iteration of the augmenting path algorithm, all flow values and residual capacities are integers. Before the first iteration, residual capacities are the original capacities, which are integral by definition. In each later iteration, the induction hypothesis implies that the capacity of the augmenting path is an integer, so augmenting changes the flow on each edge, and therefore the residual capacity of each edge, by an integer. In particular, the algorithm increases the overall value of the flow by a positive integer, which implies that the augmenting path algorithm halts and returns a maximum flow. If every edge capacity is an integer, the algorithm halts after | f ∗ | iterations, where f ∗ is the actual maximum flow. In each iteration, we can build the residual graph G f and perform a whatever-first-search to find an augmenting path in O(E) time. Thus, for networks with integer capacities, the Ford-Fulkerson algorithm runs in O(E| f ∗ |) time in the worst case. The following example shows that this running time analysis is essentially tight. Consider the 4-node network illustrated below, where X is some large integer. The maximum flow in this network is clearly 2X . However, Ford-Fulkerson might alternate between pushing 1 unit of flow along the augmenting path su v t and then pushing 1 unit of flow along the augmenting path s v u t, leading to a running time of Θ(X ) = Ω(| f ∗ |). v X X s t 1 X X u A bad example for the Ford-Fulkerson algorithm. Ford and Fulkerson’s algorithm works quite well in many practical situations, or in settings where the maximum flow value | f ∗ | is small, but without further constraints on the augmenting paths, this is not an efficient algorithm in general. The example network above can be described using only O(log X ) bits; thus, the running time of Ford-Fulkerson is actually exponential in the input size. 23.5 Irrational Capacities If we multiply all the capacities by the same (positive) constant, the maximum flow increases everywhere by the same constant factor. It follows that if all the edge capacities are rational, then the Ford-Fulkerson algorithm eventually halts, although still in exponential time. However, if we allow irrational capacities, the algorithm can actually loop forever, always finding smaller and smaller augmenting paths! Worse yet, this infinite sequence of augmentations may not even converge to the maximum flow, or even to a significant fraction of the maximum flow! Perhaps the simplest example of this effect was discovered by Uri Zwick. Consider the six-node network shown on the next page. Six of the p nine edges have some large integer capacity X , two have capacity 1, and one has capacity φ = ( 5 − 1)/2 ≈ 0.618034, chosen so that 1 − φ = φ 2 . To prove that the Ford-Fulkerson algorithm can get stuck, we can watch the residual capacities of the three horizontal edges as the algorithm progresses. (The residual capacities of the other six edges will always be at least X − 3.) Suppose the Ford-Fulkerson algorithm starts by choosing the central augmenting path, shown in the large figure on the next page. The three horizontal edges, in order from left to right, now have residual capacities 1, 0, and φ. Suppose inductively that the horizontal residual capacities are φ k−1 , 0, φ k for some non-negative integer k. 6 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] 1. Augment along B, adding φ k to the flow; the residual capacities are now φ k+1 , φ k , 0. 2. Augment along C, adding φ k to the flow; the residual capacities are now φ k+1 , 0, φ k . 3. Augment along B, adding φ k+1 to the flow; the residual capacities are now 0, φ k+1 , φ k+2 . 4. Augment along A, adding φ k+1 to the flow; the residual capacities are now φ k+1 , 0, φ k+2 . It follows by induction that after 4n + 1 augmentation steps, the horizontal edges have residual capacities φ 2n−2 , 0, φ 2n−1 . As the number of augmentations grows to infinity, the value of the flow converges to ∞ X p 2 1+2 φi = 1 + = 4 + 5 < 7, 1−φ i=1 even though the maximum flow value is clearly 2X + 1 7. s X 1 X X 1 ϕ X X X t A B C Uri Zwick’s non-terminating flow example, and three augmenting paths. Picky students might wonder at this point why we care about irrational capacities; after all, computers can’t represent anything but (small) integers or (dyadic) rationals exactly. Good question! One reason is that the integer restriction is literally artificial; it’s an artifact of actual computational hardware², not an inherent feature of the abstract mathematical problem. Another reason, which is probably more convincing to most practical computer scientists, is that the behavior of the algorithm with irrational inputs tells us something about its worst-case behavior in practice given floating-point capacities—terrible! Even with very reasonable capacities, a careless implementation of Ford-Fulkerson could enter an infinite loop simply because of round-off error. 23.6 Edmonds and Karp’s Algorithms Ford and Fulkerson’s algorithm does not specify which path in the residual graph to augment, and the poor behavior of the algorithm can be blamed on poor choices for the augmenting path. In the early 1970s, Jack Edmonds and Richard Karp analyzed two natural rules for choosing augmenting paths, both of which led to more efficient algorithms. ²...or perhaps the laws of physics. Yeah, whatever. Like reality actually matters in this class. 7 Algorithms 23.6.1 Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] Fat Pipes Edmonds and Karp’s first rule is essentially a greedy algorithm: Choose the augmenting path with largest bottleneck value. It’s a fairly easy to show that the maximum-bottleneck (s, t)-path in a directed graph can be computed in O(E log V ) time using a variant of Jarník’s minimum-spanning-tree algorithm, or of Dijkstra’s shortest path algorithm. Simply grow a directed spanning tree T , rooted at s. Repeatedly find the highest-capacity edge leaving T and add it to T , until T contains a path from s to t. Alternately, one could emulate Kruskal’s algorithm—insert edges one at a time in decreasing capacity order until there is a path from s to t—although this is less efficient, at least when the graph is directed. We can now analyze the algorithm in terms of the value of the maximum flow f ∗ . Let f be any flow in G, and let f 0 be the maximum flow in the current residual graph G f . (At the beginning of the algorithm, G f = G and f 0 = f ∗ .) Let e be the bottleneck edge in the next augmenting path. Let S be the set of vertices reachable from s through edges in G f with capacity greater than c f (e) and let T = V \ S. By construction, T is non-empty, and every edge from S to T has capacity at most c f (e). Thus, the capacity of the cut (S, T ) is at most c f (e) · E. On the other hand, the maxflow-mincut theorem implies that kS, T k ≥ | f 0 |. We conclude that c(e) ≥ | f 0 |/E. The preceding argument implies that augmenting f along the maximum-bottleneck path in G f multiplies the maximum flow value in G f by a factor of at most 1 − 1/E. In other words, the residual maximum flow value decays exponentially with the number of iterations. After E · ln| f ∗ | iterations, the maximum flow value in G f is at most ∗ ∗ | f ∗ | · (1 − 1/E) E·ln| f | < | f ∗ | e− ln| f | = 1. (That’s Euler’s constant e, not the edge e. Sorry.) In particular, if all the capacities are integers, then after E · ln| f ∗ | iterations, the maximum capacity of the residual graph is zero and f is a maximum flow. We conclude that for graphs with integer capacities, the Edmonds-Karp ‘fat pipe’ algorithm runs in O(E 2 log E log| f ∗ |) time, which is actually a polynomial function of the input size. 23.6.2 Short Pipes The second Edmonds-Karp rule was actually proposed by Ford and Fulkerson in their original max-flow paper; a variant of this rule was independently considered by the Russian mathematician Yefim Dinits around the same time as Edmonds and Karp. Choose the augmenting path with the smallest number of edges. The shortest augmenting path can be found in O(E) time by running breadth-first search in the residual graph. Surprisingly, the resulting algorithm halts after a polynomial number of iterations, independent of the actual edge capacities! The proof of this polynomial upper bound relies on two observations about the evolution of the residual graph. Let f i be the current flow after i augmentation steps, let Gi be the corresponding residual graph. In particular, f0 is zero everywhere and G0 = G. For each vertex v, let leveli (v) denote the unweighted shortest path distance from s to v in Gi , or equivalently, the level of v in a breadth-first search tree of Gi rooted at s. Our first observation is that these levels can only increase over time. 8 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] Lemma 1. leveli+1 (v) ≥ leveli (v) for all vertices v and integers i. Proof: The claim is trivial for v = s, since leveli (s) = 0 for all i. Choose an arbitrary vertex v 6= s, and let s · · · u v be a shortest path from s to v in Gi+1 . (If there is no such path, then leveli+1 (v) = ∞, and we’re done.) Because this is a shortest path, we have leveli+1 (v) = leveli+1 (u) + 1, and the inductive hypothesis implies that leveli+1 (u) ≥ leveli (u). We now have two cases to consider. If u v is an edge in Gi , then leveli (v) ≤ leveli (u) + 1, because the levels are defined by breadth-first traversal. On the other hand, if u v is not an edge in Gi , then v u must be an edge in the ith augmenting path. Thus, v u must lie on the shortest path from s to t in Gi , which implies that leveli (v) = leveli (u) − 1 ≤ leveli (u) + 1. In both cases, we have leveli+1 (v) = leveli+1 (u) + 1 ≥ leveli (u) + 1 ≥ leveli (v). Whenever we augment the flow, the bottleneck edge in the augmenting path disappears from the residual graph, and some other edge in the reversal of the augmenting path may (re-)appear. Our second observation is that an edge cannot appear or disappear too many times. Lemma 2. During the execution of the Edmonds-Karp short-pipe algorithm, any edge u v disappears from the residual graph G f at most V /2 times. Proof: Suppose u v is in two residual graphs Gi and G j+1 , but not in any of the intermediate residual graphs Gi+1 , . . . , G j , for some i < j. Then u v must be in the ith augmenting path, so leveli (v) = leveli (u) + 1, and v u must be on the jth augmenting path, so level j (v) = level j (u) − 1. By the previous lemma, we have level j (u) = level j (v) + 1 ≥ leveli (v) + 1 = leveli (u) + 2. In other words, the distance from s to u increased by at least 2 between the disappearance and reappearance of u v. Since every level is either less than V or infinite, the number of disappearances is at most V /2. Now we can derive an upper bound on the number of iterations. Since each edge can disappear at most V /2 times, there are at most EV /2 edge disappearances overall. But at least one edge disappears on each iteration, so the algorithm must halt after at most EV /2 iterations. Finally, since each iteration requires O(E) time, this algorithm runs in O(V E 2 ) time overall. 23.7 Further Progress This is nowhere near the end of the story for maximum-flow algorithms. Decades of further research have led to a number of even faster algorithms, some of which are summarized in the table below.³ All of the algorithms listed below compute a maximum flow in several iterations. Each algorithm has two variants: a simpler version that performs each iteration by brute force, and a faster variant that uses sophisticated data structures to maintain a spanning tree of the flow network, so that each iteration can be performed (and the spanning tree updated) in logarithmic time. There is no reason to believe that the best algorithms known so far are optimal; indeed, maximum flows are still a very active area of research. ³To keep the table short, I have deliberately omitted algorithms whose running time depends on the maximum capacity, the sum of the capacities, or the maximum flow value. Even with this restriction, the table is incomplete! 9 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] Technique Direct With dynamic trees Sources Blocking flow O(V E) O(V E log V ) [Dinits; Sleator and Tarjan] Network simplex O(V 2 E) O(V E log V ) [Dantzig; Goldfarb and Hao; 2 Goldberg, Grigoriadis, and Tarjan] 2 Push-relabel (generic) Push-relabel (FIFO) Push-relabel (highest label) Pseudoflow Compact abundance graphs O(V E) O(V 3 ) p O(V 2 E) O(V 2 E) — [Goldberg and Tarjan] O(V 2 log(V 2 /E)) [Goldberg and Tarjan] — [Cheriyan and Maheshwari; Tunçel] O(V E log V ) O(V E) [Hochbaum] [Orlin 2012] Several purely combinatorial maximum-flow algorithms and their running times. The fastest known maximum flow algorithm, announced by James Orlin in 2012, runs in O(V E) time. The details of Orlin’s algorithm are far beyond the scope of this course; in addition to his own new techniques, Orlin uses several existing algorithms and data structures as black boxes, most of which are themselves quite complicated. Nevertheless, for purposes of analyzing algorithms that use maximum flows, this is the time bound you should cite. So write the following sentence on your cheat sheets and cite it in your homeworks: Maximum flows can be computed in O(V E) time. Exercises 1. Suppose you are given a directed graph G = (V, E), two vertices s and t, a capacity function c : E → R+ , and a second function f : E → R. Describe an algorithm to determine whether f is a maximum (s, t)-flow in G. 2. Let (S, T ) and (S 0 , T 0 ) be minimum (s, t)-cuts in some flow network G. Prove that (S ∩ S 0 , T ∪ T 0 ) and (S ∪ S 0 , T ∩ T 0 ) are also minimum (s, t)-cuts in G. 3. Suppose (S, T ) is the unique minimum (s, t)-cut in some flow network. Prove that (S, T ) is also a minimum (x, y)-cut for all vertices x ∈ S and y ∈ T . 4. Cuts are sometimes defined as subsets of the edges of the graph, instead of as partitions of its vertices. In this problem, you will prove that these two definitions are almost equivalent. We say that a subset X of (directed) edges separates s and t if every directed path from s to t contains at least one (directed) edge in X . For any subset S of vertices, let δS denote the set of directed edges leaving S; that is, δS := {u v | u ∈ S, v 6∈ S}. (a) Prove that if (S, T ) is an (s, t)-cut, then δS separates s and t. (b) Let X be an arbitrary subset of edges that separates s and t. Prove that there is an (s, t)-cut (S, T ) such that δS ⊆ X . (c) Let X be a minimal subset of edges that separates s and t. (Such a set of edges is sometimes called a bond.) Prove that there is an (s, t)-cut (S, T ) such that δS = X . 10 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] 5. A flow f is acyclic if the subgraph of directed edges with positive flow contains no directed cycles. (a) Prove that for any flow f , there is an acyclic flow with the same value as f . (In particular, this implies that some maximum flow is acyclic.) (b) A path flow assigns positive values only to the edges of one simple directed path from s to t. Prove that every acyclic flow can be written as the sum of O(E) path flows. (c) Describe a flow in a directed graph that cannot be written as the sum of path flows. (d) A cycle flow assigns positive values only to the edges of one simple directed cycle. Prove that every flow can be written as the sum of O(E) path flows and cycle flows. (e) Prove that every flow with value 0 can be written as the sum of O(E) cycle flows. (Zero-value flows are also called circulations.) 6. Suppose instead of capacities, we consider networks where each edge u v has a nonnegative demand d(u v). Now an (s, t)-flow f is feasible if and only if f (u v) ≥ d(u v) for every edge u v. (Feasible flow values can now be arbitrarily large.) A natural problem in this setting is to find a feasible (s, t)-flow of minimum value. (a) Describe an efficient algorithm to compute a feasible (s, t)-flow, given the graph, the demand function, and the vertices s and t as input. [Hint: Find a flow that is non-zero everywhere, and then scale it up to make it feasible.] (b) Suppose you have access to a subroutine MaxFlow that computes maximum flows in networks with edge capacities. Describe an efficient algorithm to compute a minimum flow in a given network with edge demands; your algorithm should call MaxFlow exactly once. (c) State and prove an analogue of the max-flow min-cut theorem for this setting. (Do minimum flows correspond to maximum cuts?) 7. For any flow network G and any vertices u and v, let bottleneckG (u, v) denote the maximum, over all paths π in G from u to v, of the minimum-capacity edge along π. (a) Describe and analyze an algorithm to compute bottleneckG (s, t) in O(E log V ) time. (b) Describe an algorithm to construct a spanning tree T of G such that bottleneck T (u, v) = bottleneckG (u, v) for all vertices u and v. (Edges in T inherit their capacities from G.) 8. Describe an efficient algorithm to determine whether a given flow network contains a unique maximum flow. 9. Suppose you have already computed a maximum flow f ∗ in a flow network G with integer edge capacities. (a) Describe and analyze an algorithm to update the maximum flow after the capacity of a single edge is increased by 1. (b) Describe and analyze an algorithm to update the maximum flow after the capacity of a single edge is decreased by 1. 11 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] Both algorithms should be significantly faster than recomputing the maximum flow from scratch. 10. Let G be a network with integer edge capacities. An edge in G is upper-binding if increasing its capacity by 1 also increases the value of the maximum flow in G. Similarly, an edge is lower-binding if decreasing its capacity by 1 also decreases the value of the maximum flow in G. (a) Does every network G have at least one upper-binding edge? Prove your answer is correct. (b) Does every network G have at least one lower-binding edge? Prove your answer is correct. (c) Describe an algorithm to find all upper-binding edges in G, given both G and a maximum flow in G as input, in O(E) time. (d) Describe an algorithm to find all lower-binding edges in G, given both G and a maximum flow in G as input, in O(EV ) time. 11. A given flow network G may have more than one minimum (s, t)-cut. Let’s define the best minimum (s, t)-cut to be any minimum cut with the smallest number of edges. (a) Describe an efficient algorithm to determine whether a given flow network contains a unique minimum (s, t)-cut. (b) Describe an efficient algorithm to find the best minimum (s, t)-cut when the capacities are integers. (c) Describe an efficient algorithm to find the best minimum (s, t)-cut for arbitrary edge capacities. (d) Describe an efficient algorithm to determine whether a given flow network contains a unique best minimum (s, t)-cut. 12. A new assistant professor, teaching maximum flows for the first time, suggests the following greedy modification to the generic Ford-Fulkerson augmenting path algorithm. Instead of maintaining a residual graph, just reduce the capacity of edges along the augmenting path! In particular, whenever we saturate an edge, just remove it from the graph. GreedyFlow(G, c, s, t): for every edge e in G f (e) ← 0 while there is a path from s to t π ← an arbitrary path from s to t F ← minimum capacity of any edge in π for every edge e in π f (e) ← f (e) + F if c(e) = F remove e from G else c(e) ← c(e) − F return f 12 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] (a) Show that this algorithm does not always compute a maximum flow. (b) Prove that for any flow network, if the Greedy Path Fairy tells you precisely which path π to use at each iteration, then GreedyFlow does compute a maximum flow. (Sadly, the Greedy Path Fairy does not actually exist.) 13. We can speed up the Edmonds-Karp ‘fat pipe’ heuristic, at least for integer capacities, by relaxing our requirements for the next augmenting path. Instead of finding the augmenting path with maximum bottleneck capacity, we find a path whose bottleneck capacity is at least half of maximum, using the following capacity scaling algorithm. The algorithm maintains a bottleneck threshold ∆; initially, ∆ is the maximum capacity among all edges in the graph. In each phase, the algorithm augments along paths from s to t in which every edge has residual capacity at least ∆. When there is no such path, the phase ends, we set ∆ ← b∆/2c, and the next phase begins. (a) How many phases will the algorithm execute in the worst case, if the edge capacities are integers? (b) Let f be the flow at the end of a phase for a particular value of ∆. Let S be the nodes that are reachable from s in the residual graph G f using only edges with residual capacity at least ∆, and let T = V \ S. Prove that the capacity (with respect to G’s original edge capacities) of the cut (S, T ) is at most | f | + E · ∆. (c) Prove that in each phase of the scaling algorithm, there are at most 2E augmentations. (d) What is the overall running time of the scaling algorithm, assuming all the edge capacities are integers? 14. In 1980 Maurice Queyranne published the following example of a flow network where Edmonds and Karp’s “fat pipe” heuristic does not halt. Here, as in Zwick’s bad pexample for the original Ford-Fulkerson algorithm, φ denotes the inverse golden ratio ( 5 − 1)/2. The three vertical edges play essentially the same role as the horizontal edges in Zwick’s example. s (ϕ+1)/2 a 1/2 b ϕ/2 c (ϕ+1)/2 d ϕ/2 1/2 ϕ ϕ/2 ϕ (ϕ+1)/2 1 1/2 (ϕ+1)/2 e ϕ/2 f (ϕ+1)/2 g 1/2 h ϕ/2 t s s s t t t Queyranne’s network, and a sequence of “fat-pipe” augmentations. (a) Show that the following infinite sequence of path augmentations is a valid execution of the Edmonds-Karp algorithm. (See the figure above.) 13 Algorithms Lecture 23: Maximum Flows and Minimum Cuts [Fa’13] QueyranneFatPipes: for i ← 1 to ∞ push φ 3i−2 units of flow along sa f g bhc d t push φ 3i−1 units of flow along s f a b g hc t push φ 3i units of flow along se f a g bc h t forever (b) Describe a sequence of O(1) path augmentations that yields a maximum flow in Queyranne’s network. 15. An (s , t )-series-parallel graph is an directed acyclic graph with two designated vertices s (the source) and t (the target or sink) and with one of the following structures: • Base case: A single directed edge from s to t. • Series: The union of an (s, u)-series-parallel graph and a (u, t)-series-parallel graph that share a common vertex u but no other vertices or edges. • Parallel: The union of two smaller (s, t)-series-parallel graphs with the same source s and target t, but with no other vertices or edges in common. Describe an efficient algorithm to compute a maximum flow from s to t in an (s, t)-seriesparallel graph with arbitrary edge capacities. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 14 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] For a long time it puzzled me how something so expensive, so leading edge, could be so useless, and then it occurred to me that a computer is a stupid machine with the ability to do incredibly smart things, while computer programmers are smart people with the ability to do incredibly stupid things. They are, in short, a perfect match. — Bill Bryson, Notes from a Big Country (1999) 24 24.1 Applications of Maximum Flow Edge-Disjoint Paths One of the easiest applications of maximum flows is computing the maximum number of edgedisjoint paths between two specified vertices s and t in a directed graph G using maximum flows. A set of paths in G is edge-disjoint if each edge in G appears in at most one of the paths; several edge-disjoint paths may pass through the same vertex, however. If we give each edge capacity 1, then the maxflow from s to t assigns a flow of either 0 or 1 to every edge. Since any vertex of G lies on at most two saturated edges (one in and one out, or none at all), the subgraph S of saturated edges is the union of several edge-disjoint paths and cycles. Moreover, the number of paths is exactly equal to the value of the flow. Extracting the actual paths from S is easy—just follow any directed path in S from s to t, remove that path from S, and recurse. Conversely, we can transform any collection of k edge-disjoint paths into a flow by pushing one unit of flow along each path from s to t; the value of the resulting flow is exactly k. It follows that any maxflow algorithm actually computes the largest possible set of edge-disjoint paths. If we use Orlin’s algorithm to compute the maximum (s, t)-flow, we can compute edge-disjoint paths in O(V E) time, but Orlin’s algorithm is overkill for this simple application. The cut ({s}, V \ {s}) has capacity at most V − 1, so the maximum flow has value at most V − 1. Thus, Ford and Fulkerson’s original augmenting path algorithm also runs in O(| f ∗ |E) = O(V E) time. The same algorithm can also be used to find edge-disjoint paths in undirected graphs. We simply replace every undirected edge in G with a pair of directed edges, each with unit capacity, and compute a maximum flow from s to t in the resulting directed graph G 0 using the FordFulkerson algorithm. For any edge uv in G, if our max flow saturates both directed edges u v and v u in G 0 , we can remove both edges from the flow without changing its value. Thus, without loss of generality, the maximum flow assigns a direction to every saturated edge, and we can extract the edge-disjoint paths by searching the graph of directed saturated edges. 24.2 Vertex Capacities and Vertex-Disjoint Paths Suppose we have capacities on the vertices as well as the edges. Here, in addition to our other constraints, we require that for any vertex v other than s and t, the total flow into v (and therefore the total flow out of v) is at most some non-negative value c(v). How can we compute a maximum flow with these new constraints? The simplest method is to transform the input into a traditional flow network, with only edge capacities. Specifically, we replace every vertex v with two vertices vin and vout , connected by an edge vin vout with capacity c(v), and then replace every directed edge u v with the edge uout vin (keeping the same capacity). Finally, we compute the maximum flow from sout to t in in this modified flow network. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 1 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] It is now easy to compute the maximum number of vertex-disjoint paths from s to t in any directed graph. Simply give every vertex capacity 1, and compute a maximum flow! ÆÆÆ Figure! 24.3 Maximum Matchings in Bipartite Graphs Another natural application of maximum flows is finding large matchings in bipartite graphs. A matching is a subgraph in which every vertex has degree at most one, or equivalently, a collection of edges such that no two share a vertex. The problem is to find the matching with the maximum number of edges in a given bipartite graph. We can solve this problem by reducing it to a maximum flow problem as follows. Let G be the given bipartite graph with vertex set U ∪ W , such that every edge joins a vertex in U to a vertex in W . We create a new directed graph G 0 by (1) orienting each edge from U to W , (2) adding two new vertices s and t, (3) adding edges from s to every vertex in U, and (4) adding edges from each vertex in W to t. Finally, we assign every edge in G 0 a capacity of 1. Any matching M in G can be transformed into a flow f M in G 0 as follows: For each edge uw in M , push one unit of flow along the path suw t. These paths are disjoint except at s and t, so the resulting flow satisfies the capacity constraints. Moreover, the value of the resulting flow is equal to the number of edges in M . Conversely, consider any (s, t)-flow f in G 0 computed using the Ford-Fulkerson augmenting path algorithm. Because the edge capacities are integers, the Ford-Fulkerson algorithm assigns an integer flow to every edge. (This is easy to verify by induction, hint, hint.) Moreover, since each edge has unit capacity, the computed flow either saturates ( f (e) = 1) or avoids ( f (e) = 0) every edge in G 0 . Finally, since at most one unit of flow can enter any vertex in U or leave any vertex in W , the saturated edges from U to W form a matching in G. The size of this matching is exactly | f |. Thus, the size of the maximum matching in G is equal to the value of the maximum flow in 0 G , and provided we compute the maxflow using augmenting paths, we can convert the actual maxflow into a maximum matching in O(E) time. Again, we can compute the maximum flow in O(V E) time using either Orlin’s algorithm or off-the-shelf Ford-Fulkerson. s t A maximum matching in a bipartite graph G , and the corresponding maximum flow in G 0 . 24.4 Assignment Problems Maximum-cardinality matchings are a special case of a general family of so-called assignment problems.¹ An unweighted binary assignment problem involves two disjoint finite sets X and Y , ¹Most authors refer to finding a maximum-weight matching in a bipartite graph as the assignment problem. 2 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] which typically represent two different kinds of resources, such as web pages and servers, jobs and machines, rows and columns of a matrix, hospitals and interns, or customers and ice cream flavors. Our task is to choose the largest possible collection of pairs (x, y) as possible, where x ∈ X and y ∈ Y , subject to several constraints of the following form: • Each element x ∈ X can appear in at most c(x) pairs. • Each element y ∈ Y can appear in at most c( y) pairs. • Each pair (x, y) ∈ X × Y can appear in the output at most c(x, y) times. Each upper bound c(x), c( y), and c(x, y) is either a (typically small) non-negative integer or ∞. Intuitively, we create each pair in our output by assigning an element of X to an element of Y . The maximum-matching problem is a special case, where c(z) = 1 for all z ∈ X ∪ Y , and each c(x, y) is either 0 or 1, depending on whether the pair x y defines an edge in the underlying bipartite graph. Here is a slightly more interesting example. A nearby school, famous for its onerous administrative hurdles, decides to organize a dance. Every pair of students (one boy, one girl) who wants to dance must register in advance. School regulations limit each boy-girl pair to at most three dances together, and limits each student to at most ten dances overall. How can we maximize the number of dances? This is a binary assignment problem for the set X of girls and the set Y of boys, where for each girl x and boy y, we have c(x) = c( y) = 10 and either c(x, y) = 3 (if x and y registered to dance) or c(x, y) = 0 (if they didn’t register). Every binary assignment problem can be reduced to a standard maximum flow problem as follows. We construct a flow network G = (V, E) with vertices X ∪ Y ∪ {s, t} and the following edges: • an edge s x with capacity c(x) for each x ∈ X , • an edge y t with capacity c( y) for each y ∈ Y . • an edge x y with capacity c(x, y) for each x ∈ X and y ∈ Y , and Because all the edges have integer capacities, the any augmenting-path algorithm constructs an integer maximum flow f ∗ , which can be decomposed into the sum of | f ∗ | paths of the form s x y t for some x ∈ X and y ∈ Y . For each such path, we report the pair (x, y). (Thus, the pair (x, y) appears in our output collection exactly f (x y) times. It is easy to verify (hint, hint) that this collection of pairs satisfies all the necessary constraints. Conversely, any legal collection of r pairs can be transformed into a feasible integer flow in G with value r. Thus, the largest legal collection of pairs corresponds to a maximum flow in G. So our algorithm is correct. If we use Orlin’s algorithm to compute the maximum flow, this assignment algorithm runs in O(V E) = O(n 3 ) time, where n = |X | + |Y |. 24.5 Baseball Elimination Every year millions of baseball fans eagerly watch their favorite team, hoping they will win a spot in the playoffs, and ultimately the World Series. Sadly, most teams are “mathematically eliminated" days or even weeks before the regular season ends. Often, it is easy to spot when a team is eliminated—they can’t win enough games to catch up to the current leader in their division. But sometimes the situation is more subtle. For example, here are the actual standings from the American League East on August 30, 1996. 3 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] Team New York Yankees Baltimore Orioles Boston Red Sox Toronto Blue Jays Detroit Tigers Won–Lost Left 75–59 71–63 69–66 63–72 49–86 28 28 27 27 27 NYY 3 8 7 3 BAL BOS TOR DET 3 8 2 7 7 0 3 4 0 0 2 7 4 0 0 0 Detroit is clearly behind, but some die-hard Tigers fans may hold out hope that their team can still win. After all, if Detroit wins all 27 of their remaining games, they will end the season with 76 wins, more than any other team has now. So as long as every other team loses every game. . . but that’s not possible, because some of those other teams still have to play each other. Here is one complete argument:² By winning all of their remaining games, Detroit can finish the season with a record of 76 and 86. If the Yankees win just 2 more games, then they will finish the season with a 77 and 85 record which would put them ahead of Detroit. So, let’s suppose the Tigers go undefeated for the rest of the season and the Yankees fail to win another game. The problem with this scenario is that New York still has 8 games left with Boston. If the Red Sox win all of these games, they will end the season with at least 77 wins putting them ahead of the Tigers. Thus, the only way for Detroit to even have a chance of finishing in first place, is for New York to win exactly one of the 8 games with Boston and lose all their other games. Meanwhile, the Sox must loss all the games they play agains teams other than New York. This puts them in a 3-way tie for first place. . . . Now let’s look at what happens to the Orioles and Blue Jays in our scenario. Baltimore has 2 games left with with Boston and 3 with New York. So, if everything happens as described above, the Orioles will finish with at least 76 wins. So, Detroit can catch Baltimore only if the Orioles lose all their games to teams other than New York and Boston. In particular, this means that Baltimore must lose all 7 of its remaining games with Toronto. The Blue Jays also have 7 games left with the Yankees and we have already seen that for Detroit to finish in first place, Toronto must will all of these games. But if that happens, the Blue Jays will win at least 14 more games giving them at final record of 77 and 85 or better which means they will finish ahead of the Tigers. So, no matter what happens from this point in the season on, Detroit can not finish in first place in the American League East. There has to be a better way to figure this out! Here is a more abstract formulation of the problem. Our input consists of two arrays W [1 .. n] and G[1 .. n, 1 .. n], where W [i] is the number of games team i has already won, and G[i, j] is the number of upcoming games between teams i and j. We want to determine whether team n can end the season with the most wins (possibly tied with other teams).³ In the mid-1960s, Benjamin Schwartz showed that this question can be modeled as an assignment problem: We want to assign a winner to Peach game, so that team n comes in first place. We have an assignment problem! Let R[i] = j G[i, j] denote the number of remaining games for team i. We will assume that team n wins all R[n] of its remaining games. Then team n can come in first place if and only if every other team i wins at most W [n] + R[n] − W [i] of its R[i] remaining games. Since we want to assign winning teams to games, we start by building a bipartite graph, n whose nodes represent the games and the teams. We have 2 game nodes g i, j , one for each pair 1 ≤ i < j < n, and n − 1 team nodes t i , one for each 1 ≤ i < n. For each pair i, j, we add edges g i, j t i and g i, j t j with infinite capacity. We add a source vertex s and edges s g i, j with capacity G[i, j] for each pair i, j. Finally, we add a target node t and edges t i t with capacity W [n] − W [i] + R[n] for each team i. ²Both the example and this argument are taken from http://riot.ieor.berkeley.edu/~baseball/detroit.html. ³We assume here that no games end in a tie (always true for Major League Baseball), and that every game is actually played (not always true). 4 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] Theorem. Team n can end the season in first place if and only if there is a feasible flow in this graph that saturates every edge leaving s. Proof: Suppose it is possible for team n to end the season in first place. Then every team i < n wins at most W [n] + R[n] − W [i] of the remaining games. For each game between team i and team j that team i wins, add one unit of flow along the path s g i, j t i t. Because there are exactly G[i, j] games between teams i and j, every edge leaving s is saturated. Because each team i wins at most W [n] + R[n] − W [i] games, the resulting flow is feasible. Conversely, Let f be a feasible flow that saturates every edge out of s. Suppose team i wins exactly f (g i, j t i ) games against team j, for all i and j. Then teams i and j play f (g i, j t i )+ f (g i, j t j ) = f (sP g i, j ) = G[i, j] games, so every upcoming game is played. Moreover, each team i wins a total of j f (g i, j t i ) = f (t i t) ≤ W [n] + R[n] − W [i] upcoming games, and therefore at most W [n] + R[n] games overall. Thus, if team n win all their upcoming games, they end the season in first place. So, to decide whether our favorite team can win, we construct the flow network, compute a maximum flow, and report whether than maximum flow saturates the edges leaving s. The flow network has O(n2 ) vertices and O(n2 ) edges, and it can be constructed in O(n2 ) time. Using Orlin’s algorithm, we can compute the maximum flow in O(V E) = O(n 4 ) time. The graph derived from the 1996 American League East standings is shown below. The total capacity of the edges leaving s is 27 (there are 27 remaining games), but the total capacity of the edges entering t is only 26. So the maximum flow has value at most 26, which means that Detroit is mathematically eliminated. NYY BAL 3 NYY NYY BOS BAL 8 s 7 5 NYY TOR t 7 2 7 1 BOS BAL BOS BAL TOR 13 TOR The flow graph for the 1996 American League East standings. Unlabeled edges have infinite capacity. More recently, Kevin Wayne⁴ proved that one can determine all the teams that are mathematically eliminated in only O(n 3 ) time, essentially using a single maximum-flow computation. 24.6 Project Selection In our final example, suppose we are given a set of n projects that we could possibly perform; for simplicity, we identify each project by an integer between 1 and n. Some projects cannot be started until certain other projects are completed. This set of dependencies is described by a directed acyclic graph, where an edge i j indicates that project i depends on project j. Finally, each project i has an associated profit pi which is given to us if the project is completed; however, some projects have negative profits, which we interpret as positive costs. We can choose to finish ⁴Kevin D. Wayne. A new property and a faster algorithm for baseball elimination. SIAM J. Discrete Math 14(2):223–229, 2001. 5 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] any subset X of the projects that includes all its dependents; that is, for every project x ∈ X , every project that x depends on is also in X . Our goal is to find a valid subset of the projects whose total profit is as large as possible. In particular, if all of the jobs have negative profit, the correct answer is to do nothing. -2 -3 -5 -8 4 6 2 3 A dependency graph for a set of projects. Circles represent profitable projects; squares represent costly projects. At a high level, our task to partition the projects into two subsets S and T , the jobs we Select and the jobs we Turn down. So intuitively, we’d like to model our problem as a minimum cut problem in a certain graph. But in which graph? How do we enforce prerequisites? We want to maximize profit, but we only know how to find minimum cuts. And how do we convert negative profits into positive capacities? We define a new graph G by adding a source vertex s and a target vertex t to the dependency graph, with an edge s j for every profitable job (with p j > 0), and an edge i t for every costly job (with pi < 0). Intuitively, we can think of s as a new job (“To the bank!”) with profit/cost 0 that we must perform last. We assign edge capacities as follows: • c(s j) = p j for every profitable job j; • c(i t) = −pi for every costly job i; • c(i j) = ∞ for every dependency edge i j. All edge-capacities are positive, so this is a legal input to the maximum cut problem. Now consider an (s, t)-cut (S, T ) in G. If the capacity kS, T k is finite, then for every dependency edge i j, projects i and j are on the same side of the cut, which implies that S is a valid solution. Moreover, we claim that selecting the jobs in S earns us a total profit of C − kS, T k, where C is the sum of all the positive profits. This claim immediately implies that we can maximize our total profit by computing a minimum cut in G. t 3 2 ∞ -2 ∞ 5 -3 -5 ∞ ∞ 4 -8 ∞ ∞ 6 4 8 2 6 ∞ 2 ∞ 3 3 s The flow network for the example dependency graph, along with its minimum cut. The cut has capacity 13 and C = 15, so the total profit for the selected jobs is 2. 6 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] We prove our key claim as follows. For any subset A of projects, we define three functions: X X cost(A) := −pi = c(i t) i∈A: pi <0 X benefit(A) := pj = j∈A: pi >0 profit(A) := X i∈A X j∈A c(s j) pi = benefit(A) − cost(A). i∈A By definition, C = benefit(S) + benefit(T ). Because the cut (S, T ) has finite capacity, only edges of the form s j and i t can cross the cut. By construction, every edge s j points to a profitable job and each edge i t points from a costly job. Thus, kS, T k = cost(S) + benefit(T ). We immediately conclude that C − kS, T k = benefit(S) − cost(S) = profit(S), as claimed. Exercises 1. Given an undirected graph G = (V, E), with three vertices u, v, and w, describe and analyze an algorithm to determine whether there is a path from u to w that passes through v. 2. Let G = (V, E) be a directed graph where for each vertex v, the in-degree and out-degree of v are equal. Let u and v be two vertices G, and suppose G contains k edge-disjoint paths from u to v. Under these conditions, must G also contain k edge-disjoint paths from v to u? Give a proof or a counterexample with explanation. 3. Consider a directed graph G = (V, E) with multiple source vertices s1 , s2 , . . . , sσ and multiple target vertices t 1 , t 1 , . . . , t τ , where no vertex is both a source and a target. A multiterminal flow is a function f : E → R≥0 that satisfies the flow conservation constraint at every vertex that is neither a source nor a target. The value | f | of a multiterminal flow is the total excess flow out of all the source vertices: σ X X X | f | := i=1 w f (si w) − u f (usi ) As usual, we are interested in finding flows with maximum value, subject to capacity constraints on the edges. (In particular, we don’t care how much flow moves from any particular source to any particular target.) (a) Consider the following algorithm for computing multiterminal flows. The variables f and f 0 represent flow functions. The subroutine MaxFlow(G, s, t) solves the standard maximum flow problem with source s and target t. MaxMultiFlow(G, s[1 .. σ], t[1 .. τ]): f ←0 〈〈Initialize the flow〉〉 for i ← 1 to σ for j ← 1 to τ f 0 ← MaxFlow(G f , s[i], t[ j]) f ← f + f0 〈〈Update the flow〉〉 return f 7 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] Prove that this algorithm correctly computes a maximum multiterminal flow in G. (b) Describe a more efficient algorithm to compute a maximum multiterminal flow in G. 4. A cycle cover of a given directed graph G = (V, E) is a set of vertex-disjoint cycles that cover all the vertices. Describe and analyze an efficient algorithm to find a cycle cover for a given graph, or correctly report that no cycle cover exists. [Hint: Use bipartite matching!] 5. The Island of Sodor is home to a large number of towns and villages, connected by an extensive rail network. Recently, several cases of a deadly contagious disease (either swine flu or zombies; reports are unclear) have been reported in the village of Ffarquhar. The controller of the Sodor railway plans to close down certain railway stations to prevent the disease from spreading to Tidmouth, his home town. No trains can pass through a closed station. To minimize expense (and public notice), he wants to close down as few stations as possible. However, he cannot close the Ffarquhar station, because that would expose him to the disease, and he cannot close the Tidmouth station, because then he couldn’t visit his favorite pub. Describe and analyze an algorithm to find the minimum number of stations that must be closed to block all rail travel from Ffarquhar to Tidmouth. The Sodor rail network is represented by an undirected graph, with a vertex for each station and an edge for each rail connection between two stations. Two special vertices f and t represent the stations in Ffarquhar and Tidmouth. For example, given the following input graph, your algorithm should return the number 2. f t 6. The UIUC Computer Science Department is installing a mini-golf course in the basement of the Siebel Center! The playing field is a closed polygon bounded by m horizontal and vertical line segments, meeting at right angles. The course has n starting points and n holes, in one-to-one correspondence. It is always possible hit the ball along a straight line directly from each starting point to the corresponding hole, without touching the boundary of the playing field. (Players are not allowed to bounce golf balls off the walls; too much glass.) The n starting points and n holes are all at distinct locations. A minigolf course with five starting points (?) and five holes (◦), and a legal correspondence between them. 8 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] Sadly, the architect’s computer crashed just as construction was about to begin. Thanks to the herculean efforts of their sysadmins, they were able to recover the locations of the starting points and the holes, but all information about which starting points correspond to which holes was lost! Describe and analyze an algorithm to compute a one-to-one correspondence between the starting points and the holes that meets the straight-line requirement, or to report that no such correspondence exists. The input consists of the x- and y-coordinates of the m corners of the playing field, the n starting points, and the n holes. Assume you can determine in constant time whether two line segments intersect, given the x- and y-coordinates of their endpoints. 7. Suppose you are given an n × n checkerboard with some of the squares deleted. You have a large set of dominos, just the right size to cover two squares of the checkerboard. Describe and analyze an algorithm to determine whether one tile the board with dominos—each domino must cover exactly two undeleted squares, and each undeleted square must be covered by exactly one domino. Your input is a two-dimensional array Deleted[1 .. n, 1 .. n] of bits, where Deleted[i, j] = True if and only if the square in row i and column j has been deleted. Your output is a single bit; you do not have to compute the actual placement of dominos. For example, for the board shown above, your algorithm should return True. 8. Suppose we are given an n × n square grid, some of whose squares are colored black and the rest white. Describe and analyze an algorithm to determine whether tokens can be placed on the grid so that • every token is on a white square; • every row of the grid contains exactly one token; and • every column of the grid contains exactly one token. 9 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] Your input is a two dimensional array IsWhite[1 .. n, 1 .. n] of booleans, indicating which squares are white. Your output is a single boolean. For example, given the grid above as input, your algorithm should return True. 9. An n × n grid is an undirected graph with n2 vertices organized into n rows and n columns. We denote the vertex in the ith row and the jth column by (i, j). Every vertex in the grid have exactly four neighbors, except for the boundary vertices, which are the vertices (i, j) such that i = 1, i = n, j = 1, or j = n. Let (x 1 , y1 ), (x 2 , y2 ), . . . , (x m , ym ) be distinct vertices, called terminals, in the n × n grid. The escape problem is to determine whether there are m vertex-disjoint paths in the grid that connect the terminals to any m distinct boundary vertices. Describe and analyze an efficient algorithm to solve the escape problem. A positive instance of the escape problem, and its solution. 10. The UIUC Faculty Senate has decided to convene a committee to determine whether Chief Illiniwek should become the official mascot symbol of the University of Illinois Global Campus.⁵ Exactly one faculty member must be chosen from each academic department to serve on this committee. Some faculty members have appointments in multiple departments, but each committee member will represent only one department. For example, if Prof. Blagojevich is affiliated with both the Department of Corruption and the Department of Stupidity, and he is chosen as the Stupidity representative, then someone else must represent Corruption. Finally, University policy requires that any committee on virtual mascots symbols must contain the same number of assistant professors, associate professors, and full professors. Fortunately, the number of departments is a multiple of 3. Describe an efficient algorithm to select the membership of the Global Illiniwek Committee. Your input is a list of all UIUC faculty members, their ranks (assistant, associate, or full), and their departmental affiliation(s). There are n faculty members and 3k departments. 11. You’re organizing the First Annual UIUC Computer Science 72-Hour Dance Exchange, to be held all day Friday, Saturday, and Sunday. Several 30-minute sets of music will be played during the event, and a large number of DJs have applied to perform. You need to hire DJs according to the following constraints. • Exactly k sets of music must be played each day, and thus 3k sets altogether. ⁵Thankfully, the Global Campus has faded into well-deserved obscurity, thanks in part to the 2009 admissions scandal. Imagine MOOCs, but with the same business model and faculty oversight as the University of Phoenix. 10 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] • Each set must be played by a single DJ in a consistent music genre (ambient, bubblegum, dubstep, horrorcore, hyphy, trip-hop, Nitzhonot, Kwaito, J-pop, Nashville country, . . . ). • Each genre must be played at most once per day. • Each candidate DJ has given you a list of genres they are willing to play. • Each DJ can play at most three sets during the entire event. Suppose there are n candidate DJs and g different musical genres available. Describe and analyze an efficient algorithm that either assigns a DJ and a genre to each of the 3k sets, or correctly reports that no such assignment is possible. 12. The University of Southern North Dakota at Hoople has hired you to write an algorithm to schedule their final exams. Each semester, USNDH offers n different classes. There are r different rooms on campus and t different time slots in which exams can be offered. You are given two arrays E[1 .. n] and S[1 .. r], where E[i] is the number of students enrolled in the ith class, and S[ j] is the number of seats in the jth room. At most one final exam can be held in each room during each time slot. Class i can hold its final exam in room j only if E[i] < S[ j]. Describe and analyze an efficient algorithm to assign a room and a time slot to each class (or report correctly that no such assignment is possible). 13. Suppose you are running a web site that is visited by the same set of people every day. Each visitor claims membership in one or more demographic groups; for example, a visitor might describe himself as male, 40–50 years old, a father, a resident of Illinois, an academic, a blogger, and a fan of Joss Whedon.⁶ Your site is supported by advertisers. Each advertiser has told you which demographic groups should see its ads and how many of its ads you must show each day. Altogether, there are n visitors, k demographic groups, and m advertisers. Describe an efficient algorithm to determine, given all the data described in the previous paragraph, whether you can show each visitor exactly one ad per day, so that every advertiser has its desired number of ads displayed, and every ad is seen by someone in an appropriate demographic group. 14. Suppose we are given an array A[1 .. m][1 .. n] of non-negative real numbers. We want to round A to an integer matrix, by replacing each entry x in A with either bxc or dxe, without changing the sum of entries in any row or column of A. For example: 1.2 3.4 2.4 1 4 2 3.9 4.0 2.1 7−→ 4 4 2 7.9 1.6 0.5 8 1 1 (a) Describe and analyze an efficient algorithm that either rounds A in this fashion, or reports correctly that no such rounding is possible. ? (b) Suppose we are guaranteed that none of the entries in the input matrix A are integers. Describe and analyze an even faster algorithm that either rounds A or reports correctly ⁶Har har har! Mine is an evil laugh! Now die! 11 Algorithms Lecture 24: Applications of Maximum Flow [Fa’13] that no such rounding is possible. For full credit, your algorithm must run in O(mn) time. [Hint: Don’t use flows.] 15. Ad-hoc networks are made up of low-powered wireless devices. In principle⁷, these networks can be used on battlefields, in regions that have recently suffered from natural disasters, and in other hard-to-reach areas. The idea is that a large collection of cheap, simple devices could be distributed through the area of interest (for example, by dropping them from an airplane); the devices would then automatically configure themselves into a functioning wireless network. These devices can communicate only within a limited range. We assume all the devices are identical; there is a distance D such that two devices can communicate if and only if the distance between them is at most D. We would like our ad-hoc network to be reliable, but because the devices are cheap and low-powered, they frequently fail. If a device detects that it is likely to fail, it should transmit its information to some other backup device within its communication range. We require each device x to have k potential backup devices, all within distance D of x; we call these k devices the backup set of x. Also, we do not want any device to be in the backup set of too many other devices; otherwise, a single failure might affect a large fraction of the network. So suppose we are given the communication radius D, parameters b and k, and an array d[1 .. n, 1 .. n] of distances, where d[i, j] is the distance between device i and device j. Describe an algorithm that either computes a backup set of size k for each of the n devices, such that no device appears in more than b backup sets, or reports (correctly) that no good collection of backup sets exists. ? 16. A rooted tree is a directed acyclic graph, in which every vertex has exactly one incoming edge, except for the root, which has no incoming edges. Equivalently, a rooted tree consists of a root vertex, which has edges pointing to the roots of zero or more smaller rooted trees. Describe a polynomial-time algorithm to compute, given two rooted trees A and B, the largest common rooted subtree of A and B. [Hint: Let LCS(u, v) denote the largest common subtree whose root in A is u and whose root in B is v. Your algorithm should compute LCS(u, v) for all vertices u and v using dynamic programming. This would be easy if every vertex had O(1) children, and still straightforward if the children of each node were ordered from left to right and the common subtree had to respect that ordering. But for unordered trees with large degree, you need another trick to combine recursive subproblems efficiently. Don’t waste your time trying to reduce the polynomial running time.] ⁷but not really in practice © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 12 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] “Who are you?" said Lunkwill, rising angrily from his seat. “What do you want?" “I am Majikthise!" announced the older one. “And I demand that I am Vroomfondel!" shouted the younger one. Majikthise turned on Vroomfondel. “It’s alright," he explained angrily, “you don’t need to demand that." “Alright!" bawled Vroomfondel banging on an nearby desk. “I am Vroomfondel, and that is not a demand, that is a solid fact! What we demand is solid facts!" “No we don’t!" exclaimed Majikthise in irritation. “That is precisely what we don’t demand!" Scarcely pausing for breath, Vroomfondel shouted, “We don’t demand solid facts! What we demand is a total absence of solid facts. I demand that I may or may not be Vroomfondel!" — Douglas Adams, The Hitchhiker’s Guide to the Galaxy (1979) ? 25 Extensions of Maximum Flow 25.1 Maximum Flows with Edge Demands Now suppose each directed edge e in has both a capacity c(e) and a demand d(e) ≤ c(e), and we want a flow f of maximum value that satisfies d(e) ≤ f (e) ≤ c(e) at every edge e. We call a flow that satisfies these constraints a feasible flow. In our original setting, where d(e) = 0 for every edge e, the zero flow is feasible; however, in this more general setting, even determining whether a feasible flow exists is a nontrivial task. Perhaps the easiest way to find a feasible flow (or determine that none exists) is to reduce the problem to a standard maximum flow problem, as follows. The input consists of a directed graph G = (V, E), nodes s and t, demand function d : E → R, and capacity function c : E → R. Let D denote the sum of all edge demands in G: X D := d(u v). u v∈E We construct a new graph G 0 = (V 0 , E 0 ) from G by adding new source and target vertices s0 and t 0 , adding edges from s0 to each vertex in V , adding edges from each vertex in V to t 0 , and finally adding an edge from t to s. We also define a new capacity function c 0 : E 0 → R as follows: • For each vertex v ∈ V , we set c 0 (s0 v) = P u∈V d(u v) and c 0 (v t 0 ) = P • For each edge u v ∈ E, we set c 0 (u v) = c(u v) − d(u v). w∈V d(v w). • Finally, we set c 0 (t s) = ∞. Intuitively, we construct G 0 by replacing any edge u v in G with three edges: an edge u v with capacity c(u v) − d(u v), an edge s0 v with capacity d(u v), and an edge u t 0 with capacity d(u v). If this construction produces multiple edges from s0 to the same vertex v (or to t 0 from the same vertex v), we merge them into a single edge with the same total capacity. In G 0 , the total capacity out of s0 and the total capacity into t 0 are both equal to D. We call a flow with value exactly D a saturating flow, since it saturates all the edges leaving s0 or entering t 0 . If G 0 has a saturating flow, it must be a maximum flow, so we can find it using any max-flow algorithm. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 1 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] 0..5 6..20 2..15 0..15 s 5..10 5..10 4..10 t 7..20 3..10 8 11 4 5 5 7 3 13 s s' 13 5 14 6 10 t 6 t' 14 7 10 3 3 ∞ A flow network G with demands and capacities (written d .. c ), and the transformed network G 0 . Lemma 1. G has a feasible (s, t)-flow if and only if G 0 has a saturating (s0 , t 0 )-flow. Proof: Let f : E → R be a feasible (s, t)-flow in the original graph G. Consider the following function f 0 : E 0 → R: f 0 (u v) = f (u v) − d(u v) X f 0 (s0 v) = d(u v) for all u v ∈ E for all v ∈ V u∈V f 0 (v t 0 ) = X d(u → w) for all v ∈ V w∈V f 0 (t s) = | f | We easily verify that f 0 is a saturating (s0 , t 0 )-flow in G. The admissibility of f implies that f (e) ≥ d(e) for every edge e ∈ E, so f 0 (e) ≥ 0 everywhere. Admissibility also implies f (e) ≤ c(e) for every edge e ∈ E, so f 0 (e) ≤ c 0 (e) everywhere. Tedious algebra implies that X X f 0 (u v) = f (v w) u∈V 0 w∈V 0 for every vertex v ∈ V (including s and t). Thus, f 0 is a legal (s0 , t 0 )-flow, and every edge out of s0 or into t 0 is clearly saturated. Intuitively, f 0 diverts d(u v) units of flow from u directly to the new target t 0 , and injects the same amount of flow into v directly from the new source s0 . The same tedious algebra implies that for any saturating (s0 , t 0 )-flow f 0 : E 0 → R for G 0 , the function f = f 0 | E + d is a feasible (s, t)-flow in G. 2 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] Thus, we can compute a feasible (s, t)-flow for G, if one exists, by searching for a maximum (s , t 0 )-flow in G 0 and checking that it is saturating. Once we’ve found a feasible (s, t)-flow in G, we can transform it into a maximum flow using an augmenting-path algorithm, but with one small change. To ensure that every flow we consider is feasible, we must redefine the residual capacity of an edge as follows: c(u v) − f (u v) if u v ∈ E, c f (u v) = f (v u) − d(v u) if v u ∈ E, 0 otherwise. 0 Otherwise, the algorithm is unchanged. If we use the Dinitz/Edmonds-Karp fat-pipe algorithm, we get an overall running time of O(V E 2 ). 8/8 11/11 4/4 5/5 7/7 2/5 3/3 0/13 s s' 3/13 0/5 0/14 t 0/6 0/6 t' 0/14 7/7 10/10 10/10 3/3 3/3 11/∞ 3 2/0..5 7/7..20 2 5/2..15 1/1..15 s 5/5..10 4/4..10 4/4..10 3 13 s t 5 10 14 6 6/6..20 t 6 7 14 10/3..10 A saturating flow f 0 in G 0 , the corresponding feasible flow f in G , and the corresponding residual network G f . 25.2 Node Supplies and Demands Another useful variant to consider allows flow to be injected or extracted from the flow network at vertices other than s or t. Let x : (V \ {s, t}) → R be an excess function describing how much flow is to be injected (or extracted if the value is negative) at each vertex. We now want a maximum ‘flow’ that satisfies the variant balance condition X X f (u v) − f (v w) = x(v) u∈V w∈V for every node v except s and t, or prove that no such flow exists. As above, call such a function f a feasible flow. 3 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] As for flows with edge demands, the only real difficulty in finding a maximum flow under these modified constraints is finding a feasible flow (if one exists). We can reduce this problem to a standard max-flow problem, just as we did for edge demands. To simplify the transformation, let us assume without loss of generality that the total excess P in the network is zero: v x(v) = 0. If the total excess is positive, P we add an infinite capacity edge t ˜t , where ˜t is a new target node, and set x(t) = − v x(v). Similarly, if the total excess is negative, we add an infinite capacity edge ˜ss, where ˜s is a new source node, and P set x(s) = − v x(v). In both cases, every feasible flow in the modified graph corresponds to a feasible flow in the original graph. As before, we modify G to obtain a new graph G 0 by adding a new source s0 , a new target t 0 , an infinite-capacity edge t s from the old target to the old source, and several edges from s0 and to t 0 . Specifically, for each vertex v, if x(v) > 0, we add a new edge s0 v with capacity x(v), and if x(v) < 0, we add an edge v t 0 with capacity −x(v). As before, we call an (s0 , t 0 )-flow in G 0 saturating if every edge leaving s0 or entering t 0 is saturated; any saturating flow is a maximum flow. It is easy to check that saturating flows in G 0 are in direct correspondence with feasible flows in G; we leave details as an exercise (hint, hint). Similar reductions allow us to several other variants of the maximum flow problem using the same path-augmentation techniques. For example, we could associate capacities and demands with the vertices instead of (or in addition to) the edges, as well as a range of excesses with every vertex, instead of a single excess value. 25.3 Minimum-Cost Flows Now imagine that each edge e in the network has both a capacity c(e) and a cost $(e). The cost function describes the cost of sending a unit of flow through the edges; thus, the cost any flow f is defined as follows: X $( f ) = $(e) · f (e). e∈E The minimum-cost maximum-flow problem is to compute a maximum flow of minimum cost. If the network has only one maximum flow, that’s what we want, but if there is more than one maximum flow, we want the maximum flow whose cost is as small as possible. Costs can either be positive, negative, or zero. However, if an edge u v and its reversal v u both appear in the graph, their costs must sum to zero: $(u v) = −$(v u). Otherwise, we could make an infinite profit by pushing flow back and forth along the edge! Each augmentation step in the standard Ford-Fulkerson algorithm both increases the value of the flow and changes its cost. If the total cost of the augmenting path is positive, the cost of the flow decreases; conversely, if the total cost of the augmenting path is negative, the cost of the flow decreases. We can also change the cost of the flow without changing its value, by augmenting along a directed cycle in the residual graph. Again, augmenting along a negative-cost cycle decreases the cost of the flow, and augmenting along a positive-cost cycle increases the cost of the flow. It follows immediately that a flow f is a minimum-cost maximum flow in G if and only if the residual graph G f has no directed paths from s to t and no negative-cost cycles. We can compute a min-cost max-flow using the so-called cycle cancelling algorithm first proposed by Morton Klein in 1967. The algorithm has two phases; in the first, we compute an arbitrary maximum flow f , using any method we like. The second phase repeatedly decreases the cost of f , by augmenting f along a negative-cost cycle in the residual graph G f , until no such 4 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] cycle exists. As in Ford-Fulkerson, the amount of flow we push around each cycle is equal to the minimum residual capacity of any edge on the cycle. In each iteration of the second phase, we can use a modification of Shimbel’s shortest path algorithm (often called “Bellman-Ford”) to find a negative-cost cycle in O(V E) time. To bound the number of iterations in the second phase, we assume that both the capacity and the cost of each edge is an integer, and we define C = max c(e) e∈E and D = max |$(e)|. e∈E The cost of any feasible flow is clearly between −EC D and EC D, and each augmentation step decreases the cost of the flow by a positive integer, and therefore by at least 1. We conclude that the second phase requires at most 2EC D iterations, and therefore runs in O(V E 2 C D) time. As with the raw Ford-Fulkerson algorithm, this running time is exponential in the complexity of the input, and it may never terminate if the capacities and/or costs are irrational. Like Ford-Fulkerson, more careful choices of which cycle to cancel can lead to more efficient algorithms. Unfortunately, some obvious choices are NP-hard to compute, including the cycle with most negative cost and the negative cycle with the fewest edges. In the late 1980s, Andrew Goldberg and Bob Tarjan developed a min-cost flow algorithm that repeatedly cancels the so-called minimum-mean cycle, which is the cycle whose average cost per edge is smallest. By combining an algorithm of Karp to compute minimum-mean cycles in O(EV ) time, efficient dynamic tree data structures, and other sophisticated techniques that are (unfortunately) beyond the scope of this class, their algorithm achieves a running time of O(E 2 V log2 V). The fastest min-cost max-flow algorithm currently known,¹ due to James Orlin, reduces the problem to O(E log V ) iterations of Dijkstra’s shortest-path algorithm; Orlin’s algorithm runs in O(E 2 log V + E V log2 V) time. 25.4 Maximum-Weight Matchings Recall from the previous lecture that we can find a maximum-cardinality matching in any bipartite graph in O(V E) time by reduction to the standard maximum flow problem. Now suppose the input graph has weighted edges, and we want to find the matching with maximum total weight. Given a bipartite graph G = (U ×W, E) and a non-negativePweight function w: E → R, the goal is to compute a matching M whose total weight w(M ) = uw∈M w(uw) is as large as possible. Max-weight matchings can’t be found directly using standard max-flow algorithms², but we can modify the algorithm for maximum-cardinality matchings described above. It will be helpful to reinterpret the behavior of our earlier algorithm directly in terms of the original bipartite graph instead of the derived flow network. Our algorithm maintains a matching M , which is initially empty. We say that a vertex is matched if it is an endpoint of an edge in M . At each iteration, we find an alternating path π that starts and ends at unmatched vertices and alternates between edges in E \ M and edges in M . Equivalently, let G M be the directed graph obtained by orienting every edge in M from W to U, and every edge in E \ M from U to W . An alternating path is just a directed path in G M between two unmatched vertices. Any alternating path has odd length and has exactly one more edge in E \ M than in M . The iteration ends by setting M ← M ⊕ π, thereby increasing the number of edges in M by one. The max-flow/min-cut theorem implies that when there are no more alternating paths, M is a maximum matching. ¹at least, among algorithms whose running times do not depend on C and D ²However, max-flow algorithms can be modified to P compute maximum weighted flows, where every edge has both a capacity and a weight, and the goal is to maximize u v w(u v) f (u v). 5 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] A matching M with 5 edges, an alternating path π, and the augmented matching M ⊕ π with 6 edges. If the edges of G are weighted, we only need to make two changes to the algorithm. First, instead of looking for an arbitrary alternating path at each iteration, we look for the alternating path π such that M ⊕ π has largest weight. Suppose we weight the edges in the residual graph G M as follows: w0 (uw) = −w(uw) 0 w (wu) = w(uw) for all uw 6∈ M for all uw ∈ M We now have w(M ⊕ π) = w(M ) − w0 (π). Thus, the correct augmenting path π must be the directed path in G M with minimum total residual weight w 0 (π). Second, because the matching with the maximum weight may not be the matching with the maximum cardinality, we return the heaviest matching considered in any iteration of the algorithm. 2 3 3 10 5 2 10 3 5 3 A maximum-weight matching is not necessarily a maximum-cardinality matching. Before we determine the running time of the algorithm, we need to check that it actually finds the maximum-weight matching. After all, it’s a greedy algorithm, and greedy algorithms don’t work unless you prove them into submission! Let Mi denote the maximum-weight matching in G with exactly i edges. In particular, M0 = ∅, and the global maximum-weight matching is equal to Mi for some i. (The figure above show M1 and M2 for the same graph.) Let Gi denote the directed residual graph for Mi , let w i denote the residual weight function for Mi as defined above, and let πi denote the directed path in Gi such that w i (πi ) is minimized. To simplify the proof, I will assume that there is a unique maximum-weight matching Mi of any particular size; this assumption can be enforced by applying a consistent tie-breaking rule. With this assumption in place, the correctness of our algorithm follows inductively from the following lemma. Lemma 2. If G contains a matching with i + 1 edges, then Mi+1 = Mi ⊕ πi . Proof: I will prove the equivalent statement Mi+1 ⊕ Mi = πi−1 . To simplify notation, call an edge in Mi+1 ⊕ Mi red if it is an edge in Mi+1 , and blue if it is an edge in Mi . The graph Mi+1 ⊕ Mi has maximum degree 2, and therefore consists of pairwise disjoint paths and cycles, each of which alternates between red and blue edges. Since G is bipartite, every cycle 6 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] must have even length. The number of edges in Mi+1 ⊕ Mi is odd; specifically, Mi+1 ⊕ Mi has 2i + 1 − 2k edges, where k is the number of edges that are in both matchings. Thus, Mi+1 ⊕ Mi contains an odd number of paths of odd length, some number of paths of even length, and some number of cycles of even length. Let γ be a cycle in Mi+1 ⊕ Mi . Because γ has an equal number of edges from each matching, Mi ⊕γ is another matching with i edges. The total weight of this matching is exactly w(Mi ) − w i (γ), which must be less than w(Mi ), so w i (γ) must be positive. On the other hand, Mi+1 ⊕ γ is a matching with i + 1 edges whose total weight is w(Mi+1 ) + w i (γ) < w(Mi+1 ), so w i (γ) must be negative! We conclude that no such cycle γ exists; Mi+1 ⊕ Mi consists entirely of disjoint paths. Exactly the same reasoning implies that no path in Mi+1 ⊕ Mi has an even number of edges. Finally, since the number of red edges in Mi+1 ⊕ Mi is one more than the number of blue edges, the number of paths that start with a red edge is exactly one more than the number of paths that start with a blue edge. The same reasoning as above implies that Mi+1 ⊕ Mi does not contain a blue-first path, because we can pair it up with a red-first path. We conclude that Mi+1 ⊕ Mi consists of a single alternating path π whose first edge is red. Since w(Mi+1 ) = w(Mi ) − w i (π), the path π must be the one with minimum weight w i (π). We can find the alternating path πi using a single-source shortest path algorithm. Modify the residual graph Gi by adding zero-weight edges from a new source vertex s to every unmatched node in U, and from every unmatched node in W to a new target vertex t, exactly as in out unweighted matching algorithm. Then πi is the shortest path from s to t in this modified graph. Since Mi is the maximum-weight matching with i vertices, Gi has no negative cycles, so this shortest path is well-defined. We can compute the shortest path in Gi in O(V E) time using Shimbel’s algorithm, so the overall running time our algorithm is O(V 2 E). The residual graph Gi has negative-weight edges, so we can’t speed up the algorithm by replacing Shimbel’s algorithm with Dijkstra’s. However, we can use a variant of Johnson’s all-pairs shortest path algorithm to improve the running time to O(V E + V 2 log V). Let di (v) denote the ˜ i denote distance from s to v in the residual graph Gi , using the distance function w i . Let w ˜ i (u v) = di−1 (u) + w i (u v) − di−1 (v). As we argued in the the modified distance function w discussion of Johnson’s algorithm, shortest paths with respect to w i are still shortest paths with ˜ i . Moreover, w ˜ i (u v) > 0 for every edge u v in Gi : respect to w • If u v is an edge in Gi−1 , then w i (u v) = w i−1 (u v) and di−1 (v) ≤ di−1 (u) + w i−1 (u v). • If u v is not in Gi−1 , then w i (u v) = −w i−1 (v u) and v u is an edge in the shortest path πi−1 , so di−1 (u) = di−1 (v) + w i−1 (v u). ˜ i. Let d˜i (v) denote the shortest path distance from s to v with respect to the distance function w ˜ i is positive everywhere, we can quickly compute d˜i (v) for all v using Dijkstra’s algorithm. Because w This gives us both the shortest alternating path πi and the distances di (v) = d˜i (v) + di−1 (v) needed for the next iteration. Exercises 1. Suppose we are given a directed graph G = (V, E), two vertices s an t, and a capacity function c : V → R+ . A flow f is feasible if the total flow into every vertex v is at most c(v): X f (u v) ≤ c(v) for every vertex v. u 7 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] Describe and analyze an efficient algorithm to compute a feasible flow of maximum value. 2. Suppose we are given an n × n grid, some of whose cells are marked; the grid is represented by an array M [1 .. n, 1 .. n] of booleans, where M [i, j] = True if and only if cell (i, j) is marked. A monotone path through the grid starts at the top-left cell, moves only right or down at each step, and ends at the bottom-right cell. Our goal is to cover the marked cells with as few monotone paths as possible. Greedily covering the marked cells in a grid with four monotone paths. (a) Describe an algorithm to find a monotone path that covers the largest number of marked cells. (b) There is a natural greedy heuristic to find a small cover by monotone paths: If there are any marked cells, find a monotone path π that covers the largest number of marked cells, unmark any cells covered by π those marked cells, and recurse. Show that this algorithm does not always compute an optimal solution. (c) Describe and analyze an efficient algorithm to compute the smallest set of monotone paths that covers every marked cell. 3. Suppose we are given a set of boxes, each specified by their height, width, and depth in centimeters. All three side lengths of every box lie strictly between 10cm and 20cm. As you should expect, one box can be placed inside another if the smaller box can be rotated so that its height, width, and depth are respectively smaller than the height, width, and depth of the larger box. Boxes can be nested recursively. Call a box is visible if it is not inside another box. Describe and analyze an algorithm to nest the boxes so that the number of visible boxes is as small as possible. 4. Let G be a directed flow network whose edges have costs, but which contains no negativecost cycles. Prove that one can compute a minimum-cost maximum flow in G using a variant of Ford-Fulkerson that repeatedly augments the (s, t)-path of minimum total cost in the current residual graph. What is the running time of this algorithm? 5. An (s , t )-series-parallel graph is an directed acyclic graph with two designated vertices s (the source) and t (the target or sink) and with one of the following structures: • Base case: A single directed edge from s to t. • Series: The union of an (s, u)-series-parallel graph and a (u, t)-series-parallel graph that share a common vertex u but no other vertices or edges. 8 Algorithms Lecture 25: Extensions of Maximum Flow [Faâ13] • Parallel: The union of two smaller (s, t)-series-parallel graphs with the same source s and target t, but with no other vertices or edges in common. (a) Describe an efficient algorithm to compute a maximum flow from s to t in an (s, t)-series-parallel graph with arbitrary edge capacities. (b) Describe an efficient algorithm to compute a minimum-cost maximum flow from s to t in an (s, t)-series-parallel graph whose edges have unit capacity and arbitrary costs. ? (c) Describe an efficient algorithm to compute a minimum-cost maximum flow from s to t in an (s, t)-series-parallel graph whose edges have arbitrary capacities and costs. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 9 Algorithms Lecture 26: Linear Programming [Fa ’13] The greatest flood has the soonest ebb; the sorest tempest the most sudden calm; the hottest love the coldest end; and from the deepest desire oftentimes ensues the deadliest hate. — Socrates Th’ extremes of glory and of shame, Like east and west, become the same. — Samuel Butler, Hudibras Part II, Canto I (c. 1670) Extremes meet, and there is no better example than the haughtiness of humility. — Ralph Waldo Emerson, “Greatness”, in Letters and Social Aims (1876) ? 26 Linear Programming The maximum flow/minimum cut problem is a special case of a very general class of problems called linear programming. Many other optimization problems fall into this class, including minimum spanning trees and shortest paths, as well as several common problems in scheduling, logistics, and economics. Linear programming was used implicitly by Fourier in the early 1800s, but it was first formalized and applied to problems in economics in the 1930s by Leonid Kantorovich. Kantorivich’s work was hidden behind the Iron Curtain (where it was largely ignored) and therefore unknown in the West. Linear programming was rediscovered and applied to shipping problems in the early 1940s by Tjalling Koopmans. The first complete algorithm to solve linear programming problems, called the simplex method, was published by George Dantzig in 1947. Koopmans first proposed the name “linear programming" in a discussion with Dantzig in 1948. Kantorovich and Koopmans shared the 1975 Nobel Prize in Economics “for their contributions to the theory of optimum allocation of resources”. Dantzig did not; his work was apparently too pure. Koopmans wrote to Kantorovich suggesting that they refuse the prize in protest of Dantzig’s exclusion, but Kantorovich saw the prize as a vindication of his use of mathematics in economics, which his Soviet colleagues had written off as “a means for apologists of capitalism”. A linear programming problem asks for a vector x ∈ Rd that maximizes (or equivalently, minimizes) a given linear function, among all vectors x that satisfy a given set of linear inequalities. The general form of a linear programming problem is the following: maximize d X cj x j j=1 subject to d X ai j x j ≤ bi for each i = 1 .. p ai j x j = bi for each i = p + 1 .. p + q ai j x j ≥ bi for each i = p + q + 1 .. n j=1 d X j=1 d X j=1 © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 1 Algorithms Lecture 26: Linear Programming [Fa ’13] Here, the input consists of a matrix A = (ai j ) ∈ Rn×d , a column vector b ∈ Rn , and a row vector c ∈ Rd . Each coordinate of the vector x is called a variable. Each of the linear inequalities is called a constraint. The function x 7→ c · x is called the objective function. I will always use d to denote the number of variables, also known as the dimension of the problem. The number of constraints is usually denoted n. A linear programming problem is said to be in canonical form¹ if it has the following structure: maximize d X cj x j j=1 subject to d X ai j x j ≤ bi for each i = 1 .. n xj ≥ 0 for each j = 1 .. d j=1 We can express this canonical form more compactly as follows. For two vectors x = (x 1 , x 2 , . . . , x d ) and y = ( y1 , y2 , . . . , yd ), the expression x ≥ y means that x i ≥ yi for every index i. max c · x s.t. Ax ≤ b x≥0 Any linear programming problem can be converted into canonical form as follows: • For each variable x j , add the equality constraint x j = x +j − x −j and the inequalities x +j ≥ 0 and x −j ≥ 0. P P • Replace any equality constraint j ai j x j = bi with two inequality constraints j ai j x j ≥ bi P and j ai j x j ≤ bi . P P • Replace any upper bound j ai j x j ≥ bi with the equivalent lower bound j −ai j x j ≤ −bi . This conversion potentially double the number of variables and the number of constraints; fortunately, it is rarely necessary in practice. Another useful format for linear programming problems is slack form², in which every inequality is of the form x j ≥ 0: max c · x s.t. Ax = b x≥0 It’s fairly easy to convert any linear programming problem into slack form. Slack form is especially useful in executing the simplex algorithm (which we’ll see in the next lecture). 26.1 The Geometry of Linear Programming A point x ∈ Rd is feasible with respect to some linear programming problem if it satisfies all the linear constraints. The set of all feasible points is called the feasible region for that linear program. ¹Confusingly, some authors call this standard form. ²Confusingly, some authors call this standard form. 2 Algorithms Lecture 26: Linear Programming [Fa ’13] The feasible region has a particularly nice geometric structure that lends some useful intuition to the linear programming algorithms we’ll see later. Any linear equation in d variables defines a hyperplane in Rd ; think of a line when d = 2, or a plane when d = 3. This hyperplane divides Rd into two halfspaces; each halfspace is the set of points that satisfy some linear inequality. Thus, the set of feasible points is the intersection of several hyperplanes (one for each equality constraint) and halfspaces (one for each inequality constraint). The intersection of a finite number of hyperplanes and halfspaces is called a polyhedron. It’s not hard to verify that any halfspace, and therefore any polyhedron, is convex—if a polyhedron contains two points x and y, then it contains the entire line segment x y. A two-dimensional polyhedron (white) defined by 10 linear inequalities. By rotating Rd (or choosing a coordinate frame) so that the objective function points downward, we can express any linear programming problem in the following geometric form: Find the lowest point in a given polyhedron. With this geometry in hand, we can easily picture two pathological cases where a given linear programming problem has no solution. The first possibility is that there are no feasible points; in this case the problem is called infeasible. For example, the following LP problem is infeasible: maximize x − y subject to 2x + y ≤ 1 x+ y ≥2 x, y ≥ 0 An infeasible linear programming problem; arrows indicate the constraints. The second possibility is that there are feasible points at which the objective function is arbitrarily large; in this case, we call the problem unbounded. The same polyhedron could be unbounded for some objective functions but not others, or it could be unbounded for every objective function. 3 Algorithms Lecture 26: Linear Programming [Fa ’13] A two-dimensional polyhedron (white) that is unbounded downward but bounded upward. 26.2 Example 1: Shortest Paths We can compute the length of the shortest path from s to t in a weighted directed graph by solving the following very simple linear programming problem. maximize dt subject to ds = 0 d v − du ≤ `u v for every edge u v Here, `u v is the length of the edge u v. Each variable d v represents a tentative shortest-path distance from s to v. The constraints mirror the requirement that every edge in the graph must be relaxed. These relaxation constraints imply that in any feasible solution, d v is at most the shortest path distance from s to v. Thus, somewhat counterintuitively, we are correctly maximizing the objective function to compute the shortest path! In the optimal solution, the objective function d t is the actual shortest-path distance from s to t, but for any vertex v that is not on the shortest path from s to t, d v may be an underestimate of the true distance from s to v. However, we can obtain the true distances from s to every other vertex by modifying the objective function: X maximize dv v subject to ds = 0 d v − du ≤ `u v for every edge u v There is another formulation of shortest paths as an LP minimization problem using an indicator variable x u v for each edge u v. X minimize `u v · x u v subject to u X u X u u v x us − X x u t − X x u v − X X w w w x sw = 1 x t w = −1 x v w = 0 for every vertex v 6= s, t x u v ≥ 0 for every edge u v Intuitively, x u v = 1 means u v lies on the shortest path from s to t, and x u v = 0 means u v does not lie on this shortest path. The constraints merely state that the path should start at s, end at t, and either pass through or avoid every other vertex v. Any path from s to t—in particular, the shortest path—clearly implies a feasible point for this linear program. 4 Algorithms Lecture 26: Linear Programming [Fa ’13] However, there are other feasible solutions, possibly even optimal solutions, with non-integral values that do not represent paths. Nevertheless, there is always an optimal solution in which every x e is either 0 or 1 and the edges e with x e = 1 comprise the shortest path. (This fact is by no means obvious, but a proof is beyond the scope of these notes.) Moreover, in any optimal solution, even if not every x e is an integer, the objective function gives the shortest path distance! 26.3 Example 2: Maximum Flows and Minimum Cuts Recall that the input to the maximum (s, t)-flow problem consists of a weighted directed graph G = (V, E), two special vertices s and t, and a function assigning a non-negative capacity ce to each edge e. Our task is to choose the flow f e across each edge e, as follows: X X maximize f sw − fus subject to w X w f v w − u X u fu v = 0 for every vertex v 6= s, t fu v ≤ cu v for every edge u v fu v ≥ 0 for every edge u v Similarly, the minimum cut problem can be formulated using ‘indicator’ variables similarly to the shortest path problem. We have a variable S v for each vertex v, indicating whether v ∈ S or v ∈ T , and a variable X u v for each edge u v, indicating whether u ∈ S and v ∈ T , where (S, T ) is some (s, t)-cut.³ X minimize cu v · X u v subject to u v X u v + S v − Su ≥ 0 for every edge u v X u v ≥ 0 for every edge u v Ss = 1 St = 0 Like the minimization LP for shortest paths, there can be optimal solutions that assign fractional values to the variables. Nevertheless, the minimum value for the objective function is the cost of the minimum cut, and there is an optimal solution for which every variable is either 0 or 1, representing an actual minimum cut. No, this is not obvious; in particular, my claim is not a proof! 26.4 Linear Programming Duality Each of these pairs of linear programming problems is related by a transformation called duality. For any linear programming problem, there is a corresponding dual linear program that can be obtained by a mechanical translation, essentially by swapping the constraints and the variables. ³These two linear programs are not quite syntactic duals; I’ve added two redundant variables Ss and S t to the min-cut program to increase readability. 5 Algorithms Lecture 26: Linear Programming [Fa ’13] The translation is simplest when the LP is in canonical form: Dual (q) Primal (Π) max c · x s.t. Ax ≤ b x≥0 min y · b s.t. yA≥ c y≥0 ⇐⇒ We can also write the dual linear program in exactly the same canonical form as the primal, by swapping the coefficient vector c and the objective vector b, negating both vectors, and replacing the constraint matrix A with its negative transpose.⁴ Dual (q) Primal (Π) max c · x s.t. Ax≤ b x≥ 0 ⇐⇒ max −b> · y > s.t. −A> y > ≤ −c y >≥ 0 Written in this form, it should be immediately clear that duality is an involution: The dual of the dual linear program q is identical to the primal linear program Π. The choice of which LP to call the ‘primal’ and which to call the ‘dual’ is totally arbitrary.⁵ The Fundamental Theorem of Linear Programming. A linear program Π has an optimal solution x ∗ if and only if the dual linear program q has an optimal solution y ∗ such that c · x ∗ = y ∗ Ax ∗ = y ∗ · b. The weak form of this theorem is trivial to prove. Weak Duality Theorem. If x is a feasible solution for a canonical linear program Π and y is a feasible solution for its dual q, then c · x ≤ yAx ≤ y · b. Proof: Because x is feasible for Π, we have Ax ≤ b. Since y is positive, we can multiply both sides of the inequality to obtain yAx ≤ y · b. Conversely, y is feasible for q and x is positive, so yAx ≥ c · x. It immediately follows that if c · x = y · b, then x and y are optimal solutions to their respective linear programs. This is in fact a fairly common way to prove that we have the optimal value for a linear program. ⁴For the notational purists: In these formulations, x and b are column vectors, and y and c are row vectors. This is a somewhat nonstandard choice. Yes, that means the dot in c · x is redundant. Sue me. ⁵For historical reasons, maximization LPs tend to be called ‘primal’ and minimization LPs tend to be called ‘dual’. This is a pointless religious tradition, nothing more. Duality is a relationship between LP problems, not a type of LP problem. 6 Algorithms 26.5 Lecture 26: Linear Programming [Fa ’13] Duality Example Before I prove the stronger duality theorem, let me first provide some intuition about where this duality thing comes from in the first place.⁶ Consider the following linear programming problem: maximize 4x 1 + x 2 + 3x 3 subject to x 1 + 4x 2 ≤2 3x 1 − x 2 + x 3 ≤ 4 x1, x2, x3 ≥ 0 Let σ∗ denote the optimum objective value for this LP. The feasible solution x = (1, 0, 0) gives us a lower bound σ∗ ≥ 4. A different feasible solution x = (0, 0, 3) gives us a better lower bound σ∗ ≥ 9. We could play this game all day, finding different feasible solutions and getting ever larger lower bounds. How do we know when we’re done? Is there a way to prove an upper bound on σ∗ ? In fact, there is. Let’s multiply each of the constraints in our LP by a new non-negative scalar value yi : maximize 4x 1 + x 2 + 3x 3 subject to y1 ( x 1 + 4x 2 ) ≤ 2 y1 y2 (3x 1 − x 2 + x 3 ) ≤ 4 y2 x1, x2, x3 ≥ 0 Because each yi is non-negative, we do not reverse any of the inequalities. Any feasible solution (x 1 , x 2 , x 3 ) must satisfy both of these inequalities, so it must also satisfy their sum: ( y1 + 3 y2 )x 1 + (4 y1 − y2 )x 2 + y2 x 3 ≤ 2 y1 + 4 y2 . Now suppose that each yi is larger than the ith coefficient of the objective function: y1 + 3 y2 ≥ 4, 4 y1 − y2 ≥ 1, y2 ≥ 3. This assumption lets us derive an upper bound on the objective value of any feasible solution: 4x 1 + x 2 + 3x 3 ≤ ( y1 + 3 y2 )x 1 + (4 y1 − y2 )x 2 + y2 x 3 ≤ 2 y1 + 4 y2 . (∗) In particular, by plugging in the optimal solution (x 1∗ , x 2∗ , x 3∗ ) for the original LP, we obtain the following upper bound on σ∗ : σ∗ = 4x 1∗ + x 2∗ + 3x 3∗ ≤ 2 y1 + 4 y2 . Now it’s natural to ask how tight we can make this upper bound. How small can we make the expression 2 y1 + 4 y2 without violating any of the inequalities we used to prove the upper bound? This is just another linear programming problem. minimize subject to 2 y1 + 4 y2 y1 + 3 y2 ≥ 4 4 y1 − y2 ≥ 1 y2 ≥ 3 y1 , y2 ≥ 0 ⁶This example is taken from Robert Vanderbei’s excellent textbook Linear Programming: Foundations and Extensions [Springer, 2001], but the idea appears earlier in Jens Clausen’s 1997 paper ‘Teaching Duality in Linear Programming: The Multiplier Approach’. 7 Algorithms Lecture 26: Linear Programming [Fa ’13] In fact, this is precisely the dual of our original linear program! Moreover, inequality (∗) is just an instantiation of the Weak Duality Theorem. 26.6 Strong Duality The Fundamental Theorem can be rephrased in the following form: Strong Duality Theorem. If x ∗ is an optimal solution for a canonical linear program Π, then there is an optimal solution y ∗ for its dual q, such that c · x ∗ = y ∗ Ax ∗ = y ∗ · b. Proof (sketch): I’ll prove the theorem only for non-degenerate linear programs, in which (a) the optimal solution (if one exists) is a unique vertex of the feasible region, and (b) at most d constraint hyperplanes pass through any point. These non-degeneracy assumptions are relatively easy to enforce in practice and can be removed from the proof at the expense of some technical detail. I will also prove the theorem only for the case n ≥ d; the argument for under-constrained LPs is similar (if not simpler). To develop some intuition, let’s first consider the very special case where x ∗ = (0, 0, . . . , 0). Let ei denote the ith standard basis vector, whose ith coordinate is 1 and all other coordinates are 0. Because x i∗ = 0 for all i, our non-degeneracy assumption implies the strict inequality ai · x ∗ < bi for all i. Thus, any sufficiently small ball around the origin does not intersect any other constraint hyperplane ai · x = bi . Thus, for all i, and for any sufficiently small δ > 0, the vector δei is feasible. Because x ∗ is the unique optimum, we must have δci = c · (δei ) < c · x ∗ = 0. We conclude that ci < 0 for all i. Now let y = (0, 0, . . . , 0) as well. We immediately observe that yA ≥ c and y ≥ 0; in other words, y is a feasible solution for the dual linear program q. But y · b = 0 = c · x ∗ , so the weak duality theorem implies that y is an optimal solution to q, and the proof is complete for this very special case. Now let us consider the more general case. Let x ∗ be the optimal solution for the linear program Π; our non-degeneracy assumption implies that this solution is unique, and that exactly d of the n linear constraints are satisfied with equality. Without loss of generality (by permuting the constraints and possibly changing coordinates), we can assume that these are the first d constraints. Thus, we have ai · x ∗ = bi ∗ ai · x < bi for all i ≤ d, for all i ≥ d + 1, where ai denotes the ith row of A. Let A• denote the d × d matrix containing the first d rows of A. Our non-degeneracy assumption implies that A• has full rank, and thus has a well-defined inverse V = A−1 • . Now define a vector y ∈ Rn by setting y j := c · v j for all j ≤ d, y j := 0 for all j ≥ d + 1, j j where v j denotes the jth column of V = A−1 • . Note that ai · v = 0 if i 6= j, and ai · v = 1 if i = j. To simplify notation, let y• = ( y1 , y2 , . . . , yd ) and let b• = (b1 , b2 , . . . , bd ) = A• x ∗ . Because yi = 0 for all i ≥ d + 1, we immediately have ∗ y · b = y• · b• = cV b• = cA−1 • b• = c · x 8 Algorithms Lecture 26: Linear Programming [Fa ’13] and yA = y• A• = cVA• = cA−1 • A• = c. The point x ∗ lies on exactly d constraint hyperplanes; moreover, any sufficiently small ball around x ∗ intersects only those d constraint hyperplanes. Consider the point x˜ = x ∗ − "v j , for some index 1 ≤ j ≤ d and some sufficiently small " > 0. We have ai · x˜ = ai · x ∗ − "(ai · v j ) = bi for all i 6= j, and a j · x˜ = a j · x ∗ − "(a j · v j ) = b j − " < b j . Thus, x˜ is a feasible point for Π. Because x ∗ is the unique optimum for Π, we must have c · x˜ = c · x ∗ − "(c · v j ) < c · x ∗ . We conclude that y j = c · v j > 0 for all j. We have shown that yA ≥ c and y ≥ 0, so y is a feasible solution for the dual linear program q. We have also shown that y · b = c · x ∗ , so by the Weak Duality Theorem, y is also an optimal solution for q, and the proof is complete! We can also give a useful geometric interpretation to the vector y• ∈ Rd . Each linear equation ai · x = bi defines a hyperplane in Rd with normal vector ai . The normal vectors a1 , . . . , ad are linearly independent (by non-degeneracy) and therefore describe a coordinate frame for the Pd vector space Rd . The definition of y• implies that c = y• A• = i=1 yi ai . In other words, y• lists the coefficients of the objective vector c in the coordinate frame a1 , . . . , ad . 26.7 Complementary Slackness Complementary Slackness Theorem. Let x ∗ be an optimal solution to a canonical linear program Π, and let y ∗ be an optimal solution to its dual q. Then for every index i, we have yi∗ > 0 if and only if ai · x ∗ = bi . Symmetrically, for every index j, we have x ∗j > 0 if and only if y ∗ · a j = c j . To be written Exercises 1. (a) Describe how to transform any linear program written in general form into an equivalent linear program written in slack form. maximize subject to d P j=1 d P j=1 d P j=1 d P cj x j ai j x j ≤ bi for each i = 1 .. p ai j x j = bi for each i = p + 1 .. p + q ai j x j ≥ bi for each i = p + q + 1 .. n Z=⇒ max c · x s.t. Ax = b x≥0 j=1 (b) Describe precisely how to dualize a linear program written in slack form. (c) Describe precisely how to dualize a linear program written in general form: In all cases, keep the number of variables in the resulting linear program as small as possible. 9 Algorithms Lecture 26: Linear Programming [Fa ’13] 2. A matrix A = (ai j ) is skew-symmetric if and only if a ji = −ai j for all indices i 6= j; in particular, every skew-symmetric matrix is square. A canonical linear program max{c · x | Ax ≤ b; x ≥ 0} is self-dual if the matrix A is skew-symmetric and the objective vector c is equal to the constraint vector b. (a) Prove that any self-dual linear program Π is syntactically equivalent to its dual program q. (b) Show that any linear program Π with d variables and n constraints can be transformed into a self-dual linear program with n + d variables and n + d constraints. The optimal solution to the self-dual program should include both the optimal solution for Π (in d of the variables) and the optimal solution for the dual program q (in the other n variables). 3. (a) Give a linear-programming formulation of the maximum-cardinality bipartite matching problem. The input is a bipartite graph G = (U ∪ V ; E), where E ⊆ U × V ; the output is the largest matching in G. Your linear program should have one variable for each edge. (b) Now dualize the linear program from part (a). What do the dual variables represent? What does the objective function represent? What problem is this!? 4. Give a linear-programming formulation of the minimum-cost feasible circulation problem. Here you are given a flow network whose edges have both capacities and costs, and your goal is to find a feasible circulation (flow with value 0) whose cost is as small as possible. 5. An integer program is a linear program with the additional constraint that the variables must take only integer values. (a) Prove that deciding whether an integer program has a feasible solution is NP-complete. (b) Prove that finding the optimal feasible solution to an integer program is NP-hard. [Hint: Almost any NP-hard decision problem can be formulated as an integer program. Pick your favorite.] ? 6. Helly’s theorem states that for any collection of convex bodies in Rd , if every d + 1 of them intersect, then there is a point lying in the intersection of all of them. Prove Helly’s theorem for the special case where the convex bodies are halfspaces. Equivalently, show that if a system of linear inequalities Ax ≤ b does not have a solution, then we can select d + 1 of the inequalities such that the resulting subsystem also does not have a solution. [Hint: Construct a dual LP from the system by choosing a 0 cost vector.] 7. Given points (x 1 , y1 ), (x 2 , y2 ), . . . , (x n , yn ) in the plane, the linear regression problem asks for real numbers a and b such that the line y = a x + b fits the points as closely as possible, according to some criterion. The most common fit criterion is minimizing the L2 error, 10 Algorithms Lecture 26: Linear Programming [Fa ’13] defined as follows:⁷ "2 (a, b) = n X ( yi − a x i − b)2 . i=1 But there are several other fit criteria, some of which can be optimized via linear programming. (a) The L1 error (or total absolute deviation) of the line y = a x + b is defined as follows: "1 (a, b) = n X | yi − a x i − b| . i=1 Describe a linear program whose solution (a, b) describes the line with minimum L1 error. (b) The L∞ error (or maximum absolute deviation) of the line y = a x + b is defined as follows: n "∞ (a, b) = max | yi − a x i − b| . i=1 Describe a linear program whose solution (a, b) describes the line with minimum L∞ error. ⁷This measure is also known as sum of squared residuals, and the algorithm to compute the best fit is normally called (ordinary/linear) least squares fitting. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 11 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] Simplicibus itaque verbis gaudet Mathematica Veritas, cum etiam per se simplex sit Veritatis oratio. [And thus Mathematical Truth prefers simple words, because the language of Truth is itself simple.] — Tycho Brahe (quoting Seneca (quoting Euripides)) Epistolarum astronomicarum liber primus (1596) When a jar is broken, the space that was inside Merges into the space outside. In the same way, my mind has merged in God; To me, there appears no duality. — Sankara, Viveka-Chudamani (c. 700), translator unknown ? 27 Linear Programming Algorithms In this lecture, we’ll see a few algorithms for actually solving linear programming problems. The most famous of these, the simplex method, was proposed by George Dantzig in 1947. Although most variants of the simplex algorithm performs well in practice, no deterministic simplex variant is known to run in sub-exponential time in the worst case.¹ However, if the dimension of the problem is considered a constant, there are several linear programming algorithms that run in linear time. I’ll describe a particularly simple randomized algorithm due to Raimund Seidel. My approach to describing these algorithms will rely much more heavily on geometric intuition than the usual linear-algebraic formalism. This works better for me, but your mileage may vary. For a more traditional description of the simplex algorithm, see Robert Vanderbei’s excellent textbook Linear Programming: Foundations and Extensions [Springer, 2001], which can be freely downloaded (but not legally printed) from the author’s website. 27.1 Bases, Feasibility, and Local Optimality Consider the canonical linear program max{c · x | Ax ≤ b, x ≥ 0}, where A is an n × d constraint matrix, b is an n-dimensional coefficient vector, and c is a d-dimensional objective vector. We will interpret this linear program geometrically as looking for the lowest point in a convex polyhedron in Rd , described as the intersection of n + d halfspaces. As in the last lecture, we will consider only non-degenerate linear programs: Every subset of d constraint hyperplanes intersects in a single point; at most d constraint hyperplanes pass through any point; and objective vector is linearly independent from any d − 1 constraint vectors. A basis is a subset of d constraints, which by our non-degeneracy assumption must be linearly independent. The location of a basis is the unique point x that satisfies all d constraints with equality; geometrically, x is the unique intersection point of the d hyperplanes. The value of a n+d basis is c · x, where x is the location of the basis. There are precisely d bases. Geometrically, the set of constraint hyperplanes defines a decomposition of Rd into convex polyhedra; this cell decomposition is called the arrangement of the hyperplanes. Every subset of d hyperplanes (that is, every basis) defines a vertex of this arrangement (the location of the basis). I will use the words ‘vertex’ and ‘basis’ interchangeably. ¹However, there are randomized variants of the simplex algorithm that run in subexponential expected time, most notably the RandomFacet algorithm analyzed by Gil Kalai in 1992, and independently by Jií Matoušek, Micha Sharir, and Emo Welzl in 1996. No randomized variant is known to run in polynomial time. In particular, in 2010, Oliver Friedmann, Thomas Dueholm Hansen, and Uri Zwick proved that the worst-case expected running time of RandomFacet is superpolynomial. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 1 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] A basis is feasible if its location x satisfies all the linear constraints, or geometrically, if the point x is a vertex of the polyhedron. If there are no feasible bases, the linear program is infeasible. A basis is locally optimal if its location x is the optimal solution to the linear program with the same objective function and only the constraints in the basis. Geometrically, a basis is locally optimal if its location x is the lowest point in the intersection of those d halfspaces. A careful reading of the proof of the Strong Duality Theorem reveals that local optimality is the dual equivalent of feasibility; a basis is locally feasible for a linear program Π if and only if the same basis is feasible for the dual linear program q. For this reason, locally optimal bases are sometimes also called dual feasible. If there are no locally optimal bases, the linear program is unbounded.² Two bases are neighbors if they have d − 1 constraints in common. Equivalently, in geometric terms, two vertices are neighbors if they lie on a line determined by some d − 1 constraint hyperplanes. Every basis is a neighbor of exactly d n other bases; to change a basis into one of its neighbors, there are d choices for which constraint to remove and n choices for which constraint to add. The graph of vertices and edges on the boundary of the feasible polyhedron is a subgraph of the basis graph. The Weak Duality Theorem implies that the value of every feasible basis is less than or equal to the value of every locally optimal basis; equivalently, every feasible vertex is higher than every locally optimal vertex. The Strong Duality Theorem implies that (under our non-degeneracy assumption), if a linear program has an optimal solution, it is the unique vertex that is both feasible and locally optimal. Moreover, the optimal solution is both the lowest feasible vertex and the highest locally optimal vertex. 27.2 The Primal Simplex Algorithm: Falling Marbles From a geometric standpoint, Dantzig’s simplex algorithm is very simple. The input is a set H of halfspaces; we want the lowest vertex in the intersection of these halfspaces. Simplex1(H): if ∩H = ∅ return Infeasible x ← any feasible vertex while x is not locally optimal 〈〈pivot downward, maintaining feasibility〉〉 if every feasible neighbor of x is higher than x return Unbounded else x ← any feasible neighbor of x that is lower than x return x Let’s ignore the first three lines for the moment. The algorithm maintains a feasible vertex x. At each so-called pivot operation, the algorithm moves to a lower vertex, so the algorithm n+d never visits the same vertex more than once. Thus, the algorithm must halt after at most d pivots. When the algorithm halts, either the feasible vertex x is locally optimal, and therefore the optimum vertex, or the feasible vertex x is not locally optimal but has no lower feasible neighbor, in which case the feasible region must be unbounded. ²For non-degenerate linear programs, the feasible region is unbounded in the objective direction if and only if no basis is locally optimal. However, there are degenerate linear programs with no locally optimal basis that are infeasible. 2 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] Notice that we have not specified which neighbor to choose at each pivot. Many different pivoting rules have been proposed, but for almost every known pivot rule, there is an input polyhedron that requires an exponential number of pivots under that rule. No pivoting rule is known that guarantees a polynomial number of pivots in the worst case, or even in expectation.³ 27.3 The Dual Simplex Algorithm: Rising Bubbles We can also geometrically interpret the execution of the simplex algorithm on the dual linear program q. Again, the input is a set H of halfspaces, and we want the lowest vertex in the intersection of these halfspaces. By the Strong Duality Theorem, this is the same as the highest locally-optimal vertex in the hyperplane arrangement. Simplex2(H): if there is no locally optimal vertex return Unbounded x ← any locally optimal vertex while x is not feasbile 〈〈pivot upward, maintaining local optimality〉〉 if every locally optimal neighbor of x is lower than x return Infeasible else x ← any locally-optimal neighbor of x that is higher than x return x Let’s ignore the first three lines for the moment. The algorithm maintains a locally optimal vertex x. At each pivot operation, it moves to a higher vertex, so the algorithm never visits the n+d same vertex more than once. Thus, the algorithm must halt after at most d pivots. When the algorithm halts, either the locally optimal vertex x is feasible, and therefore the optimum vertex, or the locally optimal vertex x is not feasible but has no higher locally optimal neighbor, in which case the problem must be infeasible. The primal simplex (falling marble) algorithm in action. The dual simplex (rising bubble) algorithm in action. From the standpoint of linear algebra, there is absolutely no difference between running Simplex1 on any linear program Π and running Simplex2 on the dual linear program q. The ³In 1957, Hirsch conjectured that for any linear programming instance with d variables and n + d constraints, starting at any feasible basis, there is a sequence of at most n pivots that leads to the optimal basis. This long-standing conjecture was finally disproved in 2010 by Fransisco Santos, who described an counterexample with 43 variables, 86 facets, and diameter 44. 3 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] actual code is identical. The only difference between the two algorithms is how we interpret the linear algebra geometrically. 27.4 Computing the Initial Basis To complete our description of the simplex algorithm, we need to describe how to find the initial vertex x. Our algorithm relies on the following simple observations. First, the feasibility of a vertex does not depend at all on the choice of objective vector; a vertex is either feasible for every objective function or for none. No matter how we rotate the polyhedron, every feasible vertex stays feasible. Conversely (or by duality, equivalently), the local optimality of a vertex does not depend on the exact location of the d hyperplanes, but only on their normal directions and the objective function. No matter how we translate the hyperplanes, every locally optimal vertex stays locally optimal. In terms of the original matrix formulation, feasibility depends on A and b but not c, and local optimality depends on A and c but not b. The second important observation is that every basis is locally optimal for some objective function. Specifically, it suffices to choose any vector that has a positive inner product with each of the normal vectors of the d chosen hyperplanes. Equivalently, we can make any basis feasible by translating the hyperplanes appropriately. Specifically, it suffices to translate the chosen d hyperplanes so that they pass through the origin, and then translate all the other halfspaces so that they strictly contain the origin. Our strategy for finding our initial feasible vertex is to choose any vertex, choose a new objective function that makes that vertex locally optimal, and then find the optimal vertex for that objective function by running the (dual) simplex algorithm. This vertex must be feasible, even after we restore the original objective function! (a) (b) (c) (a) Choose any basis. (b) Rotate objective to make it locally optimal, and pivot ’upward’ to find a feasible basis. (c) Pivot downward to the optimum basis for the original objective. Equivalently, to find an initial locally optimal vertex, we choose any vertex, translate the hyperplanes so that that vertex becomes feasible, and then find the optimal vertex for those translated constraints using the (primal) simplex algorithm. This vertex must be locally optimal, even after we restore the hyperplanes to their original locations! Here are more complete descriptions of the simplex algorithm with this initialization rule, in both primal and dual forms. As usual, the input is a set H of halfspaces, and the algorithms either return the lowest vertex in the intersection of these halfspaces or report that no such vertex exists. 4 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] (a) (b) (c) (a) Choose any basis. (b) Translate constraints to make it feasible, and pivot downward to find a locally optimal basis. (c) Pivot upward to the optimum basis for the original constraints. Simplex1(H): x ← any vertex ˜ ← any rotation of H that makes x locally optimal H while x is not feasible ˜ than x if every locally optimal neighbor of x is lower (wrt H) return Infeasible else ˜ than x x ← any locally optimal neighbor of x that is higher (wrt H) while x is not locally optimal if every feasible neighbor of x is higher than x return Unbounded else x ← any feasible neighbor of x that is lower than x return x Simplex2(H): x ← any vertex ˜ ← any translation of H that makes x feasible H while x is not locally optimal ˜ than x if every feasible neighbor of x is higher (wrt H) return Unbounded else ˜ than x x ← any feasible neighbor of x that is lower (wrt H) while x is not feasible if every locally optimal neighbor of x is lower than x return Infeasible else x ← any locally-optimal neighbor of x that is higher than x return x 27.5 Linear Expected Time for Fixed Dimensions In most geometric applications of linear programming, the number of variables is a small constant, but the number of constraints may still be very large. The input to the following algorithm is a set H of n halfspaces and a set B of b hyperplanes. (B stands for basis.) The algorithm returns the lowest point in the intersection of the halfspaces 5 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] in H and the hyperplanes B. At the top level of recursion, B is empty. I will implicitly assume that the linear program is both feasible and bounded. (If necessary, we can guarantee boundedness by adding a single halfspace to H, and we can guarantee feasibility by adding a dimension.) A point x violates a constraint h if it is not contained in the corresponding halfspace. SeidelLP(H, B) : if |B| = d T return B if |H ∪ B| =T d return (H ∪ B) h ← random element of H x ← SeidelLP(H \ h, B) (∗) if x violates h return SeidelLP(H \ h, B ∪ ∂ h) else return x The point x recursively computed in line (∗) is the optimal solution if and only if the random halfspace h is not one of the d halfspaces that define the optimal solution. In other words, the probability of calling SeidelLP(H, B ∪ h) is exactly (d − b)/n. Thus, we have the following recurrence for the expected number of recursive calls for this algorithm: if b = d or n + b = d 1 T (n, b) = T (n − 1, b) + d − b · T (n − 1, b + 1) otherwise n The recurrence is somewhat simpler if we write δ = d − b: if δ = 0 or n = δ 1 T (n, δ) = T (n − 1, δ) + δ · T (n − 1, δ − 1) otherwise n It’s easy to prove by induction that T (n, δ) = O(δ! n): T (n, δ) = T (n − 1, δ) + δ · T (n − 1, δ − 1) n δ (δ − 1)! · (n − 1) n n−1 = δ! (n − 1) + δ! n ≤ δ! n ≤ δ! (n − 1) + [induction hypothesis] At the top level of recursion, we perform one violation test in O(d) time. In each of the base cases, we spend O(d 3 ) time computing the intersection point of d hyperplanes, and in the first base case, we spend O(d n) additional time testing for violations. More careful analysis implies that the algorithm runs in O(d! · n) expected time. Exercises 1. Fix a non-degenerate linear program in canonical form with d variables and n+d constraints. 6 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] (a) Prove that every feasible basis has exactly d feasible neighbors. (b) Prove that every locally optimal basis has exactly n locally optimal neighbors. 2. Suppose you have a subroutine that can solve linear programs in polynomial time, but only if they are both feasible and bounded. Describe an algorithm that solves arbitrary linear programs in polynomial time. Your algorithm should return an optimal solution if one exists; if no optimum exists, your algorithm should report that the input instance is Unbounded or Infeasible, whichever is appropriate. [Hint: Add one variable and one constraint.] 3. (a) Give an example of a non-empty polyhedron Ax ≤ b that is unbounded for every objective vector c. (b) Give an example of an infeasible linear program whose dual is also infeasible. In both cases, your linear program will be degenerate. 4. Describe and analyze an algorithm that solves the following problem in O(n) time: Given n red points and n blue points in the plane, either find a line that separates every red point from every blue point, or prove that no such line exists. 5. The single-source shortest path problem can be formulated as a linear programming problem, with one variable d v for each vertex v 6= s in the input graph, as follows: X maximize dv v subject to d v ≤ `s→v for every edge s → v d v − du ≤ `u→v for every edge u → v with u 6= s for every vertex v 6= s dv ≥ 0 This problem asks you to describe the behavior of the simplex algorithm on this linear program in terms of distances. Assume that the edge weights `u→v are all non-negative and that there is a unique shortest path between any two vertices in the graph. (a) What is a basis for this linear program? What is a feasible basis? What is a locally optimal basis? (b) Show that in the optimal basis, every variable d v is equal to the shortest-path distance from s to v. (c) Describe the primal simplex algorithm for the shortest-path linear program directly in terms of vertex distances. In particular, what does it mean to pivot from a feasible basis to a neighboring feasible basis, and how can we execute such a pivot quickly? (d) Describe the dual simplex algorithm for the shortest-path linear program directly in terms of vertex distances. In particular, what does it mean to pivot from a locally optimal basis to a neighboring locally optimal basis, and how can we execute such a pivot quickly? (e) Is Dijkstra’s algorithm an instance of the simplex method? Justify your answer. 7 Algorithms Lecture 27: Linear Programming Algorithms [Fa’13] (f) Is Shimbel’s algorithm an instance of the simplex method? Justify your answer. 6. The maximum (s, t)-flow problem can be formulated as a linear programming problem, with one variable fu→v for each edge u → v in the input graph: X X maximize fs→w − fu→s subject to w X w f v→w − u X fu→v = 0 for every vertex v 6= s, t u fu→v ≤ cu→v for every edge u → v fu→v ≥ 0 for every edge u → v This problem asks you to describe the behavior of the simplex algorithm on this linear program in terms of flows. (a) What is a basis for this linear program? What is a feasible basis? What is a locally optimal basis? (b) Show that the optimal basis represents a maximum flow. (c) Describe the primal simplex algorithm for the flow linear program directly in terms of flows. In particular, what does it mean to pivot from a feasible basis to a neighboring feasible basis, and how can we execute such a pivot quickly? (d) Describe the dual simplex algorithm for the flow linear program directly in terms of flows. In particular, what does it mean to pivot from a locally optimal basis to a neighboring locally optimal basis, and how can we execute such a pivot quickly? (e) Is the Ford-Fulkerson augmenting path algorithm an instance of the simplex method? Justify your answer. [Hint: There is a one-line argument.] 7. (a) Formulate the minimum spanning tree problem as an instance of linear programming. Try to minimize the number of variables and constraints. (b) In your MST linear program, what is a basis? What is a feasible basis? What is a locally optimal basis? (c) Describe the primal simplex algorithm for your MST linear program directly in terms of the input graph. In particular, what does it mean to pivot from a feasible basis to a neighboring feasible basis, and how can we execute such a pivot quickly? (d) Describe the dual simplex algorithm for your MST linear program directly in terms of the input graph. In particular, what does it mean to pivot from a locally optimal basis to a neighboring locally optimal basis, and how can we execute such a pivot quickly? (e) Which of the classical MST algorithms (Borvka, Jarník, Kruskal, reverse greedy), if any, are instances of the simplex method? Justify your answer. © Copyright 2014 Jeff Erickson. This work is licensed under a Creative Commons License (http://creativecommons.org/licenses/by-nc-sa/4.0/). Free distribution is strongly encouraged; commercial distribution is expressly forbidden. See http://www.cs.uiuc.edu/~jeffe/teaching/algorithms for the most recent revision. 8

© Copyright 2019