Run Time Analysis Jordan Snow Kruskal’s Algorithm Finds minimum spanning tree 1. Kruskal’s add edges starting with a forest of trees 2. Sorts the edges by cost 3. Adds edges by ascending order of cost 4. Edge is only added if it connects two different trees Cycles are not formed 5. Finishes with complete tree Pseudo Code for Kruskal MST-Kruskal(G, w) A=∅ for each vertex v ∈ G.V Make-Set(v) sort the edges of G.E into ascending order by cost for each edge (u, v) ∈ G.E, considered in ascending order: if Find-Set (u) ≠ Find-Set(v) //vertices not in the same set A = A ⋃ {(u, v)} //it is ok to add the edge to A Union(u, v) //union the two trees return A Single Linkage Analysis done in term of Kruskal’s • Start by placing each point in its own cluster O(V) • Sort the edges O(E lgE) • Finding sets and unions O(E) We assume the implementation of disjoint-set Union find with path compression and union by rank So O((V+E))α(V)) • Total time is O((V+E))α(V)) + O(E lgE) • α(V) = O(lgV) = O(lgE) Total time reduces to O(E lgE) Single Linkage After using Kruskal’s, we start cutting the minimum spanning tree • We want to cut k-1 longest edges • This results in k clusters • If k =3 , we cut the two longest Complete Linkage Start by placing each point in its own cluster O(n) Store the distance between each pair of clusters O(n²) While there are more than k clusters O(n) Let A,B be the two closest, farthest clusters O(n²) Add cluster A ⋃ B O(n) Remove cluster A and B O(n) Find farthest distance from A ⋃ B to all other clusters O(n²) Total time comes to O(n³) Average Linkage Same analysis applies to average linkage as complete Average must take all distances and find the average of those, then compare Total time is still O(n³) Lloyd’s Method Pick k random points O(k) Until convergence: ? Assign each point to its closest center O(kn) Compute the mean of each cluster O(n) Let these means be the new centers O(k) In practice, Lloyd’s converges so quickly that the algorithm is linear in practice Deterministic Lloyd’s Method Furthest Centroids: Pick a random center C1 O(1) Set C2 as the farthest point from C1 O(n) Set Ci to have largest minimum distance from any center already chosen O(kn) Running time of seeding is O(kn) Runs same as previous after seeding, linear run time

© Copyright 2018