The Minimum Spanning Tree Problem

Three subproblems to linear time

12 min read | Algorithms & Complexity

Index

The Problem

Given an undirected, connected graph G = (V, E, C) with n vertices and m edges, where each edge has a cost, a spanning tree is a subset of n - 1 edges that connects all vertices. The minimum spanning tree (MST) is the spanning tree whose total edge cost is minimum.

This is one of the oldest problems in combinatorial optimization. It has been studied since the 1920s, starting with Boruvka's algorithm in 1926. The problem appears everywhere: network design, clustering, image segmentation, approximation algorithms for harder problems. It is also one of the most natural graph problems to state. Most computer science students encounter it in their second or third year.

And yet, one of the most basic questions about it remains open: can we compute the MST in deterministic O(m) time?

Here is what we know:

  • Randomized O(m): Karger, Klein, and Tarjan (1995) gave a randomized algorithm that runs in O(m) expected time.
  • Deterministic O(m alpha(m,n)): Chazelle (2000) gave a deterministic algorithm where alpha is the inverse Ackermann function -- a function that grows so slowly it is effectively constant for any input that fits in the observable universe.
  • Optimal (comparison model): Pettie and Ramachandran (2002) gave an algorithm that is provably optimal in the comparison model, but the complexity of their algorithm T*(m,n) is not known to be linear.

The gap between randomized O(m) and deterministic O(m alpha(m,n)) is tiny in practice -- alpha(m,n) is at most 4 for any graph that could exist in physical reality. But in theory, the gap is the difference between a solved problem and an open one. Closing it has resisted four decades of effort.

This post summarizes work from my 2018 master's thesis at DIKU (University of Copenhagen), supervised by Jyrki Katajainen. The thesis attempted to decompose the MST problem into three independent subproblems whose solutions would compose into a deterministic O(m) algorithm. The decomposition is sound. The individual solutions remain incomplete.

Models of Computation

Before attacking the problem, we need to specify what operations are allowed. The MST problem is studied under two computational models, and the distinction matters because the bottleneck lives in the gap between them.

The Pointer Machine

A pointer machine (PM) has an input tape (read-only), an output tape (write-only), and a memory that is a directed graph with constant fan-out k. Each memory cell consists of k pointers labeled from a pointer alphabet. You can follow pointers, compare them, and create new cells, but you cannot do arithmetic on pointers or addresses. There is no random access.

This model closely resembles what you actually do when you implement graph algorithms: you follow edges, you traverse linked structures, you compare values. It is the more restrictive and more realistic model.

The Word RAM

A word RAM operates on words of size wordsize bits, stored in a memory of 2^wordsize locations. Each word can hold an integer or an address. The processor supports arithmetic, logical, and bitwise operations at unit cost. Any word in memory can be accessed in O(1) time by its address.

The Word RAM is more powerful. It allows table lookups -- precomputing answers for small inputs and retrieving them in constant time. This is what separates O(m + n) from O(m alpha(m,n)) in the disjoint-set data structure, which is the bottleneck of MST computation.

On a pointer machine, disjoint-set operations with path compression and union by rank take O(m alpha(m,n)). The alpha factor comes from the analysis of path compression. On a Word RAM, if you know the union tree structure in advance, you can precompute lookup tables and achieve O(m + n). The gap between the models is the gap between the current best and linear time.

The only restriction on edge comparisons: we do not assume anything about the weights. They can be arbitrary real numbers. If you allow integer weights with bounded range, sorting becomes linear and the problem simplifies. We do not take that shortcut.

The Road So Far

Every MST algorithm exploits two fundamental properties:

  • Cycle property: For any cycle C in the graph, the heaviest edge in C is not in the MST.
  • Cut property: For any proper subset X of vertices, the lightest edge with exactly one endpoint in X belongs to the MST.

The classical algorithms differ in how they apply these properties:

Kruskal (1956) sorts all edges by weight, then processes them in order. For each edge, if it connects two different components, add it to the tree. This is the "joining two nearest fragments" approach. The running time is O(m log m) for sorting plus O(m alpha(m,n)) for the disjoint-set operations. Sorting dominates.

Prim (1957) grows a single tree from an arbitrary start vertex, always adding the cheapest edge that extends the current tree. With a Fibonacci heap, the running time is O(m + n log n). This is linear for dense graphs where m is large relative to n.

Boruvka (1926) works in phases. In each phase, every component selects its lightest outgoing edge. All selected edges are added simultaneously, and the connected components are contracted. Each phase halves the number of vertices, so after O(log n) phases the tree is complete. The running time is O(m log n).

The modern algorithms build on these foundations:

Karger-Klein-Tarjan (1995) achieved randomized O(m) by combining Boruvka phases with random sampling. The key insight: sample each edge independently with probability 1/2, recursively compute the MST of the sample, then use a linear-time verification algorithm to discard edges that are heavier than the path connecting their endpoints in the sample MST. The expected number of surviving edges is O(n/p) = O(n), so the recursion bottoms out quickly.

Chazelle (2000) achieved deterministic O(m alpha(m,n)) using soft heaps -- a priority queue that deliberately corrupts a fraction of its elements in exchange for O(1) amortized insert and extract-min. The corruption allows finding "approximately contractible" subgraphs without exact comparisons, building a tree of such subgraphs that converges to the MST.

Pettie-Ramachandran (2002) achieved optimal complexity in the comparison model by precomputing optimal decision trees for all graphs up to log^3(n) vertices. For small subgraphs, brute-force enumeration of all possible decision trees finds the best one. For the full graph, a partition procedure splits it into small components that are solved by lookup. The resulting complexity T*(m,n) is provably optimal, but whether T*(m,n) = O(m) is itself an open question.

Every improvement since Boruvka either exploits randomization (which we want to avoid) or trades correctness for speed via approximate data structures (soft heaps, corrupted priorities). The remaining gap is in the disjoint-set operations -- the alpha(m,n) factor that path compression cannot eliminate on a pointer machine.

The Decomposition

Rather than attempting one monolithic algorithm, the thesis breaks the problem into three independent subproblems. Each addresses a different aspect of the bottleneck, and each can be studied and potentially solved on its own. If all three are solved, they compose into a deterministic O(m + n) algorithm.

Subproblem 1                    Subproblem 3
(Approximate Disjoint-Sets)     (Cycle Hierarchy)
         \                          /
          \                        /
           v                      v
           Subproblem 2
           (Density Partition)
                  |
                  v
          Full MST Algorithm
              O(m + n)

Subproblems 1 and 3 are independent and can be attacked in parallel. Subproblem 2 is the framework that combines their results. The thesis organizes its own attempts honestly: Chapter IV is titled "Documenting Our Trials" and structured as "The Bad, The Ugly, The Good" -- dead ends are documented alongside the sound ideas.

Subproblem 1: Approximate Disjoint-Sets with Bounded Error

The disjoint-set data structure is the bottleneck. In Kruskal's algorithm, we process edges in weight order and use FIND to check whether two vertices are in the same component. If not, we UNION them. With path compression and union by rank, a sequence of m operations on n elements takes O(m alpha(m,n)). This alpha factor is what prevents linearity.

Gabow and Tarjan (1983) showed that if you know the union tree structure in advance -- that is, if someone tells you which unions will happen and in what tree shape -- you can precompute lookup tables for small components and achieve O(m + n) time. The catch is obvious: knowing the union tree in advance means knowing the MST, which is what you are trying to compute.

The thesis proposes a way around this circularity: what if you approximately know the union tree?

  • Build a predicted tree T' such that it differs from the true MST union tree by at most a fraction of the edges.
  • Use Gabow-Tarjan lookup tables based on T'.
  • When a FIND query returns an answer that disagrees with the true structure, correct it. The corrections cost extra work, but if the prediction is good enough, the total extra work is bounded.

Where does the prediction come from? One natural source: run a fast approximate MST algorithm (for example, the random sampling step from Karger-Klein-Tarjan) and use the result as T'. The key theorem from the thesis:

Theorem 12. Given an undirected weighted graph G of n vertices and m edges, in the computation of its minimum spanning tree the disjoint set operations can be bounded to O(m + n) running time, and can allow a degree of error that must be accounted for.

The proof sketch: if the predicted tree T' shares enough edges with the true MST, then the lookup tables for small components (of size log log n) give O(1) FIND operations for most queries. The faulty queries create "phantom cycles" -- the algorithm might temporarily believe two vertices are in the same component when they are not, or vice versa. These phantom cycles are bounded by the prediction error and can be cleaned up in linear time after the main computation.

This idea connects to an active research direction that did not exist when the thesis was written: learning-augmented algorithms. The framework of "algorithms with predictions" (Mitzenmacher and Vassilvitskii, 2020) formalizes exactly this tradeoff -- achieve O(m + n) when predictions are correct (consistency) while maintaining O(m alpha(m,n)) when predictions are arbitrarily wrong (robustness). The approximate disjoint-set subproblem fits this framework precisely.

Subproblem 2: Density Partition (HD/LD/MDN)

Different MST algorithms work best at different graph densities. Prim's algorithm is linear on dense graphs (m >= n^p for some p > 1). Kruskal's algorithm is nearly linear on sparse graphs (m = O(n)). Neither is linear on all graphs. The density partition exploits this by splitting the graph into regions and applying the right algorithm to each.

The partition classifies vertices by degree:

  • HD (High Degree): vertices with degree > c * log^3(n)
  • LD (Low Degree): vertices with degree < c * log^3(n)

And then by the types of their neighbors:

  • HDN: high-degree vertices whose neighbors are also high-degree
  • LDN: low-degree vertices whose neighbors are also low-degree
  • MDN: mixed-degree neighbors -- the boundary between dense and sparse regions

The data structure for this split is computed in O(m + n) time: iterate over vertices to set degrees, iterate again to classify neighbor types.

For each region:

  • HD-HDN components are dense enough that Prim's algorithm runs in linear time. These are almost fully connected subgraphs.
  • LD-LDN components are sparse enough for a variation of Kruskal's algorithm. The edges can be processed in weight order, discarding F-heavy edges (edges heavier than the path connecting their endpoints in the current forest), keeping only the F-light ones.
  • MDN boundary edges connect dense and sparse regions. The key claim: this boundary is small -- O(n / log^3(n)) edges -- because it consists only of edges between vertices of different degree classes.

After solving each region, contract the components into super-vertices. The resulting graph is smaller. The border edges form a sparse graph whose MST can be computed recursively. The recursion depth should be bounded (O(1) or O(log* n) levels), and at each level the total work is O(m + n).

This subproblem is the glue. It becomes clearer once Subproblems 1 and 3 are solved, because the density partition needs to know: (a) how the disjoint-set operations will behave with approximate answers, and (b) how to trim tree components efficiently using the cycle hierarchy.

Subproblem 3: Cycle Hierarchy

Consider a sparse graph component that is almost a tree. It has a few cycles and many degree-1 vertices (leaves). The MST of such a component must include all edges connecting degree-1 vertices -- there is no alternative path. So we can safely "trim" these edges and contract the leaves.

The problem: trimming cascades. When you remove a leaf, its neighbor might become a new leaf, which can be trimmed in turn. Naive trimming takes O(n^2) in the worst case because a single trim cascade can propagate across the entire graph.

Theorem 13. Edges that connect components of degree 1 belong to the MST regardless of their weight. (Proof: removing such an edge disconnects the component.)

The fix: if you know the cycle structure of the graph beforehand, you know exactly where each trim cascade will stop -- at a vertex that participates in a cycle. With this knowledge, trimming runs in O(n).

The cycle hierarchy organizes cycles by level:

  • Level 0: triangles (3-cycles)
  • Level 1: 4-cycles that are not decomposable into triangles
  • Level k: (k+3)-cycles, where each cycle at level k may contain cycles from lower levels

Building this hierarchy requires finding cycles. Finding cycles requires a spanning tree. But we are trying to compute the MST. This looks circular.

The escape: cycle detection does not require edge weights. Any spanning tree reveals the cycle structure -- each non-tree edge closes exactly one cycle. You can build the topological hierarchy using a DFS or BFS tree, which can be computed in O(m + n) without any weight comparisons. The hierarchy tells you where cycles are and how they nest. You then apply this structural information to the weighted MST computation.

The algorithm for building levels: at level i, use a disjoint-set L(i) to track cycle-rooted trees. Pick an unvisited high-degree vertex v, add its neighbors, and check whether a FIND operation returns a vertex already in the component. If it does, a cycle is found. Track border edges (edges leaving the current level) separately -- they become active edges at the next level.

Vertices of very high degree (approaching n) must be handled separately, as they participate in almost every cycle. The thesis proposes treating the set of very high-degree vertices as a dense subgraph whose MST can be computed by Prim in linear time.

This is the hardest of the three subproblems. The hierarchy construction is clear in outline but the formal proof that it can be built in O(m + n) for arbitrary graphs -- and that it correctly enables O(n) trimming in the weighted MST computation -- remains incomplete.

One Algorithm to Bind Them

The proposed algorithm on the RAM model combines all three subproblems:

  1. Compute the density partition in O(m + n). Classify vertices into HD, LD, and their neighbor types (HDN, LDN, MDN).
  2. Build the cycle hierarchy using a weight-oblivious spanning tree. This gives the topological structure needed for trimming.
  3. For HD-HDN components: apply Prim's algorithm. Linear time on dense graphs.
  4. For LD-LDN components: apply modified Kruskal with the approximate disjoint-set. Use the cycle hierarchy to trim degree-1 vertices in O(n) instead of O(n^2). Process edges in weight order, using the approximate FIND to detect cycles.
  5. Handle MDN boundary edges: contract solved components into super-vertices, recurse on the border graph.
  6. Clean up phantom cycles from the approximate disjoint-set. Verify the result using King's O(m + n) MST verification algorithm.

On a pointer machine, this approach has less to offer. The disjoint-set operations without table lookup are bounded by O(m alpha(m,n)), and that factor cannot be removed without the RAM model's ability to precompute and access tables in constant time. The thesis is honest about this limitation: "this discussion has little to offer if one wishes to solve the problem on a pointer machine."

Closing Remarks

This thesis was written in five months at DIKU under difficult circumstances. The core insight -- that the MST problem decomposes into three independent subproblems attackable on different fronts -- is sound. The individual subproblems are well-defined and, in 2026, connect to active research directions that did not exist in 2018.

The approximate disjoint-set (Subproblem 1) fits squarely into the learning-augmented algorithms framework. The density partition (Subproblem 2) relates to recent work on local graph algorithms and graph sparsification. The cycle hierarchy (Subproblem 3) connects to dynamic connectivity and top trees.

What is incomplete: Subproblem 1 has a theorem statement and a proof sketch, but the formal bounds on prediction quality needed for O(m + n) are not tight. Subproblem 3 has a clear construction but the proof that it runs in O(m + n) for arbitrary graphs is missing. Subproblem 2 depends on the other two.

The thesis documents its dead ends alongside its successes. Chapter IV opens with "The Bad" -- ideas that looked promising but led nowhere. It continues with "The Ugly" -- ideas that might work but have gaps in the proofs. It closes with "The Good" -- the density partition, trimming, and cycle hierarchy. This structure is deliberate. The reader should know which parts to trust and which parts to question.

The three subproblems remain open. I continue working on them. The attack plan has been updated with connections to research published between 2018 and 2026, and each subproblem now has concrete vectors of attack informed by eight years of additional perspective. Whether O(m) MST is achievable deterministically, or whether the alpha(m,n) factor is fundamental, remains one of the most remarkable open problems in computer science.

12 min read | Algorithms & Complexity | go-fp in Production | PQC in the Real World