Disjoint set
The Disjoint Set is an Abstract Data Type that is optimized to support two particular operations:
- Union: Given two elements u and v, merge the sets to which they belong
- Find: Given an element e, return the set to which it belongs
Disjoint Sets can be very efficiently implemented using the Up-Tree data structure, which in essence is simply a graph in which all nodes have a single "parent" pointer.
Nodes that do not have a parent (i.e., the roots of their respective trees) are called sentinel nodes, and each sentinel node represents a single set
How to implement a forest of nodes? Adjacency list or adjacency matrix.
Nodes in an Up-Tree will always only need to store an edge to a single other node, its parent, as opposed to multiple adjacent vertices. As a result, we can take advantage of a simple array to store the information that we need. Each element of array save the index of its parent node.
Union: the purpose of the union operation is to join two sets. This implies that, after we have unioned two different sets, the two sets should end up with the same sentinel node.
Our overarching goal while building an Up-Tree is to minimize its height because a smaller tree height corresponds to a faster "find" operation.
"Union-by-Size": In this insertion tactic, the sentinel node of the smaller set (i.e., the set with less elements) gets attached to the sentinel node of the larger set.
the worst-case time complexity of a "find" operation in Union-by-Size is the same as the worst-case height of a balanced tree: O(log n).
"Union-by-Height": the sentinel node of the shorter set (i.e., smaller height) gets attached to the sentinel node of the taller set.
Path Compression: As we traverse the tree to return the sentinel node, we re-attach each vertex along the way directly to the root.