Sunday, April 28, 2019

Algorithms: Union-Find in C#

Union-Find Algorithm: Imagine if you have 100 computers in a room not networked to one another. You could use a Union(computer1, computer2) method to join 2 computers together in a network. From there, you could continue joining computers together randomly and you would want to have a method that returns a bool called Connected(computer1, computer2) that determines if they are already connected, so you don't have to do the work if they're already in the same network.
    /// 
    /// Initialize: N
    /// Union: N
    /// Find: 1
    /// Accesses: takes N^2 accesses to process sequence of N union commands on N objects
    /// Summary: Union is too expensive. Too slow for huge problems. Quick union will be a little faster. 
    /// 
    public class QuickFindUF
    {
        private int[] id;

        public QuickFindUF(int N)
        {
            id = new int[N];
            for (int i = 0; i < N; i++) id[i] = i;
        }

        public bool Connected(int p, int q) // Find
        {
            return id[p] == id[q];
        }

        public void Union(int p, int q)
        {
            int pid = id[p];
            int qid = id[q];

            for (int i = 0; i < id.Length; i++)
            {
                if (id[i] == pid) id[i] = qid;
            }
        }
    }
Now, if we wanted to be a little more efficient, we could setup the connections to be in a tree format. Each computer will reference it's parent's number as an ID. The root computers will reference themselves as their ID, so you can check for that to tell if a computer is a root of a tree.
    /// 
    /// Faster than QuickFindUF, but still too slow.
    /// Too expensive because trees can get too tall or too flat (extremes).
    /// 
    public class QuickUnionUF
    {
        private int[] id;

        public QuickUnionUF(int N)
        {
            id = new int[N];
            for (int i = 0; i < N; i++) id[i] = i;
        }

        private int Root(int i)
        {
            // Chase parent ID until reach the root (depth of i array accesses)
            while (i != id[i]) i = id[i];
            return i;
        }

        public bool Connected(int p, int q) // Find
        {
            return Root(p) == Root(q);
        }

        public void Union(int p, int q)
        {
            int i = Root(p);
            int j = Root(q);
            id[i] = j;
        }
    }
Lastly, we could have balanced trees to ensure we have no extremely tall trees. A component is how many groups of computers that we have a component could be 1 computer by itself, or it could be all of the computers in a component if every computer has been networked together at some point with the Union() method. We should keep track of the size of each component and how many total components we have in our data set.
/// 
    /// Keep track of tree size and take steps to avoid having tall trees.
    /// Smaller tree will connect to the root of the larger tree.
    /// Java implementation on p. 228
    /// Depth of any node x is at most lg N (lg = base-2 logarithms)
    /// If N is 1,000 its 10 (2^10 = 1000)
    /// If N is 1,000,000 its 20 (2^20)
    /// If N is 1,000,000,000 its 30 (2^30)
    /// Can also add path compression: just after computing root of p, 
    /// set the id of each examined node to point to that root
    /// 
    public class QuickUnionUF2 // Weighted Quick Union
    {
        private int[] id;
        private int[] sz;
        private int count; // number of components

        public QuickUnionUF2(int N)
        {
            id = new int[N];
            for (int i = 0; i < N; i++) id[i] = i;

            sz = new int[N];
            for (int i = 0; i < N; i++) sz[i] = 1;
        }

        public int Count() { return count;  }

        private int Root(int i)
        {
            // Chase parent ID until reach the root (depth of i array accesses)
            while (i != id[i])
            {
                // Halves path length. No reason not to, keeps tree almost completely flat.
                id[i] = id[id[i]]; // Only 1 extra line of code to do path compression improvement! 
                
                i = id[i];
            }
            return i;
        }

        public bool Connected(int p, int q) // Find
        {
            return Root(p) == Root(q);
        }

        public void Union(int p, int q)
        {
            int i = Root(p);
            int j = Root(q);

            if (i == j) return;

            // Make smaller root point to the larger root.
            if (sz[i] < sz[j]) // if i is smaller than j
            {
                id[i] = j; // point i to new root j
                sz[j] += sz[i]; // add the count of the smaller array to the count of the larger array
            }
            else // if j is smaller than i
            {
                id[j] = i;
                sz[i] += sz[j];
            }

            count--; // if we combine components, the count of components goes down by 1
        }
    }