In a previous section we outlined what the distribution center problem is, explaining that the end goal is to find a node in a tree which minimizes the time needed to send goods to all other nodes. In that previous section we explained how the concept of priority queues is important to solving our problem, but careful readers would likely not be satisfied; many typical priority queue operations are worse than linear. So in this section we will hopefully satisfy that curiosity.
A reminder of the overall algorithm:
We will work by starting at the leaves and processing inward until we find our center. We assign all leaves a value of zero, and parent nodes will have a label dependent on the values and count of their children.
The Data Structure:
We will take the edges of the tree as input, noting that this is linear in the number of vertices. From this edge list we will build a flexible, linked, data structure to optimize the types of tree traversal we want to do later in the algorithm.
We start with an array of linked lists. Each node will have a unique index in the array. Then, for each edge from A to B, we will add B to A’s list, and A to B’s list. Lastly, we will link B’s list element to A’s list element, making it easier to jump to specific locations in each linked list.
That likely sounded confusing, so hopefully a diagram will help:
Corresponding Edge List:
How to use this structure:
Start by making an array of “buckets” which we are going to use to order the nodes by their values. A node will be added to the array whenever we can accurately calculate its value. Effectively this means that we will be adding leaf nodes to the buckets first, then their parents, and so on. We are going to go through each bucket only once, and we will start by assigning all the leaf nodes to bucket zero. We can identify leaf nodes by looking at the length of each node’s edge list; an edge list of length one implies a leaf node.
When a node is processed, we remove it’s edges from the edge list structure. For example, if we processed leaf node D, we would remove its edge to B as well as B’s edge to D.
After removing the processed edges, the process repeats, any list with a length of 1 is added to the array of buckets and the process continues.
The last thing to explain is exactly how we are going to keep track of the value of nodes while we are processing the tree. For this we will use yet another array, this one keeping track of the intermediate results for each node. When a leaf node is processed, it will update the intermediate result of its parent. This can all be done in constant time, and in a method similar to how we described when originally talking about the distribution center algorithm.
Overall Running Time:
So step one is to build the edge list, bucket array, and intermediate array. For each of these we only need at most an array length equal to the number of nodes. Building the edge list also adds links for each edge, but because we have a tree, we have N-1 edges. Because setting up the links for edges can be done in constant time, the overall initialization process is linear.
Step two is to put leaf nodes in the zero bucket. This is linear because we simply need to scan through the edge list once.
Lastly, we iterate through the buckets, and for each node in a bucket, we remove its corresponding edges and update the intermediate array. Updating the array is constant, removing the edge is constant (thanks to our edge list data structure) and adding any eligible parent nodes is also constant (because the only eligible nodes will be linked to by the leaf nodes we are removing; there is no need to search for them). This overall process is linear because once a node is processed, it will never appear in the bucket list a second time.
Because all steps of this algorithm are linear, we can clearly say that the distribution problem is solvable in linear time!