Various Implicit Tree Structures

So when we looked at implicit structures we talked about heaps. With a heap, we had a simple array that held our binary tree. In many situations, this is the most efficient way to store a binary tree. But what can we do if we have trees of arbitrary shape? Of course, we can always make nodes linked with pointers and throw that all in the heap, but what if we want a more elegant solution?

For the following methods, consider the following tree:

Farley’s Structure:

Let’s start with an interesting, if not necessarily helpful data structure created by Art Farley in the University of Oregon.  Here a tree is going to be stored as an array of pairs (or an array of size N by 2).  

For each node we will store two things, the number of adjacent nodes, and the sum of those node’s labels. In the following table, the blue column shows the node’s label, and the yellow columns show the pairs. Note that in a program we would not explicitly store the blue column.

We can recover the original tree by doing some smart manipulations to this data. Firstly, if a row contains a 1 for the number of adjacent neighbors, then we can conclude it is a leaf node. Note that if there is only one adjacent neighbor then the sum of neighbor index column really just contains the label of that node’s parent. For example, we can read in our table that node 8’s parent is node 5.

Next, we can subtract each leaf node’s label from its parent. For example, we can subtract node 9 and 10 from node 6. Doing so will decrement node 6’s adjacent node count by two, and set its neighbor label sum to 2. Unsurprisingly, the result shows that node 6’s parent is node 2.

Granted, if you believe this may be more complicated than its worth, you might be right, a simpler and more efficient implicit tree structure is next. Still, you never know when techniques like this will show up in your own projects, and it’s important to have a large conceptual toolbox.

The Prufer Array:

This one is really simple, and seems to have been discovered by many people independently, but Prufer was the first one published. In fact, you may have even already thought of this while reading the previous section. We store the tree as an array of size N, where each cell index corresponds to a unique node, and each cells value corresponds to that node’s parent. We will use the value of zero to denote the root of the tree. Although the root can be anywhere, it is common to reindex the tree such that the root is the first index. Then, we don’t even need to store the root itself!

The following table shows our tree’s Prufer array, assuming node 2 is the root. Once again, the blue values wouldn’t actually be stored in a computer.

Now for its simplicity and memory efficiency, it is worth noting that this method does have a tradeoff with Farley’s structure we mentioned previously. It is much harder to identify leaf nodes in a Pruffer array. Before we could see that any node with only one adjacent neighbor must be a leaf node, but here we need to read the entire array and record which nodes have children and which don’t. That being said, in most cases the Prufer array is the best choice, especially when memory is a top priority.