Implicit Data Structures

All too often, when dealing with data structures, we are concerned about complicated pointer structures and many levels of indirection. For example, when we discussed priority queues, we focused heavily on the linking structure of binomial and Fibonacci trees. Now pointers are important and often make for fast structures, but sometimes it is important to have simpler designs ready for times when pointers get in the way.

Storing data to disk, transmitting data between machines, or simply reducing the brain power needed to follow your code; these are all times when its worth switching to implicit data structures.

What are they?

By “Implicit” I mean that the data has implied structure rather than one which is clearly evident. C programmers know about implicit data structures because they use them frequently. In C, if you write “short x = 65; char c = x” you would have the number 65 in the variable x and the letter ‘A’ in the variable c. Both x and c store the binary value 010000012, but interpret that binary value differently to produce different results. This is only exasperated when considering floats. Consider the following C++ code.

float f = 0xc542a000;
unsigned int u = 0xc542a000;
int y = 0xc542a000;
cout << "F: " << f << endl
     << "U: " << u << endl
     << "Y: " << y << endl;

Although f, u, and y contain the same binary string, they represent very different values.

F: 3.30948e+09

U: 3309477888

Y: -985489408

So clearly, we use implicit data structures all the time often without really realizing it. Lets see how we can apply this concept to represent more interesting structures.

Heap Representation:

So in a basic data structures class students are eventually taught heaps, which is typically the first clearly implicit data structure a young computer scientist knows. Heaps are binary trees, but they are typically stored in an array.

By storing the heap in an array, there is no memory overhead of additional links, but this is relatively unimportant when modern systems often have memory to spare. Instead, heaps store their data in an array because it makes memory accesses constant time. This is achieved because interpreting the heap’s array as a tree is very straightforward.

If you need a reminder, if a heap node is stored in array index X then its children will be at array index 2X+1 and 2X+2. Similarly that node’s parent will be at floor(X/2) – 1. These simple equations replace the need to store pointers for every parent-child relationship, they improve cache locality of the heap, and they greatly simplify the overall structure.

Going forward, we are going to see that there are a lot of different ways to store structures implicitly, some of which are incredibly efficient provided their easily interpretable.