Tuesday, September 30, 2008

Iterative Tree Traversal

When dealing with a tree (i.e., hierarchical) data structure, recursion is typically the first tool a grizzled veteran programmer will reach for from his trusty toolbelt. After grasping air or worse, he will of course remember that recursion is not a physical tool, but a development approach that involves self-referential code.

Consider the following method, which will traverse a Tree data structure and perform some operation on each node:

Visit(Tree t)
{
DoSomethingExcitingWith(t);

foreach (Tree child in t.children)
{
Visit(child);
}
}

This is generally a concise and appropriate way to implement a tree traversal algorithm. However, in cases involving trees with an especially deep structure, recursion can use too much memory, since each method invocation requires the allocation of a new stack frame. Such times call for drastic, tedious measures, otherwise known as iteration. I don't recall ever seeing a tree traversal algorithm implemented iteratively in practice, but here is one approach to do so:

IterVisit(Tree t)
{
Stack s = new Stack();
s.Push(t);

while (s.Size() > 0)
{
t = s.Pop();
DoSomethingExcitingWith(t);

foreach Tree child in t.children
{
s.Push(child);
}
}
}

One interesting fact about this iterative implementation is that the use of a Stack implies a depth-first tree traversal, while simply changing to a Queue configures the algorithm to use a breadth-first traversal.

Friday, August 22, 2008

Oracle SQL to select one row from a matching group

Consider the following poorly organized table of location data for cities. This table, which has no primary key, is very similar to a data set that I had to work with recently on a project. The problem is that I had to display the city names on a map and, since the location data varies slightly for some cities, I originally had a cluster of overlapping names instead of a single clearly-visible city name. Needless to say, this was not acceptable and, quite frankly, looked ridiculous.

City Longitude Latitude
...
Boston, MA -71.0 42.4
Boston, MA -71.03 42.37
...
Washington, DC -77.045 38.855
Washington, DC -77.045 38.8455
Washington, DC -77.04 38.855
...

To solve this problem, I eventually came up with the following SQL query, which just arbitrarily picks one from each series of cities with matching names:

SELECT * FROM Cities
WHERE rowid IN (
SELECT MIN(rowid) FROM Cities
GROUP BY City
)

This solved the problem admirably!

Note that this query probably only works in Oracle, but should be adaptable to other DBMSs by substituting the appropriate 'rowid' construct.