Data intensive applications on clusters often require requests quickly be
sent to the node managing the desired data. In many applications, one must look
through a sorted tree structure to determine the responsible node for accessing
or storing the data.
Examples include object tracking in sensor networks, packet routing over the
internet, request processing in publish-subscribe middleware, and query
processing in database systems. When the tree structure is larger than the CPU
cache, the standard implementation potentially incurs many cache misses for
each lookup; one cache miss at each successive level of the tree. As the
CPU-RAM gap grows, this performance degradation will only become worse in the
future.
We propose a solution that takes advantage of the growing speed of local area
networks for clusters. We split the sorted tree structure among the nodes of
the cluster. We assume that the structure will fit inside the aggregation of
the CPU caches of the entire cluster. We then send a word over the network (as
part of a larger packet containing other words) in order to examine the tree
structure in another node's CPU cache. We show that this is often faster than
the standard solution, which locally incurs multiple cache misses while
accessing each successive level of the tree.Comment: New version published at IEEE Cluster Computing 200