Workstation clusters equipped with high performance in-terconnect having programmable network processors facili-tate interesting opportunities to enhance the performance of parallel application run on them. In this paper, we propose schemes where certain application level processing in par-allel database query execution is performed on the network processor. We evaluate the performance of TPC-H queries executing on a high end cluster where all tuple processing is done on the host processor, using a timed Petri net model, and find that tuple processing costs on the host processor dominate the execution time. These results are validated using a small cluster. We therefore propose 4 schemes where certain tuple pro-cessing activity is offloaded to the network processor. The first 2 schemes offload the tuple splitting activity – compu-tation to identify the node on which to process the tuples, resulting in an execution time speedup of 1.09 relative to the base scheme, but with I/O bus becoming the bottleneck resource. In the 3rd scheme in addition to offloading tu-ple processing activity, the disk and network interface are combined to avoid the I/O bus bottleneck, which results in speedups upto 1.16, but with high host processor utilization. Our 4th scheme where the network processor also performs a part of join operation along with the host processor, gives a speedup of 1.47 along with balanced system resource uti-lizations. Further we observe that the proposed schemes perform equally well even in a scaled architecture i.e., when the number of processors is increased from 2 to 64.
To submit an update or takedown request for this paper, please submit an Update/Correction/Removal Request.