Distributed Generation of Suffix Arrays: a Quicksort-Based Approach

Abstract

An algorithm for the distributed computation of suffix arrays for large texts is presented. The parallelism model is that of a set of sequential tasks which execute in parallel and exchange messages between each other. The underlying architecture is that of a highbandwidth network of processors. In such a network, a remote memory access has a transfer time similar to the transfer time of magnetic disks (with no seek cost) which allows to use the aggregate memory distributed over the various processors as a giant cache for disks. Our algorithm takes advantage of this architectural feature to implement a quicksort-based distributed sorting procedure for building the suffix array. We show that such algorithm has computation complexity given by O(r log(n=r)+n=r log r log n) in the worst case and O(n=r log n) on average and communication complexity given by O(n=r log 2 r) in the worst case and O(n=r log r) on average, where n is the text size and r is the number of processors. This is ..

    Similar works

    Full text

    thumbnail-image

    Available Versions