440 research outputs found
Rocket: Efficient and Scalable All-Pairs Computations on Heterogeneous Platforms
All-pairs compute problems apply a user-defined function to each combination
of two items of a given data set. Although these problems present an abundance
of parallelism, data reuse must be exploited to achieve good performance.
Several researchers considered this problem, either resorting to partial
replication with static work distribution or dynamic scheduling with full
replication. In contrast, we present a solution that relies on hierarchical
multi-level software-based caches to maximize data reuse at each level in the
distributed memory hierarchy, combined with a divide-and-conquer approach to
exploit data locality, hierarchical work-stealing to dynamically balance the
workload, and asynchronous processing to maximize resource utilization. We
evaluate our solution using three real-world applications (from digital
forensics, localization microscopy, and bioinformatics) on different platforms
(from a desktop machine to a supercomputer). Results shows excellent efficiency
and scalability when scaling to 96 GPUs, even obtaining super-linear speedups
due to a distributed cache
- …