Energy-efficient stream compaction through filtering and coalescing accesses in GPGPU memory partitions

Abstract

Graph-based applications are essential in emerging domains such as data analytics or machine learning. Data gathering in a knowledge-based society requires great data processing efficiency. High-throughput GPGPU architectures are key to enable efficient graph processing. Nonetheless, irregular and sparse memory access patterns present in graph-based applications induce high memory divergence and contention, which result in poor GPGPU efficiency for graph processing. Recent work has pointed out the importance of stream compaction operations, and has proposed a Stream Compaction Unit (SCU) to offload them to a specialized hardware. On the other hand, memory contention caused by high divergence has been tackled with the Irregular accesses Reorder Unit (IRU), delivering improved memory coalescing. In this paper, we propose a new unit, the IRU-enhanced SCU (ISCU), that leverages the strengths of both approaches. The ISCU employs the efficient mechanisms of the IRU to improve SCU stream compaction efficiency and throughput limitations, achieving a synergistic effect for graph processing. We evaluate the ISCU for a wide variety of state-of-the-art graph-based algorithms and applications. Results show that the ISCU achieves a performance speedup of 2.2x and 90% energy savings derived from a high reduction of 78% memory accesses, while incurring in 8.5% area overhead.Peer ReviewedPostprint (author's final draft

    Similar works