Active-Routing: Parallelization and Scheduling of 3D-Memory Vault Computations

Abstract

In an age where big data is more available than ever, new high-bandwidth, low-latency memory technology, such as Hybrid Memory Cubes (HMC), have extended into the third dimension to tighten the increasing gap between memory and CPU speeds. Processing power built into these new 3D memory technologies allows CPU cores to offload computations to memory, leading to recent interest in the design space of Processing-In-Memory (PIM) when several HMC units are chained together in a network. Using topology-oblivious Active-Routing technique in such a network, computations like dot products over a large set of data can be distributed across a virtual "tree" such that partial results are compounded at every branch "on the way" back to the CPU. We propose driving performance of Active-Routing by offloading computations to memory with high throughput offloading techniques. We present Vault-Level Parallelism to further parallelize computations by strategically dispatching computations to DRAM vault controllers within each HMC. Our new implementation distributes the resources of Active-Routing to each of the vault controllers in the HMC so as to reduce contention for compute resources. We simulate our implemented techniques and assess their performance using previously developed micro-benchmarks and a widely accepted benchmark in scientific computing. The evaluation results show an increase in overall data throughout the Active-Routing Tree with an aggregate 23x speedup

    Similar works