2 research outputs found
MPU: Towards Bandwidth-abundant SIMT Processor via Near-bank Computing
With the growing number of data-intensive workloads, GPU, which is the
state-of-the-art single-instruction-multiple-thread (SIMT) processor, is
hindered by the memory bandwidth wall. To alleviate this bottleneck, previously
proposed 3D-stacking near-bank computing accelerators benefit from abundant
bank-internal bandwidth by bringing computations closer to the DRAM banks.
However, these accelerators are specialized for certain application domains
with simple architecture data paths and customized software mapping schemes.
For general purpose scenarios, lightweight hardware designs for diverse data
paths, architectural supports for the SIMT programming model, and end-to-end
software optimizations remain challenging.
To address these issues, we propose MPU (Memory-centric Processing Unit), the
first SIMT processor based on 3D-stacking near-bank computing architecture.
First, to realize diverse data paths with small overheads while leveraging
bank-level bandwidth, MPU adopts a hybrid pipeline with the capability of
offloading instructions to near-bank compute-logic. Second, we explore two
architectural supports for the SIMT programming model, including a near-bank
shared memory design and a multiple activated row-buffers enhancement. Third,
we present an end-to-end compilation flow for MPU to support CUDA programs. To
fully utilize MPU's hybrid pipeline, we develop a backend optimization for the
instruction offloading decision. The evaluation results of MPU demonstrate
3.46x speedup and 2.57x energy reduction compared with an NVIDIA Tesla V100 GPU
on a set of representative data-intensive workloads
A Survey of Resource Management for Processing-in-Memory and Near-Memory Processing Architectures
Due to amount of data involved in emerging deep learning and big data
applications, operations related to data movement have quickly become the
bottleneck. Data-centric computing (DCC), as enabled by processing-in-memory
(PIM) and near-memory processing (NMP) paradigms, aims to accelerate these
types of applications by moving the computation closer to the data. Over the
past few years, researchers have proposed various memory architectures that
enable DCC systems, such as logic layers in 3D stacked memories or charge
sharing based bitwise operations in DRAM. However, application-specific memory
access patterns, power and thermal concerns, memory technology limitations, and
inconsistent performance gains complicate the offloading of computation in DCC
systems. Therefore, designing intelligent resource management techniques for
computation offloading is vital for leveraging the potential offered by this
new paradigm. In this article, we survey the major trends in managing PIM and
NMP-based DCC systems and provide a review of the landscape of resource
management techniques employed by system designers for such systems.
Additionally, we discuss the future challenges and opportunities in DCC
management.Comment: Accepted to appear in Journal of Low Power Electronics and
Application