8,794 research outputs found
Near-Memory Address Translation
Memory and logic integration on the same chip is becoming increasingly cost
effective, creating the opportunity to offload data-intensive functionality to
processing units placed inside memory chips. The introduction of memory-side
processing units (MPUs) into conventional systems faces virtual memory as the
first big showstopper: without efficient hardware support for address
translation MPUs have highly limited applicability. Unfortunately, conventional
translation mechanisms fall short of providing fast translations as
contemporary memories exceed the reach of TLBs, making expensive page walks
common.
In this paper, we are the first to show that the historically important
flexibility to map any virtual page to any page frame is unnecessary in today's
servers. We find that while limiting the associativity of the
virtual-to-physical mapping incurs no penalty, it can break the
translate-then-fetch serialization if combined with careful data placement in
the MPU's memory, allowing for translation and data fetch to proceed
independently and in parallel. We propose the Distributed Inverted Page Table
(DIPTA), a near-memory structure in which the smallest memory partition keeps
the translation information for its data share, ensuring that the translation
completes together with the data fetch. DIPTA completely eliminates the
performance overhead of translation, achieving speedups of up to 3.81x and
2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure
Performance Implications of NoCs on 3D-Stacked Memories: Insights from the Hybrid Memory Cube
Memories that exploit three-dimensional (3D)-stacking technology, which
integrate memory and logic dies in a single stack, are becoming popular. These
memories, such as Hybrid Memory Cube (HMC), utilize a network-on-chip (NoC)
design for connecting their internal structural organizations. This novel usage
of NoC, in addition to aiding processing-in-memory capabilities, enables
numerous benefits such as high bandwidth and memory-level parallelism. However,
the implications of NoCs on the characteristics of 3D-stacked memories in terms
of memory access latency and bandwidth have not been fully explored. This paper
addresses this knowledge gap by (i) characterizing an HMC prototype on the
AC-510 accelerator board and revealing its access latency behaviors, and (ii)
by investigating the implications of such behaviors on system and software
designs
HIPE: HMC Instruction Predication Extension Applied on Database Processing
The recent Hybrid Memory Cube (HMC) is a smart memory which includes functional units inside one logic layer of the 3D stacked memory design. In order to execute instructions inside the Hybrid Memory Cube (HMC), the processor needs to send instructions to be executed near data, keeping most of the pipeline complexity inside the processor. Thus, control-flow and data-flow dependencies are all managed inside the processor, in such way that only update instructions are supported by the HMC. In order to solve data-flow dependencies inside the memory, previous work proposed HMC Instruction Vector Extensions (HIVE), which embeds a high number of functional units with a interlock register bank. In this work we propose HMC Instruction Prediction Extensions (HIPE), that supports predicated execution inside the memory, in order to transform control-flow dependencies into data-flow dependencies. Our mechanism focus on removing the high latency iteration between the processor and the smart memory during the execution of branches that depends on data processed inside the memory. In this paper we evaluate a balanced design of HIVE comparing to x86 and HMC executions. After we show the HIPE mechanism results when executing a database workload, which is a strong candidate to use smart memories. We show interesting trade-offs of performance when comparing our mechanism to previous work
HPC Accelerators with 3D Memory
ArtĂculo invitado, publicado en las actas del congreso por IEEE Society Press. Páginas 320 a 328. ISBN: 978-1-5090-3593-9.DOI 10.1109/CSE-EUC-DCABES-2016.203After a decade evolving in the High Performance Computing arena, GPU-equipped supercomputers have con- quered the top500 and green500 lists, providing us unprecedented levels of computational power and memory bandwidth. This year, major vendors have introduced new accelerators based on 3D memory, like Xeon Phi Knights Landing by Intel and Pascal architecture by Nvidia. This paper reviews hardware features of those new HPC accelerators and unveils potential performance for scientific applications, with an emphasis on Hybrid Memory Cube (HMC) and High Bandwidth Memory (HBM) used by commercial products according to roadmaps already announced.Universidad de Málaga. Campus de Excelencia Internacional Andalucia Tec
- …