Search CORE

15,349 research outputs found

Umzi: Unified Multi-Zone Indexing for Large-Scale HTAP

Author: Barber Ronald
Luo Chen
Raman Vijayshankar
Sidle Richard
Tian Yuanyuan
Tözün Pinar
Publication venue
Publication date: 01/01/2019
Field of study

The IT University of Copenhagen's Repository

Speculative Segmented Sum for Sparse Matrix-Vector Multiplication on Heterogeneous Processors

Author: Liu Weifeng
Vinter Brian
Publication venue: 'Elsevier BV'
Publication date: 14/09/2015
Field of study

Sparse matrix-vector multiplication (SpMV) is a central building block for scientific software and graph applications. Recently, heterogeneous processors composed of different types of cores attracted much attention because of their flexible core configuration and high energy efficiency. In this paper, we propose a compressed sparse row (CSR) format based SpMV algorithm utilizing both types of cores in a CPU-GPU heterogeneous processor. We first speculatively execute segmented sum operations on the GPU part of a heterogeneous processor and generate a possibly incorrect results. Then the CPU part of the same chip is triggered to re-arrange the predicted partial sums for a correct resulting vector. On three heterogeneous processors from Intel, AMD and nVidia, using 20 sparse matrices as a benchmark suite, the experimental results show that our method obtains significant performance improvement over the best existing CSR-based SpMV algorithms. The source code of this work is downloadable at https://github.com/bhSPARSE/Benchmark_SpMV_using_CSRComment: 22 pages, 8 figures, Published at Parallel Computing (PARCO

arXiv.org e-Print Archive

Copenhagen University Research Information System

Near-Memory Address Translation

Author: Falsafi Babak
Jevdjic Djordje
Picorel Javier
Publication venue
Publication date: 21/08/2017
Field of study

Memory and logic integration on the same chip is becoming increasingly cost effective, creating the opportunity to offload data-intensive functionality to processing units placed inside memory chips. The introduction of memory-side processing units (MPUs) into conventional systems faces virtual memory as the first big showstopper: without efficient hardware support for address translation MPUs have highly limited applicability. Unfortunately, conventional translation mechanisms fall short of providing fast translations as contemporary memories exceed the reach of TLBs, making expensive page walks common. In this paper, we are the first to show that the historically important flexibility to map any virtual page to any page frame is unnecessary in today's servers. We find that while limiting the associativity of the virtual-to-physical mapping incurs no penalty, it can break the translate-then-fetch serialization if combined with careful data placement in the MPU's memory, allowing for translation and data fetch to proceed independently and in parallel. We propose the Distributed Inverted Page Table (DIPTA), a near-memory structure in which the smallest memory partition keeps the translation information for its data share, ensuring that the translation completes together with the data fetch. DIPTA completely eliminates the performance overhead of translation, achieving speedups of up to 3.81x and 2.13x over conventional translation using 4KB and 1GB pages respectively.Comment: 15 pages, 9 figure

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Crossref