Search CORE

4 research outputs found

Quantifying the Effect of Matrix Structure on Multithreaded Performance of the SpMV Kernel

Author: Keltcher Paul
Kimball Daniel
Michel Elizabeth
Wolf Michael M.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 30/07/2014
Field of study

Sparse matrix-vector multiplication (SpMV) is the core operation in many common network and graph analytics, but poor performance of the SpMV kernel handicaps these applications. This work quantifies the effect of matrix structure on SpMV performance, using Intel's VTune tool for the Sandy Bridge architecture. Two types of sparse matrices are considered: finite difference (FD) matrices, which are structured, and R-MAT matrices, which are unstructured. Analysis of cache behavior and prefetcher activity reveals that the SpMV kernel performs far worse with R-MAT matrices than with FD matrices, due to the difference in matrix structure. To address the problems caused by unstructured matrices, novel architecture improvements are proposed.Comment: 6 pages, 7 figures. IEEE HPEC 201

arXiv.org e-Print Archive

Crossref

embedded DRAM,

Author: Paul Keltcher
Stephen Richardson
Stuart Siu
Publication venue
Publication date
Field of study

cache hierarchy, pageable memory © Copyright Hewlett-Packard Company 2000 Recent architectures in academia and industry have explored placing multiple processors on a single chip, but a consensus has not emerged on the memory architecture. The recent availability of embedded DRAM (EDRAM) has further complicated the formula. In this investigation, we present a new and comprehensive comparison of four very different memory technologies in the same framework: SRAM cache, SRAM configured as pageable memory, EDRAM configured as cache, and EDRAM configured as pageable memory. In addition, these experiments investigate tradeoffs between two levels of on-chip memory, given constant silicon area: as the level one capacity increases, the level two capacity decreases. Having four processors on a single die, each with its own set of level one caches, helps exaggerate the effective memory tradeoffs

CiteSeerX