Search CORE

4 research outputs found

ShenZhen transportation system (SZTS): a novel big data benchmark suite

Author: Bei Zhengdong
Eeckhout Lieven
Xiong Wen
Xu Chengzhong
Yu Zhibin
Zhang Fan
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Data analytics is at the core of the supply chain for both products and services in modern economies and societies. Big data workloads, however, are placing unprecedented demands on computing technologies, calling for a deep understanding and characterization of these emerging workloads. In this paper, we propose ShenZhen Transportation System (SZTS), a novel big data Hadoop benchmark suite comprised of real-life transportation analysis applications with real-life input data sets from Shenzhen in China. SZTS uniquely focuses on a specific and real-life application domain whereas other existing Hadoop benchmark suites, such as HiBench and CloudRank-D, consist of generic algorithms with synthetic inputs. We perform a cross-layer workload characterization at the microarchitecture level, the operating system (OS) level, and the job level, revealing unique characteristics of SZTS compared to existing Hadoop benchmarks as well as general-purpose multi-core PARSEC benchmarks. We also study the sensitivity of workload behavior with respect to input data size, and we propose a methodology for identifying representative input data sets

Ghent University Academic Bibliography

A methodology for speeding up matrix vector multiplication for single/multi-core architectures

Author: Angeliki Kritikakou
B Hendrickson
Costas Goutis
Elissavet Papadima
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
MA Wani
N Fujimoto
N Zhang
P Kulkarni
RC Whaley
RC Whaley
SM Bhandarkar
SM Bhandarkar
SM Bhandarkar
Vasilios Kelefouras
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 29/03/2015
Field of study

In this paper, a new methodology for computing the Dense Matrix Vector Multiplication, for both embedded (processors without SIMD unit) and general purpose processors (single and multi-core processors, with SIMD unit), is presented. This methodology achieves higher execution speed than ATLAS state-of-the-art library (speedup from 1.2 up to 1.45). This is achieved by fully exploiting the combination of the software (e.g., data reuse) and hardware parameters (e.g., data cache associativity) which are considered simultaneously as one problem and not separately, giving a smaller search space and high-quality solutions. The proposed methodology produces a different schedule for different values of the (i) number of the levels of data cache; (ii) data cache sizes; (iii) data cache associativities; (iv) data cache and main memory latencies; (v) data array layout of the matrix and (vi) number of cores

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Sheffield Hallam University Research Archive

HAL Descartes

Hal-Diderot

HAL-Rennes 1

A high-performance matrix-matrix multiplication methodology for CPU and GPU architectures

Author: A. Kritikakou
B Moon
DF Bacon
F Desprez
G Shobaki
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
HR Arabnia
Iosif Mporas
J Kurzak
K Goto
KD Cooper
M Hattori
M Kulkarni
M Stephenson
M Tartara
MA Wani
N Binkert
N Nethercote
P Bjørstad
P Kulkarni
PA Kulkarni
R Nath
RC Whaley
RC Whaley
RD Blumofe
SM Bhandarkar
SM Bhandarkar
SM Bhandarkar
SS Pinter
T Austin
V Strassen
Vasilios Kelefouras
Vasilios Kolonias
VI Kelefouras
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2016
Field of study

Current compilers cannot generate code that can compete with hand-tuned code in efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in program optimization is the estimation of optimal values for parameters such as tile sizes and number of levels of tiling. The scheduling parameter values selection is a very difficult and time-consuming task, since parameter values depend on each other; this is why they are found by using searching methods and empirical techniques. To overcome this problem, the scheduling sub-problems must be optimized together, as one problem and not separately. In this paper, an MMM methodology is presented where the optimum scheduling parameters are found by decreasing the search space theoretically, while the major scheduling sub-problems are addressed together as one problem and not separately according to the hardware architecture parameters and input size; for different hardware architecture parameters and/or input sizes, a different implementation is produced. This is achieved by fully exploiting the software characteristics (e.g., data reuse) and hardware architecture parameters (e.g., data caches sizes and associativities), giving high-quality solutions and a smaller search space. This methodology refers to a wide range of CPU and GPU architectures

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

Sheffield Hallam University Research Archive

HAL Descartes

University of Hertfordshire Research Archive

Hal-Diderot

HAL-Rennes 1

Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

Author: Aristidis Sotiropoulos
C.-T. King
D. Patterson
E. Hodzic
E. Hodzic
F. Desprez
G. Goumas
Georgios Tsoukalas
H. R. Arabnia
J. Ramanujam
J. Xue
J. Xue
J.-P. Sheu
J.-P. Sheu
K. Hogstedt
M. Kandemir
Maria Athanasaki
N. J. Boden
N. Manjikian
N. Park
Nectarios Koziris
P. Boulet
P. Tsanakas
Panayiotis Tsanakas
S. M. Bhandarkar
T. Andronikos
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref