Search CORE

2 research outputs found

Fine-grained Benchmark Subsetting for System Selection

Author: Chadi Akel
Mihail Popov
Oliveira Castro
Pablo De
William Jalby
Yuriy Kashnikov
Publication venue
Publication date: 23/04/2020
Field of study

ABSTRACT System selection aims at finding the best architecture for a set of programs and workloads. It traditionally requires long running benchmarks. We propose a method to reduce the cost of system selection. We break down benchmarks into elementary fragments of source code, called codelets. Then, we identify two causes of redundancy: first, similar codelets; second, codelets called repeatedly. The key idea is to minimize redundancy inside the benchmark suite to speed it up. For each group of similar codelets, only one representative is kept. For codelets called repeatedly and for which the performance does not vary across calls, the number of invocations is reduced. Given an initial benchmark suite, our method produces a set of reduced benchmarks that can be used in place of the original one for system selection. We evaluate our method on the NAS SER benchmarks, producing a reduced benchmark suite 30 times faster in average than the original suite, with a maximum of 44 times. The reduced suite predicts the execution time on three target architectures with a median error between 3.9% and 8%

CiteSeerX

Evaluating Architecture and Compiler Design through Static Loop Analysis

Author: de Oliveira Castro Pablo
Jalby William
Kashnikov Yuriy
Oseret Emmanuel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2013
Field of study

International audienceUsing the MAQAO loop static analyzer, we characterize a corpus of binary loops extracted from common benchmark suits such as SPEC, NAS, etc. and several industrial applications. For each loop, MAQAO extracts low-level assembly features such as: integer and floating-point vectorization ratio, number of registers used and spill-fill, number of concurrent memory streams accessed, etc. The distributions of these features on a large representative code corpus can be used to evaluate compilers and architectures and tune them for the most frequently used assembly patterns. In this paper, we present the MAQAO loop analyzer and a characterization of the 4857 binary loops. We evaluate register allocation and vectorization on two compilers and propose a method to tune loop buffer size and stream prefetcher based on static analysis of benchmarks

HAL UVSQ