Search CORE

27,055 research outputs found

Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP

Author: Alastruey-Benedé Jesús
Ibáñez-Marín Pablo
Navarro-Torres Agustín
Viñals-Yúfera Víctor
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2019
Field of study

SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads

Repositorio Universidad de Zaragoza

Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

Author: De Melo Arnaldo Carvalho
Emre
Hasan
Jin Youngbin
Limaye A.
Manu Awasthi
Qinzhe
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 30/09/2019
Field of study

In this paper we provide a comprehensive, memory-centric characterization of the SPEC CPU2017 benchmark suite, using a number of mechanisms including dynamic binary instrumentation, measurements on native hardware using hardware performance counters and OS based tools. We present a number of results including working set sizes, memory capacity consumption and, memory bandwidth utilization of various workloads. Our experiments reveal that the SPEC CPU2017 workloads are surprisingly memory intensive, with approximately 50% of all dynamic instructions being memory intensive ones. We also show that there is a large variation in the memory footprint and bandwidth utilization profiles of the entire suite, with some benchmarks using as much as 16 GB of main memory and up to 2.3 GB/s of memory bandwidth. We also perform instruction execution and distribution analysis of the suite and find that the average instruction count for SPEC CPU2017 workloads is an order of magnitude higher than SPEC CPU2006 ones. In addition, we also find that FP benchmarks of the SPEC 2017 suite have higher compute requirements: on average, FP workloads execute three times the number of compute operations as compared to INT workloads.Comment: 12 pages, 133 figures, A short version of this work has been published at "Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

arXiv.org e-Print Archive

Crossref

Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

Author
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2003
Field of study

Crossref

CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code

Author: Bhatu P.
Brahmakshatriya A.
Garg D.
Kedia P.
Lal A.
McKee D.
Nemati H.
Panda A.
Rastogi A.
Publication venue
Publication date: 01/01/2019
Field of study

We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code, appearing at EuroSys 201

arXiv.org e-Print Archive

CISPA – Helmholtz-Zentrum für Informationssicherheit

Crossref

MPG.PuRe

Nonaxisymmetric, multi-region relaxed magnetohydrodynamic equilibrium solutions

Author: Dewar R. L.
Hole M. J.
Hudson S. R.
McGann M.
Publication venue: 'IOP Publishing'
Publication date: 05/10/2011
Field of study

We describe a magnetohydrodynamic (MHD) constrained energy functional for equilibrium calculations that combines the topological constraints of ideal MHD with elements of Taylor relaxation. Extremizing states allow for partially chaotic magnetic fields and non-trivial pressure profiles supported by a discrete set of ideal interfaces with irrational rotational transforms. Numerical solutions are computed using the Stepped Pressure Equilibrium Code, SPEC, and benchmarks and convergence calculations are presented.Comment: Submitted to Plasma Physics and Controlled Fusion for publication with a cluster of papers associated with workshop: Stability and Nonlinear Dynamics of Plasmas, October 31, 2009 Atlanta, GA on occasion of 65th birthday of R.L. Dewar. V2 is revised for referee

arXiv.org e-Print Archive

The Australian National University

Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM

Author: Zeng Fucen
Qiao Lin
Liu Mingliang
Tang Zhizhong
Publication venue: Published by Elsevier B.V.
Publication date: 29/06/2006
Field of study

AbstractThis paper uses TSIM, a cycle accurate architecture simulator, to characterize the memory performance of SPEC CPU2006 Benchmarks under CMP platform. The experiment covers 54 workloads with different input sets, and collects statistical information of instruction mixture and cache behaviors. By detecting the cyclical changes of MPKI, this paper clearly shows the memory performance phases of some SPEC CPU2006 programs. These performance data and analysis results can not only help program developers and architects understand the memory performance caused by system architecture better, but also guide them in software and system optimization

Elsevier - Publisher Connector

Crossref

Repositori Obert de Coneixement de l'Ajuntament de Barcelona

Recommended from our members

Measuring program similarity for efficient benchmarking and performance analysis of computer systems

Author: Phansalkar Aashish S.
Publication venue
Publication date: 01/05/2007
Field of study

textComputer benchmarking involves running a set of benchmark programs to measure performance of a computer system. Modern benchmarks are developed from real applications. Applications are becoming complex and hence modern benchmarks run for a very long time. These benchmarks are also used for performance evaluation in the early design phase of microprocessors. Due to the size of benchmarks and increase in complexity of microprocessor design, the effort required for performance evaluation has increased significantly. This dissertation proposes methodologies to reduce the effort of benchmarking and performance evaluation of computer systems. Identifying a set of programs that can be used in the process of benchmarking can be very challenging. A solution to this problem can start by identifying similarity between programs to capture the diversity in their behavior before they can be considered for benchmarking. The aim of this methodology is to identify redundancy in the set of benchmarks and find a subset of representative benchmarks with the least possible loss of information. This dissertation proposes the use of program characteristics which capture the performance behavior of programs and identifies representative benchmarks applicable over a wide range of system configurations. The use of benchmark subsetting has not been restricted to academic research. Recently, the SPEC CPU subcommittee used the information derived from measuring similarity based on program behavior characteristics between different benchmark candidates as one of the criteria for selecting the SPEC CPU2006 benchmarks. The information of similarity between programs can also be used to predict performance of an application when it is difficult to port the application on different platforms. This is a common problem when a customer wants to buy the best computer system for his application. Performance of a customer's application on a particular system can be predicted using the performance scores of the standard benchmarks on that system and the similarity information between the application and the benchmarks. Similarity between programs is quantified by the distance between them in the space of the measured characteristics, and is appropriately used to predict performance of a new application using the performance scores of its neighbors in the workload space.Electrical and Computer Engineerin

Texas ScholarWorks

Improving Uniformity of Cache Access Pattern using Split Data Caches

Author: Afrin Naz
Krishna Kavi
Oluwayomi Adamo
Tomislav Janjusic
Publication venue
Publication date
Field of study

In this paper we show that partitioning data cache into array and scalar caches can improve cache access pattern without having to remap data, while maintaining the constant access time of a direct-mapped cache and improving the performance of L-1 cache memories. By using 4 central moments (mean, standard-deviation, skewness and kurtosis) we report on the frequency of accesses to cache sets and show that split data caches significantly mitigate the problem of non-uniform accesses to cache sets for several embedded benchmarks (from MiBench) and some SPEC benchmarks

CiteSeerX

Workload generation for microprocessor performance evaluation

Author: Eeckhout Lieven
Van Ertvelde Luk
Publication venue
Publication date: 01/01/2012
Field of study

This PhD thesis [1], awarded with the SPEC Distinguished Dissertation Award 2011, proposes and studies three workload generation and reduction techniques for microprocessor performance evaluation. (1) The thesis proposes code mutation, a novel methodology for hiding proprietary information from computer programs while maintaining representative behavior; code mutation enables dissemination of proprietary applications as benchmarks to third parties in both academia and industry. (2) It contributes to sampled simulation by proposing NSL-BLRL, a novel warm-up technique that reduces simulation time by an order of magnitude over state-of-the-art. (3) It presents a benchmark synthesis framework for generating synthetic benchmarks from a set of desired program statistics. The benchmarks are generated in a high-level programming language, which enables both compiler and hardware exploration

Crossref

Ghent University Academic Bibliography