27,055 research outputs found
Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP
SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads
Memory Centric Characterization and Analysis of SPEC CPU2017 Suite
In this paper we provide a comprehensive, memory-centric characterization of
the SPEC CPU2017 benchmark suite, using a number of mechanisms including
dynamic binary instrumentation, measurements on native hardware using hardware
performance counters and OS based tools.
We present a number of results including working set sizes, memory capacity
consumption and, memory bandwidth utilization of various workloads. Our
experiments reveal that the SPEC CPU2017 workloads are surprisingly memory
intensive, with approximately 50% of all dynamic instructions being memory
intensive ones. We also show that there is a large variation in the memory
footprint and bandwidth utilization profiles of the entire suite, with some
benchmarks using as much as 16 GB of main memory and up to 2.3 GB/s of memory
bandwidth.
We also perform instruction execution and distribution analysis of the suite
and find that the average instruction count for SPEC CPU2017 workloads is an
order of magnitude higher than SPEC CPU2006 ones. In addition, we also find
that FP benchmarks of the SPEC 2017 suite have higher compute requirements: on
average, FP workloads execute three times the number of compute operations as
compared to INT workloads.Comment: 12 pages, 133 figures, A short version of this work has been
published at "Proceedings of the 2019 ACM/SPEC International Conference on
Performance Engineering
CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code
We present an instrumenting compiler for enforcing data confidentiality in
low-level applications (e.g. those written in C) in the presence of an active
adversary. In our approach, the programmer marks secret data by writing
lightweight annotations on top-level definitions in the source code. The
compiler then uses a static flow analysis coupled with efficient runtime
instrumentation, a custom memory layout, and custom control-flow integrity
checks to prevent data leaks even in the presence of low-level attacks. We have
implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC
micro-benchmarks for performance, and on larger, real-world applications
(including OpenLDAP, which is around 300KLoC) for programmer overhead required
to restructure the application when protecting the sensitive data such as
passwords. We find that performance overheads introduced by our instrumentation
are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP
is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data
Confidentiality in Low-Level Code, appearing at EuroSys 201
Nonaxisymmetric, multi-region relaxed magnetohydrodynamic equilibrium solutions
We describe a magnetohydrodynamic (MHD) constrained energy functional for
equilibrium calculations that combines the topological constraints of ideal MHD
with elements of Taylor relaxation.
Extremizing states allow for partially chaotic magnetic fields and
non-trivial pressure profiles supported by a discrete set of ideal interfaces
with irrational rotational transforms.
Numerical solutions are computed using the Stepped Pressure Equilibrium Code,
SPEC, and benchmarks and convergence calculations are presented.Comment: Submitted to Plasma Physics and Controlled Fusion for publication
with a cluster of papers associated with workshop: Stability and Nonlinear
Dynamics of Plasmas, October 31, 2009 Atlanta, GA on occasion of 65th
birthday of R.L. Dewar. V2 is revised for referee
Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM
AbstractThis paper uses TSIM, a cycle accurate architecture simulator, to characterize the memory performance of SPEC CPU2006 Benchmarks under CMP platform. The experiment covers 54 workloads with different input sets, and collects statistical information of instruction mixture and cache behaviors. By detecting the cyclical changes of MPKI, this paper clearly shows the memory performance phases of some SPEC CPU2006 programs. These performance data and analysis results can not only help program developers and architects understand the memory performance caused by system architecture better, but also guide them in software and system optimization
Recommended from our members
Measuring program similarity for efficient benchmarking and performance analysis of computer systems
textComputer benchmarking involves running a set of benchmark programs to measure performance of a computer system. Modern benchmarks are developed from real applications. Applications are becoming complex and hence modern benchmarks run for a very long time. These benchmarks are also used for performance evaluation in the early design phase of microprocessors. Due to the size of benchmarks and increase in complexity of microprocessor design, the effort required for performance evaluation has increased significantly. This dissertation proposes methodologies to reduce the effort of benchmarking and performance evaluation of computer systems. Identifying a set of programs that can be used in the process of benchmarking can be very challenging. A solution to this problem can start by identifying similarity between programs to capture the diversity in their behavior before they can be considered for benchmarking. The aim of this methodology is to identify redundancy in the set of benchmarks and find a subset of representative benchmarks with the least possible loss of information. This dissertation proposes the use of program characteristics which capture the performance behavior of programs and identifies representative benchmarks applicable over a wide range of system configurations. The use of benchmark subsetting has not been restricted to academic research. Recently, the SPEC CPU subcommittee used the information derived from measuring similarity based on program behavior characteristics between different benchmark candidates as one of the criteria for selecting the SPEC CPU2006 benchmarks. The information of similarity between programs can also be used to predict performance of an application when it is difficult to port the application on different platforms. This is a common problem when a customer wants to buy the best computer system for his application. Performance of a customer's application on a particular system can be predicted using the performance scores of the standard benchmarks on that system and the similarity information between the application and the benchmarks. Similarity between programs is quantified by the distance between them in the space of the measured characteristics, and is appropriately used to predict performance of a new application using the performance scores of its neighbors in the workload space.Electrical and Computer Engineerin
Improving Uniformity of Cache Access Pattern using Split Data Caches
In this paper we show that partitioning data cache into array and scalar caches can improve cache access pattern without having to remap data, while maintaining the constant access time of a direct-mapped cache and improving the performance of L-1 cache memories. By using 4 central moments (mean, standard-deviation, skewness and kurtosis) we report on the frequency of accesses to cache sets and show that split data caches significantly mitigate the problem of non-uniform accesses to cache sets for several embedded benchmarks (from MiBench) and some SPEC benchmarks
Workload generation for microprocessor performance evaluation
This PhD thesis [1], awarded with the SPEC Distinguished Dissertation Award 2011, proposes and studies three workload generation and reduction techniques for microprocessor performance evaluation. (1) The thesis proposes code mutation, a novel methodology for hiding proprietary information from computer programs while maintaining representative behavior; code mutation enables dissemination of proprietary applications as benchmarks to third parties in both academia and industry. (2) It contributes to sampled simulation by proposing NSL-BLRL, a novel warm-up technique that reduces simulation time by an order of magnitude over state-of-the-art. (3) It presents a benchmark synthesis framework for generating synthetic benchmarks from a set of desired program statistics. The benchmarks are generated in a high-level programming language, which enables both compiler and hardware exploration
- …