27,055 research outputs found

    Memory hierarchy characterization of SPEC CPU2006 and SPEC CPU2017 on the Intel Xeon Skylake-SP

    Get PDF
    SPEC CPU is one of the most common benchmark suites used in computer architecture research. CPU2017 has recently been released to replace CPU2006. In this paper we present a detailed evaluation of the memory hierarchy performance for both the CPU2006 and single-threaded CPU2017 benchmarks. The experiments were executed on an Intel Xeon Skylake-SP, which is the first Intel processor to implement a mostly non-inclusive last-level cache (LLC). We present a classification of the benchmarks according to their memory pressure and analyze the performance impact of different LLC sizes. We also test all the hardware prefetchers showing they improve performance in most of the benchmarks. After comprehensive experimentation, we can highlight the following conclusions: i) almost half of SPEC CPU benchmarks have very low miss ratios in the second and third level caches, even with small LLC sizes and without hardware prefetching, ii) overall, the SPEC CPU2017 benchmarks demand even less memory hierarchy resources than the SPEC CPU2006 ones, iii) hardware prefetching is very effective in reducing LLC misses for most benchmarks, even with the smallest LLC size, and iv) from the memory hierarchy standpoint the methodologies commonly used to select benchmarks or simulation points do not guarantee representative workloads

    Memory Centric Characterization and Analysis of SPEC CPU2017 Suite

    Full text link
    In this paper we provide a comprehensive, memory-centric characterization of the SPEC CPU2017 benchmark suite, using a number of mechanisms including dynamic binary instrumentation, measurements on native hardware using hardware performance counters and OS based tools. We present a number of results including working set sizes, memory capacity consumption and, memory bandwidth utilization of various workloads. Our experiments reveal that the SPEC CPU2017 workloads are surprisingly memory intensive, with approximately 50% of all dynamic instructions being memory intensive ones. We also show that there is a large variation in the memory footprint and bandwidth utilization profiles of the entire suite, with some benchmarks using as much as 16 GB of main memory and up to 2.3 GB/s of memory bandwidth. We also perform instruction execution and distribution analysis of the suite and find that the average instruction count for SPEC CPU2017 workloads is an order of magnitude higher than SPEC CPU2006 ones. In addition, we also find that FP benchmarks of the SPEC 2017 suite have higher compute requirements: on average, FP workloads execute three times the number of compute operations as compared to INT workloads.Comment: 12 pages, 133 figures, A short version of this work has been published at "Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering

    Quantitative Performance Analysis of the SPEC OMPM2001 Benchmarks

    Get PDF

    CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code

    Full text link
    We present an instrumenting compiler for enforcing data confidentiality in low-level applications (e.g. those written in C) in the presence of an active adversary. In our approach, the programmer marks secret data by writing lightweight annotations on top-level definitions in the source code. The compiler then uses a static flow analysis coupled with efficient runtime instrumentation, a custom memory layout, and custom control-flow integrity checks to prevent data leaks even in the presence of low-level attacks. We have implemented our scheme as part of the LLVM compiler. We evaluate it on the SPEC micro-benchmarks for performance, and on larger, real-world applications (including OpenLDAP, which is around 300KLoC) for programmer overhead required to restructure the application when protecting the sensitive data such as passwords. We find that performance overheads introduced by our instrumentation are moderate (average 12% on SPEC), and the programmer effort to port OpenLDAP is only about 160 LoC.Comment: Technical report for CONFLLVM: A Compiler for Enforcing Data Confidentiality in Low-Level Code, appearing at EuroSys 201

    Nonaxisymmetric, multi-region relaxed magnetohydrodynamic equilibrium solutions

    Full text link
    We describe a magnetohydrodynamic (MHD) constrained energy functional for equilibrium calculations that combines the topological constraints of ideal MHD with elements of Taylor relaxation. Extremizing states allow for partially chaotic magnetic fields and non-trivial pressure profiles supported by a discrete set of ideal interfaces with irrational rotational transforms. Numerical solutions are computed using the Stepped Pressure Equilibrium Code, SPEC, and benchmarks and convergence calculations are presented.Comment: Submitted to Plasma Physics and Controlled Fusion for publication with a cluster of papers associated with workshop: Stability and Nonlinear Dynamics of Plasmas, October 31, 2009 Atlanta, GA on occasion of 65th birthday of R.L. Dewar. V2 is revised for referee

    Memory Performance Characterization of SPEC CPU2006 Benchmarks Using TSIM

    Get PDF
    AbstractThis paper uses TSIM, a cycle accurate architecture simulator, to characterize the memory performance of SPEC CPU2006 Benchmarks under CMP platform. The experiment covers 54 workloads with different input sets, and collects statistical information of instruction mixture and cache behaviors. By detecting the cyclical changes of MPKI, this paper clearly shows the memory performance phases of some SPEC CPU2006 programs. These performance data and analysis results can not only help program developers and architects understand the memory performance caused by system architecture better, but also guide them in software and system optimization

    Improving Uniformity of Cache Access Pattern using Split Data Caches

    Get PDF
    In this paper we show that partitioning data cache into array and scalar caches can improve cache access pattern without having to remap data, while maintaining the constant access time of a direct-mapped cache and improving the performance of L-1 cache memories. By using 4 central moments (mean, standard-deviation, skewness and kurtosis) we report on the frequency of accesses to cache sets and show that split data caches significantly mitigate the problem of non-uniform accesses to cache sets for several embedded benchmarks (from MiBench) and some SPEC benchmarks

    Workload generation for microprocessor performance evaluation

    Get PDF
    This PhD thesis [1], awarded with the SPEC Distinguished Dissertation Award 2011, proposes and studies three workload generation and reduction techniques for microprocessor performance evaluation. (1) The thesis proposes code mutation, a novel methodology for hiding proprietary information from computer programs while maintaining representative behavior; code mutation enables dissemination of proprietary applications as benchmarks to third parties in both academia and industry. (2) It contributes to sampled simulation by proposing NSL-BLRL, a novel warm-up technique that reduces simulation time by an order of magnitude over state-of-the-art. (3) It presents a benchmark synthesis framework for generating synthetic benchmarks from a set of desired program statistics. The benchmarks are generated in a high-level programming language, which enables both compiler and hardware exploration
    • …
    corecore