Search CORE

113 research outputs found

Recommended from our members

Measuring program similarity for efficient benchmarking and performance analysis of computer systems

Author: Phansalkar Aashish S.
Publication venue
Publication date: 01/05/2007
Field of study

textComputer benchmarking involves running a set of benchmark programs to measure performance of a computer system. Modern benchmarks are developed from real applications. Applications are becoming complex and hence modern benchmarks run for a very long time. These benchmarks are also used for performance evaluation in the early design phase of microprocessors. Due to the size of benchmarks and increase in complexity of microprocessor design, the effort required for performance evaluation has increased significantly. This dissertation proposes methodologies to reduce the effort of benchmarking and performance evaluation of computer systems. Identifying a set of programs that can be used in the process of benchmarking can be very challenging. A solution to this problem can start by identifying similarity between programs to capture the diversity in their behavior before they can be considered for benchmarking. The aim of this methodology is to identify redundancy in the set of benchmarks and find a subset of representative benchmarks with the least possible loss of information. This dissertation proposes the use of program characteristics which capture the performance behavior of programs and identifies representative benchmarks applicable over a wide range of system configurations. The use of benchmark subsetting has not been restricted to academic research. Recently, the SPEC CPU subcommittee used the information derived from measuring similarity based on program behavior characteristics between different benchmark candidates as one of the criteria for selecting the SPEC CPU2006 benchmarks. The information of similarity between programs can also be used to predict performance of an application when it is difficult to port the application on different platforms. This is a common problem when a customer wants to buy the best computer system for his application. Performance of a customer's application on a particular system can be predicted using the performance scores of the standard benchmarks on that system and the similarity information between the application and the benchmarks. Similarity between programs is quantified by the distance between them in the space of the measured characteristics, and is appropriately used to predict performance of a new application using the performance scores of its neighbors in the workload space.Electrical and Computer Engineerin

Texas ScholarWorks

PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors

Author: Christian Bienia
Kai Li
Sanjeev Kumar
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

The PARSEC benchmark suite was recently released and has been adopted by a significant number of users within a short amount of time. This new collection of workloads is not yet fully under-stood by researchers. In this study we compare the SPLASH-2 and PARSEC benchmark suites with each other to gain insights into differences and similarities between the two program collections. We use standard statistical methods and machine learning to ana-lyze the suites for redundancy and overlap on Chip-Multiprocessors (CMPs). Our analysis shows that PARSEC workloads are funda-mentally different from SPLASH-2 benchmarks. The observed dif-ferences can be explained with two technology trends, the prolifer-ation of CMPs and the accelerating growth of world data

CiteSeerX

Crossref

Fex: A Software Systems Evaluator

Author: Bhatotia Pramod
Fetzer Christof
Kuvaiskii Dmitrii
Oleksenko Oleksii
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 31/08/2017
Field of study

Crossref

Edinburgh Research Explorer

Characterizing and Subsetting Big Data Workloads

Author: Han Rui
Jia Zhen
Li Jingwei
Luo Chunjie
McKee Sally A.
Wang Lei
Yang Qiang
Zhan Jianfeng
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2014
Field of study

Big data benchmark suites must include a diversity of data and workloads to be useful in fairly evaluating big data systems and architectures. However, using truly comprehensive benchmarks poses great challenges for the architecture community. First, we need to thoroughly understand the behaviors of a variety of workloads. Second, our usual simulation-based research methods become prohibitively expensive for big data. As big data is an emerging field, more and more software stacks are being proposed to facilitate the development of big data applications, which aggravates hese challenges. In this paper, we first use Principle Component Analysis (PCA) to identify the most important characteristics from 45 metrics to characterize big data workloads from BigDataBench, a comprehensive big data benchmark suite. Second, we apply a clustering technique to the principle components obtained from the PCA to investigate the similarity among big data workloads, and we verify the importance of including different software stacks for big data benchmarking. Third, we select seven representative big data workloads by removing redundant ones and release the BigDataBench simulation version, which is publicly available from http://prof.ict.ac.cn/BigDataBench/simulatorversion/.Comment: 11 pages, 6 figures, 2014 IEEE International Symposium on Workload Characterizatio

arXiv.org e-Print Archive

Crossref

Chalmers Research

Understanding Performance Inefficiencies In Native And Managed Languages

Author: Su Pengfei
Publication venue: W&M ScholarWorks
Publication date: 01/01/2020
Field of study

Production software packages have become increasingly complex with millions of lines of code, sophisticated control and data flow, and references to a hierarchy of external libraries. This complexity often introduces performance inefficiencies across software stacks, making it practically impossible for users to pinpoint them manually. Performance profiling tools (a.k.a. profilers) abound in the tools community to aid software developers in understanding program behavior. Classical profiling techniques focus on identifying hotspots. The hotspot analysis is indispensable; however, it can hardly diagnose whether a resource is being used in a productive manner that contributes to the overall efficiency of a program. Consequently, a significant burden is on developers to make a judgment call on whether there is scope to optimize a hotspot. Derived metrics, e.g., cache miss ratio, offer slightly better intuition into hotspots but are still not panaceas. Hence, there is a need for profilers that investigate resource wastage instead of usage. To overcome the critical missing pieces in prior work and complement existing profilers, we propose novel fine- and coarse-grained profilers to pinpoint varieties of performance inefficiencies and provide optimization guidance for a wide range of software covering benchmarks, enterprise applications, and large-scale parallel applications running on supercomputers and data centers. Fine-grained profilers are indispensable to understand performance inefficiencies comprehensively. We propose a whole-program profiler called LoadSpy, which works on binary executables to detect and quantify wasteful memory operations in their context and scope. Our observation, which is justified by myriad case studies, is that wasteful memory operations are often an indicator of various forms of performance inefficiencies, such as suboptimal choices of algorithms or data structures, missed compiler optimizations, and developers’ inattention to performance. Guided by LoadSpy, we are able to optimize a large number of well-known benchmarks and real-world applications, yielding significant speedups. Despite deep performance insights offered by fine-grained profilers, the high overhead keeps them away from widespread adoption, particularly in production. By contrast, coarse-grained profilers introduce low overhead at the cost of poor performance insights. Hence, another research topic is how we benefit from both, that is, the combination of deep insights of fine-grained profilers and low overhead of coarse-grained ones. The first effort to do so is proposing a lightweight profiler called JXPerf. It abandons heavyweight instrumentation by combining hardware performance monitoring units and debug registers available in commodity CPUs to detect wasteful memory operations. Compared with LoadSpy, JXPerf reduces the runtime overhead from 10x to 7% on average. The lightweight nature makes it useful in production. Another effort is proposing a lightweight profiler called FVSampler, the first nonintrusive profiler to study function execution variance

College of William & Mary: W&M Publish

Characterizing the Performance of Emerging Deep Learning, Graph, and High Performance Computing Workloads Under Interference

Author: Mao Ze
Song Shuang
Xu Hao
Publication venue
Publication date: 28/03/2023
Field of study

Throughput-oriented computing via co-running multiple applications in the same machine has been widely adopted to achieve high hardware utilization and energy saving on modern supercomputers and data centers. However, efficiently co-running applications raises new design challenges, mainly because applications with diverse requirements can stress out shared hardware resources (IO, Network and Cache) at various levels. The disparities in resource usage can result in interference, which in turn can lead to unpredictable co-running behaviors. To better understand application interference, prior work provided detailed execution characterization. However, these characterization approaches either emphasize on traditional benchmarks or fall into a single application domain. To address this issue, we study 25 up-to-date applications and benchmarks from various application domains and form 625 consolidation pairs to thoroughly analyze the execution interference caused by application co-running. Moreover, we leverage mini-benchmarks and real applications to pinpoint the provenance of co-running interference in both hardware and software aspects

arXiv.org e-Print Archive