Search CORE

907 research outputs found

DeLTA: GPU Performance Model for Deep Learning Applications with In-depth Memory System Traffic Analysis

Author: Chatterjee Niladrish
Erez Mattan
Lee Donghyuk
Lym Sangkug
O'Connor Mike
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 02/04/2019
Field of study

Training convolutional neural networks (CNNs) requires intense compute throughput and high memory bandwidth. Especially, convolution layers account for the majority of the execution time of CNN training, and GPUs are commonly used to accelerate these layer workloads. GPU design optimization for efficient CNN training acceleration requires the accurate modeling of how their performance improves when computing and memory resources are increased. We present DeLTA, the first analytical model that accurately estimates the traffic at each GPU memory hierarchy level, while accounting for the complex reuse patterns of a parallel convolution algorithm. We demonstrate that our model is both accurate and robust for different CNNs and GPU architectures. We then show how this model can be used to carefully balance the scaling of different GPU resources for efficient CNN performance improvement

arXiv.org e-Print Archive

Crossref

CONFPROFITT: A CONFIGURATION-AWARE PERFORMANCE PROFILING, TESTING, AND TUNING FRAMEWORK

Author: Han Xue
Publication venue: UKnowledge
Publication date: 01/01/2019
Field of study

Modern computer software systems are complicated. Developers can change the behavior of the software system through software configurations. The large number of configuration option and their interactions make the task of software tuning, testing, and debugging very challenging. Performance is one of the key aspects of non-functional qualities, where performance bugs can cause significant performance degradation and lead to poor user experience. However, performance bugs are difficult to expose, primarily because detecting them requires specific inputs, as well as specific configurations. While researchers have developed techniques to analyze, quantify, detect, and fix performance bugs, many of these techniques are not effective in highly-configurable systems. To improve the non-functional qualities of configurable software systems, testing engineers need to be able to understand the performance influence of configuration options, adjust the performance of a system under different configurations, and detect configuration-related performance bugs. This research will provide an automated framework that allows engineers to effectively analyze performance-influence configuration options, detect performance bugs in highly-configurable software systems, and adjust configuration options to achieve higher long-term performance gains. To understand real-world performance bugs in highly-configurable software systems, we first perform a performance bug characteristics study from three large-scale opensource projects. Many researchers have studied the characteristics of performance bugs from the bug report but few have reported what the experience is when trying to replicate confirmed performance bugs from the perspective of non-domain experts such as researchers. This study is meant to report the challenges and potential workaround to replicate confirmed performance bugs. We also want to share a performance benchmark to provide real-world performance bugs to evaluate future performance testing techniques. Inspired by our performance bug study, we propose a performance profiling approach that can help developers to understand how configuration options and their interactions can influence the performance of a system. The approach uses a combination of dynamic analysis and machine learning techniques, together with configuration sampling techniques, to profile the program execution, analyze configuration options relevant to performance. Next, the framework leverages natural language processing and information retrieval techniques to automatically generate test inputs and configurations to expose performance bugs. Finally, the framework combines reinforcement learning and dynamic state reduction techniques to guide subject application towards achieving higher long-term performance gains

University of Kentucky

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks

Author: Fernandez Ivan
Ghose Saugata
Gómez-Luna Juan
Mutlu Onur
Oliveira Geraldo F.
Orosa Lois
Sadrosadati Mohammad
Vijaykumar Nandita
Publication venue
Publication date: 01/01/2021
Field of study

Data movement between the CPU and main memory is a first-order obstacle against improving performance, scalability, and energy efficiency in modern systems. Computer systems employ a range of techniques to reduce overheads tied to data movement, spanning from traditional mechanisms (e.g., deep multi-level cache hierarchies, aggressive hardware prefetchers) to emerging techniques such as Near-Data Processing (NDP), where some computation is moved close to memory. Our goal is to methodically identify potential sources of data movement over a broad set of applications and to comprehensively compare traditional compute-centric data movement mitigation techniques to more memory-centric techniques, thereby developing a rigorous understanding of the best techniques to mitigate each source of data movement. With this goal in mind, we perform the first large-scale characterization of a wide variety of applications, across a wide range of application domains, to identify fundamental program properties that lead to data movement to/from main memory. We develop the first systematic methodology to classify applications based on the sources contributing to data movement bottlenecks. From our large-scale characterization of 77K functions across 345 applications, we select 144 functions to form the first open-source benchmark suite (DAMOV) for main memory data movement studies. We select a diverse range of functions that (1) represent different types of data movement bottlenecks, and (2) come from a wide range of application domains. Using NDP as a case study, we identify new insights about the different data movement bottlenecks and use these insights to determine the most suitable data movement mitigation mechanism for a particular application. We open-source DAMOV and the complete source code for our new characterization methodology at https://github.com/CMU-SAFARI/DAMOV.Comment: Our open source software is available at https://github.com/CMU-SAFARI/DAMO

arXiv.org e-Print Archive

Repository for Publications and Research Data

Directory of Open Access Journals

PerfBlower: Quickly Detecting Memory-Related Performance Problems via Amplification

Author: Dou Liang
Fang Lu
Xu Guoqing
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 29th European Conference on Object-Oriented Programming (ECOOP 2015)
Publication date: 01/01/2015
Field of study

Performance problems in managed languages are extremely difficult to find. Despite many efforts to find those problems, most existing work focuses on how to debug a user-provided test execution in which performance problems already manifest. It remains largely unknown how to effectively find performance bugs before software release. As a result, performance bugs often escape to production runs, hurting software reliability and user experience. This paper describes PerfBlower, a general performance testing framework that allows developers to quickly test Java programs to find memory-related performance problems. PerfBlower provides (1) a novel specification language ISL to describe a general class of performance problems that have observable symptoms; (2) an automated test oracle via emph{virtual amplification}; and (3) precise reference-path-based diagnostic information via object mirroring. Using this framework, we have amplified three different types of problems. Our experimental results demonstrate that (1) ISL is expressive enough to describe various memory-related performance problems; (2) PerfBlower successfully distinguishes executions with and without problems; 8 unknown problems are quickly discovered under small workloads; and (3) PerfBlower outperforms existing detectors and does not miss any bugs studied before in the literature

CiteSeerX

Dagstuhl Research Online Publication Server

Experimental analysis of computer system dependability

Author: Iyer Ravishankar, K.
Tang Dong
Publication venue
Publication date
Field of study

This paper reviews an area which has evolved over the past 15 years: experimental analysis of computer system dependability. Methodologies and advances are discussed for three basic approaches used in the area: simulated fault injection, physical fault injection, and measurement-based analysis. The three approaches are suited, respectively, to dependability evaluation in the three phases of a system's life: design phase, prototype phase, and operational phase. Before the discussion of these phases, several statistical techniques used in the area are introduced. For each phase, a classification of research methods or study topics is outlined, followed by discussion of these methods or topics as well as representative studies. The statistical techniques introduced include the estimation of parameters and confidence intervals, probability distribution characterization, and several multivariate analysis methods. Importance sampling, a statistical technique used to accelerate Monte Carlo simulation, is also introduced. The discussion of simulated fault injection covers electrical-level, logic-level, and function-level fault injection methods as well as representative simulation environments such as FOCUS and DEPEND. The discussion of physical fault injection covers hardware, software, and radiation fault injection methods as well as several software and hybrid tools including FIAT, FERARI, HYBRID, and FINE. The discussion of measurement-based analysis covers measurement and data processing techniques, basic error characterization, dependency analysis, Markov reward modeling, software-dependability, and fault diagnosis. The discussion involves several important issues studies in the area, including fault models, fast simulation techniques, workload/failure dependency, correlated failures, and software fault tolerance

NASA Technical Reports Server

PerfCE: Performance Debugging on Databases with Chaos Engineering-Enhanced Causality Analysis

Author: Ji Zhenlan
Ma Pingchuan
Wang Shuai
Publication venue
Publication date: 14/09/2023
Field of study

Debugging performance anomalies in real-world databases is challenging. Causal inference techniques enable qualitative and quantitative root cause analysis of performance downgrade. Nevertheless, causality analysis is practically challenging, particularly due to limited observability. Recently, chaos engineering has been applied to test complex real-world software systems. Chaos frameworks like Chaos Mesh mutate a set of chaos variables to inject catastrophic events (e.g., network slowdowns) to "stress" software systems. The systems under chaos stress are then tested using methods like differential testing to check if they retain their normal functionality (e.g., SQL query output is always correct under stress). Despite its ubiquity in the industry, chaos engineering is now employed mostly to aid software testing rather for performance debugging. This paper identifies novel usage of chaos engineering on helping developers diagnose performance anomalies in databases. Our presented framework, PERFCE, comprises an offline phase and an online phase. The offline phase learns the statistical models of the target database system, whilst the online phase diagnoses the root cause of monitored performance anomalies on the fly. During the offline phase, PERFCE leverages both passive observations and proactive chaos experiments to constitute accurate causal graphs and structural equation models (SEMs). When observing performance anomalies during the online phase, causal graphs enable qualitative root cause identification (e.g., high CPU usage) and SEMs enable quantitative counterfactual analysis (e.g., determining "when CPU usage is reduced to 45\%, performance returns to normal"). PERFCE notably outperforms prior works on common synthetic datasets, and our evaluation on real-world databases, MySQL and TiDB, shows that PERFCE is highly accurate and moderately expensive

arXiv.org e-Print Archive

Protecting applications using trusted execution environments

Author: Priebe Christian
Publication venue: Computing, Imperial College London
Publication date: 01/11/2020
Field of study

While cloud computing has been broadly adopted, companies that deal with sensitive data are still reluctant to do so due to privacy concerns or legal restrictions. Vulnerabilities in complex cloud infrastructures, resource sharing among tenants, and malicious insiders pose a real threat to the confidentiality and integrity of sensitive customer data. In recent years trusted execution environments (TEEs), hardware-enforced isolated regions that can protect code and data from the rest of the system, have become available as part of commodity CPUs. However, designing applications for the execution within TEEs requires careful consideration of the elevated threats that come with running in a fully untrusted environment. Interaction with the environment should be minimised, but some cooperation with the untrusted host is required, e.g. for disk and network I/O, via a host interface. Implementing this interface while maintaining the security of sensitive application code and data is a fundamental challenge. This thesis addresses this challenge and discusses how TEEs can be leveraged to secure existing applications efficiently and effectively in untrusted environments. We explore this in the context of three systems that deal with the protection of TEE applications and their host interfaces: SGX-LKL is a library operating system that can run full unmodified applications within TEEs with a minimal general-purpose host interface. By providing broad system support inside the TEE, the reliance on the untrusted host can be reduced to a minimal set of low-level operations that cannot be performed inside the enclave. SGX-LKL provides transparent protection of the host interface and for both disk and network I/O. Glamdring is a framework for the semi-automated partitioning of TEE applications into an untrusted and a trusted compartment. Based on source-level annotations, it uses either dynamic or static code analysis to identify sensitive parts of an application. Taking into account the objectives of a small TCB size and low host interface complexity, it defines an application-specific host interface and generates partitioned application code. EnclaveDB is a secure database using Intel SGX based on a partitioned in-memory database engine. The core of EnclaveDB is its logging and recovery protocol for transaction durability. For this, it relies on the database log managed and persisted by the untrusted database server. EnclaveDB protects against advanced host interface attacks and ensures the confidentiality, integrity, and freshness of sensitive data.Open Acces

Spiral - Imperial College Digital Repository