2,921 research outputs found

    Report from GI-Dagstuhl Seminar 16394: Software Performance Engineering in the DevOps World

    Get PDF
    This report documents the program and the outcomes of GI-Dagstuhl Seminar 16394 "Software Performance Engineering in the DevOps World". The seminar addressed the problem of performance-aware DevOps. Both, DevOps and performance engineering have been growing trends over the past one to two years, in no small part due to the rise in importance of identifying performance anomalies in the operations (Ops) of cloud and big data systems and feeding these back to the development (Dev). However, so far, the research community has treated software engineering, performance engineering, and cloud computing mostly as individual research areas. We aimed to identify cross-community collaboration, and to set the path for long-lasting collaborations towards performance-aware DevOps. The main goal of the seminar was to bring together young researchers (PhD students in a later stage of their PhD, as well as PostDocs or Junior Professors) in the areas of (i) software engineering, (ii) performance engineering, and (iii) cloud computing and big data to present their current research projects, to exchange experience and expertise, to discuss research challenges, and to develop ideas for future collaborations

    Automatic performance optimisation of component-based enterprise systems via redundancy

    Get PDF
    Component technologies, such as J2EE and .NET have been extensively adopted for building complex enterprise applications. These technologies help address complex functionality and flexibility problems and reduce development and maintenance costs. Nonetheless, current component technologies provide little support for predicting and controlling the emerging performance of software systems that are assembled from distinct components. Static component testing and tuning procedures provide insufficient performance guarantees for components deployed and run in diverse assemblies, under unpredictable workloads and on different platforms. Often, there is no single component implementation or deployment configuration that can yield optimal performance in all possible conditions under which a component may run. Manually optimising and adapting complex applications to changes in their running environment is a costly and error-prone management task. The thesis presents a solution for automatically optimising the performance of component-based enterprise systems. The proposed approach is based on the alternate usage of multiple component variants with equivalent functional characteristics, each one optimized for a different execution environment. A management framework automatically administers the available redundant variants and adapts the system to external changes. The framework uses runtime monitoring data to detect performance anomalies and significant variations in the application's execution environment. It automatically adapts the application so as to use the optimal component configuration under the current running conditions. An automatic clustering mechanism analyses monitoring data and infers information on the components' performance characteristics. System administrators use decision policies to state high-level performance goals and configure system management processes. A framework prototype has been implemented and tested for automatically managing a J2EE application. Obtained results prove the framework's capability to successfully manage a software system without human intervention. The management overhead induced during normal system execution and through management operations indicate the framework's feasibility

    Spherical harmonic transform with GPUs

    Get PDF
    We describe an algorithm for computing an inverse spherical harmonic transform suitable for graphic processing units (GPU). We use CUDA and base our implementation on a Fortran90 routine included in a publicly available parallel package, S2HAT. We focus our attention on the two major sequential steps involved in the transforms computation, retaining the efficient parallel framework of the original code. We detail optimization techniques used to enhance the performance of the CUDA-based code and contrast them with those implemented in the Fortran90 version. We also present performance comparisons of a single CPU plus GPU unit with the S2HAT code running on either a single or 4 processors. In particular we find that use of the latest generation of GPUs, such as NVIDIA GF100 (Fermi), can accelerate the spherical harmonic transforms by as much as 18 times with respect to S2HAT executed on one core, and by as much as 5.5 with respect to S2HAT on 4 cores, with the overall performance being limited by the Fast Fourier transforms. The work presented here has been performed in the context of the Cosmic Microwave Background simulations and analysis. However, we expect that the developed software will be of more general interest and applicability

    Improving the Performance and Energy Efficiency of GPGPU Computing through Adaptive Cache and Memory Management Techniques

    Get PDF
    Department of Computer Science and EngineeringAs the performance and energy efficiency requirement of GPGPUs have risen, memory management techniques of GPGPUs have improved to meet the requirements by employing hardware caches and utilizing heterogeneous memory. These techniques can improve GPGPUs by providing lower latency and higher bandwidth of the memory. However, these methods do not always guarantee improved performance and energy efficiency due to the small cache size and heterogeneity of the memory nodes. While prior works have proposed various techniques to address this issue, relatively little work has been done to investigate holistic support for memory management techniques. In this dissertation, we analyze performance pathologies and propose various techniques to improve memory management techniques. First, we investigate the effectiveness of advanced cache indexing (ACI) for high-performance and energy-efficient GPGPU computing. Specifically, we discuss the designs of various static and adaptive cache indexing schemes and present implementation for GPGPUs. We then quantify and analyze the effectiveness of the ACI schemes based on a cycle-accurate GPGPU simulator. Our quantitative evaluation shows that ACI schemes achieve significant performance and energy-efficiency gains over baseline conventional indexing scheme. We also analyze the performance sensitivity of ACI to key architectural parameters (i.e., capacity, associativity, and ICN bandwidth) and the cache indexing latency. We also demonstrate that ACI continues to achieve high performance in various settings. Second, we propose IACM, integrated adaptive cache management for high-performance and energy-efficient GPGPU computing. Based on the performance pathology analysis of GPGPUs, we integrate state-of-the-art adaptive cache management techniques (i.e., cache indexing, bypassing, and warp limiting) in a unified architectural framework to eliminate performance pathologies. Our quantitative evaluation demonstrates that IACM significantly improves the performance and energy efficiency of various GPGPU workloads over the baseline architecture (i.e., 98.1% and 61.9% on average, respectively) and achieves considerably higher performance than the state-of-the-art technique (i.e., 361.4% at maximum and 7.7% on average). Furthermore, IACM delivers significant performance and energy efficiency gains over the baseline GPGPU architecture even when enhanced with advanced architectural technologies (e.g., higher capacity, associativity). Third, we propose bandwidth- and latency-aware page placement (BLPP) for GPGPUs with heterogeneous memory. BLPP analyzes the characteristics of a application and determines the optimal page allocation ratio between the GPU and CPU memory. Based on the optimal page allocation ratio, BLPP dynamically allocate pages across the heterogeneous memory nodes. Our experimental results show that BLPP considerably outperforms the baseline and state-of-the-art technique (i.e., 13.4% and 16.7%) and performs similar to the static-best version (i.e., 1.2% difference), which requires extensive offline profiling.clos

    Supporting dynamic aspect-oriented features

    Get PDF
    Aspect-oriented programming techniques extend object-oriented programming with new methods to modularize concerns that otherwise would be non-modular. For example, logging concerns are typically scattered across a system but using aspect-oriented techniques they can be localized into a single high-level module. These techniques typically take modular high-level code and statically transform it into non-modular intermediate code. The contribution of this work is the design and implementation of a flexible and dynamic intermediate-language (IL) model. The main motivation for the design of this IL model is to support a variety of dynamic aspect-oriented language constructs that are proposed in recent literature such as CaeserJ\u27s deploy, history-based pointcuts, and control flow constructs. Our IL model provides a higher level of abstraction compared to traditional object-oriented ILs as a compilation target for such constructs, which makes it easier to provide efficient implementations of these constructs. We demonstrate these benefits by providing an industrial strength implementation for our IL model, by showing translation strategies from dynamic source-level constructs to our improved IL, and by analyzing the performance of the resulting IL code. Our evaluation using the SPEC JVM98 and Java Grande benchmarks shows that the overhead of supporting a dynamic deployment model can be reduced to as little as ~1.5%, when compared to the unmodified VM
    corecore