28 research outputs found

    Valgreen: an Application's Energy Profiler

    Get PDF
    International audienceThe popularity of hand-held and portable devices put the energy aware computing in evidence. The need for long time batteries surpasses the hardware manufacturer, impacting the operational system policies and software development. Power modeling of applications has been studied during the last years and can be used to estimate their total energy. In order to aid the programmer to implement energy efficient algorithms, this paper introduces an application's energy profiler, namely Valgreen, which exploits the battery's information in order to generate an architecture independent power model through a calibration process

    An Inviscid Decoupled Method for the Roe FDS Scheme in the Reacting Gas Path of FUN3D

    Get PDF
    An approach is described to decouple the species continuity equations from the mixture continuity, momentum, and total energy equations for the Roe flux difference splitting scheme. This decoupling simplifies the implicit system, so that the flow solver can be made significantly more efficient, with very little penalty on overall scheme robustness. Most importantly, the computational cost of the point implicit relaxation is shown to scale linearly with the number of species for the decoupled system, whereas the fully coupled approach scales quadratically. Also, the decoupled method significantly reduces the cost in wall time and memory in comparison to the fully coupled approach. This work lays the foundation for development of an efficient adjoint solution procedure for high speed reacting flow

    Runtime Automated Detection of Out of Process Resource Management in the X Windowing System

    Get PDF
    Software applications typically allocate and deallocate resources during their lifetime. Resources can be categorized into two broad groups, in-process and out-of-process resources where in-process resources are local resources directly managed by a client, while out-of-process resources are remotely managed by a client which instructs a server to allocate and deallocate the resource on its behalf. Out-of-process resources do not reside in a clients address space which poses an extra layer of complexity in attempting to debug their misuse. This thesis presents an automatic run-time solution to the problem of detecting and reporting source code locations of application client mismanagement of out-of-process resources for a specific case-study of the X Windowing System which lends itself to use in the wider general case

    Forward-Mode Automatic Differentiation of Compiled Programs

    Full text link
    Algorithmic differentiation (AD) is a set of techniques that provide partial derivatives of computer-implemented functions. Such a function can be supplied to state-of-the-art AD tools via its source code, or via an intermediate representation produced while compiling its source code. We present the novel AD tool Derivgrind, which augments the machine code of compiled programs with forward-mode AD logic. Derivgrind leverages the Valgrind instrumentation framework for a structured access to the machine code, and a shadow memory tool to store dot values. Access to the source code is required at most for the files in which input and output variables are defined. Derivgrind's versatility comes at the price of scaling the run-time by a factor between 30 and 75, measured on a benchmark based on a numerical solver for a partial differential equation. Results of our extensive regression test suite indicate that Derivgrind produces correct results on GCC- and Clang-compiled programs, including a Python interpreter, with a small number of exceptions. While we provide a list of scenarios that Derivgrind does not handle correctly, nearly all of them are academic counterexamples or originate from highly optimized math libraries. As long as differentiating those is avoided, Derivgrind can be applied to an unprecedentedly wide range of cross-language or partially closed-source software with little integration efforts.Comment: 21 pages, 3 figures, 3 tables, 5 listing

    On Performance Debugging of Unnecessary Lock Contentions on Multicore Processors: A Replay-based Approach

    Full text link
    Locks have been widely used as an effective synchronization mechanism among processes and threads. However, we observe that a large number of false inter-thread dependencies (i.e., unnecessary lock contentions) exist during the program execution on multicore processors, thereby incurring significant performance overhead. This paper presents a performance debugging framework, PERFPLAY, to facilitate a comprehensive and in-depth understanding of the performance impact of unnecessary lock contentions. The core technique of our debugging framework is trace replay. Specifically, PERFPLAY records the program execution trace, on the basis of which the unnecessary lock contentions can be identified through trace analysis. We then propose a novel technique of trace transformation to transform these identified unnecessary lock contentions in the original trace into the correct pattern as a new trace free of unnecessary lock contentions. Through replaying both traces, PERFPLAY can quantify the performance impact of unnecessary lock contentions. To demonstrate the effectiveness of our debugging framework, we study five real-world programs and PARSEC benchmarks. Our experimental results demonstrate the significant performance overhead of unnecessary lock contentions, and the effectiveness of PERFPLAY in identifying the performance critical unnecessary lock contentions in real applications.Comment: 18 pages, 19 figures, 3 table

    BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring

    Get PDF
    Software robustness is an ever-challenging problem in the face of today's evolving software and hardware that has undergone recent shifts. Instruction-grain monitoring is a powerful approach for improved software robustness that affords comprehensive runtime coverage for a wide spectrum of bugs and security exploits. Unfortunately, existing instruction-grain monitoring frameworks, such as dynamic binary instrumentation, are either prohibitively expensive (slowing down applications by an order of magnitude or more) or offer limited coverage. This work introduces BugSifter, a new design that drastically decreases monitoring overhead without sacrificing flexibility or bug coverage. The main overhead of instruction-grain monitoring lies in execution of software event handlers to monitor nearly every application instruction to check for bugs. BugSifter identifies common monitoring activities that result in redundant monitoring actions, and filters them using general, light-weight hardware, eliminating the majority of costly software event handlers. Our proposed design filters 80-98% of events while monitoring for a variety of commonly-occurring bugs, delegating the rest to flexible software handlers. BugSifter significantly reduces the overhead of instruction-grain monitoring to an average of 40% over unmonitored application time. BugSifter makes instruction-grain monitoring practical, enabling efficient and timely detection of a wide range of bugs, thus making software more robust
    corecore