28 research outputs found
Valgreen: an Application's Energy Profiler
International audienceThe popularity of hand-held and portable devices put the energy aware computing in evidence. The need for long time batteries surpasses the hardware manufacturer, impacting the operational system policies and software development. Power modeling of applications has been studied during the last years and can be used to estimate their total energy. In order to aid the programmer to implement energy efficient algorithms, this paper introduces an application's energy profiler, namely Valgreen, which exploits the battery's information in order to generate an architecture independent power model through a calibration process
An Inviscid Decoupled Method for the Roe FDS Scheme in the Reacting Gas Path of FUN3D
An approach is described to decouple the species continuity equations from the mixture continuity, momentum, and total energy equations for the Roe flux difference splitting scheme. This decoupling simplifies the implicit system, so that the flow solver can be made significantly more efficient, with very little penalty on overall scheme robustness. Most importantly, the computational cost of the point implicit relaxation is shown to scale linearly with the number of species for the decoupled system, whereas the fully coupled approach scales quadratically. Also, the decoupled method significantly reduces the cost in wall time and memory in comparison to the fully coupled approach. This work lays the foundation for development of an efficient adjoint solution procedure for high speed reacting flow
Runtime Automated Detection of Out of Process Resource Management in the X Windowing System
Software applications typically allocate and deallocate resources during their lifetime. Resources can be categorized into two broad groups, in-process and out-of-process resources where in-process resources are local resources directly managed by a client, while out-of-process resources are remotely managed by a client which instructs a server to allocate and deallocate the resource on its behalf. Out-of-process resources do not reside in a clients address space which poses an extra layer of complexity in attempting to debug their misuse. This thesis presents an automatic run-time solution to the problem of detecting and reporting source code locations of application client mismanagement of out-of-process resources for a specific case-study of the X Windowing System which lends itself to use in the wider general case
Forward-Mode Automatic Differentiation of Compiled Programs
Algorithmic differentiation (AD) is a set of techniques that provide partial
derivatives of computer-implemented functions. Such a function can be supplied
to state-of-the-art AD tools via its source code, or via an intermediate
representation produced while compiling its source code.
We present the novel AD tool Derivgrind, which augments the machine code of
compiled programs with forward-mode AD logic. Derivgrind leverages the Valgrind
instrumentation framework for a structured access to the machine code, and a
shadow memory tool to store dot values. Access to the source code is required
at most for the files in which input and output variables are defined.
Derivgrind's versatility comes at the price of scaling the run-time by a
factor between 30 and 75, measured on a benchmark based on a numerical solver
for a partial differential equation. Results of our extensive regression test
suite indicate that Derivgrind produces correct results on GCC- and
Clang-compiled programs, including a Python interpreter, with a small number of
exceptions. While we provide a list of scenarios that Derivgrind does not
handle correctly, nearly all of them are academic counterexamples or originate
from highly optimized math libraries. As long as differentiating those is
avoided, Derivgrind can be applied to an unprecedentedly wide range of
cross-language or partially closed-source software with little integration
efforts.Comment: 21 pages, 3 figures, 3 tables, 5 listing
On Performance Debugging of Unnecessary Lock Contentions on Multicore Processors: A Replay-based Approach
Locks have been widely used as an effective synchronization mechanism among
processes and threads. However, we observe that a large number of false
inter-thread dependencies (i.e., unnecessary lock contentions) exist during the
program execution on multicore processors, thereby incurring significant
performance overhead. This paper presents a performance debugging framework,
PERFPLAY, to facilitate a comprehensive and in-depth understanding of the
performance impact of unnecessary lock contentions. The core technique of our
debugging framework is trace replay. Specifically, PERFPLAY records the program
execution trace, on the basis of which the unnecessary lock contentions can be
identified through trace analysis. We then propose a novel technique of trace
transformation to transform these identified unnecessary lock contentions in
the original trace into the correct pattern as a new trace free of unnecessary
lock contentions. Through replaying both traces, PERFPLAY can quantify the
performance impact of unnecessary lock contentions. To demonstrate the
effectiveness of our debugging framework, we study five real-world programs and
PARSEC benchmarks. Our experimental results demonstrate the significant
performance overhead of unnecessary lock contentions, and the effectiveness of
PERFPLAY in identifying the performance critical unnecessary lock contentions
in real applications.Comment: 18 pages, 19 figures, 3 table
BugSifter: A Generalized Accelerator for Flexible Instruction-Grain Monitoring
Software robustness is an ever-challenging problem in the face of today's evolving software and hardware that has undergone recent shifts. Instruction-grain monitoring is a powerful approach for improved software robustness that affords comprehensive runtime coverage for a wide spectrum of bugs and security exploits. Unfortunately, existing instruction-grain monitoring frameworks, such as dynamic binary instrumentation, are either prohibitively expensive (slowing down applications by an order of magnitude or more) or offer limited coverage. This work introduces BugSifter, a new design that drastically decreases monitoring overhead without sacrificing flexibility or bug coverage. The main overhead of instruction-grain monitoring lies in execution of software event handlers to monitor nearly every application instruction to check for bugs. BugSifter identifies common monitoring activities that result in redundant monitoring actions, and filters them using general, light-weight hardware, eliminating the majority of costly software event handlers. Our proposed design filters 80-98% of events while monitoring for a variety of commonly-occurring bugs, delegating the rest to flexible software handlers. BugSifter significantly reduces the overhead of instruction-grain monitoring to an average of 40% over unmonitored application time. BugSifter makes instruction-grain monitoring practical, enabling efficient and timely detection of a wide range of bugs, thus making software more robust