21 research outputs found

    Application Heartbeats for Software Performance and Health

    Get PDF
    Adaptive, or self-aware, computing has been proposed as one method to help application programmers confront the growing complexity of multicore software development. However, existing approaches to adaptive systems are largely ad hoc and often do not manage to incorporate the true performance goals of the applications they are designed to support. This paper presents an enabling technology for adaptive computing systems: Application Heartbeats. The Application Heartbeats framework provides a simple, standard programming interface that applications can use to indicate their performance and system software (and hardware) can use to query an applicationâ s performance. Several experiments demonstrate the simplicity and efficacy of the Application Heartbeat approach. First the PARSEC benchmark suite is instrumented with Application Heartbeats to show the broad applicability of the interface. Then, an adaptive H.264 encoder is developed to show how applications might use Application Heartbeats internally. Next, an external resource scheduler is developed which assigns cores to an application based on its performance as specified with Application Heartbeats. Finally, the adaptive H.264 encoder is used to illustrate how Application Heartbeats can aid fault tolerance

    CAMP: A Common API for Measuring Performance

    Get PDF
    Accurate performance testing of heterogeneous distributed systems, such as those created using GRID technology, requires a consistent method for retrieving system performance data from multiple platforms. This paper presents CAMP: a low-level platform independent performance data API designed for use with distributed testing frameworks. CAMP is not necessarily tied to the distributed testing task: it provides a simple, low-level interface into operating system performance data that can be used to build complex performance measurement applications. This paper discusses CAMP\u27s functionality and implementation in detail. It also contains a detailed analysis of the API\u27s correctness, performance, and overhead

    Complete System Power Estimation: A Trickle-Down Approach Based on Performance Events

    Full text link

    Approaches to multiprocessor error recovery using an on-chip interconnect subsystem

    Get PDF
    For future multicores, a dedicated interconnect subsystem for on-chip monitors was found to be highly beneficial in terms of scalability, performance and area. In this thesis, such a monitor network (MNoC) is used for multicores to support selective error identification and recovery and maintain target chip reliability in the context of dynamic voltage and frequency scaling (DVFS). A selective shared memory multiprocessor recovery is performed using MNoC in which, when an error is detected, only the group of processors sharing an application with the affected processors are recovered. Although the use of DVFS in contemporary multicores provides significant protection from unpredictable thermal events, a potential side effect can be an increased processor exposure to soft errors. To address this issue, a flexible fault prevention and recovery mechanism has been developed to selectively enable a small amount of per-core dual modular redundancy (DMR) in response to increased vulnerability, as measured by the processor architectural vulnerability factor (AVF). Our new algorithm for DMR deployment aims to provide a stable effective soft error rate (SER) by using DMR in response to DVFS caused by thermal events. The algorithm is implemented in real-time on the multicore using MNoC and controller which evaluates thermal information and multicore performance statistics in addition to error information. DVFS experiments with a multicore simulator using standard benchmarks show an average 6% improvement in overall power consumption and a stable SER by using selective DMR versus continuous DMR deployment

    Non-intrusive dynamic application profiler for detailed loop execution characterization

    Full text link

    A survey on run-time power monitors at the edge

    Get PDF
    Effectively managing energy and power consumption is crucial to the success of the design of any computing system, helping mitigate the efficiency obstacles given by the downsizing of the systems while also being a valuable step towards achieving green and sustainable computing. The quality of energy and power management is strongly affected by the prompt availability of reliable and accurate information regarding the power consumption for the different parts composing the target monitored system. At the same time, effective energy and power management are even more critical within the field of devices at the edge, which exponentially proliferated within the past decade with the digital revolution brought by the Internet of things. This manuscript aims to provide a comprehensive conceptual framework to classify the different approaches to implementing run-time power monitors for edge devices that appeared in literature, leading the reader toward the solutions that best fit their application needs and the requirements and constraints of their target computing platforms. Run-time power monitors at the edge are analyzed according to both the power modeling and monitoring implementation aspects, identifying specific quality metrics for both in order to create a consistent and detailed taxonomy that encompasses the vast existing literature and provides a sound reference to the interested reader

    Performance Counter Measurements of Data Structures: Implementations for Multi-Objective Optimisation

    Get PDF
    Solving multi-objective optimisation problems using evolutionary computation methods involve the implementation of algorithms and data structures for the storage of tempo- rary solutions. Computational efficiency of these systems becomes important as problems increase in complexity and the number of solutions maintained becomes large. Many data structures and algorithms have been proposed looking to decrease computa- tional times. The effectiveness of a data structure/algorithm can be characterised using wall-clock time. This is a widely used parameter in the literature, however it is strongly dependent on the underlying computer architecture and hence not a reliable measure of absolute performance. A commonly used approach to avoid architectural dependencies is to compare the performance of the data structure being evaluated to the equivalent implementation using a linked list. Modern processors offer built-in hardware performance counters, giving access to a wide set of parameters that can be used to explore performance. In this dissertation we study the efficiency of a non-dominated quad-tree data structure in combination with different evolutionary algorithms using hardware performance counters. We also compare the re- sults for the quad-tree data structure to a linked list as it is the standard practice, however we find non-scalable hardware dependencies might appear
    corecore