112,535 research outputs found
ScALPEL: A Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring
As supercomputers continue to grow in scale and capabilities, it is becoming
increasingly difficult to isolate processor and system level causes of
performance degradation. Over the last several years, a significant number of
performance analysis and monitoring tools have been built/proposed. However,
these tools suffer from several important shortcomings, particularly in
distributed environments. In this paper we present ScALPEL, a Scalable Adaptive
Lightweight Performance Evaluation Library for application performance
monitoring at the functional level. Our approach provides several distinct
advantages. First, ScALPEL is portable across a wide variety of architectures,
and its ability to selectively monitor functions presents low run-time
overhead, enabling its use for large-scale production applications. Second, it
is run-time configurable, enabling both dynamic selection of functions to
profile as well as events of interest on a per function basis. Third, our
approach is transparent in that it requires no source code modifications.
Finally, ScALPEL is implemented as a pluggable unit by reusing existing
performance monitoring frameworks such as Perfmon and PAPI and extending them
to support both sequential and MPI applications.Comment: 10 pages, 4 figures, 2 table
Recommended from our members
Application of Advanced Early Warning Systems with Adaptive Protection
This project developed and field-tested two methods of Adaptive Protection systems utilizing synchrophasor data. One method detects conditions of system stress that can lead to unintended relay operation, and initiates a supervisory signal to modify relay response in real time to avoid false trips. The second method detects the possibility of false trips of impedance relays as stable system swings âencroachâ on the relaysâ impedance zones, and produces an early warning so that relay engineers can re-evaluate relay settings. In addition, real-time synchrophasor data produced by this project was used to develop advanced visualization techniques for display of synchrophasor data to utility operators and engineers
Prototype of Fault Adaptive Embedded Software for Large-Scale Real-Time Systems
This paper describes a comprehensive prototype of large-scale fault adaptive
embedded software developed for the proposed Fermilab BTeV high energy physics
experiment. Lightweight self-optimizing agents embedded within Level 1 of the
prototype are responsible for proactive and reactive monitoring and mitigation
based on specified layers of competence. The agents are self-protecting,
detecting cascading failures using a distributed approach. Adaptive,
reconfigurable, and mobile objects for reliablility are designed to be
self-configuring to adapt automatically to dynamically changing environments.
These objects provide a self-healing layer with the ability to discover,
diagnose, and react to discontinuities in real-time processing. A generic
modeling environment was developed to facilitate design and implementation of
hardware resource specifications, application data flow, and failure mitigation
strategies. Level 1 of the planned BTeV trigger system alone will consist of
2500 DSPs, so the number of components and intractable fault scenarios involved
make it impossible to design an `expert system' that applies traditional
centralized mitigative strategies based on rules capturing every possible
system state. Instead, a distributed reactive approach is implemented using the
tools and methodologies developed by the Real-Time Embedded Systems group.Comment: 2nd Workshop on Engineering of Autonomic Systems (EASe), in the 12th
Annual IEEE International Conference and Workshop on the Engineering of
Computer Based Systems (ECBS), Washington, DC, April, 200
Validation of hardware events for successful performance pattern identification in High Performance Computing
Hardware performance monitoring (HPM) is a crucial ingredient of performance
analysis tools. While there are interfaces like LIKWID, PAPI or the kernel
interface perf\_event which provide HPM access with some additional features,
many higher level tools combine event counts with results retrieved from other
sources like function call traces to derive (semi-)automatic performance
advice. However, although HPM is available for x86 systems since the early 90s,
only a small subset of the HPM features is used in practice. Performance
patterns provide a more comprehensive approach, enabling the identification of
various performance-limiting effects. Patterns address issues like bandwidth
saturation, load imbalance, non-local data access in ccNUMA systems, or false
sharing of cache lines. This work defines HPM event sets that are best suited
to identify a selection of performance patterns on the Intel Haswell processor.
We validate the chosen event sets for accuracy in order to arrive at a reliable
pattern detection mechanism and point out shortcomings that cannot be easily
circumvented due to bugs or limitations in the hardware
City Data Fusion: Sensor Data Fusion in the Internet of Things
Internet of Things (IoT) has gained substantial attention recently and play a
significant role in smart city application deployments. A number of such smart
city applications depend on sensor fusion capabilities in the cloud from
diverse data sources. We introduce the concept of IoT and present in detail ten
different parameters that govern our sensor data fusion evaluation framework.
We then evaluate the current state-of-the art in sensor data fusion against our
sensor data fusion framework. Our main goal is to examine and survey different
sensor data fusion research efforts based on our evaluation framework. The
major open research issues related to sensor data fusion are also presented.Comment: Accepted to be published in International Journal of Distributed
Systems and Technologies (IJDST), 201
- âŠ