37,626 research outputs found
Domain knowledge specification for energy tuning
To overcome the challenges of energy consumption of HPC systems, the European Union Horizon 2020 READEX (Runtime Exploitation of Application Dynamism for Energy-efficient Exascale computing) project uses an online auto-tuning approach to improve energy efficiency of HPC applications. The READEX methodology pre-computes optimal system configurations at design-time, such as the CPU frequency, for instances of program regions and switches at runtime to the configuration given in the tuning model when the region is executed. READEX goes beyond previous approaches by exploiting dynamic changes of a region's characteristics by leveraging region and characteristic specific system configurations. While the tool suite supports an automatic approach, specifying domain knowledge such as the structure and characteristics of the application and application tuning parameters can significantly help to create a more refined tuning model. This paper presents the means available for an application expert to provide domain knowledge and presents tuning results for some benchmarks.Web of Science316art. no. E465
Instrumentation, performance visualization, and debugging tools for multiprocessors
The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessor architectures. However, without effective means to monitor (and visualize) program execution, debugging, and tuning parallel programs becomes intractably difficult as program complexity increases with the number of processors. Research on performance evaluation tools for multiprocessors is being carried out at ARC. Besides investigating new techniques for instrumenting, monitoring, and presenting the state of parallel program execution in a coherent and user-friendly manner, prototypes of software tools are being incorporated into the run-time environments of various hardware testbeds to evaluate their impact on user productivity. Our current tool set, the Ames Instrumentation Systems (AIMS), incorporates features from various software systems developed in academia and industry. The execution of FORTRAN programs on the Intel iPSC/860 can be automatically instrumented and monitored. Performance data collected in this manner can be displayed graphically on workstations supporting X-Windows. We have successfully compared various parallel algorithms for computational fluid dynamics (CFD) applications in collaboration with scientists from the Numerical Aerodynamic Simulation Systems Division. By performing these comparisons, we show that performance monitors and debuggers such as AIMS are practical and can illuminate the complex dynamics that occur within parallel programs
ScALPEL: A Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring
As supercomputers continue to grow in scale and capabilities, it is becoming
increasingly difficult to isolate processor and system level causes of
performance degradation. Over the last several years, a significant number of
performance analysis and monitoring tools have been built/proposed. However,
these tools suffer from several important shortcomings, particularly in
distributed environments. In this paper we present ScALPEL, a Scalable Adaptive
Lightweight Performance Evaluation Library for application performance
monitoring at the functional level. Our approach provides several distinct
advantages. First, ScALPEL is portable across a wide variety of architectures,
and its ability to selectively monitor functions presents low run-time
overhead, enabling its use for large-scale production applications. Second, it
is run-time configurable, enabling both dynamic selection of functions to
profile as well as events of interest on a per function basis. Third, our
approach is transparent in that it requires no source code modifications.
Finally, ScALPEL is implemented as a pluggable unit by reusing existing
performance monitoring frameworks such as Perfmon and PAPI and extending them
to support both sequential and MPI applications.Comment: 10 pages, 4 figures, 2 table
Racing to hardware-validated simulation
Processor simulators rely on detailed timing models of the processor pipeline to evaluate performance. The diversity in real-world processor designs mandates building flexible simulators that expose parts of the underlying model to the user in the form of configurable parameters. Consequently, the accuracy of modeling a real processor relies on both the accuracy of the pipeline model itself, and the accuracy of adjusting the configuration parameters according to the modeled processor. Unfortunately, processor vendors publicly disclose only a subset of their design decisions, raising the probability of introducing specification inaccuracies when modeling these processors. Inaccurately tuning model parameters deviates the simulated processor from the actual one. In the worst case, using improper parameters may lead to imbalanced pipeline models compromising the simulation output. Therefore, simulation models should be hardware-validated before using them for performance evaluation. As processors increase in complexity and diversity, validating a simulator model against real hardware becomes increasingly more challenging and time-consuming. In this work, we propose a methodology for validating simulation models against real hardware. We create a framework that relies on micro-benchmarks to collect performance statistics on real hardware, and machine learning-based algorithms to fine-tune the unknown parameters based on the accumulated statistics. We overhaul the Sniper simulator to support the ARM AArch64 instruction-set architecture (ISA), and introduce two new timing models for ARM-based in-order and out-of-order cores. Using our proposed simulator validation framework, we tune the in-order and out-of-order models to match the performance of a real-world implementation of the Cortex-A53 and Cortex-A72 cores with an average error of 7% and 15%, respectively, across a set of SPEC CPU2017 benchmarks
A survey of fuzzy control for stabilized platforms
This paper focusses on the application of fuzzy control techniques (fuzzy
type-1 and type-2) and their hybrid forms (Hybrid adaptive fuzzy controller and
fuzzy-PID controller) in the area of stabilized platforms. It represents an
attempt to cover the basic principles and concepts of fuzzy control in
stabilization and position control, with an outline of a number of recent
applications used in advanced control of stabilized platform. Overall, in this
survey we will make some comparisons with the classical control techniques such
us PID control to demonstrate the advantages and disadvantages of the
application of fuzzy control techniques
Mira: A Framework for Static Performance Analysis
The performance model of an application can pro- vide understanding about its
runtime behavior on particular hardware. Such information can be analyzed by
developers for performance tuning. However, model building and analyzing is
frequently ignored during software development until perfor- mance problems
arise because they require significant expertise and can involve many
time-consuming application runs. In this paper, we propose a fast, accurate,
flexible and user-friendly tool, Mira, for generating performance models by
applying static program analysis, targeting scientific applications running on
supercomputers. We parse both the source code and binary to estimate
performance attributes with better accuracy than considering just source or
just binary code. Because our analysis is static, the target program does not
need to be executed on the target architecture, which enables users to perform
analysis on available machines instead of conducting expensive exper- iments on
potentially expensive resources. Moreover, statically generated models enable
performance prediction on non-existent or unavailable architectures. In
addition to flexibility, because model generation time is significantly reduced
compared to dynamic analysis approaches, our method is suitable for rapid
application performance analysis and improvement. We present several scientific
application validation results to demonstrate the current capabilities of our
approach on small benchmarks and a mini application
Analysing I/O bottlenecks in LHC data analysis on grid storage resources
We describe recent I/O testing frameworks that we have developed and applied within the UK GridPP Collaboration, the ATLAS experiment and the DPM team, for a variety of distinct purposes. These include benchmarking vendor supplied storage products, discovering scaling limits of SRM solutions, tuning of storage systems for experiment data analysis, evaluating file access protocols, and exploring I/O read patterns of experiment software and their underlying event data models. With multiple grid sites now dealing with petabytes of data, such studies are becoming essential. We describe how the tests build, and improve, on previous work and contrast how the use-cases differ. We also detail the results obtained and the implications for storage hardware, middleware and experiment software
- …