619 research outputs found
WMTrace : a lightweight memory allocation tracker and analysis framework
The diverging gap between processor and memory performance has been a well discussed aspect of computer architecture literature for some years. The use of multi-core processor designs has, however, brought new problems to the design of memory architectures - increased core density without matched improvement in memory capacity is reduc- ing the available memory per parallel process. Multiple cores accessing memory simultaneously degrades performance as a result of resource con- tention for memory channels and physical DIMMs. These issues combine to ensure that memory remains an on-going challenge in the design of parallel algorithms which scale. In this paper we present WMTrace, a lightweight tool to trace and analyse memory allocation events in parallel applications. This tool is able to dynamically link to pre-existing application binaries requiring no source code modification or recompilation. A post-execution analysis stage enables in-depth analysis of traces to be performed allowing memory allocations to be analysed by time, size or function. The second half of this paper features a case study in which we apply WMTrace to five parallel scientific applications and benchmarks, demonstrating its effectiveness at recording high-water mark memory consumption as well as memory use per-function over time. An in-depth analysis is provided for an unstructured mesh benchmark which reveals significant memory allocation imbalance across its participating processes
The SIOX architecture – coupling automatic monitoring and optimization of parallel I/O
Performance analysis and optimization of high-performance I/O systems is a daunting task. Mainly, this is due to the overwhelmingly complex interplay of the involved hardware and software layers. The Scalable I/O for Extreme Performance (SIOX) project provides a versatile environment for monitoring I/O activities and learning from this information. The goal of SIOX is to automatically suggest and apply performance optimizations, and to assist in locating and diagnosing performance problems.
In this paper, we present the current status of SIOX. Our modular architecture covers instrumentation of POSIX, MPI and other high-level I/O libraries; the monitoring data is recorded asynchronously into a global database, and recorded traces can be visualized. Furthermore, we offer a set of primitive plug-ins with additional features to demonstrate the flexibility of our architecture: A surveyor plug-in to keep track of the observed spatial access patterns; an fadvise plug-in for injecting hints to achieve read-ahead for strided access patterns; and an optimizer plug-in which monitors the performance achieved with different MPI-IO hints, automatically supplying the best known hint-set when no hints were explicitly set. The presentation of the technical status is accompanied by a demonstration of some of these features on our 20 node cluster. In additional experiments, we analyze the overhead for concurrent access, for MPI-IO’s 4-levels of access, and for an instrumented climate application.
While our prototype is not yet full-featured, it demonstrates the potential and feasibility of our approach
Synapse: Synthetic Application Profiler and Emulator
We introduce Synapse motivated by the needs to estimate and emulate workload
execution characteristics on high-performance and distributed heterogeneous
resources. Synapse has a platform independent application profiler, and the
ability to emulate profiled workloads on a variety of heterogeneous resources.
Synapse is used as a proxy application (or "representative application") for
real workloads, with the added advantage that it can be tuned at arbitrary
levels of granularity in ways that are simply not possible using real
applications. Experiments show that automated profiling using Synapse
represents application characteristics with high fidelity. Emulation using
Synapse can reproduce the application behavior in the original runtime
environment, as well as reproducing properties when used in a different
run-time environments
Monitoring data in R with the lumberjack package
Monitoring data while it is processed and transformed can yield detailed
insight into the dynamics of a (running) production system. The lumberjack
package is a lightweight package allowing users to follow how an R object is
transformed as it is manipulated by R code. The package abstracts all logging
code from the user, who only needs to specify which objects are logged and what
information should be logged. A few default loggers are included with the
package but the package is extensible through user-defined logger objects.Comment: Accepted for publication in the Journal of Statistical Softwar
iLeak: A Lightweight System for Detecting Inadvertent Information Leaks
Data loss incidents, where data of sensitive nature are exposed to the public, have become too frequent and have caused damages of millions of dollars to companies and other organizations. Repeatedly, information leaks occur over the Internet, and half of the time they are accidental, caused by user negligence, misconfiguration of software, or inadequate understanding of an application's functionality. This paper presents iLeak, a lightweight, modular system for detecting inadvertent information leaks. Unlike previous solutions, iLeak builds on components already present in modern computers. In particular, we employ system tracing facilities and data indexing services, and combine them in a novel way to detect data leaks. Our design consists of three components: uaudits are responsible for capturing the information that exits the system, while Inspectors use the indexing service to identify if the transmitted data belong to files that contain potentially sensitive information. The Trail Gateway handles the communication and synchronization of uaudits and Inspectors. We implemented iLeak on Mac OS X using DTrace and the Spotlight indexing service. Finally, we show that iLeak is indeed lightweight, since it only incurs 4% overhead on protected applications
GekkoFS: A temporary distributed file system for HPC applications
We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also supported by
the Spanish Ministry of Science and Innovation (TIN2015–65316), the Generalitat de Catalunya (2014–SGR–1051), as well as the European Union’s Horizon 2020 Research and
Innovation Programme (NEXTGenIO, 671951) and the European Comission’s BigStorage project (H2020-MSCA-ITN-2014-642963). This research was conducted using the supercomputer MOGON II and services offered by the Johannes Gutenberg University Mainz.Peer ReviewedPostprint (author's final draft
- …