361 research outputs found
Workflow Critical Path: A Data-oriented Critical path metric for Holistic HPC Workflows
Current trends in HPC, such as the push to exascale, convergence with Big Data, and growing complexity of HPC applications, have created gaps that traditional performance tools do not cover. One example is Holistic HPC Workflows — HPC workflows comprising multiple codes, paradigms, or platforms that are not developed using a workflow management system. To diagnose the performance of these applications, we define a new metric called Workflow Critical Path (WCP), a data-oriented metric for Holistic HPC Workflows. WCP constructs graphs that span across the workflow codes and platforms, using data states as vertices and data mutations as edges. Using cloud-based technologies, we implement a prototype called Crux, a distributed analysis tool for calculating and visualizing WCP. Our experiments with a workflow simulator on Amazon Web Services show Crux is scalable and capable of correctly calculating WCP for common Holistic HPC workflow patterns. We explore the use of WCP and discuss how Crux could be used in a production HPC environment
07341 Abstracts Collection -- Code Instrumentation and Modeling for Parallel Performance Analysis
From 20th to 24th August 2007, the Dagstuhl Seminar 07341 ``Code Instrumentation and Modeling for Parallel Performance Analysis\u27\u27 was held in the International Conference and Research Center (IBFI), Schloss Dagstuhl.
During the seminar, several participants presented their current
research, and ongoing work and open problems were discussed. Abstracts of
the presentations given during the seminar as well as abstracts of
seminar results and ideas are put together in this paper. The first section
describes the seminar topics and goals in general.
Links to extended abstracts or full papers are provided, if available
ScalAna: Automating Scaling Loss Detection with Graph Analysis
Scaling a parallel program to modern supercomputers is challenging due to
inter-process communication, Amdahl's law, and resource contention. Performance
analysis tools for finding such scaling bottlenecks either base on profiling or
tracing. Profiling incurs low overheads but does not capture detailed
dependencies needed for root-cause analysis. Tracing collects all information
at prohibitive overheads. In this work, we design ScalAna that uses static
analysis techniques to achieve the best of both worlds - it enables the
analyzability of traces at a cost similar to profiling. ScalAna first leverages
static compiler techniques to build a Program Structure Graph, which records
the main computation and communication patterns as well as the program's
control structures. At runtime, we adopt lightweight techniques to collect
performance data according to the graph structure and generate a Program
Performance Graph. With this graph, we propose a novel approach, called
backtracking root cause detection, which can automatically and efficiently
detect the root cause of scaling loss. We evaluate ScalAna with real
applications. Results show that our approach can effectively locate the root
cause of scaling loss for real applications and incurs 1.73% overhead on
average for up to 2,048 processes. We achieve up to 11.11% performance
improvement by fixing the root causes detected by ScalAna on 2,048 processes.Comment: conferenc
Coz: Finding Code that Counts with Causal Profiling
Improving performance is a central concern for software developers. To locate
optimization opportunities, developers rely on software profilers. However,
these profilers only report where programs spent their time: optimizing that
code may have no impact on performance. Past profilers thus both waste
developer time and make it difficult for them to uncover significant
optimization opportunities.
This paper introduces causal profiling. Unlike past profiling approaches,
causal profiling indicates exactly where programmers should focus their
optimization efforts, and quantifies their potential impact. Causal profiling
works by running performance experiments during program execution. Each
experiment calculates the impact of any potential optimization by virtually
speeding up code: inserting pauses that slow down all other code running
concurrently. The key insight is that this slowdown has the same relative
effect as running that line faster, thus "virtually" speeding it up.
We present Coz, a causal profiler, which we evaluate on a range of
highly-tuned applications: Memcached, SQLite, and the PARSEC benchmark suite.
Coz identifies previously unknown optimization opportunities that are both
significant and targeted. Guided by Coz, we improve the performance of
Memcached by 9%, SQLite by 25%, and accelerate six PARSEC applications by as
much as 68%; in most cases, these optimizations involve modifying under 10
lines of code.Comment: Published at SOSP 2015 (Best Paper Award
PROV-IO+: A Cross-Platform Provenance Framework for Scientific Data on HPC Systems
Data provenance, or data lineage, describes the life cycle of data. In
scientific workflows on HPC systems, scientists often seek diverse provenance
(e.g., origins of data products, usage patterns of datasets). Unfortunately,
existing provenance solutions cannot address the challenges due to their
incompatible provenance models and/or system implementations. In this paper, we
analyze four representative scientific workflows in collaboration with the
domain scientists to identify concrete provenance needs. Based on the
first-hand analysis, we propose a provenance framework called PROV-IO+, which
includes an I/O-centric provenance model for describing scientific data and the
associated I/O operations and environments precisely. Moreover, we build a
prototype of PROV-IO+ to enable end-to-end provenance support on real HPC
systems with little manual effort. The PROV-IO+ framework can support both
containerized and non-containerized workflows on different HPC platforms with
flexibility in selecting various classes of provenance. Our experiments with
realistic workflows show that PROV-IO+ can address the provenance needs of the
domain scientists effectively with reasonable performance (e.g., less than 3.5%
tracking overhead for most experiments). Moreover, PROV-IO+ outperforms a
state-of-the-art system (i.e., ProvLake) in our experiments
Performance Analysis Tool for HPC and Big Data Applications on Scientific Clusters
Big data is prevalent in HPC computing. Many HPC projects rely on complex workflows to analyze terabytes or petabytes of data. These workflows often require running over thousands of CPU cores and performing simultaneous data accesses, data movements, and computation. It is challenging to analyze the performance involving terabytes or petabytes of workflow data or measurement data of the executions, from complex workflows over a large number of nodes and multiple parallel task executions. To help identify performance bottlenecks or debug the performance issues in large-scale scientific applications and scientific clusters, we have developed a performance analysis framework, using state-ofthe- art open-source big data processing tools. Our tool can ingest system logs and application performance measurements to extract key performance features, and apply the most sophisticated statistical tools and data mining methods on the performance data. It utilizes an efficient data processing engine to allow users to interactively analyze a large amount of different types of logs and measurements. To illustrate the functionality of the big data analysis framework, we conduct case studies on the workflows from an astronomy project known as the Palomar Transient Factory (PTF) and the job logs from the genome analysis scientific cluster
- …