524 research outputs found
Understanding Legacy Workflows through Runtime Trace Analysis
abstract: When scientific software is written to specify processes, it takes the form of a workflow, and is often written in an ad-hoc manner in a dynamic programming language. There is a proliferation of legacy workflows implemented by non-expert programmers due to the accessibility of dynamic languages. Unfortunately, ad-hoc workflows lack a structured description as provided by specialized management systems, making ad-hoc workflow maintenance and reuse difficult, and motivating the need for analysis methods. The analysis of ad-hoc workflows using compiler techniques does not address dynamic languages - a program has so few constrains that its behavior cannot be predicted. In contrast, workflow provenance tracking has had success using run-time techniques to record data. The aim of this work is to develop a new analysis method for extracting workflow structure at run-time, thus avoiding issues with dynamics.
The method captures the dataflow of an ad-hoc workflow through its execution and abstracts it with a process for simplifying repetition. An instrumentation system first processes the workflow to produce an instrumented version, capable of logging events, which is then executed on an input to produce a trace. The trace undergoes dataflow construction to produce a provenance graph. The dataflow is examined for equivalent regions, which are collected into a single unit. The workflow is thus characterized in terms of its treatment of an input. Unlike other methods, a run-time approach characterizes the workflow's actual behavior; including elements which static analysis cannot predict (for example, code dynamically evaluated based on input parameters). This also enables the characterization of dataflow through external tools.
The contributions of this work are: a run-time method for recording a provenance graph from an ad-hoc Python workflow, and a method to analyze the structure of a workflow from provenance. Methods are implemented in Python and are demonstrated on real world Python workflows. These contributions enable users to derive graph structure from workflows. Empowered by a graphical view, users can better understand a legacy workflow. This makes the wealth of legacy ad-hoc workflows accessible, enabling workflow reuse instead of investing time and resources into creating a workflow.Dissertation/ThesisMasters Thesis Computer Science 201
Efficient cloud tracing: From very high level to very low level
With the increase of cloud infrastructure complexity, the origin of service deterioration is difficult to detect because issues may occur at the different layer of the system. We propose a multi-layer tracing approach to gather all the relevant information needed for a full workflow analysis. The idea is to collect trace events from all the cloud nodes to follow users' requests from the cloud interface to their execution on the hardware. Our approach involves tracing OpenStack's interfaces, the virtualization layer, and the host kernel space to perform analysis and show abnormal tasks and the main causes of latency or failures in the system. Experimental results about virtual machines live migration confirm that we are able to analyse services efficiency by locating platforms' weakest links
Recommended from our members
Great Plains Network - Kansas State University Agronomy Application Deep Dive
In May 2019, staff members from the Engagement and Performance Operations Center (EPOC) and members of the Great Plains Network (GPN) attending their Annual Meeting meeting met with researchers at Kansas State University (KSU) for the purpose of an Application Deep Dive training session. The goal of this training session was to help characterize the requirements for an agronomy application, to enable cyberinfrastructure support staff to better understand the needs of the researchers they support, and to offer training to GPN members to be able to conduct these on their own. Material for this event includes both the written documentation from the agronomy application at KSU, details about the infrastructure setup at KSU, and also a writeup of the discussion that took place in person on May 20, 2019.
EPOC, GPN, and KSU recorded a set of action items for this use case, continuing the ongoing support and collaboration. These are a reflection of the case study report, and in person discussion
Progressive Analytics: A Computation Paradigm for Exploratory Data Analysis
Exploring data requires a fast feedback loop from the analyst to the system,
with a latency below about 10 seconds because of human cognitive limitations.
When data becomes large or analysis becomes complex, sequential computations
can no longer be completed in a few seconds and data exploration is severely
hampered. This article describes a novel computation paradigm called
Progressive Computation for Data Analysis or more concisely Progressive
Analytics, that brings at the programming language level a low-latency
guarantee by performing computations in a progressive fashion. Moving this
progressive computation at the language level relieves the programmer of
exploratory data analysis systems from implementing the whole analytics
pipeline in a progressive way from scratch, streamlining the implementation of
scalable exploratory data analysis systems. This article describes the new
paradigm through a prototype implementation called ProgressiVis, and explains
the requirements it implies through examples.Comment: 10 page
Recommended from our members
Automated Testing and Debugging for Big Data Analytics
The prevalence of big data analytics in almost every large-scale software system has generated a substantial push to build data-intensive scalable computing (DISC) frameworks such as Google MapReduce and Apache Spark that can fully harness the power of existing data centers. However, frameworks once used by domain experts are now being leveraged by data scientists, business analysts, and researchers. This shift in user demographics calls for immediate advancements in the development, debugging, and testing practices of big data applications, which are falling behind compared to the DISC framework design and implementation. In practice, big data applications often fail as users are unable to test all behaviors emerging from interleaving dataflow operators, user-defined functions, and framework's code. "Testing based on a random sample" rarely guarantees the reliability and "trial and error" and "print" debugging methods are expensive and time-consuming. Thus, the current practice of developing a big data application must be improved and the tools built to enhance the developer's productivity must adapt to the distinct characteristics of data-intensive scalable computing. By synthesizing ideas from software engineering and database systems, our hypothesis is that we can design effective and scalable testing and debugging algorithms for big data analytics without compromising the performance and efficiency of the underlying DISC framework. To design such techniques, we investigate how we can build interactive and responsive debugging primitives that significantly reduce the debugging time, yet do not pose much performance overhead on big data applications. Furthermore, we investigate how we can leverage data provenance techniques from databases and fault-isolation algorithms from software engineering to pinpoint the minimal subset of failure-inducing inputs efficiently. To improve the reliability of big data analytics, we investigate how we can abstract the semantics of dataflow operators and use them in tandem with the semantics of user-defined functions to generate a minimum set of synthetic test inputs capable of revealing more defects than the entire input dataset.To examine the first hypothesis, we introduce interactive, real-time debugging primitives for big data analytics through innovative and scalable debugging features such as simulated breakpoint, dynamic watchpoint, and crash culprit identification. Second, we design a new automated fault localization approach that combines insights from both the software engineering and database literature to bring delta debugging closer to a reality in the big data applications by leveraging data provenance and by constructing systems optimizations for debugging provenance queries. Lastly, we devise a new symbolic-execution based white-box testing algorithm for big data applications that abstracts the implementation of dataflow operators using logical specifications instead of modeling their implementations and combines them with the semantics of any arbitrary user-defined function. We instantiate the idea of an interactive debugging algorithm as BigDebug, the idea of an automated debugging algorithm as BigSift, and the idea of symbolic execution-based testing as BigTest. Our investigation shows that the interactive debugging primitives can scale to terabytes---our record-level tracing incurs less than 25% overhead on average and provides up to 100% time saving compared to the baseline replay debugger. Second, we observe that by combining data provenance with delta debugging, we can identify the minimum faulty input in just under 30% of the original job execution time. Lastly, we verify that by abstracting dataflow operators using logical specifications, we can efficiently generate the most concise test data suitable for local testing while revealing twice as many faults as prior approaches. Our investigations collectively demonstrate that developer productivity can be significantly improved through effective and scalable testing and debugging techniques for big data analytics, without impacting the DISC framework's performance. This dissertation affirms the feasibility of automated debugging and testing techniques for big data analytics---techniques that were previously considered infeasible for large-scale data processing
Performance Observability and Monitoring of High Performance Computing with Microservices
Traditionally, High Performance Computing (HPC) softwarehas been built and deployed as bulk-synchronous, parallel
executables based on the message-passing interface (MPI) programming model.
The rise of data-oriented computing paradigms and an explosion
in the variety of applications that need to be supported on HPC
platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality.
In situ workflows demarcate a paradigm shift in
HPC software development methodologies enabling
a range of new applications ---
from user-level data services to machine learning (ML) workflows that run
alongside traditional scientific simulations.
By tracing the evolution of HPC software developmentover the past 30 years, this dissertation identifies the key elements and trends
responsible for the emergence of coupled, distributed, in situ workflows.
This dissertation's focus is on coupled in situ workflows
involving composable, high-performance microservices. After outlining the motivation
to enable performance observability of these services and why
existing HPC performance tools and techniques can not be applied in this context, this dissertation
proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from
different sources to generate observability. By leveraging microservice components initially designed
to build high performance data services,
this dissertation demonstrates their broader applicability for building and deploying performance
monitoring and visualization as services within an in situ workflow.
The results from this dissertation suggest that: (1) integration of
performance data from different sources is vital to understanding the performance
of service components, (2) the in situ (online) analysis of this performance data
is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined
with performance observations can help generate better service configurations, and (4) services are a promising
architecture choice for deploying in situ performance monitoring and visualization functionality.
This dissertation includes previously published and co-authored material and unpublished co-authored material
Une approche générique pour l'automatisation des expériences sur les réseaux informatiques
This thesis proposes a generic approach to automate network experiments for scenarios involving any networking technology on any type of network evaluation platform. The proposed approach is based on abstracting the experiment life cycle of the evaluation platforms into generic steps from which a generic experiment model and experimentation primitives are derived. A generic experimentation architecture is proposed, composed of an experiment model, a programmable experiment interface and an orchestration algorithm that can be adapted to network simulators, emulators and testbeds alike. The feasibility of the approach is demonstrated through the implementation of a framework capable of automating experiments using any combination of these platforms. Three main aspects of the framework are evaluated: its extensibility to support any type of platform, its efficiency to orchestrate experiments and its flexibility to support diverse use cases including education, platform management and experimentation with multiple platforms. The results show that the proposed approach can be used to efficiently automate experimentation on diverse platforms for a wide range of scenarios.Cette thèse propose une approche générique pour automatiser des expériences sur des réseaux quelle que soit la technologie utilisée ou le type de plate-forme d'évaluation. L'approche proposée est basée sur l'abstraction du cycle de vie de l'expérience en étapes génériques à partir desquelles un modèle d'expérience et des primitives d'expérimentation sont dérivés. Une architecture générique d'expérimentation est proposée, composée d'un modèle d'expérience générique, d'une interface pour programmer des expériences et d'un algorithme d'orchestration qui peux être adapté aux simulateurs, émulateurs et bancs d'essai de réseaux. La faisabilité de cette approche est démontrée par la mise en œuvre d'un framework capable d'automatiser des expériences sur toute combinaison de ces plateformes. Trois aspects principaux du framework sont évalués : son extensibilité pour s'adapter à tout type de plate-forme, son efficacité pour orchestrer des expériences et sa flexibilité pour permettre des cas d'utilisation divers, y compris l'enseignement, la gestion des plate-formes et l'expérimentation avec des plates-formes multiples. Les résultats montrent que l'approche proposée peut être utilisée pour automatiser efficacement l'expérimentation sur les plates-formes d'évaluation hétérogènes et pour un éventail de scénarios variés
Scalable Observation, Analysis, and Tuning for Parallel Portability in HPC
It is desirable for general productivity that high-performance computing applications be portable to new architectures, or can be optimized for new workflows and input types, without the need for costly code interventions or algorithmic re-writes. Parallel portability programming models provide the potential for high performance and productivity, however they come with a multitude of runtime parameters that can have significant impact on execution performance. Selecting the optimal set of parameters, so that HPC applications perform well in different system environments and on different input data sets, is not trivial.This dissertation maps out a vision for addressing this parallel portability challenge, and then demonstrates this plan through an effective combination of observability, analysis, and in situ machine learning techniques. A platform for general-purpose observation in HPC contexts is investigated, along with support for its use in human-in-the-loop performance understanding and analysis. The dissertation culminates in a demonstration of lessons learned in order to provide automated tuning of HPC applications utilizing parallel portability frameworks
- …