905 research outputs found
Asynchronous Execution of Python Code on Task Based Runtime Systems
Despite advancements in the areas of parallel and distributed computing, the
complexity of programming on High Performance Computing (HPC) resources has
deterred many domain experts, especially in the areas of machine learning and
artificial intelligence (AI), from utilizing performance benefits of such
systems. Researchers and scientists favor high-productivity languages to avoid
the inconvenience of programming in low-level languages and costs of acquiring
the necessary skills required for programming at this level. In recent years,
Python, with the support of linear algebra libraries like NumPy, has gained
popularity despite facing limitations which prevent this code from distributed
runs. Here we present a solution which maintains both high level programming
abstractions as well as parallel and distributed efficiency. Phylanx, is an
asynchronous array processing toolkit which transforms Python and NumPy
operations into code which can be executed in parallel on HPC resources by
mapping Python and NumPy functions and variables into a dependency tree
executed by HPX, a general purpose, parallel, task-based runtime system written
in C++. Phylanx additionally provides introspection and visualization
capabilities for debugging and performance analysis. We have tested the
foundations of our approach by comparing our implementation of widely used
machine learning algorithms to accepted NumPy standards
A Comparison of Big Data Frameworks on a Layered Dataflow Model
In the world of Big Data analytics, there is a series of tools aiming at
simplifying programming applications to be executed on clusters. Although each
tool claims to provide better programming, data and execution models, for which
only informal (and often confusing) semantics is generally provided, all share
a common underlying model, namely, the Dataflow model. The Dataflow model we
propose shows how various tools share the same expressiveness at different
levels of abstraction. The contribution of this work is twofold: first, we show
that the proposed model is (at least) as general as existing batch and
streaming frameworks (e.g., Spark, Flink, Storm), thus making it easier to
understand high-level data-processing applications written in such frameworks.
Second, we provide a layered model that can represent tools and applications
following the Dataflow paradigm and we show how the analyzed tools fit in each
level.Comment: 19 pages, 6 figures, 2 tables, In Proc. of the 9th Intl Symposium on
High-Level Parallel Programming and Applications (HLPP), July 4-5 2016,
Muenster, German
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
AIR: A Light-Weight Yet High-Performance Dataflow Engine based on Asynchronous Iterative Routing
Distributed Stream Processing Systems (DSPSs) are among the currently most
emerging topics in data management, with applications ranging from real-time
event monitoring to processing complex dataflow programs and big data
analytics. The major market players in this domain are clearly represented by
Apache Spark and Flink, which provide a variety of frontend APIs for SQL,
statistical inference, machine learning, stream processing, and many others.
Yet rather few details are reported on the integration of these engines into
the underlying High-Performance Computing (HPC) infrastructure and the
communication protocols they use. Spark and Flink, for example, are implemented
in Java and still rely on a dedicated master node for managing their control
flow among the worker nodes in a compute cluster.
In this paper, we describe the architecture of our AIR engine, which is
designed from scratch in C++ using the Message Passing Interface (MPI),
pthreads for multithreading, and is directly deployed on top of a common HPC
workload manager such as SLURM. AIR implements a light-weight, dynamic sharding
protocol (referred to as "Asynchronous Iterative Routing"), which facilitates a
direct and asynchronous communication among all client nodes and thereby
completely avoids the overhead induced by the control flow with a master node
that may otherwise form a performance bottleneck. Our experiments over a
variety of benchmark settings confirm that AIR outperforms Spark and Flink in
terms of latency and throughput by a factor of up to 15; moreover, we
demonstrate that AIR scales out much better than existing DSPSs to clusters
consisting of up to 8 nodes and 224 cores.Comment: 16 pages, 6 figures, 15 plot
- …