3,715 research outputs found
Dynamic Control Flow in Large-Scale Machine Learning
Many recent machine learning models rely on fine-grained dynamic control flow
for training and inference. In particular, models based on recurrent neural
networks and on reinforcement learning depend on recurrence relations,
data-dependent conditional execution, and other features that call for dynamic
control flow. These applications benefit from the ability to make rapid
control-flow decisions across a set of computing devices in a distributed
system. For performance, scalability, and expressiveness, a machine learning
system must support dynamic control flow in distributed and heterogeneous
environments.
This paper presents a programming model for distributed machine learning that
supports dynamic control flow. We describe the design of the programming model,
and its implementation in TensorFlow, a distributed machine learning system.
Our approach extends the use of dataflow graphs to represent machine learning
models, offering several distinctive features. First, the branches of
conditionals and bodies of loops can be partitioned across many machines to run
on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs.
Second, programs written in our model support automatic differentiation and
distributed gradient computations, which are necessary for training machine
learning models that use control flow. Third, our choice of non-strict
semantics enables multiple loop iterations to execute in parallel across
machines, and to overlap compute and I/O operations.
We have done our work in the context of TensorFlow, and it has been used
extensively in research and production. We evaluate it using several real-world
applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure
LEGaTO: first steps towards energy-efficient toolset for heterogeneous computing
LEGaTO is a three-year EU H2020 project which started in December 2017. The LEGaTO project will leverage task-based programming models to provide a software ecosystem for Made-in-Europe heterogeneous hardware composed of CPUs, GPUs, FPGAs and dataflow engines. The aim is to attain one order of magnitude energy savings from the edge to the converged cloud/HPC.Peer ReviewedPostprint (author's final draft
BrainFrame: A node-level heterogeneous accelerator platform for neuron simulations
Objective: The advent of High-Performance Computing (HPC) in recent years has
led to its increasing use in brain study through computational models. The
scale and complexity of such models are constantly increasing, leading to
challenging computational requirements. Even though modern HPC platforms can
often deal with such challenges, the vast diversity of the modeling field does
not permit for a single acceleration (or homogeneous) platform to effectively
address the complete array of modeling requirements. Approach: In this paper we
propose and build BrainFrame, a heterogeneous acceleration platform,
incorporating three distinct acceleration technologies, a Dataflow Engine, a
Xeon Phi and a GP-GPU. The PyNN framework is also integrated into the platform.
As a challenging proof of concept, we analyze the performance of BrainFrame on
different instances of a state-of-the-art neuron model, modeling the Inferior-
Olivary Nucleus using a biophysically-meaningful, extended Hodgkin-Huxley
representation. The model instances take into account not only the neuronal-
network dimensions but also different network-connectivity circumstances that
can drastically change application workload characteristics. Main results: The
synthetic approach of three HPC technologies demonstrated that BrainFrame is
better able to cope with the modeling diversity encountered. Our performance
analysis shows clearly that the model directly affect performance and all three
technologies are required to cope with all the model use cases.Comment: 16 pages, 18 figures, 5 table
Blazes: Coordination Analysis for Distributed Programs
Distributed consistency is perhaps the most discussed topic in distributed
systems today. Coordination protocols can ensure consistency, but in practice
they cause undesirable performance unless used judiciously. Scalable
distributed architectures avoid coordination whenever possible, but
under-coordinated systems can exhibit behavioral anomalies under fault, which
are often extremely difficult to debug. This raises significant challenges for
distributed system architects and developers. In this paper we present Blazes,
a cross-platform program analysis framework that (a) identifies program
locations that require coordination to ensure consistent executions, and (b)
automatically synthesizes application-specific coordination code that can
significantly outperform general-purpose techniques. We present two case
studies, one using annotated programs in the Twitter Storm system, and another
using the Bloom declarative language.Comment: Updated to include additional materials from the original technical
report: derivation rules, output stream label
- …