10,621 research outputs found
A Dataflow Language for Decentralised Orchestration of Web Service Workflows
Orchestrating centralised service-oriented workflows presents significant
scalability challenges that include: the consumption of network bandwidth,
degradation of performance, and single points of failure. This paper presents a
high-level dataflow specification language that attempts to address these
scalability challenges. This language provides simple abstractions for
orchestrating large-scale web service workflows, and separates between the
workflow logic and its execution. It is based on a data-driven model that
permits parallelism to improve the workflow performance. We provide a
decentralised architecture that allows the computation logic to be moved
"closer" to services involved in the workflow. This is achieved through
partitioning the workflow specification into smaller fragments that may be sent
to remote orchestration services for execution. The orchestration services rely
on proxies that exploit connectivity to services in the workflow. These proxies
perform service invocations and compositions on behalf of the orchestration
services, and carry out data collection, retrieval, and mediation tasks. The
evaluation of our architecture implementation concludes that our decentralised
approach reduces the execution time of workflows, and scales accordingly with
the increasing size of data sets.Comment: To appear in Proceedings of the IEEE 2013 7th International Workshop
on Scientific Workflows, in conjunction with IEEE SERVICES 201
Dynamic Control Flow in Large-Scale Machine Learning
Many recent machine learning models rely on fine-grained dynamic control flow
for training and inference. In particular, models based on recurrent neural
networks and on reinforcement learning depend on recurrence relations,
data-dependent conditional execution, and other features that call for dynamic
control flow. These applications benefit from the ability to make rapid
control-flow decisions across a set of computing devices in a distributed
system. For performance, scalability, and expressiveness, a machine learning
system must support dynamic control flow in distributed and heterogeneous
environments.
This paper presents a programming model for distributed machine learning that
supports dynamic control flow. We describe the design of the programming model,
and its implementation in TensorFlow, a distributed machine learning system.
Our approach extends the use of dataflow graphs to represent machine learning
models, offering several distinctive features. First, the branches of
conditionals and bodies of loops can be partitioned across many machines to run
on a set of heterogeneous devices, including CPUs, GPUs, and custom ASICs.
Second, programs written in our model support automatic differentiation and
distributed gradient computations, which are necessary for training machine
learning models that use control flow. Third, our choice of non-strict
semantics enables multiple loop iterations to execute in parallel across
machines, and to overlap compute and I/O operations.
We have done our work in the context of TensorFlow, and it has been used
extensively in research and production. We evaluate it using several real-world
applications, and demonstrate its performance and scalability.Comment: Appeared in EuroSys 2018. 14 pages, 16 figure
- …