1,589 research outputs found
DALiuGE: A Graph Execution Framework for Harnessing the Astronomical Data Deluge
The Data Activated Liu Graph Engine - DALiuGE - is an execution framework for
processing large astronomical datasets at a scale required by the Square
Kilometre Array Phase 1 (SKA1). It includes an interface for expressing complex
data reduction pipelines consisting of both data sets and algorithmic
components and an implementation run-time to execute such pipelines on
distributed resources. By mapping the logical view of a pipeline to its
physical realisation, DALiuGE separates the concerns of multiple stakeholders,
allowing them to collectively optimise large-scale data processing solutions in
a coherent manner. The execution in DALiuGE is data-activated, where each
individual data item autonomously triggers the processing on itself. Such
decentralisation also makes the execution framework very scalable and flexible,
supporting pipeline sizes ranging from less than ten tasks running on a laptop
to tens of millions of concurrent tasks on the second fastest supercomputer in
the world. DALiuGE has been used in production for reducing interferometry data
sets from the Karl E. Jansky Very Large Array and the Mingantu Ultrawide
Spectral Radioheliograph; and is being developed as the execution framework
prototype for the Science Data Processor (SDP) consortium of the Square
Kilometre Array (SKA) telescope. This paper presents a technical overview of
DALiuGE and discusses case studies from the CHILES and MUSER projects that use
DALiuGE to execute production pipelines. In a companion paper, we provide
in-depth analysis of DALiuGE's scalability to very large numbers of tasks on
two supercomputing facilities.Comment: 31 pages, 12 figures, currently under review by Astronomy and
Computin
Adaptive Energy-aware Scheduling of Dynamic Event Analytics across Edge and Cloud Resources
The growing deployment of sensors as part of Internet of Things (IoT) is
generating thousands of event streams. Complex Event Processing (CEP) queries
offer a useful paradigm for rapid decision-making over such data sources. While
often centralized in the Cloud, the deployment of capable edge devices on the
field motivates the need for cooperative event analytics that span Edge and
Cloud computing. Here, we identify a novel problem of query placement on edge
and Cloud resources for dynamically arriving and departing analytic dataflows.
We define this as an optimization problem to minimize the total makespan for
all event analytics, while meeting energy and compute constraints of the
resources. We propose 4 adaptive heuristics and 3 rebalancing strategies for
such dynamic dataflows, and validate them using detailed simulations for 100 -
1000 edge devices and VMs. The results show that our heuristics offer
O(seconds) planning time, give a valid and high quality solution in all cases,
and reduce the number of query migrations. Furthermore, rebalance strategies
when applied in these heuristics have significantly reduced the makespan by
around 20 - 25%.Comment: 11 pages, 7 figure
Pando: Personal Volunteer Computing in Browsers
The large penetration and continued growth in ownership of personal
electronic devices represents a freely available and largely untapped source of
computing power. To leverage those, we present Pando, a new volunteer computing
tool based on a declarative concurrent programming model and implemented using
JavaScript, WebRTC, and WebSockets. This tool enables a dynamically varying
number of failure-prone personal devices contributed by volunteers to
parallelize the application of a function on a stream of values, by using the
devices' browsers. We show that Pando can provide throughput improvements
compared to a single personal device, on a variety of compute-bound
applications including animation rendering and image processing. We also show
the flexibility of our approach by deploying Pando on personal devices
connected over a local network, on Grid5000, a French-wide computing grid in a
virtual private network, and seven PlanetLab nodes distributed in a wide area
network over Europe.Comment: 14 pages, 12 figures, 2 table
Microgrid - The microthreaded many-core architecture
Traditional processors use the von Neumann execution model, some other
processors in the past have used the dataflow execution model. A combination of
von Neuman model and dataflow model is also tried in the past and the resultant
model is referred as hybrid dataflow execution model. We describe a hybrid
dataflow model known as the microthreading. It provides constructs for
creation, synchronization and communication between threads in an intermediate
language. The microthreading model is an abstract programming and machine model
for many-core architecture. A particular instance of this model is named as the
microthreaded architecture or the Microgrid. This architecture implements all
the concurrency constructs of the microthreading model in the hardware with the
management of these constructs in the hardware.Comment: 30 pages, 16 figure
Extending dataflow programs for guaranteed throughput.
International audienceIn the context of multi-core processors and the trend toward many-core, dataflow programming can be used as a solu- tion to the parallelization problem. By decoupling computa- tion from communication, this paradigm naturally exposes parallelism in several ways. In this work we propose lan- guage extensions for expressing throughput properties over dataflow programs together with a run-time mechanism for the observation of events meaningful to compute the effec- tive throughput. We show the limited impact of such mech- anisms on the application overall performances. We also review existing run-time adaptation mechanisms that may be used in a dataflow context to satisfy throughput require- ments
- …