8,676 research outputs found
A Reference Interpreter for the Graph Programming Language GP 2
GP 2 is an experimental programming language for computing by graph
transformation. An initial interpreter for GP 2, written in the functional
language Haskell, provides a concise and simply structured reference
implementation. Despite its simplicity, the performance of the interpreter is
sufficient for the comparative investigation of a range of test programs. It
also provides a platform for the development of more sophisticated
implementations.Comment: In Proceedings GaM 2015, arXiv:1504.0244
A Dataflow Language for Decentralised Orchestration of Web Service Workflows
Orchestrating centralised service-oriented workflows presents significant
scalability challenges that include: the consumption of network bandwidth,
degradation of performance, and single points of failure. This paper presents a
high-level dataflow specification language that attempts to address these
scalability challenges. This language provides simple abstractions for
orchestrating large-scale web service workflows, and separates between the
workflow logic and its execution. It is based on a data-driven model that
permits parallelism to improve the workflow performance. We provide a
decentralised architecture that allows the computation logic to be moved
"closer" to services involved in the workflow. This is achieved through
partitioning the workflow specification into smaller fragments that may be sent
to remote orchestration services for execution. The orchestration services rely
on proxies that exploit connectivity to services in the workflow. These proxies
perform service invocations and compositions on behalf of the orchestration
services, and carry out data collection, retrieval, and mediation tasks. The
evaluation of our architecture implementation concludes that our decentralised
approach reduces the execution time of workflows, and scales accordingly with
the increasing size of data sets.Comment: To appear in Proceedings of the IEEE 2013 7th International Workshop
on Scientific Workflows, in conjunction with IEEE SERVICES 201
A survey of parallel execution strategies for transitive closure and logic programs
An important feature of database technology of the nineties is the use of parallelism for speeding up the execution of complex queries. This technology is being tested in several experimental database architectures and a few commercial systems for conventional select-project-join queries. In particular, hash-based fragmentation is used to distribute data to disks under the control of different processors in order to perform selections and joins in parallel. With the development of new query languages, and in particular with the definition of transitive closure queries and of more general logic programming queries, the new dimension of recursion has been added to query processing. Recursive queries are complex; at the same time, their regular structure is particularly suited for parallel execution, and parallelism may give a high efficiency gain. We survey the approaches to parallel execution of recursive queries that have been presented in the recent literature. We observe that research on parallel execution of recursive queries is separated into two distinct subareas, one focused on the transitive closure of Relational Algebra expressions, the other one focused on optimization of more general Datalog queries. Though the subareas seem radically different because of the approach and formalism used, they have many common features. This is not surprising, because most typical Datalog queries can be solved by means of the transitive closure of simple algebraic expressions. We first analyze the relationship between the transitive closure of expressions in Relational Algebra and Datalog programs. We then review sequential methods for evaluating transitive closure, distinguishing iterative and direct methods. We address the parallelization of these methods, by discussing various forms of parallelization. Data fragmentation plays an important role in obtaining parallel execution; we describe hash-based and semantic fragmentation. Finally, we consider Datalog queries, and present general methods for parallel rule execution; we recognize the similarities between these methods and the methods reviewed previously, when the former are applied to linear Datalog queries. We also provide a quantitative analysis that shows the impact of the initial data distribution on the performance of methods
Recommended from our members
Large-scale social-media analytics on stratosphere
The importance of social-media platforms and online communities - in business as well as public context - is more and more acknowledged and appreciated by industry and researchers alike. Consequently, a wide range of analytics has been proposed to understand, steer, and exploit the mechanics and laws driving their functionality and creating the resulting benefits. However, analysts usually face significant problems in scaling existing and novel approaches to match the data volume and size of modern online communities. In this work, we propose and demonstrate the usage of the massively parallel data processing system Stratosphere, based on second order functions as an extended notion of the MapReduce paradigm, to provide a new level of scalability to such social-media analytics. Based on the popular example of role analysis, we present and illustrate how this massively parallel approach can be leveraged to scale out complex data-mining tasks, while providing a programming approach that eases the formulation of complete analytical workflows
Lightweight Asynchronous Snapshots for Distributed Dataflows
Distributed stateful stream processing enables the deployment and execution
of large scale continuous computations in the cloud, targeting both low latency
and high throughput. One of the most fundamental challenges of this paradigm is
providing processing guarantees under potential failures. Existing approaches
rely on periodic global state snapshots that can be used for failure recovery.
Those approaches suffer from two main drawbacks. First, they often stall the
overall computation which impacts ingestion. Second, they eagerly persist all
records in transit along with the operation states which results in larger
snapshots than required. In this work we propose Asynchronous Barrier
Snapshotting (ABS), a lightweight algorithm suited for modern dataflow
execution engines that minimises space requirements. ABS persists only operator
states on acyclic execution topologies while keeping a minimal record log on
cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics
engine that supports stateful stream processing. Our evaluation shows that our
algorithm does not have a heavy impact on the execution, maintaining linear
scalability and performing well with frequent snapshots.Comment: 8 pages, 7 figure
A Logic of Reachable Patterns in Linked Data-Structures
We define a new decidable logic for expressing and checking invariants of
programs that manipulate dynamically-allocated objects via pointers and
destructive pointer updates. The main feature of this logic is the ability to
limit the neighborhood of a node that is reachable via a regular expression
from a designated node. The logic is closed under boolean operations
(entailment, negation) and has a finite model property. The key technical
result is the proof of decidability. We show how to express precondition,
postconditions, and loop invariants for some interesting programs. It is also
possible to express properties such as disjointness of data-structures, and
low-level heap mutations. Moreover, our logic can express properties of
arbitrary data-structures and of an arbitrary number of pointer fields. The
latter provides a way to naturally specify postconditions that relate the
fields on entry to a procedure to the fields on exit. Therefore, it is possible
to use the logic to automatically prove partial correctness of programs
performing low-level heap mutations
Compiling array computations for the Fresh Breeze Parallel Processor
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.Includes bibliographical references (p. 80).Fresh Breeze is a highly parallel architecture currently under development, which strives to provide high performance scientific computing with simple programmability. The architecture provides for multithreaded determinate execution with a write-once shared memory system. In particular, Fresh Breeze data structures must be constructed from directed acyclic graphs of immutable fixed-size chunks of memory, rather than laid out in a mutable linear memory. While this model is well suited for executing functional programs, the goal of this thesis is to see if conventional programs can be efficiently compiled for this novel memory system and parallelization model, focusing specifically on array-based linear algebra computations. We compile a subset of Java, targeting the Fresh Breeze instruction set. The compiler, using a static data-flow graph intermediate representation, performs analysis and transformations which reduce communication with the shared memory and identify opportunities for parallelization.by Igor Arkadiy Ginzburg.M.Eng
- …