44,119 research outputs found
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
Functionally Specified Distributed Transactions in Co-operative Scenarios
Addresses the problem of specifying co-operative, distributed transactions in a manner that can be subject to verification and testing. Our approach combines the process-algebraic language LOTOS and the object-oriented database modelling language TM to obtain a clear and formal protocol for distributed database transactions meant to describe co-operation scenarios. We argue that a separation of concerns, namely the interaction of database applications on the one hand and data modelling on the other, results in a practical, modular approach that is formally well-founded. An advantage of this is that we may vary over transaction models to support the language combinatio
An introduction to Graph Data Management
A graph database is a database where the data structures for the schema
and/or instances are modeled as a (labeled)(directed) graph or generalizations
of it, and where querying is expressed by graph-oriented operations and type
constructors. In this article we present the basic notions of graph databases,
give an historical overview of its main development, and study the main current
systems that implement them
Concurrent constraint programming with process mobility
We propose an extension of concurrent constraint programming with primitives for process migration within a hierarchical network, and we study its semantics. To this purpose, we first investigate a "pure " paradigm for process migration, namely a paradigm where the only actions are those dealing with transmissions of processes. Our goal is to give a structural definition of the semantics of migration; namely, we want to describe the behaviour of the system, during the transmission of a process, in terms of the behaviour of the components. We achieve this goal by using a labeled transition system where the effects of sending a process, and requesting a process, are modeled by symmetric rules (similar to handshaking-rules for synchronous communication) between the two partner nodes in the network. Next, we extend our paradigm with the primitives of concurrent constraint programming, and we show how to enrich the semantics to cope with the notions of environment and constraint store. Finally, we show how the operational semantics can be used to define an interpreter for the basic calculus.
PLACES'10: The 3rd Workshop on Programmng Language Approaches to concurrency and Communication-Centric Software
Paphos, Cyprus. March 201
Introducing Molly: Distributed Memory Parallelization with LLVM
Programming for distributed memory machines has always been a tedious task,
but necessary because compilers have not been sufficiently able to optimize for
such machines themselves. Molly is an extension to the LLVM compiler toolchain
that is able to distribute and reorganize workload and data if the program is
organized in statically determined loop control-flows. These are represented as
polyhedral integer-point sets that allow program transformations applied on
them. Memory distribution and layout can be declared by the programmer as
needed and the necessary asynchronous MPI communication is generated
automatically. The primary motivation is to run Lattice QCD simulations on IBM
Blue Gene/Q supercomputers, but since the implementation is not yet completed,
this paper shows the capabilities on Conway's Game of Life
- …