1,370 research outputs found
PlinyCompute: A Platform for High-Performance, Distributed, Data-Intensive Tool Development
This paper describes PlinyCompute, a system for development of
high-performance, data-intensive, distributed computing tools and libraries. In
the large, PlinyCompute presents the programmer with a very high-level,
declarative interface, relying on automatic, relational-database style
optimization to figure out how to stage distributed computations. However, in
the small, PlinyCompute presents the capable systems programmer with a
persistent object data model and API (the "PC object model") and associated
memory management system that has been designed from the ground-up for high
performance, distributed, data-intensive computing. This contrasts with most
other Big Data systems, which are constructed on top of the Java Virtual
Machine (JVM), and hence must at least partially cede performance-critical
concerns such as memory management (including layout and de/allocation) and
virtual method/function dispatch to the JVM. This hybrid approach---declarative
in the large, trusting the programmer's ability to utilize PC object model
efficiently in the small---results in a system that is ideal for the
development of reusable, data-intensive tools and libraries. Through extensive
benchmarking, we show that implementing complex objects manipulation and
non-trivial, library-style computations on top of PlinyCompute can result in a
speedup of 2x to more than 50x or more compared to equivalent implementations
on Spark.Comment: 48 pages, including references and Appendi
Approaches to Interpreter Composition
In this paper, we compose six different Python and Prolog VMs into 4 pairwise
compositions: one using C interpreters; one running on the JVM; one using
meta-tracing interpreters; and one using a C interpreter and a meta-tracing
interpreter. We show that programs that cross the language barrier frequently
execute faster in a meta-tracing composition, and that meta-tracing imposes a
significantly lower overhead on composed programs relative to mono-language
programs.Comment: 33 pages, 1 figure, 9 table
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Code Generation for Big Data Processing in the Web using WebAssembly
Traditional clusters for cloud computing are quite hard to configure and setup, and the number of cluster nodes is limited by the available hardware in the cluster. We hence envision the concept of a Browser Cloud: One just has to visit with his/her web browser a certain webpage in order to connect his/her computer to the Browser Cloud. In this way the setup of the Browser Cloud is much easier than those of traditional clouds. Furthermore, the Browser Cloud has a much larger number of potential nodes, as any computer running a browser may connect to and be integrated in the Browser Cloud. New challenges arise when setting up a cloud by web browsers: Data is processed within the browser, which requires to use the technologies offered by the browser for this purpose. The typically used JavaScript runtime environment may be too slow, because JavaScript is an interpreted language. Hence we investigate the possibilities for computing the work-intensive part of the query processing inside a virtual machine of the web browser. The technology WebAssemby for virtual machines is recently supported by all major browsers and promises high speedups in comparison with JavaScript. Recent approaches to efficient Big Data processing generate code for the data processing steps of queries. To run the generated code in a WebAssembly virtual machine, an online compiler is needed to generate the WebAssembly bytecode from the generated code. Hence our main contribution is an online compiler to WebAssembly bytecode especially developed to run in the web browser and for Big Data processing based on code generation of the processing steps. In our experiments, the runtimes of Big Data processing using JavaScript is compared with running WebAssembly technologies in the major web browsers
Weaving Rules into [email protected] for Embedded Smart Systems
Smart systems are characterised by their ability to analyse measured data in
live and to react to changes according to expert rules. Therefore, such systems
exploit appropriate data models together with actions, triggered by
domain-related conditions. The challenge at hand is that smart systems usually
need to process thousands of updates to detect which rules need to be
triggered, often even on restricted hardware like a Raspberry Pi. Despite
various approaches have been investigated to efficiently check conditions on
data models, they either assume to fit into main memory or rely on high latency
persistence storage systems that severely damage the reactivity of smart
systems. To tackle this challenge, we propose a novel composition process,
which weaves executable rules into a data model with lazy loading abilities. We
quantitatively show, on a smart building case study, that our approach can
handle, at low latency, big sets of rules on top of large-scale data models on
restricted hardware.Comment: pre-print version, published in the proceedings of MOMO-17 Worksho
Towards optimisation of model queries : A parallel execution approach
The growing size of software models poses significant scalability challenges. Amongst these challenges is the execution time of queries and transformations. In many cases, model management programs are (or can be) expressed as chains and combinations of core fundamental operations. Most of these operations are pure functions, making them amenable to parallelisation, lazy evaluation and short-circuiting. In this paper we show how all three of these optimisations can be combined in the context of Epsilon: an OCL-inspired family of model management languages. We compare our solutions with both interpreted and compiled OCL as well as hand-written Java code. Our experiments show a significant improvement in the performance of queries, especially on large models
Tools of the Trade: A Survey of Various Agent Based Modeling Platforms
Agent Based Modeling (ABM) toolkits are as diverse as the community of people who use them. With so many toolkits available, the choice of which one is best suited for a project is left to word of mouth, past experiences in using particular toolkits and toolkit publicity. This is especially troublesome for projects that require specialization. Rather than using toolkits that are the most publicized but are designed for general projects, using this paper, one will be able to choose a toolkit that already exists and that may be built especially for one's particular domain and specialized needs. In this paper, we examine the entire continuum of agent based toolkits. We characterize each based on 5 important characteristics users consider when choosing a toolkit, and then we categorize the characteristics into user-friendly taxonomies that aid in rapid indexing and easy reference.Agent Based Modeling, Individual Based Model, Multi Agent Systems
- …