4,677 research outputs found
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for developing large-scale,
data-intensive computation. Many frameworks that implement this paradigm have
recently been developed. To leverage these frameworks, however, developers must
become familiar with their APIs and rewrite existing code. Casper is a new tool
that automatically translates sequential Java programs into the MapReduce
paradigm. Casper identifies potential code fragments to rewrite and translates
them in two steps: (1) Casper uses program synthesis to search for a program
summary (i.e., a functional specification) of each code fragment. The summary
is expressed using a high-level intermediate language resembling the MapReduce
paradigm and verified to be semantically equivalent to the original using a
theorem prover. (2) Casper generates executable code from the summary, using
either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically
converting real-world, sequential Java benchmarks to MapReduce. The resulting
benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi
Flattening an object algebra to provide performance
Algebraic transformation and optimization techniques have been the method of choice in relational query execution, but applying them in object-oriented (OO) DBMSs is difficult due to the complexity of OO query languages. This paper demonstrates that the problem can be simplified by mapping an OO data model to the binary relational model implemented by Monet, a state-of-the-art database kernel. We present a generic mapping scheme to flatten data models and study the case of straightforward OO model. We show how flattening enabled us to implement a query algebra, using only a very limited set of simple operations. The required primitives and query execution strategies are discussed, and their performance is evaluated on the 1-GByte TPC-D (Transaction-processing Performance Council's Benchmark D), showing that our divide-and-conquer approach yields excellent result
Cobra: A Framework for Cost Based Rewriting of Database Applications
Database applications are typically written using a mixture of imperative
languages and declarative frameworks for data processing. Application logic
gets distributed across the declarative and imperative parts of a program.
Often, there is more than one way to implement the same program, whose
efficiency may depend on a number of parameters. In this paper, we propose a
framework that automatically generates all equivalent alternatives of a given
program using a given set of program transformations, and chooses the least
cost alternative. We use the concept of program regions as an algebraic
abstraction of a program and extend the Volcano/Cascades framework for
optimization of algebraic expressions, to optimize programs. We illustrate the
use of our framework for optimizing database applications. We show through
experimental results, that our framework has wide applicability in real world
applications and provides significant performance benefits
- …