9,721 research outputs found
Program Transformations for Asynchronous and Batched Query Submission
The performance of database/Web-service backed applications can be
significantly improved by asynchronous submission of queries/requests well
ahead of the point where the results are needed, so that results are likely to
have been fetched already when they are actually needed. However, manually
writing applications to exploit asynchronous query submission is tedious and
error-prone. In this paper we address the issue of automatically transforming a
program written assuming synchronous query submission, to one that exploits
asynchronous query submission. Our program transformation method is based on
data flow analysis and is framed as a set of transformation rules. Our rules
can handle query executions within loops, unlike some of the earlier work in
this area. We also present a novel approach that, at runtime, can combine
multiple asynchronous requests into batches, thereby achieving the benefits of
batching in addition to that of asynchronous submission. We have built a tool
that implements our transformation techniques on Java programs that use JDBC
calls; our tool can be extended to handle Web service calls. We have carried
out a detailed experimental study on several real-life applications, which
shows the effectiveness of the proposed rewrite techniques, both in terms of
their applicability and the performance gains achieved.Comment: 14 page
PF-OLA: A High-Performance Framework for Parallel On-Line Aggregation
Online aggregation provides estimates to the final result of a computation
during the actual processing. The user can stop the computation as soon as the
estimate is accurate enough, typically early in the execution. This allows for
the interactive data exploration of the largest datasets. In this paper we
introduce the first framework for parallel online aggregation in which the
estimation virtually does not incur any overhead on top of the actual
execution. We define a generic interface to express any estimation model that
abstracts completely the execution details. We design a novel estimator
specifically targeted at parallel online aggregation. When executed by the
framework over a massive TPC-H instance, the estimator provides
accurate confidence bounds early in the execution even when the cardinality of
the final result is seven orders of magnitude smaller than the dataset size and
without incurring overhead.Comment: 36 page
VerdictDB: Universalizing Approximate Query Processing
Despite 25 years of research in academia, approximate query processing (AQP)
has had little industrial adoption. One of the major causes of this slow
adoption is the reluctance of traditional vendors to make radical changes to
their legacy codebases, and the preoccupation of newer vendors (e.g.,
SQL-on-Hadoop products) with implementing standard features. Additionally, the
few AQP engines that are available are each tied to a specific platform and
require users to completely abandon their existing databases---an unrealistic
expectation given the infancy of the AQP technology. Therefore, we argue that a
universal solution is needed: a database-agnostic approximation engine that
will widen the reach of this emerging technology across various platforms.
Our proposal, called VerdictDB, uses a middleware architecture that requires
no changes to the backend database, and thus, can work with all off-the-shelf
engines. Operating at the driver-level, VerdictDB intercepts analytical queries
issued to the database and rewrites them into another query that, if executed
by any standard relational engine, will yield sufficient information for
computing an approximate answer. VerdictDB uses the returned result set to
compute an approximate answer and error estimates, which are then passed on to
the user or application. However, lack of access to the query execution layer
introduces significant challenges in terms of generality, correctness, and
efficiency. This paper shows how VerdictDB overcomes these challenges and
delivers up to 171 speedup (18.45 on average) for a variety of
existing engines, such as Impala, Spark SQL, and Amazon Redshift, while
incurring less than 2.6% relative error. VerdictDB is open-sourced under Apache
License.Comment: Extended technical report of the paper that appeared in Proceedings
of the 2018 International Conference on Management of Data, pp. 1461-1476.
ACM, 201
The JStar language philosophy
This paper introduces the JStar parallel programming language, which is a Java-based declarative language aimed at discouraging sequential programming, en-couraging massively parallel programming, and giving the compiler and runtime maximum freedom to try alternative parallelisation strategies. We describe the execution semantics and runtime support of the language, several optimisations and parallelism strategies, with some benchmark results
- …