1,181 research outputs found
XWeB: the XML Warehouse Benchmark
With the emergence of XML as a standard for representing business data, new
decision support applications are being developed. These XML data warehouses
aim at supporting On-Line Analytical Processing (OLAP) operations that
manipulate irregular XML data. To ensure feasibility of these new tools,
important performance issues must be addressed. Performance is customarily
assessed with the help of benchmarks. However, decision support benchmarks do
not currently support XML features. In this paper, we introduce the XML
Warehouse Benchmark (XWeB), which aims at filling this gap. XWeB derives from
the relational decision support benchmark TPC-H. It is mainly composed of a
test data warehouse that is based on a unified reference model for XML
warehouses and that features XML-specific structures, and its associate XQuery
decision support workload. XWeB's usage is illustrated by experiments on
several XML database management systems
Making an Embedded DBMS JIT-friendly
While database management systems (DBMSs) are highly optimized, interactions
across the boundary between the programming language (PL) and the DBMS are
costly, even for in-process embedded DBMSs. In this paper, we show that
programs that interact with the popular embedded DBMS SQLite can be
significantly optimized - by a factor of 3.4 in our benchmarks - by inlining
across the PL / DBMS boundary. We achieved this speed-up by replacing parts of
SQLite's C interpreter with RPython code and composing the resulting
meta-tracing virtual machine (VM) - called SQPyte - with the PyPy VM. SQPyte
does not compromise stand-alone SQL performance and is 2.2% faster than SQLite
on the widely used TPC-H benchmark suite.Comment: 24 pages, 18 figure
Automatically Leveraging MapReduce Frameworks for Data-Intensive Applications
MapReduce is a popular programming paradigm for developing large-scale,
data-intensive computation. Many frameworks that implement this paradigm have
recently been developed. To leverage these frameworks, however, developers must
become familiar with their APIs and rewrite existing code. Casper is a new tool
that automatically translates sequential Java programs into the MapReduce
paradigm. Casper identifies potential code fragments to rewrite and translates
them in two steps: (1) Casper uses program synthesis to search for a program
summary (i.e., a functional specification) of each code fragment. The summary
is expressed using a high-level intermediate language resembling the MapReduce
paradigm and verified to be semantically equivalent to the original using a
theorem prover. (2) Casper generates executable code from the summary, using
either the Hadoop, Spark, or Flink API. We evaluated Casper by automatically
converting real-world, sequential Java benchmarks to MapReduce. The resulting
benchmarks perform up to 48.2x faster compared to the original.Comment: 12 pages, additional 4 pages of references and appendi
Compositional Verification for Timed Systems Based on Automatic Invariant Generation
We propose a method for compositional verification to address the state space
explosion problem inherent to model-checking timed systems with a large number
of components. The main challenge is to obtain pertinent global timing
constraints from the timings in the components alone. To this end, we make use
of auxiliary clocks to automatically generate new invariants which capture the
constraints induced by the synchronisations between components. The method has
been implemented in the RTD-Finder tool and successfully experimented on
several benchmarks
Shared Arrangements: practical inter-query sharing for streaming dataflows
Current systems for data-parallel, incremental processing and view
maintenance over high-rate streams isolate the execution of independent
queries. This creates unwanted redundancy and overhead in the presence of
concurrent incrementally maintained queries: each query must independently
maintain the same indexed state over the same input streams, and new queries
must build this state from scratch before they can begin to emit their first
results. This paper introduces shared arrangements: indexed views of maintained
state that allow concurrent queries to reuse the same in-memory state without
compromising data-parallel performance and scaling. We implement shared
arrangements in a modern stream processor and show order-of-magnitude
improvements in query response time and resource consumption for interactive
queries against high-throughput streams, while also significantly improving
performance in other domains including business analytics, graph processing,
and program analysis
- …