9,018 research outputs found
Building Efficient Query Engines in a High-Level Language
Abstraction without regret refers to the vision of using high-level
programming languages for systems development without experiencing a negative
impact on performance. A database system designed according to this vision
offers both increased productivity and high performance, instead of sacrificing
the former for the latter as is the case with existing, monolithic
implementations that are hard to maintain and extend. In this article, we
realize this vision in the domain of analytical query processing. We present
LegoBase, a query engine written in the high-level language Scala. The key
technique to regain efficiency is to apply generative programming: LegoBase
performs source-to-source compilation and optimizes the entire query engine by
converting the high-level Scala code to specialized, low-level C code. We show
how generative programming allows to easily implement a wide spectrum of
optimizations, such as introducing data partitioning or switching from a row to
a column data layout, which are difficult to achieve with existing low-level
query compilers that handle only queries. We demonstrate that sufficiently
powerful abstractions are essential for dealing with the complexity of the
optimization effort, shielding developers from compiler internals and
decoupling individual optimizations from each other. We evaluate our approach
with the TPC-H benchmark and show that: (a) With all optimizations enabled,
LegoBase significantly outperforms a commercial database and an existing query
compiler. (b) Programmers need to provide just a few hundred lines of
high-level code for implementing the optimizations, instead of complicated
low-level code that is required by existing query compilation approaches. (c)
The compilation overhead is low compared to the overall execution time, thus
making our approach usable in practice for compiling query engines
Experiences with Some Benchmarks for Deductive Databases and Implementations of Bottom-Up Evaluation
OpenRuleBench is a large benchmark suite for rule engines, which includes
deductive databases. We previously proposed a translation of Datalog to C++
based on a method that "pushes" derived tuples immediately to places where they
are used. In this paper, we report performance results of various
implementation variants of this method compared to XSB, YAP and DLV. We study
only a fraction of the OpenRuleBench problems, but we give a quite detailed
analysis of each such task and the factors which influence performance. The
results not only show the potential of our method and implementation approach,
but could be valuable for anybody implementing systems which should be able to
execute tasks of the discussed types.Comment: In Proceedings WLP'15/'16/WFLP'16, arXiv:1701.0014
Code Generation for Efficient Query Processing in Managed Runtimes
In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous āimpedance mismatch ā problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an applicationās in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1
Tupleware: Redefining Modern Analytics
There is a fundamental discrepancy between the targeted and actual users of
current analytics frameworks. Most systems are designed for the data and
infrastructure of the Googles and Facebooks of the world---petabytes of data
distributed across large cloud deployments consisting of thousands of cheap
commodity machines. Yet, the vast majority of users operate clusters ranging
from a few to a few dozen nodes, analyze relatively small datasets of up to a
few terabytes, and perform primarily compute-intensive operations. Targeting
these users fundamentally changes the way we should build analytics systems.
This paper describes the design of Tupleware, a new system specifically aimed
at the challenges faced by the typical user. Tupleware's architecture brings
together ideas from the database, compiler, and programming languages
communities to create a powerful end-to-end solution for data analysis. We
propose novel techniques that consider the data, computations, and hardware
together to achieve maximum performance on a case-by-case basis. Our
experimental evaluation quantifies the impact of our novel techniques and shows
orders of magnitude performance improvement over alternative systems
AMaĻoSāAbstract Machine for Xcerpt
Web query languages promise convenient and efficient access
to Web data such as XML, RDF, or Topic Maps. Xcerpt is one such Web
query language with strong emphasis on novel high-level constructs for
effective and convenient query authoring, particularly tailored to versatile
access to data in different Web formats such as XML or RDF.
However, so far it lacks an efficient implementation to supplement the
convenient language features. AMaĻoS is an abstract machine implementation
for Xcerpt that aims at efficiency and ease of deployment. It
strictly separates compilation and execution of queries: Queries are compiled
once to abstract machine code that consists in (1) a code segment
with instructions for evaluating each rule and (2) a hint segment that
provides the abstract machine with optimization hints derived by the
query compilation. This article summarizes the motivation and principles
behind AMaĻoS and discusses how its current architecture realizes
these principles
Two research contributions in 64-bit computing: Testing and Applications
Following the release of Windows 64-bit and Redhat Linux 64-bit operating systems (OS) in late April 2005, this is the one of the first 64-bit OS research project completed in a British university. The objective is to investigate (1) the increase/decrease in performance compared to 32-bit computing; (2) the techniques used to develop 64-bit applications; and (3) how 64-bit computing should be used in IT and research organizations to improve their work. This paper summarizes research discoveries for this investigation, including two major research contributions in (1) testing and (2) application development. The first contribution includes performance, stress, application, multiplatform, JDK and compatibility testing for AMD and Intel models. Comprehensive testing results reveal that 64-bit computing has a better performance in application performance, system performance and stress testing, but a worse performance in compatibility testing than the traditional 32-bit computing. A 64-bit dual-core processor has been tested and the results show that it performs better than a 64-bit single-core processor, but only in application that requires very high demands of CPU and memory consumption. The second contribution is .NET 1.1 64-bit implementations. Without additional troubleshooting, .NET 1.1 does not work on 64-bit Windows operating systems in stable ways. After stabilizing .NET environment, the next step is the application development, which is a dynamic repository with functions such as registration, download, login-logout, product submissions, database storage and statistical reports. The technology is based on Visual Studio .NET 2003, .NET 1.1 Framework with Service Pack 1, SQL Server 2000 with Service Pack 4 and IIS Server 6.0 on the Windows Server 2003 Enterprise x64 platform with Service Pack 1
- ā¦