Search CORE

369 research outputs found

Code Generation for Efficient Query Processing in Managed Runtimes

Author: Bierman Gavin M.
Nagel Fabian
Viglas Stratis D.
Publication venue
Publication date: 01/01/2014
Field of study

In this paper we examine opportunities arising from the conver-gence of two trends in data management: in-memory database sys-tems (IMDBs), which have received renewed attention following the availability of affordable, very large main memory systems; and language-integrated query, which transparently integrates database queries with programming languages (thus addressing the famous ‘impedance mismatch ’ problem). Language-integrated query not only gives application developers a more convenient way to query external data sources like IMDBs, but also to use the same querying language to query an application’s in-memory collections. The lat-ter offers further transparency to developers as the query language and all data is represented in the data model of the host program-ming language. However, compared to IMDBs, this additional free-dom comes at a higher cost for query evaluation. Our vision is to improve in-memory query processing of application objects by introducing database technologies to managed runtimes. We focus on querying and we leverage query compilation to im-prove query processing on application objects. We explore dif-ferent query compilation strategies and study how they improve the performance of query processing over application data. We take C] as the host programming language as it supports language-integrated query through the LINQ framework. Our techniques de-liver significant performance improvements over the default LINQ implementation. Our work makes important first steps towards a future where data processing applications will commonly run on machines that can store their entire datasets in-memory, and will be written in a single programming language employing language-integrated query and IMDB-inspired runtimes to provide transparent and highly efficient querying. 1

CiteSeerX

Crossref

Edinburgh Research Explorer

Recommended from our members

Improving Database Performance on Simultaneous Multithreading Processors

Author: Cieslewicz John
Ross Kenneth A.
Shah Mihir
Zhou Jingren
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2005
Field of study

Simultaneous multithreading (SMT) allows multiple threads to supply instructions to the instruction pipeline of a superscalar processor. Because threads share processor resources, an SMT system is inherently different from a multiprocessor system and, therefore, utilizing multiple threads on an SMT processor creates new challenges for database implementers. We investigate three thread-based techniques to exploit SMT architectures on memory-resident data. First, we consider running independent operations in separate threads, a technique applied to conventional multiprocessor systems. Second, we describe a novel implementation strategy in which individual operators are implemented in a multi-threaded fashion. Finally, we introduce a new data-structure called a work-ahead set that allows us to use one of the threads to aggressively preload data into the cache for use by the other thread. We evaluate each method with respect to its performance, implementation complexity, and other measures. We also provide guidance regarding when and how to best utilize the various threading techniques. Our experimental results show that by taking advantage of SMT technology we achieve a 30\% to 70\% improvement in throughput over single threaded implementations on in-memory database operations

Columbia University Academic Commons

BriskStream: Scaling Data Stream Processing on Shared-Memory Multicore Architectures

Author: He Bingsheng
He Jiong
Zhang Shuhao
Zhou Amelie Chi
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 07/04/2019
Field of study

We introduce BriskStream, an in-memory data stream processing system (DSPSs) specifically designed for modern shared-memory multicore architectures. BriskStream's key contribution is an execution plan optimization paradigm, namely RLAS, which takes relative-location (i.e., NUMA distance) of each pair of producer-consumer operators into consideration. We propose a branch and bound based approach with three heuristics to resolve the resulting nontrivial optimization problem. The experimental evaluations demonstrate that BriskStream yields much higher throughput and better scalability than existing DSPSs on multi-core architectures when processing different types of workloads.Comment: To appear in SIGMOD'1

arXiv.org e-Print Archive

Crossref

ScholarBank@NUS

Optimal column layout for hybrid workloads

Author: Athanassoulis Manoussos
Bøgh Kenneth
Idreos Stratos
Publication venue: 'VLDB Endowment'
Publication date: 01/09/2019
Field of study

Data-intensive analytical applications need to support both efficient reads and writes. However, what is usually a good data layout for an update-heavy workload, is not well-suited for a read-mostly one and vice versa. Modern analytical data systems rely on columnar layouts and employ delta stores to inject new data and updates. We show that for hybrid workloads we can achieve close to one order of magnitude better performance by tailoring the column layout design to the data and query workload. Our approach navigates the possible design space of the physical layout: it organizes each column’s data by determining the number of partitions, their corresponding sizes and ranges, and the amount of buffer space and how it is allocated. We frame these design decisions as an optimization problem that, given workload knowledge and performance requirements, provides an optimal physical layout for the workload at hand. To evaluate this work, we build an in-memory storage engine, Casper, and we show that it outperforms state-of-the-art data layouts of analytical systems for hybrid workloads. Casper delivers up to 2.32x higher throughput for update-intensive workloads and up to 2.14x higher throughput for hybrid workloads. We further show how to make data layout decisions robust to workload variation by carefully selecting the input of the optimization.http://www.vldb.org/pvldb/vol12/p2393-athanassoulis.pdfPublished versionPublished versio

Boston University Institutional Repository (OpenBU)

Code generation for efficient query processing in managed runtimes

Author: Boncz P.
Cheney J.
Dees J.
Diaconu C.
Francisco P.
Harizopoulos S.
Koch C.
Krikellas K.
Meijer E.
Padmanabhan S.
Pirk H.
Publication venue: 'VLDB Endowment'
Publication date
Field of study

Crossref

Recommended from our members

Analytical Query Execution Optimized for all Layers of Modern Hardware

Author: Polychroniou Orestis
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2018
Field of study

Analytical database queries are at the core of business intelligence and decision support. To analyze the vast amounts of data available today, query execution needs to be orders of magnitude faster. Hardware advances have made a profound impact on database design and implementation. The large main memory capacity allows queries to execute exclusively in memory and shifts the bottleneck from disk access to memory bandwidth. In the new setting, to optimize query performance, databases must be aware of an unprecedented multitude of complicated hardware features. This thesis focuses on the design and implementation of highly efficient database systems by optimizing analytical query execution for all layers of modern hardware. The hardware layers include the network across multiple machines, main memory and the NUMA interconnection across multiple processors, the multiple levels of caches across multiple processor cores, and the execution pipeline within each core. For the network layer, we introduce a distributed join algorithm that minimizes the network traffic. For the memory hierarchy, we describe partitioning variants aware to the dynamics of the CPU caches and the NUMA interconnection. To improve the memory access rate of linear scans, we optimize lightweight compression variants and evaluate their trade-offs. To accelerate query execution within the core pipeline, we introduce advanced SIMD vectorization techniques generalizable across multiple operators. We evaluate our algorithms and techniques on both mainstream hardware and on many-integrated-core platforms, and combine our techniques in a new query engine design that can better utilize the features of many-core CPUs. In the era of hardware becoming increasingly parallel and datasets consistently growing in size, this thesis can serve as a compass for developing hardware-conscious databases with truly high-performance analytical query execution

Columbia University Academic Commons

MonetDB/X100 - A DBMS in the CPU cache

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2005
Field of study

X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t

CWI's Institutional Repository

Event Stream Processing with Multiple Threads

Author: DA Basin
G Graefe
H Nazarpour
J Ha
JJ Harrow
L Kuhtz
M Paes
PMG Apers
S Berkovich
S Hallé
S Qadeer
S Savage
Publication venue
Publication date: 09/07/2017
Field of study

Current runtime verification tools seldom make use of multi-threading to speed up the evaluation of a property on a large event trace. In this paper, we present an extension to the BeepBeep 3 event stream engine that allows the use of multiple threads during the evaluation of a query. Various parallelization strategies are presented and described on simple examples. The implementation of these strategies is then evaluated empirically on a sample of problems. Compared to the previous, single-threaded version of the BeepBeep engine, the allocation of just a few threads to specific portions of a query provides dramatic improvement in terms of running time

arXiv.org e-Print Archive

Crossref