Search CORE

43 research outputs found

MonetDB/X100 - A DBMS in the CPU cache

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2005
Field of study

X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t

CWI's Institutional Repository

Positional Delta Trees to reconcile updates with read-optimized data storage

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/08/2008
Field of study

We investigate techniques that marry the high readonly analytical query performance of compressed, replicated column storage (“read-optimized” databases) with the ability to handle a high-throughput update workload. Today’s large RAM sizes and the growing gap between sequential vs. random IO disk throughput, bring this once elusive goal in reach, as it has become possible to buffer enough updates in memory to allow background migration of these updates to disk, where efficient sequential IO is amortized among many updates. Our key goal is that read-only queries always see the latest database state, yet are not (significantly) slowed down by the update processing. To this end, we propose the Positional Delta Tree (PDT), that is designed to minimize the overhead of on-the-fly merging of differential updates into (index) scans on stale disk-based data. We describe the PDT data structure and its basic operations (lookup, insert, delete, modify) and provide an in-detail study of their performance. Further, we propose a storage architecture called Replicated Mirrors, that replicates tables in multiple orders, storing each table copy mirrored in both column- and row-wise data formats, and uses PDTs to handle updates. Experiments in the MonetDB/X100 system show that this integrated architecture is able to achieve our main goals

CWI's Institutional Repository

Flexible and efficient IR using array databases

Author: Arjen P. de Vries
Marcin Zukowski
Peter Boncz
Roberto Cornacchia
Sándor Héman
Publication venue: Springer Nature
Publication date: 01/01/2007
Field of study

textabstractThe Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

Springer - Publisher Connector

CWI's Institutional Repository

Super-scalar RAM-CPU cache compression

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: CWI
Publication date: 01/01/2005
Field of study

High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-e

CWI's Institutional Repository

Super-Scalar RAM-CPU Cache Compression

Author: Boncz P.A. (Peter)
Héman S. (Sándor)
Nes N.J. (Niels)
Zukowski M. (Marcin)
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even whe

CWI's Institutional Repository

Super-Scalar RAM-CPU Cache Compression

Author: Boncz Peter
Héman Sándor
Nes Niels
Zukowski Marcin
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2005
Field of study

textabstractHigh-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-end RAID storage systems are used. Compression can alleviate this bottleneck only if encoding and decoding speeds significantly exceed RAID I/O bandwidth. For this purpose, we propose three new versatile compression schemes (PDICT, PFOR, and PFOR-DELTA) that are specifically designed to extract maximum IPC from modern CPUs. We compare these algorithms with compression techniques used in (commercial) database and information retrieval systems. Our experiments on the MonetDB/X100 database system, using both DSM and PAX disk storage, show that these techniques strongly accelerate TPC-H performance to the point that the I/O bottleneck is eliminated

Crossref

CWI's Institutional Repository

Flexible and efficient IR using array databases

Author: Boncz P.A. (Peter)
Cornacchia R. (Roberto)
Héman S. (Sándor)
Vries A.P. (Arjen) de
Zukowski M. (Marcin)
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

The Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

CWI's Institutional Repository

Flexible and efficient IR using array databases

Author: A. Eisenberg
A. Trotman
Arjen P. de Vries
C. Galindo-Legaria
D.A. Grossman
G. Graefe
G.H. Golub
H. Turtle
I.H. Witten
L.A. Barroso
M.F. Porter
Marcin Zukowski
P. Buneman
Peter Boncz
Roberto Cornacchia
S.E. Robertson
Sándor Héman
T. Grabs
T. Roelleke
U.S. Chakravarthy
V.N. Anh
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Promovendus maakt verwerking Big Data stukken sneller - Automatiseringsgids.nl

Author: CWI CWI
Héman S. (Sándor)
Publication venue
Publication date: 27/10/2015
Field of study

CWI's Institutional Repository