20 research outputs found

    MonetDB/X100 - A DBMS in the CPU cache

    Get PDF
    X100 is a new execution engine for the MonetDB system, that improves execution speed and overcomes its main memory limitation. It introduces t

    Positional Delta Trees to reconcile updates with read-optimized data storage

    Get PDF
    We investigate techniques that marry the high readonly analytical query performance of compressed, replicated column storage (“read-optimized” databases) with the ability to handle a high-throughput update workload. Today’s large RAM sizes and the growing gap between sequential vs. random IO disk throughput, bring this once elusive goal in reach, as it has become possible to buffer enough updates in memory to allow background migration of these updates to disk, where efficient sequential IO is amortized among many updates. Our key goal is that read-only queries always see the latest database state, yet are not (significantly) slowed down by the update processing. To this end, we propose the Positional Delta Tree (PDT), that is designed to minimize the overhead of on-the-fly merging of differential updates into (index) scans on stale disk-based data. We describe the PDT data structure and its basic operations (lookup, insert, delete, modify) and provide an in-detail study of their performance. Further, we propose a storage architecture called Replicated Mirrors, that replicates tables in multiple orders, storing each table copy mirrored in both column- and row-wise data formats, and uses PDTs to handle updates. Experiments in the MonetDB/X100 system show that this integrated architecture is able to achieve our main goals

    Super-scalar RAM-CPU cache compression

    Get PDF
    High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-e

    Super-Scalar RAM-CPU Cache Compression

    Get PDF
    High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even whe

    Flexible and efficient IR using array databases

    Get PDF
    The Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

    VectorWise

    No full text
    VectorWise (2008) is based on scientific research results and derives its strength from a completely new approach on data processing. The approach makes use of vector processing on data sets in which every vector is tailored to the size of the cache memory of modern processors.The vectorized execution model also allows exploiting SIMD features (such as SSE in Intel CPUs) as well as multi-core technology, and is complemented by innovations to help optimize high-bandwidth disk I/O. The new method is beneficial for existing database applications and allows organizations to perform data analysis tasks that were previously not feasible. Applications are not restricted to businesses, as increasingly advances in logistics, science, medicine and healthcare depend on the analysis of very large data volumes. The CWI spin-off VectorWise that was created around this software, was sold in 2011 to Actian Corporation.</p

    Super-Scalar RAM-CPU Cache Compression

    No full text
    CWI is a founding member of ERCIM, the European Research Consortium for Informatics and Mathematics. CWI&apos;s research has a theme-oriented structure and is grouped into four clusters. Listed below are the names of the clusters and in parentheses their acronyms
    corecore