12 research outputs found

    Integration of VectorWise with Ingres

    Get PDF
    Actian Corporation recently entered into a cooperative relationship with VectorWise BV to integrate its VectorWise technology into the Ingres RDBMS server. The resulting commercial product has already achieved phenomenal performance results with the TPC-H industry standard benchmark, and has been well received in the analytical RDBMS market. This paper describes the integration of the VectorWise technology with Ingres, some of the design decisions made as part of the integration project, and the problems that had to be solved in the process

    Vectorwise: Beyond Column Stores

    Get PDF
    textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative influence on product development

    From X100 to Vectorwise: opportunities, challenges and things most researchers do not think about

    Get PDF
    textabstractIn 2008 a group of researchers behind the X100 database kernel created Vectorwise: a spin-o which together with the Actian corporation (previously Ingres) worked on bringing this technology to the market. Today, Vectorwise is a popular product and one of the examples of conversion of a research prototype into successful commercial software. We describe here some of the interesting aspects of the work performed by the Vectorwise development team in the process, and discuss the op- portunities and challenges resulting from the decision of integrating a prototype-quality kernel with Ingres, an established commercial product. We also discuss how requirements coming from real-life scenarios sometimes clashed with design choices and simplications often found in research projects, and how Vectorwise team addressed some of of them

    Vectorwise: a Vectorized Analytical DBMS

    Get PDF
    Vectorwise is a new entrant in the analytical database marketplace whose technology comes straight from innovations in the database research community in the past years. The product has since made waves due to its excellent performance in analytical customer workloads as well as benchmarks. We describe the history of Vectorwise, as well as its basic architecture and the experiences in turning a technology developed in an academic context into a commercial-grade product. Finally, we turn our attention to recent performance results, most notably on the TPC-H benchmark at various sizes

    Business Analytics in (a) Blink

    Get PDF
    The Blink project’s ambitious goal is to answer all Business Intelligence (BI) queries in mere seconds, regardless of the database size, with an extremely low total cost of ownership. Blink is a new DBMS aimed primarily at read-mostly BI query processing that exploits scale-out of commodity multi-core processors and cheap DRAM to retain a (copy of a) data mart completely in main memory. Additionally, it exploits proprietary compression technology and cache-conscious algorithms that reduce memory bandwidth consumption and allow most SQL query processing to be performed on the compressed data. Blink always scans (portions of) the data mart in parallel on all nodes, without using any indexes or materialized views, and without any query optimizer to choose among them. The Blink technology has thus far been incorp

    Massively Parallel Sort-Merge Joins in Main Memory Multi-Core Database Systems

    Full text link
    Two emerging hardware trends will dominate the database system technology in the near future: increasing main memory capacities of several TB per server and massively parallel multi-core processing. Many algorithmic and control techniques in current database technology were devised for disk-based systems where I/O dominated the performance. In this work we take a new look at the well-known sort-merge join which, so far, has not been in the focus of research in scalable massively parallel multi-core data processing as it was deemed inferior to hash joins. We devise a suite of new massively parallel sort-merge (MPSM) join algorithms that are based on partial partition-based sorting. Contrary to classical sort-merge joins, our MPSM algorithms do not rely on a hard to parallelize final merge step to create one complete sort order. Rather they work on the independently created runs in parallel. This way our MPSM algorithms are NUMA-affine as all the sorting is carried out on local memory partitions. An extensive experimental evaluation on a modern 32-core machine with one TB of main memory proves the competitive performance of MPSM on large main memory databases with billions of objects. It scales (almost) linearly in the number of employed cores and clearly outperforms competing hash join proposals - in particular it outperforms the "cutting-edge" Vectorwise parallel query engine by a factor of four.Comment: VLDB201

    Automatic Schema Design for Co-Clustered Tables

    Get PDF
    Schema design of analytical workloads provides opportunities to index, cluster, partition and/or materialize. With these opportunities also the complexity of finding the right setup rises. In this paper we present an automatic schema design approach for a table co-clustering scheme called Bitwise Dimensional Co-Clustering, aimed at schemas with a moderate amount dimensions, but not limited to typical star and snowflake schemas. The goal is to design one primary schema and keep the knobs to turn to a minimum while providing a robust schema for a wide range of queries. In our approach a clustered schema is derived by trying to apply dimensions throughout the whole schema and co-cluster as many tables as possible according to at least one common dimension. Our approach is based on the assumption that initially foreign key relationships and a set of dimensions are defined based on classic DDL

    Query processing of pre-partitioned data using Sandwich Operators

    Get PDF
    textabstractIn this paper we present the Sandwich Operators, an elegant approach to exploit pre-sorting or pre-grouping from clustered storage schemes in operators such as Aggregation/Grouping, HashJoin, and Sort of a database management system. Thereby, each of these operator types is "sandwiched" by two new operators, namely PartitionSplit and PartitionRestart. PartitionSplit splits the input relation into its smaller independent groups on which the sandwiched operator is executed. After a group is processed PartitionRestart is used to trigger the execution on the following group. Executing one of these operator types with the help of the Sandwich Operators introduces minimal overhead and does not penalty performance of the sandwiched operator as its implementation remains unchanged. On the contrary, we show that sandwiched execution of an operator results in lower memory consumption and faster execution time. PartitionSplit and PartitionRestart replace special implementations of partitioned versions of these operator. Sandwich Operators also turn blocking operators in streaming operators, resulting in faster response times for the first query results

    From Cooperative Scans to Predictive Buffer Management

    Get PDF
    In analytical applications, database systems often need to sustain workloads with multiple concurrent scans hitting the same table. The Cooperative Scans (CScans) framework, which introduces an Active Buffer Manager (ABM) component into the database architecture, has been the most effective and elaborate response to this problem, and was initially developed in the X100 research prototype. We now report on the the experiences of integrating Cooperative Scans into its industrial-strength successor, the Vectorwise database product. During this implementation we invented a simpler optimization of concurrent scan buffer management, called Predictive Buffer Management (PBM). PBM is based on the observation that in a workload with long-running scans, the buffer manager has quite a bit of information on the workload in the immediate future, such that an approximation of the ideal OPT algorithm becomes feasible. In the evaluation on both synthetic benchmarks as well as a TPC-H throughput run we compare the benefits of naive buffer management (LRU) versus CScans, PBM and OPT; showing that PBM achieves benefits close to Cooperative Scans, while incurring much lower architectural impact.Comment: VLDB201
    corecore