217 research outputs found

    Vectorwise: Beyond Column Stores

    Get PDF
    textabstractThis paper tells the story of Vectorwise, a high-performance analytical database system, from multiple perspectives: its history from academic project to commercial product, the evolution of its technical architecture, customer reactions to the product and its future research and development roadmap. One take-away from this story is that the novelty in Vectorwise is much more than just column-storage: it boasts many query processing innovations in its vectorized execution model, and an adaptive mixed row/column data storage model with indexing support tailored to analytical workloads. Another one is that there is a long road from research prototype to commercial product, though database research continues to achieve a strong innovative influence on product development

    Disk

    Get PDF
    In disk storage, data is recorded on planar, round and rotating surfaces (disks, discs, or platters). A disk drive is a peripheral device of a computer system, connected by some communication medium to a disk controller. The disk controller is a chip, typically connected to the CPU of the computer by the internal communication bus. Main implementations are hard disks, floppy disks and optical discs, of which the first is the usual interpretation. Recently, Solid State Disks have been introduced; though the term ‘disc’ is a misnomer for these devices, as internally they consist of NAND Flash memory chips. Similarly, the term RAM Disk is used for a storage device consisting of volatile DRAM memory. Both offer the same data storage services at the operating system level, though their price, size, performance and persistence characteristics are very different from a hard disk

    Letter from the Special Issue Editor

    Get PDF
    Editorial work for DEBULL on a special issue on data management on Storage Class Memory (SCM) technologies

    Main Memory

    Get PDF
    Primary storage, presently known as main memory, is the largest memory directly accessible to the CPU in the prevalent Von Neumann model and stores both data and instructions (program code). The CPU continuously reads instructions stored there and executes them. It is also called Random Access Memory (RAM), to indicate that load/store instructions can access data at any location at the same cost, is usually implemented using DRAM chips, which are connected to the CPU and other peripherals (disk drive, network) via a bus

    Processor Cache

    Get PDF
    To hide the high latencies of DRAM access, modern computer architecture now features a memory hierarchy that besides DRAM also includes SRAM cache memories, typically located on the CPU chip. Memory access first check these caches, which takes only a few cycles. Only if the needed data is not found, an expensive memory access is needed

    From X100 to Vectorwise: opportunities, challenges and things most researchers do not think about

    Get PDF
    textabstractIn 2008 a group of researchers behind the X100 database kernel created Vectorwise: a spin-o which together with the Actian corporation (previously Ingres) worked on bringing this technology to the market. Today, Vectorwise is a popular product and one of the examples of conversion of a research prototype into successful commercial software. We describe here some of the interesting aspects of the work performed by the Vectorwise development team in the process, and discuss the op- portunities and challenges resulting from the decision of integrating a prototype-quality kernel with Ingres, an established commercial product. We also discuss how requirements coming from real-life scenarios sometimes clashed with design choices and simplications often found in research projects, and how Vectorwise team addressed some of of them

    AmbientDB: P2P Data Management Middleware for Ambient Intelligence

    Get PDF
    The future generation of consumer electronics devices is envisioned to provide automatic cooperation between devices and run applications that are sensitive to people's likings, personalized to their requirements, anticipatory of their behavior and responsive to their presence. We see this `Ambient Intelligence' as a key feature of future pervasive computing. We focus here on one of the challenges in realizing this vision: information management. This entails integrating, querying, synchronizing and evolving structured data, on a heterogeneous and ad-hoc collection of (mobile) devices. Rather than hard-coding data management functionality in each individual application, we argue for adding highlevel data management functionalities to the distributed middleware layer. Our AmbientDB P2P database management system addresses this by providing a global database abstraction over an ad-hoc network of heterogeneous peers

    Moa and the Multi-model Architecture: A New Perspective on NF2

    Full text link

    Advances in Large-Scale RDF Data Management

    Get PDF
    One of the prime goals of the LOD2 project is improving the performance and scalability of RDF storage solutions so that the increasing amount of Linked Open Data (LOD) can be efficiently managed. Virtuoso has been chosen as the basic RDF store for the LOD2 project, and during the project it has been significantly improved by incorporating advanced relational database techniques from MonetDB and Vectorwise, turning it into a compressed column store with vectored execution. This has reduced the performance gap (“RDF tax”) between Virtuoso’s SQL and SPARQL query performance in a way that still respects the “schema-last” nature of RDF. However, by lacking schema information, RDF database systems such as Virtuoso still cannot use advanced relational storage optimizations such as table partitioning or clustered indexes and have to execute SPARQL queries with many self-joins to a triple table, which leads to more join effort than needed in SQL systems. In this chapter, we first discuss the new column store techniques applied to Virtuoso, the enhancements in its cluster parallel version, and show its performance using the popular BSBM benchmark at the unsurpassed scale of 150 billion triples. We finally describe ongoing work in deriving an “emergent” relational schema from RDF data, which can help to close the performance gap between relational-based and RDF-based storage solutions

    Distributed XQuery and Updates Processing with Heterogeneous XQuery Engines

    Get PDF
    We demonstrate XRPC, a minimal XQuery extension that enables distributed querying between heterogeneous XQuery engines. The XRPC language extension enhances the existing concept of XQuery functions with the Remote Procedure Call (RPC) paradigm. XRPC is orthogonal to all XQuery features, including the XQuery Update Facility (XQUF). Note that executing xquf updating functions over XRPC leads to the phenomenon of distributed transactions. XRPC achieves heterogeneity by an open SOAP-based network protocol, that can be implemented by any engine, and an XRPC Wrapper that allow
    corecore