228 research outputs found

    GRATIN: Accelerating Graph Traversals in Main-Memory Column Stores

    Get PDF
    Native graph query and processing capabilities have become indispensable for modern business applications in enterprise-critical operations on data that is stored in relational database management systems. Traversal operations are a basic ingredient of graph algorithms and graph queries. As a consequence, they are fundamental for querying graph data in a relational database management system. In this paper we present gratin, a concise secondary index structure to speedup graph traversals in main-memory column stores. Conventional approaches for graph traversals rely on repeated full column scans, making it an inefficient approach for deep traversals on very large graphs. To tackle this challenge, we devise a novel and adaptive block-based index to handle graphs efficiently. Most importantly, gratin is updateable in constant time and allows supporting evolving graphs with frequent updates to the graph topology. We conducted an extensive evaluation on real-world data sets from different domains for a large variety of traversal queries. Our experiments show improvements of up to an order of magnitude compared to a scan-based traversal algorithm

    An Application-Specific Instruction Set for Accelerating Set-Oriented Database Primitives

    Get PDF
    The key task of database systems is to efficiently manage large amounts of data. A high query throughput and a low query latency are essential for the success of a database system. Lately, research focused on exploiting hardware features like superscalar execution units, SIMD, or multiple cores to speed up processing. Apart from these software optimizations for given hardware, even tailor-made processing circuits running on FPGAs are built to run mostly stateless query plans with incredibly high throughput. A similar idea, which was already considered three decades ago, is to build tailor-made hardware like a database processor. Despite their superior performance, such application-specific processors were not considered to be beneficial because general-purpose processors eventually always caught up so that the high development costs did not pay off. In this paper, we show that the development of a database processor is much more feasible nowadays through the availability of customizable processors. We illustrate exemplarily how to create an instruction set extension for set-oriented database rimitives. The resulting application-specific processor provides not only a high performance but it also enables very energy-efficient processing. Our processor requires in various configurations more than 960x less energy than a high-end x86 processor while providing the same performance

    MetaXMorph: Hierarchical Transformation of Data with Metadata

    Get PDF
    This research is about transforming data. Data comes in different shapes; it can be structured as a graph, a tree, a collection of tables, or some other shape. In this thesis, we focus on data structured as a tree, which is known as hierarchical data. The same data could be structured in many different tree shapes. Previously it was shown how to transform data from one tree shape, one hierarchy to another without losing any information. But sometimes the pieces of the hierarchy are annotated or associated with metadata, that is, with data about the data itself. The metadata can have special semantics that must be preserved when the data is transformed. Previous research also sketched how to transform hierarchical data annotated with metadata without losing information while preserving the semantics of the metadata. In this thesis, we implement the research on transforming data with metadata by extending XMorph, a data transformation language. And we evaluate the extension showing that the overhead is modest

    Distribution Policies for Datalog

    Get PDF
    Modern data management systems extensively use parallelism to speed up query processing over massive volumes of data. This trend has inspired a rich line of research on how to formally reason about the parallel complexity of join computation. In this paper, we go beyond joins and study the parallel evaluation of recursive queries. We introduce a novel framework to reason about multi-round evaluation of Datalog programs, which combines implicit predicate restriction with distribution policies to allow expressing a combination of data-parallel and query-parallel evaluation strategies. Using our framework, we reason about key properties of distributed Datalog evaluation, including parallel-correctness of the evaluation strategy, disjointness of the computation effort, and bounds on the number of communication rounds
    • …
    corecore