508 research outputs found

    Forecasting the cost of processing multi-join queries via hashing for main-memory databases (Extended version)

    Full text link
    Database management systems (DBMSs) carefully optimize complex multi-join queries to avoid expensive disk I/O. As servers today feature tens or hundreds of gigabytes of RAM, a significant fraction of many analytic databases becomes memory-resident. Even after careful tuning for an in-memory environment, a linear disk I/O model such as the one implemented in PostgreSQL may make query response time predictions that are up to 2X slower than the optimal multi-join query plan over memory-resident data. This paper introduces a memory I/O cost model to identify good evaluation strategies for complex query plans with multiple hash-based equi-joins over memory-resident data. The proposed cost model is carefully validated for accuracy using three different systems, including an Amazon EC2 instance, to control for hardware-specific differences. Prior work in parallel query evaluation has advocated right-deep and bushy trees for multi-join queries due to their greater parallelization and pipelining potential. A surprising finding is that the conventional wisdom from shared-nothing disk-based systems does not directly apply to the modern shared-everything memory hierarchy. As corroborated by our model, the performance gap between the optimal left-deep and right-deep query plan can grow to about 10X as the number of joins in the query increases.Comment: 15 pages, 8 figures, extended version of the paper to appear in SoCC'1

    Flexible and efficient IR using array databases

    Get PDF
    textabstractThe Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

    Flexible and efficient IR using array databases

    Get PDF
    The Matrix Framework is a recent proposal by IR researchers to flexibly represent all important information retrieval models in a single multi-dimensional array framework. Computational support for exactly this framework is provided by the array database system SRAM (Sparse Relational Array Mapping) that works on top of a DBMS. Information retrieval models can be specified in its comprehension-based array query language, in a way that directly corresponds to the underlying mathematical formulas. SRAM efficiently stores sparse arrays in (compressed) relational tables and translates and optimizes array queries into relational queries. In this work, we describe a number of array query optimization rules and demonstrate their effect on text retrieval in the TREC TeraByte track (TREC-TB) efficiency task, using the Okapi BM25 model as our example. It turns out that these optimization rules enable SRAM to automatically translate the BM25 array queries into the relational equivalent of inverted list processing including compression, score materialization and quantization, such as employed by custom-built IR systems. The use of the high-performance MonetDB/X100 relational backend, that provides transparent database compression, allows the system to achieve very fast response times with good precision and low resource usage

    Doctor of Philosophy

    Get PDF
    dissertationEmerging trends such as growing architectural diversity and increased emphasis on energy and power efficiency motivate the need for code that adapts to its execution context (input dataset and target architecture). Unfortunately, writing such code remains difficult, and is typically attempted only by a small group of motivated expert programmers who are highly knowledgeable about the relationship between software and its hardware mapping. In this dissertation, we introduce novel abstractions and techniques based on automatic performance tuning that enable both experts and nonexperts (application developers) to produce adaptive code. We present two new frameworks for adaptive programming: Nitro and Surge. Nitro enables expert programmers to specify code variants, or alternative implementations of the same computation, together with meta-information for selecting among them. It then utilizes supervised classification to select an optimal code variant at runtime based on characteristics of the execution context. Surge, on the other hand, provides a high-level nested data-parallel programming interface for application developers to specify computations. It then employs a two-level mechanism to automatically generate code variants and then tunes them using Nitro. The resulting code performs on par with or better than handcrafted reference implementations on both CPUs and GPUs. In addition to abstractions for expressing code variants, this dissertation also presents novel strategies for adaptively tuning them. First, we introduce a technique for dynamically selecting an optimal code variant at runtime based on characteristics of the input dataset. On five high-performance GPU applications, variants tuned using this strategy achieve over 93% of the performance of variants selected through exhaustive search. Next, we present a novel approach based on multitask learning to develop a code variant selection model on a target architecture from training on different source architectures. We evaluate this approach on a set of six benchmark applications and a collection of six NVIDIA GPUs from three distinct architecture generations. Finally, we implement support for combined code variant and frequency selection based on multiple objectives, including power and energy efficiency. Using this strategy, we construct a GPU sorting implementation that provides improved energy and power efficiency with less than a proportional drop in sorting throughput

    Graph Data Processing and Analysis: From Algorithms to System Development

    Full text link
    There are many real-world application domains where data can be naturally modelled as graphs, such as social networks and computer networks. The amount of data generated and published is rapidly increasing with the explosion of information. Effective storage of graph data and querying has become a significant challenge; hence the graph database is emerging to address this challenge. Graph databases have the unique advantages of modelling and querying complex relationships, capturing and navigating complex data relationships and recursive path querying when handling graph data. In this thesis, we enhance graph databases from both system and algorithm perspectives. Firstly, we propose two systems, SQL2Cypher and FSPS, to improve the usability and efficiency of graph databases. SQL2Cypher automatically migrates data from a relational database to a graph database. This system also supports translating SQL queries into Cypher queries. FSPS is the first FPGA-based system for accelerating graph queries on massive graphs. FSPS has the following features 1) a CPU-FPGA co-designed framework, 2) a fully pipelined FPGA execution, and 3) reduced data transfer from FPGA’s external memory. FSPS supports the two most fundamental types of graph queries, namely subgraph and path queries. Performance evaluation shows that FSPS outperforms the most popular graph database, Neo4j, by up to three orders of magnitude. All the draft demo videos can be found at https://www.youtube.com/watch?v=oSpHtJ8iVio and https://www.youtube.com/watch?v=eGaeBrVTJws. Secondly, the graph database does not widely support the cohesive subgraph models (i.e., Neo4j and PatMat). Many real-world relationships can be naturally represented as bipartite graphs such as customer-product, user-item, and author-paper. Therefore, we use efficient construct algorithms to investigate the bipartite hierarchy model. The bipartite hierarchy is the first model to discover the hierarchical structure of bipartite graphs based on the concept of (alpha, beta)-core and graph connectivity. These algorithms can effectively identify the affected regions to limit computation scope and avoid re-building the bipartite hierarchy from scratch. Extensive experiments on 10 real-world graphs demonstrate the effectiveness of the proposed bipartite hierarchy and validate the efficiency of our hierarchy constructions algorithms

    High Performance Matrix-Fee Method for Large-Scale Finite Element Analysis on Graphics Processing Units

    Get PDF
    This thesis presents a high performance computing (HPC) algorithm on graphics processing units (GPU) for large-scale numerical simulations. In particular, the research focuses on the development of an efficient matrix-free conjugate gradient solver for the acceleration and scalability of the steady-state heat transfer finite element analysis (FEA) on a three-dimension uniform structured hexahedral mesh using a voxel-based technique. One of the greatest challenges in large-scale FEA is the availability of computer memory for solving the linear system of equations. Like in large-scale heat transfer simulations, where the size of the system matrix assembly becomes very large, the FEA solver requires huge amounts of computational time and memory that very often exceed the actual memory limits of the available hardware resources. To overcome this problem a matrix-free conjugate gradient (MFCG) method is designed and implemented to finite element computations which avoids the global matrix assembly. The main difference of the MFCG to the classical conjugate gradient (CG) solver lies on the implementation of the matrix-vector product operation. Matrix-vector operation found to be the most expensive process consuming more than 80% out of the total computations for the numerical solution and thus a matrix-free matrix-vector (MFMV) approach becomes beneficial for saving memory and computational time throughout the execution of the FEA. In summary, the MFMV algorithm consists of three nested loops: (a) a loop over the mesh elements of the domain, (b) a loop on the element nodal values to perform the element matrix-vector operations and (c) the summation and transformation of the nodal values into their correct positions in the global index. A performance analysis on a serial and a parallel implementation on a GPU shows that the MFCG solver outperforms the classical CG consuming significantly lower amounts of memory allowing for much larger size simulations. The outcome of this study suggests that the MFCG can also speed-up and scale the execution of large-scale finite element simulations

    Super-scalar RAM-CPU cache compression

    Get PDF
    High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even when high-e

    Scalable Integration View Computation and Maintenance with Parallel, Adaptive and Grouping Techniques

    Get PDF
    Materialized integration views constructed by integrating data from multiple distributed data sources help to achieve better access, reliable performance, and high availability for a wide range of applications. In this dissertation, we propose parallel, adaptive, and grouping techniques to address scalability challenges in high-performance integration view computation and maintenance due to increasingly large data sources and high rates of source updates. State-of-the-art parallel integration view computation makes the common assumption that the maximal pipelined parallelism leads to superior performance. We instead propose segmented bushy parallel processing that combines pipelined parallelism with alternate forms of parallelism to achieve an overall more effective strategy. Experimental studies conducted over a cluster of high-performance PCs confirm that the proposed strategy has an on average of 50\% improvement in terms of total processing time in comparison to existing solutions. Run-time adaptation becomes critical for parallel integration view computation due to its long running and memory intensive nature. We investigate two types of state level adaptations, namely, state spill and state relocation, to address the run-time memory shortage. We propose lazy-disk and active-disk approaches that integrate both adaptations to maximize run-time query throughput in a memory constrained environment. We also propose global throughput-oriented state adaptation strategies for computation plans with multiple state intensive operators. Extensive experiments confirm the effectiveness of our proposed adaptation solutions. Once results have been computed and materialized, it\u27s typically more efficient to maintain them incrementally instead of full recomputation. However, state-of-the-art incremental view maintenance require O(n2n^2) maintenance queries with n being the number of data sources that the view is defined upon. Moreover, they do not exploit view definitions and data source processing capabilities to further improve view maintenance performance. We propose novel grouping maintenance algorithms that dramatically reduce the number of maintenance queries to (O(n)). A cost-based view maintenance framework has been proposed to generate optimized maintenance plans tuned to particular environmental settings. Extensive experimental studies verify the effectiveness of our maintenance algorithms as well as the maintenance framework

    10381 Summary and Abstracts Collection -- Robust Query Processing

    Get PDF
    Dagstuhl seminar 10381 on robust query processing (held 19.09.10 - 24.09.10) brought together a diverse set of researchers and practitioners with a broad range of expertise for the purpose of fostering discussion and collaboration regarding causes, opportunities, and solutions for achieving robust query processing. The seminar strove to build a unified view across the loosely-coupled system components responsible for the various stages of database query processing. Participants were chosen for their experience with database query processing and, where possible, their prior work in academic research or in product development towards robustness in database query processing. In order to pave the way to motivate, measure, and protect future advances in robust query processing, seminar 10381 focused on developing tests for measuring the robustness of query processing. In these proceedings, we first review the seminar topics, goals, and results, then present abstracts or notes of some of the seminar break-out sessions. We also include, as an appendix, the robust query processing reading list that was collected and distributed to participants before the seminar began, as well as summaries of a few of those papers that were contributed by some participants

    Super-Scalar RAM-CPU Cache Compression

    Get PDF
    High-performance data-intensive query processing tasks like OLAP, data mining or scientific data analysis can be severely I/O bound, even whe
    • …
    corecore