125 research outputs found

    An Empirical Evaluation of XQuery Processors

    Get PDF
    This paper presents an extensive and detailed experimental evaluation of XQuery processors. The study consists of running five publicly available XQuery benchmarks --- the Michigan benchmark (MBench), XBench, XMach-1, XMark and X007 --- on six XQuery processors, three stand-alone (file-based) XQuery processors (Galax, Qizx/Open, Saxon-B) and three XML/XQuery database systems (BerkeleyDB/XML, MonetDB/XQuery, X-Hive/DB). Next to assessing and comparing the functionality, performance and scalability for the various systems, the major focus of this work is to report in detail about the experiences made while performing such an exhaustive study, to discuss all the problems that we encountered and how we solved them, and hence to hopefully provide some guidelines (or even a recipe) for performing reproducible large-scale experime

    An Empirical Evaluation of XQuery Processors

    Get PDF
    This paper presents an extensive and detailed experimental evaluation of XQuery processors. The study consists of running five publicly available XQuery benchmarks --- the Michigan benchmark (MBench), XBench, XMach-1, XMark and X007 --- on six XQuery processors, three stand-alone (file-based) XQuery processors (Galax, Qizx/Open, Saxon-B) and three XML/XQuery database systems (BerkeleyDB/XML, MonetDB/XQuery, X-Hive/DB). Next to assessing and comparing the functionality, performance and scalability for the various systems, the major focus of this work is to report in detail about the experiences made while performing such an exhaustive study, to discuss all the problems that we encountered and how we solved them, and hence to hopefully provide some guidelines (or even a recipe) for performing reproducible large-scale experimental research and system evaluation

    Efficient resource utilization in shared-everything environments

    Get PDF
    Efficient resource usage is a key to achieve better performance in parallel database systems. Up to now, most research has focussed on balancing the load on several resources of the same type, i.e. balancing either CPU load or I/O load. In this paper, we present emph{floating probe, a strategy for parallel evaluation of pipelining segments in a shared-everything environment that provides dynamic load balancing between CPU- and I/O-resources. The key idea of floating probe is to overlap---as much as possible with respect to data dependencies---I/O-bound build phase and CPU-bound probe phase of pipelining segments to improve resource utilization. Simulation results show, that floating probe achieves shorter execution times while consuming less memory than conventional pipelining strategies

    Cracking the database store

    Get PDF
    Query performance strongly depends on finding an execution plan that touches as few superfluous tuples as possible. The access structures d

    Adaptive indexing in modern database kernels

    Get PDF
    Physical design represents one of the hardest problems for database management systems. Without proper tuning, systems cannot achieve good performance. Offline indexing creates indexes a priori assuming good workload knowledge and idle time. More recently, online indexing monitors the workload trends and creates or drops indexes online. Adaptive indexing takes another step towards completely automating the tuning process of a database system, by enabling incremental and partial online indexing. The main idea is that physical design changes continuously, adaptively, partially, incrementally and on demand while processing queries as part of the execution operators. As such it brings a plethora of opportunities for rethinking and improving every single corner of database system design. We will analyze the indexing space between offline, online and adaptive indexing through several state of the art indexing techniques, e. g., what-if analysis and soft indexes. We will discuss in detail adaptive indexing techniques such as database cracking, adaptive merging, sideways cracking and various hybrids that try to balance the online tuning overhead with the convergence speed to optimal performance. In addition, we will discuss how various aspects of modern techniques for database architectures, such as vectorization, bulk processing, column-store execution and storage affect adaptive indexing. Finally, we will discuss several open research topics towards fully automomous database kernels

    Big Data

    Get PDF

    Storing XML Documents in Databases

    Get PDF
    The authors introduce concepts for loading large amounts of XML documents into databases where the documents are stored and maintained. The goal is to make XML databases as unobtrusive in multi-tier systems as possible and at the same time provide as many services defined by the XML standards as possible. The ubiquity of XML has sparked great interest in deploying concepts known from Relational Database Management Systems such as declarative query languages, transactions, indexes and integrity constraints. This chapter presents now bulkloading is done in Monet XML, a main memory XML database system, and evaluates the cost of bulkloading and bulk deletion with respect to strategies which base on insertion and deletion of individual nodes. Additionally, we survey the applicability of the techniques to a wider class of XML storage schemas

    Self-organizing tuple reconstruction in column-stores

    Get PDF
    Column-stores gained popularity as a promising physical design alternative. Each attribute of a relation is physically stored as a separate column allowing queries to load only the required attributes. The overhead incurred is on-the-fly tuple reconstruction for multi-attribute queries. Each tuple reconstruction is a join of two columns based on tuple IDs, making it a significant cost component. The ultimate physical design is to have multiple presorted copies of each base table such that tuples are already appropriately organized in multiple different orders across the various columns. This requires the ability to predict the workload, idle time to prepare, and infrequent updates. In this paper, we propose a novel design, \emph{partial sideways cracking}, that minimizes the tuple rec

    Benchmarking adaptive indexing

    Get PDF
    Ideally, realizing the best physical design for the current and all subsequent workloads would impact neither performance nor storage usage. In reality, workloads and datasets can change dramatically over time and index creation impacts the performance of concurrent user and system activity. We propose a framework that evaluates the key premise of adaptive indexing --- a new indexing paradigm where index creation and re-organization take place automatically and incrementally, as a side-effect of query execution. We focus on how the incremental costs and benefits of dynamic reorganization are distributed across the workload's lifetime. We believe measuring the costs and utility of the stages of adaptation are relevant metrics for evaluating new query processing paradigms and comparing them to traditional approaches
    • …
    corecore