26 research outputs found

    Database Performance Tuning Methods for Manufacturing Execution System

    Get PDF
    In manufacturing industry where data are produced and shared every day, data volumes could be large enough for the database performance to become an issue. Manufacturing Execution System (MES) is such a system that cannot tolerate with poor database performance as the system relies heavily on real-time reporting that requires instance query responses. Manufacturing products’ quality and production targets can be affected as the result of delayed queries. Therefore, the need to maintain the acceptable level of database performance in this domain is crucial. One task in maintaining database performance is identification and diagnosis of the root causes that may cause delayed queries. Poor query design has been identified as one major cause of delayed queries that affect real-time reporting. Nevertheless, as various methods available to deal with poor query design, it is important for a database administrator to decide the method or combination of methods that work best. In this paper, we present a case study on the methods used by a real manufacturing industry company called as Silterra and the methods proposed in the literature that deal with poor query design. For each method, we elicit its strength and weaknesses and analyse the practical implementation of it

    Left Bit Right: For SPARQL Join Queries with OPTIONAL Patterns (Left-outer-joins)

    Full text link
    SPARQL basic graph pattern (BGP) (a.k.a. SQL inner-join) query optimization is a well researched area. However, optimization of OPTIONAL pattern queries (a.k.a. SQL left-outer-joins) poses additional challenges, due to the restrictions on the \textit{reordering} of left-outer-joins. The occurrence of such queries tends to be as high as 50% of the total queries (e.g., DBPedia query logs). In this paper, we present \textit{Left Bit Right} (LBR), a technique for \textit{well-designed} nested BGP and OPTIONAL pattern queries. Through LBR, we propose a novel method to represent such queries using a graph of \textit{supernodes}, which is used to aggressively prune the RDF triples, with the help of compressed indexes. We also propose novel optimization strategies -- first of a kind, to the best of our knowledge -- that combine together the characteristics of \textit{acyclicity} of queries, \textit{minimality}, and \textit{nullification}, \textit{best-match} operators. In this paper, we focus on OPTIONAL patterns without UNIONs or FILTERs, but we also show how UNIONs and FILTERs can be handled with our technique using a \textit{query rewrite}. Our evaluation on RDF graphs of up to and over one billion triples, on a commodity laptop with 8 GB memory, shows that LBR can process \textit{well-designed} low-selectivity complex queries up to 11 times faster compared to the state-of-the-art RDF column-stores as Virtuoso and MonetDB, and for highly selective queries, LBR is at par with them.Comment: SIGMOD 201

    Database system architecture supporting coexisting query languages and data models

    Get PDF
    SIGLELD:D48239/84 / BLDSC - British Library Document Supply CentreGBUnited Kingdo

    Design and Evaluation of Small-Large Outer Joins in Cloud Computing Environments

    Get PDF
    Large-scale analytics is a key application area for data processing and parallel computing research. One of the most common (and challenging) operations in this domain is the join. Though inner join approaches have been extensively evaluated in parallel and distributed systems, there is little published work providing analysis of outer joins, especially in the extremely popular cloud computing environments. A common type of outer join is the small-large outer join, where one relation is relatively small and the other is large. Conventional implementations on this condition, such as one based on hash redistribution, often incur significant network communication, while the duplication-based approaches are complex and inefficient. In this work, we present a new method called DDR (duplication and direct redistribution), which aims to enable efficient small-large outer joins in cloud computing environments while being easy to implement using existing predicates in data processing frameworks. We present the detailed implementation of our approach and evaluate its performance through extensive experiments over the widely used MapReduce and Spark platforms. We show that the proposed method is scalable and can achieve significant performance improvements over the conventional approaches. Compared to the state-of-art method, the DDR algorithm is shown to be easier to implement and can achieve very similar or better performance under different outer join workloads, and thus, can be considered as a new option for current data analysis applications. Moreover, our detailed experimental results also have provided insights of current small-large outer join implementations, thereby allowing system developers to make a more informed choice for their data analysis applications

    Query Flattening and the Nested Data Parallelism Paradigm

    Get PDF
    This work is based on the observation that languages for two seemingly distant domains are closely related. Orthogonal query languages based on comprehension syntax admit various forms of query nesting to construct nested query results and express complex predicates. Languages for nested data parallelism allow to nest parallel iterators and thereby admit the parallel evaluation of computations that are themselves parallel. Both kinds of languages center around the application of side-effect-free functions to each element of a collection. The motivation for this work is the seamless integration of relational database queries with programming languages. In frameworks for language-integrated database queries, a host language's native collection-programming API is used to express queries. To mediate between native collection programming and relational queries, we define an expressive, orthogonal query calculus that supports nesting and order. The challenge of query flattening is to translate this calculus to bundles of efficient relational queries restricted to flat, unordered multisets. Prior approaches to query flattening either support only query languages that lack in expressiveness or employ a complex, monolithic translation that is hard to comprehend and generates inefficient code that is hard to optimize. To improve on those approaches, we draw on the similarity to nested data parallelism. Blelloch's flattening transformation is a static program transformation that translates nested data parallelism to flat data parallel programs over flat arrays. Based on the flattening transformation, we describe a pipeline of small, comprehensible lowering steps that translates our nested query calculus to a bundle of relational queries. The pipeline is based on a number of well-defined intermediate languages. Our translation adopts the key concepts of the flattening transformation but is designed with specifics of relational query processing in mind. Based on this translation, we revisit all aspects of query flattening. Our translation is fully compositional and can translate any term of the input language. Like prior work, the translation by itself produces inefficient code due to compositionality that is not fit for execution without optimization. In contrast to prior work, we show that query optimization is orthogonal to flattening and can be performed before flattening. We employ well-known work on logical query optimization for nested query languages and demonstrate that this body of work integrates well with our approach. Furthermore, we describe an improved encoding of ordered and nested collections in terms of flat, unordered multisets. Our approach emits idiomatic relational queries in which the effort required to maintain the non-relational semantics of the source language (order and nesting) is minimized. A set of experiments provides evidence that our approach to query flattening can handle complex, list-based queries with nested results and nested intermediate data well. We apply our approach to a number of flat and nested benchmark queries and compare their runtime with hand-written SQL queries. In these experiments, our SQL code generated from a list-based nested query language usually performs as well as hand-written queries

    Efficient handling of SPARQL OPTIONAL for OBDA

    Get PDF
    OPTIONAL is a key feature in SPARQL for dealing with missing information. While this operator is used extensively, it is also known for its complexity, which can make efficient evaluation of queries with OPTIONAL challenging. We tackle this problem in the Ontology-Based Data Access (OBDA) setting, where the data is stored in a SQL relational database and exposed as a virtual RDF graph by means of an R2RML mapping. We start with a succinct translation of a SPARQL fragment into SQL. It fully respects bag semantics and three-valued logic and relies on the extensive use of the LEFT JOIN operator and COALESCE function. We then propose optimisation techniques for reducing the size and improving the structure of generated SQL queries. Our optimisations capture interactions between JOIN, LEFT JOIN, COALESCE and integrity constraints such as attribute nullability, uniqueness and foreign key constraints. Finally, we empirically verify effectiveness of our techniques on the BSBM OBDA benchmark

    Load-balancing distributed outer joins through operator decomposition

    Get PDF
    High-performance data analytics largely relies on being able to efficiently execute various distributed data operators such as distributed joins. So far, large amounts of join methods have been proposed and evaluated in parallel and distributed environments. However, most of them focus on inner joins, and there is little published work providing the detailed implementations and analysis of outer joins. In this work, we present POPI (Partial Outer join & Partial Inner join), a novel method to load-balance large parallel outer joins by decomposing them into two operations: a large outer join over data that does not present significant skew in the input and an inner join over data presenting significant skew. We present the detailed implementation of our approach and show that POPI is implementable over a variety of architectures and underlying join implementations. Moreover, our experimental evaluation over a distributed memory platform also demonstrates that the proposed method is able to improve outer join performance under varying data skew and present excellent load-balancing properties, compared to current approaches

    A solution for synchronous incremental maintenance of materialized views based on SQL recursive query

    Get PDF
    Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed method
    corecore