133 research outputs found

    Performance analysis of a dynamic query processing scheme

    Get PDF
    Traditional query optimizers produce a fixed query evaluation plan based on assumptions about data distribution and processor workloads. However, these assumptions may not hold at query execution time. In this paper, we propose a dynamic query processing scheme and we present the performance results obtained by simulation of a queueing network model of the proposed software architecture

    Implementation of Query Optimization for Reducing Run Time

    Get PDF
    Query optimization is the process of selecting the most efficient query-evaluation plan from many strategies so, In this paper  we have developed a technique that performs query optimization at compile-time to reduce the burden of optimization at run-time to improve the performance of the code execution. using histograms that are computed from the data and these histograms are used to get the estimate of selectivity for query joins and predicates in a query at compile-time. With these estimates, a query plan is constructed at compile-time and executed it at run-time Keywords: runtime, query optimization ,compile time histogra

    Kaskade: Graph Views for Efficient Graph Analytics

    Full text link
    Graphs are an increasingly popular way to model real-world entities and relationships between them, ranging from social networks to data lineage graphs and biological datasets. Queries over these large graphs often involve expensive subgraph traversals and complex analytical computations. These real-world graphs are often substantially more structured than a generic vertex-and-edge model would suggest, but this insight has remained mostly unexplored by existing graph engines for graph query optimization purposes. Therefore, in this work, we focus on leveraging structural properties of graphs and queries to automatically derive materialized graph views that can dramatically speed up query evaluation. We present KASKADE, the first graph query optimization framework to exploit materialized graph views for query optimization purposes. KASKADE employs a novel constraint-based view enumeration technique that mines constraints from query workloads and graph schemas, and injects them during view enumeration to significantly reduce the search space of views to be considered. Moreover, it introduces a graph view size estimator to pick the most beneficial views to materialize given a query set and to select the best query evaluation plan given a set of materialized views. We evaluate its performance over real-world graphs, including the provenance graph that we maintain at Microsoft to enable auditing, service analytics, and advanced system optimizations. Our results show that KASKADE substantially reduces the effective graph size and yields significant performance speedups (up to 50X), in some cases making otherwise intractable queries possible

    A query processing system for very large spatial databases using a new map algebra

    Get PDF
    Dans cette thĂšse nous introduisons une approche de traitement de requĂȘtes pour des bases de donnĂ©e spatiales. Nous expliquons aussi les concepts principaux que nous avons dĂ©fini et dĂ©veloppĂ©: une algĂšbre spatiale et une approche Ă  base de graphe utilisĂ©e dans l'optimisateur. L'algĂšbre spatiale est dĂ©fini pour exprimer les requĂȘtes et les rĂšgles de transformation pendant les diffĂ©rentes Ă©tapes de l'optimisation de requĂȘtes. Nous avons essayĂ© de dĂ©finir l'algĂšbre la plus complĂšte que possible pour couvrir une grande variĂ©tĂ© d'application. L'opĂ©rateur algĂ©brique reçoit et produit seulement des carte. Les fonctions reçoivent des cartes et produisent des scalaires ou des objets. L'optimisateur reçoit la requĂȘte en expression algĂ©brique et produit un QEP (Query Evaluation Plan) efficace dans deux Ă©tapes: gĂ©nĂ©ration de QEG (Query Evaluation Graph) et gĂ©nĂ©ration de QEP. Dans premiĂšre Ă©tape un graphe (QEG) Ă©quivalent de l'expression algĂ©brique est produit. Les rĂšgles de transformation sont utilisĂ©es pour transformer le graphe a un Ă©quivalent plus efficace. Dans deuxiĂšme Ă©tape un QEP est produit de QEG passĂ© de l'Ă©tape prĂ©cĂ©dente. Le QEP est un ensemble des opĂ©rations primitives consĂ©cutives qui produit les rĂ©sultats finals (la rĂ©ponse finale de la requĂȘte soumise au base de donnĂ©e). Nous avons implĂ©mentĂ© l'optimisateur, un gĂ©nĂ©rateur de requĂȘte spatiale alĂ©atoire, et une base de donnĂ©e simulĂ©e. La base de donnĂ©e spatiale simulĂ©e est un ensemble de fonctions pour simuler des opĂ©rations spatiales primitives. Les requĂȘtes alĂ©atoires sont soumis Ă  l'optimisateur. Les QEPs gĂ©nĂ©rĂ©es sont soumis au simulateur de base de donnĂ©es spatiale. Les rĂ©sultats expĂ©rimentaux sont utilisĂ©s pour discuter les performances et les caractĂ©ristiques de l'optimisateur.Abstract: In this thesis we introduce a query processing approach for spatial databases and explain the main concepts we defined and developed: a spatial algebra and a graph based approach used in the optimizer. The spatial algebra was defined to express queries and transformation rules during different steps of the query optimization. To cover a vast variety of potential applications, we tried to define the algebra as complete as possible. The algebra looks at the spatial data as maps of spatial objects. The algebraic operators act on the maps and result in new maps. Aggregate functions can act on maps and objects and produce objects or basic values (characters, numbers, etc.). The optimizer receives the query in algebraic expression and produces one efficient QEP (Query Evaluation Plan) through two main consecutive blocks: QEG (Query Evaluation Graph) generation and QEP generation. In QEG generation we construct a graph equivalent of the algebraic expression and then apply graph transformation rules to produce one efficient QEG. In QEP generation we receive the efficient QEG and do predicate ordering and approximation and then generate the efficient QEP. The QEP is a set of consecutive phases that must be executed in the specified order. Each phase consist of one or more primitive operations. All primitive operations that are in the same phase can be executed in parallel. We implemented the optimizer, a randomly spatial query generator and a simulated spatial database. The query generator produces random queries for the purpose of testing the optimizer. The simulated spatial database is a set of functions to simulate primitive spatial operations. They return the cost of the corresponding primitive operation according to input parameters. We put randomly generated queries to the optimizer, got the generated QEPs and put them to the spatial database simulator. We used the experimental results to discuss on the optimizer characteristics and performance. The optimizer was designed for databases with a very large number of spatial objects nevertheless most of the concepts we used can be applied to all spatial information systems."--RĂ©sumĂ© abrĂ©gĂ© par UMI

    Efficient Multi-way Theta-Join Processing Using MapReduce

    Full text link
    Multi-way Theta-join queries are powerful in describing complex relations and therefore widely employed in real practices. However, existing solutions from traditional distributed and parallel databases for multi-way Theta-join queries cannot be easily extended to fit a shared-nothing distributed computing paradigm, which is proven to be able to support OLAP applications over immense data volumes. In this work, we study the problem of efficient processing of multi-way Theta-join queries using MapReduce from a cost-effective perspective. Although there have been some works using the (key,value) pair-based programming model to support join operations, efficient processing of multi-way Theta-join queries has never been fully explored. The substantial challenge lies in, given a number of processing units (that can run Map or Reduce tasks), mapping a multi-way Theta-join query to a number of MapReduce jobs and having them executed in a well scheduled sequence, such that the total processing time span is minimized. Our solution mainly includes two parts: 1) cost metrics for both single MapReduce job and a number of MapReduce jobs executed in a certain order; 2) the efficient execution of a chain-typed Theta-join with only one MapReduce job. Comparing with the query evaluation strategy proposed in [23] and the widely adopted Pig Latin and Hive SQL solutions, our method achieves significant improvement of the join processing efficiency.Comment: VLDB201

    Efficient Generation and Execution of DAG-Structured Query Graphs

    Get PDF
    Traditional database management systems use tree-structured query evaluation plans. While easy to implement, a tree-structured query evaluation plan is not expressive enough for some optimizations like factoring common algebraic subexpressions or magic sets. These require directed acyclic graphs (DAGs), i.e. shared subplans. This work covers the different aspects of DAG-structured query graphs. First, it introduces a novel framework to reason about sharing of subplans and thus DAG-structured query evaluation plans. Second, it describes the first plan generator capable of generating optimal DAG-structured query evaluation plans. Third, an efficient framework for reasoning about orderings and groupings used by the plan generator is presented. And fourth, a runtime system capable of executing DAG-structured query evaluation plans with minimal overhead is discussed. The experimental results show that with no or only a modest increase of plan generation time, a major reduction of query execution time can be achieved for common queries. This shows that DAG-structured query evaluation plans are serviceable and should be preferred over tree-structured query plans
    • 

    corecore