9 research outputs found

    A solution for synchronous incremental maintenance of materialized views based on SQL recursive query

    Get PDF
    Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed method

    A solution for synchronous incremental maintenance of materialized views based on SQL recursive query

    Get PDF
    Materialized views are excessively stored query execution results in the database. They can be used to partially or completely answer queries which will be further appeared instead of re-executing query from the scratch. There is a large number of published works that address the maintenance, especially incremental update, of materialized views and query rewriting for using those ones. Some of them support materialized views based on recursive query in datalog language. Although most of datalog queries can be transferred into SQL queries and vise versa but it is not the case for recursive queries. Recursive queries in the data log try to find all possible transitive closures. Recursive queries in SQL (Common Table Expression – CTE) return direct links but not transitive closures. In this paper, we propose efficient methods for incremental update of materialized views based on CTE; and then propose an algorithm for generating source codes in C language for any input SQL recursive queries. The synthesized source codes implement our proposed incremental update algorithms according to inserted/deleted/updated record set in the base tables. This paper focuses mainly on the recursive queries whose execution results are directed tree-structured data. The two cases of tree node are considered. In the first case, a child node has only one parent node and in the second case, a child node can have many parent nodes. Those two cases represent the two types of relationships between entities in real world, that are one–to–many and many–to–many, respectively. For the one–to–many relationships, the relationship data is accompanied with the records describing the child using some fields. Those fields are set as null in deleting a concrete relationship. For the many–to–many relationships, it is stored in a separate table and the concrete relationships are removed by deleting describing records from that table. Considering of enforcing referential integrity may help to reduce the searching space and therefore, help to improve the performance. However, the set of tree nodes or tree edges can be manipulated. All those combinations lead to different algorithms. The experimental results are provided and discussed to confirm the effectiveness of our proposed method

    Incremental Refresh Materialized Query Table(Mqt) Memanfaatkan Staging Table Untuk Optimasi Query Execution Time Dan Resources Yang Digunakan

    Get PDF
    Materialized Query Table (MQT) menyimpan data dari query yang sering digunakan sehingga pengguna dapat memperoleh data tanpa harus melakukan komputasi ulang. Hal ini dapat meningkatkan performa sistem dengan mengurangi biaya query. Data didalam MQT harus diperbarui secara berkala agar tidak menjadi usang ketika terjadi perubahan pada tabel induk. Ada 2 (dua) macam mekanisme pembaruan yang umum digunakan, yaitu full refresh dan incremental refresh. Full refresh mengkomputasi ulang seluruh data dari tabel induk. Sedangkan incremental refresh hanya memproses data-data yang mengalami perubahan dengan memanfaatkan staging table. Staging table berperan menyimpan perubahan data (delta) pada tabel induk untuk mendukung proses incremental refresh. Penelitian ini mensimulasikan dan membandingkan performa dari full refresh dengan incremental refresh untuk mengetahui dampak keduanya terhadap waktu eksekusi query dan penggunaan sumber daya (I/O dan CPU). Data yang digunakan merupakan data asli yang berasal dari penelitian sebelumnya dan data dummy hasil generate sistem untuk mendukung penelitian. Hasil pengujian menunjukkan bahwa incremental refresh meningkatkan performa lebih dari 10x lipat pada waktu eksekusi query dan meningkatkan lebih dari 50x lipat pada penggunaan sumber daya dibandingkan dengan full refresh

    PigReuse: A Reuse-based Optimizer for Pig Latin

    Get PDF
    Pig Latin is a popular language which is widely used for parallel processing of massive data sets. Currently, subexpressions occurring repeatedly in Pig Latin scripts are executed as many times as they appear, and the current Pig Latin optimizer does not identify reuse opportunities.We present a novel optimization approach aiming at identifying and reusing repeated subexpressions in Pig Latin scripts. Our optimization algorithm, named PigReuse, operates on a particular algebraic representation of Pig Latin scripts. PigReuse identifies subexpression merging opportunities, selects the best ones to execute based on a cost function, and reuses their results as needed in order to compute exactly the same output as the original scripts. Our experiments demonstrate the effectiveness of our approach

    Performance Optimizations and Operator Semantics for Streaming Data Flow Programs

    Get PDF
    Unternehmen sammeln mehr Daten als je zuvor und müssen auf diese Informationen zeitnah reagieren. Relationale Datenbanken eignen sich nicht für die latenzfreie Verarbeitung dieser oft unstrukturierten Daten. Um diesen Anforderungen zu begegnen, haben sich in der Datenbankforschung seit dem Anfang der 2000er Jahre zwei neue Forschungsrichtungen etabliert: skalierbare Verarbeitung unstrukturierter Daten und latenzfreie Datenstromverarbeitung. Skalierbare Verarbeitung unstrukturierter Daten, auch bekannt unter dem Begriff "Big Data"-Verarbeitung, hat in der Industrie schnell Einzug erhalten. Gleichzeitig wurden in der Forschung Systeme zur latenzfreien Datenstromverarbeitung entwickelt, die auf eine verteilte Architektur, Skalierbarkeit und datenparallele Verarbeitung setzen. Obwohl diese Systeme in der Industrie vermehrt zum Einsatz kommen, gibt es immer noch große Herausforderungen im praktischen Einsatz. Diese Dissertation verfolgt zwei Hauptziele: Zuerst wird das Laufzeitverhalten von hochskalierbaren datenparallelen Datenstromverarbeitungssystemen untersucht. Im zweiten Hauptteil wird das "Dual Streaming Model" eingeführt, das eine Semantik zur gleichzeitigen Verarbeitung von Datenströmen und Tabellen beschreibt. Das Ziel unserer Untersuchung ist ein besseres Verständnis über das Laufzeitverhalten dieser Systeme zu erhalten und dieses Wissen zu nutzen um Anfragen automatisch ausreichende Rechenkapazität zuzuweisen. Dazu werden ein Kostenmodell und darauf aufbauende Optimierungsalgorithmen für Datenstromanfragen eingeführt, die Datengruppierung und Datenparallelität einbeziehen. Das vorgestellte Datenstromverarbeitungsmodell beschreibt das Ergebnis eines Operators als kontinuierlichen Strom von Veränderugen auf einer Ergebnistabelle. Dabei behandelt unser Modell die Diskrepanz der physikalischen und logischen Ordnung von Datenelementen inhärent und erreicht damit eine deterministische Semantik und eine minimale Verarbeitungslatenz.Modern companies are able to collect more data and require insights from it faster than ever before. Relational databases do not meet the requirements for processing the often unstructured data sets with reasonable performance. The database research community started to address these trends in the early 2000s. Two new research directions have attracted major interest since: large-scale non-relational data processing as well as low-latency data stream processing. Large-scale non-relational data processing, commonly known as "Big Data" processing, was quickly adopted in the industry. In parallel, low latency data stream processing was mainly driven by the research community developing new systems that embrace a distributed architecture, scalability, and exploits data parallelism. While these systems have gained more and more attention in the industry, there are still major challenges to operate them at large scale. The goal of this dissertation is two-fold: First, to investigate runtime characteristics of large scale data-parallel distributed streaming systems. And second, to propose the "Dual Streaming Model" to express semantics of continuous queries over data streams and tables. Our goal is to improve the understanding of system and query runtime behavior with the aim to provision queries automatically. We introduce a cost model for streaming data flow programs taking into account the two techniques of record batching and data parallelization. Additionally, we introduce optimization algorithms that leverage our model for cost-based query provisioning. The proposed Dual Streaming Model expresses the result of a streaming operator as a stream of successive updates to a result table, inducing a duality between streams and tables. Our model handles the inconsistency of the logical and the physical order of records within a data stream natively, which allows for deterministic semantics as well as low latency query execution

    Efficient Generation and Execution of DAG-Structured Query Graphs

    Get PDF
    Traditional database management systems use tree-structured query evaluation plans. While easy to implement, a tree-structured query evaluation plan is not expressive enough for some optimizations like factoring common algebraic subexpressions or magic sets. These require directed acyclic graphs (DAGs), i.e. shared subplans. This work covers the different aspects of DAG-structured query graphs. First, it introduces a novel framework to reason about sharing of subplans and thus DAG-structured query evaluation plans. Second, it describes the first plan generator capable of generating optimal DAG-structured query evaluation plans. Third, an efficient framework for reasoning about orderings and groupings used by the plan generator is presented. And fourth, a runtime system capable of executing DAG-structured query evaluation plans with minimal overhead is discussed. The experimental results show that with no or only a modest increase of plan generation time, a major reduction of query execution time can be achieved for common queries. This shows that DAG-structured query evaluation plans are serviceable and should be preferred over tree-structured query plans

    Equivalence of Queries with Nested Aggregation

    Get PDF
    Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization. In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation

    Организация баз данных

    Get PDF
    Опис дисципліни. Дисципліна присвячена вивченню теоретичних основ, практичних методів і засобів побудови баз даних, а також питань, пов'язаних з життєвим циклом, підтримкою і супроводом баз даних. Розглядаються основні поняття баз даних, способи їх класифікації, принципи організації структур даних і відповідні їм типи систем управління базами даних (СУБД). Детально вивчається реляційна модель даних, теорія нормалізації та СУБД, що відповідають цій моделі (на прикладі СУБД MS SQL Server), стандартна мова запитів до реляційних СУБД - SQL, методи представлення складних структур даних засобами реляційної СУБД. Розглядаються питання організації колективного доступу до даних, вводяться поняття посилальної цілісності і семантичної цілісності даних, транзакцій і пов'язані з ними проблеми і методи їх вирішення. Розглядаються питання збереження і безпеки даних, методи резервного копіювання та стиснення даних. Дається огляд ієрархічних, нереляційних і постреляціонних, об'єктно-орієнтованих, повнотекстових, мережевих і розподілених СУБД. Вивчається побудова ER-моделі засобами Entity Framework Visual Studio, створення додатка для роботи з базами даних в середовищі розробки Visual Studio на мові С #.Анотація дисципліни «Організація баз даних». Метою викладання дисципліни є формування у студентів розуміння ролі автоматизованих банків даних в створенні інформаційних систем. Завданнями вивчення дисципліни є: вивчення моделей даних, які підтримуються різними системами управління базами даних (СУБД); вивчення нереляційних моделей; вивчення елементів теорії реляційних баз даних; знайомство з принципами побудови СУБД; вивчення розподілених СУБД і засобів розробки додатків для цих СУБД.Abstract "Database Organization" discipline. The purpose of teaching is to develop students' understanding the role of automated data banks in the creation of information systems. The objectives of the discipline are: study data models supported by different database management systems (DBMS); the study of non-relational models, the theory of relational databases, the principles of creating a database, the distributed database and application development tools for these databases.Аннотация дисциплины «Организация баз данных». Целью преподавания дисциплины является формирование у студентов понимания роли автоматизированных банков данных в создании информационных систем. Задачами изучения дисциплины являются: изучение моделей данных, поддерживаемых различными системами управления базами данных (СУБД); изучение нереляционных моделей; изучение элементов теории реляционных баз данных; знакомство с принципами построения СУБД; изучение распределенных СУБД и средств разработки приложений для этих СУБД
    corecore