261 research outputs found

    Complete and equivalent query rewriting using views.

    Get PDF

    The Internet of Things as a Privacy-Aware Database Machine

    Get PDF
    Instead of using a computer cluster with homogeneous nodes and very fast high bandwidth connections, we want to present the vision to use the Internet of Things (IoT) as a database machine. This is among others a key factor for smart (assistive) systems in apartments (AAL, ambient assisted living), offices (AAW, ambient assisted working), Smart Cities as well as factories (IIoT, Industry 4.0). It is important to massively distribute the calculation of analysis results on sensor nodes and other low-resource appliances in the environment, not only for reasons of performance, but also for reasons of privacy and protection of corporate knowledge. Thus, functions crucial for assistive systems, such as situation, activity, and intention recognition, are to be automatically transformed not only in database queries, but also in local nodes of lower performance. From a database-specific perspective, analysis operations on large quantities of distributed sensor data, currently based on classical big-data techniques and executed on large, homogeneously equipped parallel computers have to be automatically transformed to billions of processors with energy and capacity restrictions. In this visionary paper, we will focus on the database-specific perspective and the fundamental research questions in the underlying database theory

    Efficient and scalable techniques for minimization and rewriting of conjunctive queries

    Get PDF
    Query rewriting as an approach to query answering has been a challenging issue in database and information integration systems. In general, rewriting of a conjunctive query Q using a set of views in conjunctive form consists of two phases: (1) generating proper building blocks using the views, and (2) combining them to generate a union of conjunctive queries which is maximally contained in Q . While the problem of query rewriting is known to be exponential in the number of subgoals of Q , there is a demand for increased efficiency for practical queries. We revisit this problem for conjunctive queries, and show that Stirling numbers can be used to determine the optimal number of combinations in the second phase, and hence the number of rules in the generated union of conjunctive queries. Based on these numbers, we introduce the notion of combination patterns and develop a rewriting algorithm that uses these numeral patterns to break down the large combinatorial problem in the second phase into several smaller ones. The results of our numerous experiments indicate that the proposed rewriting technique outperforms existing techniques including Minicon algorithm in terms of computation time, memory requirements, and scalability. On a related context, we studied query minimization, motivated by the fact that queries with fewer or no redundant subgoals can be evaluated faster, in general. However, such redundancies are often present in automatically generated queries. We propose an algorithm that, given a conjunctive query, repeatedly identifies and eliminates all the redundant subgoals. We also illustrate its performance superiority over existing minimization algorithms. It has been shown that query rewriting naturally generates queries with redundant subgoals. We also show that redundant subgoals in the input of query rewriting result in redundant rules in its output. Based on this, we investigate the impact of minimization as pre-processing and post-processing phases to query rewriting technique. Our experimental results using different synthetic data show that our query rewriting technique coupled with pre/post minimization phases produces the best quality of rewriting in a more efficient way compared to existing rewriting techniques, including the Treewise algorithm. It has been shown that extending conjunctive queries with constraints adds to the complexity of query rewriting. Previous studies identified classes of conjunctive queries with constraints in the form of arithmetic comparisons for which the complexity of rewriting does not change. Such classes are said to satisfy homomorphism property. We identify new classes of conjunctive queries with linear arithmetic constraints that enjoy this property, and extend our query rewriting algorithm accordingly to support such queries

    Histogram techniques for cost estimation in query optimization.

    Get PDF
    Yu Xiaohui.Thesis (M.Phil.)--Chinese University of Hong Kong, 2001.Includes bibliographical references (leaves 98-115).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 2 --- Related Work --- p.6Chapter 2.1 --- Query Optimization --- p.6Chapter 2.2 --- Query Rewriting --- p.8Chapter 2.2.1 --- Optimizing Multi-Block Queries --- p.8Chapter 2.2.2 --- Semantic Query Optimization --- p.13Chapter 2.2.3 --- Query Rewriting in Starburst --- p.15Chapter 2.3 --- Plan Generation --- p.16Chapter 2.3.1 --- Dynamic Programming Approach --- p.16Chapter 2.3.2 --- Join Query Processing --- p.17Chapter 2.3.3 --- Queries with Aggregates --- p.23Chapter 2.4 --- Statistics and Cost Estimation --- p.24Chapter 2.5 --- Histogram Techniques --- p.27Chapter 2.5.1 --- Definitions --- p.28Chapter 2.5.2 --- Trivial Histograms --- p.29Chapter 2.5.3 --- Heuristic-based Histograms --- p.29Chapter 2.5.4 --- V-Optimal Histograms --- p.32Chapter 2.5.5 --- Wavelet-based Histograms --- p.35Chapter 2.5.6 --- Multidimensional Histograms --- p.35Chapter 2.5.7 --- Global Histograms --- p.37Chapter 3 --- New Histogram Techniques --- p.39Chapter 3.1 --- Piecewise Linear Histograms --- p.39Chapter 3.1.1 --- Construction --- p.41Chapter 3.1.2 --- Usage --- p.43Chapter 3.1.3 --- Error Measures --- p.43Chapter 3.1.4 --- Experiments --- p.45Chapter 3.1.5 --- Conclusion --- p.51Chapter 3.2 --- A-Optimal Histograms --- p.54Chapter 3.2.1 --- A-Optimal(mean) Histograms --- p.56Chapter 3.2.2 --- A-Optimal(median) Histograms --- p.58Chapter 3.2.3 --- A-Optimal(median-cf) Histograms --- p.59Chapter 3.2.4 --- Experiments --- p.60Chapter 4 --- Global Histograms --- p.64Chapter 4.1 --- Wavelet-based Global Histograms --- p.65Chapter 4.1.1 --- Wavelet-based Global Histograms I --- p.66Chapter 4.1.2 --- Wavelet-based Global Histograms II --- p.68Chapter 4.2 --- Piecewise Linear Global Histograms --- p.70Chapter 4.3 --- A-Optimal Global Histograms --- p.72Chapter 4.3.1 --- Experiments --- p.74Chapter 5 --- Dynamic Maintenance --- p.81Chapter 5.1 --- Problem Definition --- p.83Chapter 5.2 --- Refining Bucket Coefficients --- p.84Chapter 5.3 --- Restructuring --- p.86Chapter 5.4 --- Experiments --- p.91Chapter 6 --- Conclusions --- p.95Bibliography --- p.9

    Query Rewriting with Symmetric Constraints

    Get PDF

    The Internet of Things as a Privacy-Aware Database Machine

    Get PDF
    Instead of using a computer cluster with homogeneous nodes and very fast high bandwidth connections, we want to present the vision to use the Internet of Things (IoT) as a database machine. This is among others a key factor for smart (assistive) systems in apartments (AAL, ambient assisted living), offices (AAW, ambient assisted working), Smart Cities as well as factories (IIoT, Industry 4.0). It is important to massively distribute the calculation of analysis results on sensor nodes and other low-resource appliances in the environment, not only for reasons of performance, but also for reasons of privacy and protection of corporate knowledge. Thus, functions crucial for assistive systems, such as situation, activity, and intention recognition, are to be automatically transformed not only in database queries, but also in local nodes of lower performance. From a database-specific perspective, analysis operations on large quantities of distributed sensor data, currently based on classical big-data techniques and executed on large, homogeneously equipped parallel computers have to be automatically transformed to billions of processors with energy and capacity restrictions. In this visionary paper, we will focus on the database-specific perspective and the fundamental research questions in the underlying database theory

    Knowledge Integration to Overcome Ontological Heterogeneity: Challenges from Financial Information Systems

    Get PDF
    The shift towards global networking brings with it many opportunities and challenges. In this paper, we discuss key technologies in achieving global semantic interoperability among heterogeneous information systems, including both traditional and web data sources. In particular, we focus on the importance of this capability and technologies we have designed to overcome ontological heterogeneity, a common type of disparity in financial information systems. Our approach to representing and reasoning with ontological heterogeneities in data sources is an extension of the Context Interchange (COIN) framework, a mediator-based approach for achieving semantic interoperability among heterogeneous sources and receivers. We also analyze the issue of ontological heterogeneity in the context of source-selection, and offer a declarative solution that combines symbolic solvers and mixed integer programming techniques in a constraint logic-programming framework. Finally, we discuss how these techniques can be coupled with emerging Semantic Web related technologies and standards such as Web-Services, DAML+OIL, and RuleML, to offer scalable solutions for global semantic interoperability. We believe that the synergy of database integration and Semantic Web research can make significant contributions to the financial knowledge integration problem, which has implications in financial services, and many other e-business tasks.Singapore-MIT Alliance (SMA

    Processing Rank-Aware Queries in Schema-Based P2P Systems

    Get PDF
    Effiziente Anfragebearbeitung in Datenintegrationssystemen sowie in P2P-Systemen ist bereits seit einigen Jahren ein Aspekt aktueller Forschung. Konventionelle Datenintegrationssysteme bestehen aus mehreren Datenquellen mit ggf. unterschiedlichen Schemata, sind hierarchisch aufgebaut und besitzen eine zentrale Komponente: den Mediator, der ein globales Schema verwaltet. Anfragen an das System werden auf diesem globalen Schema formuliert und vom Mediator bearbeitet, indem relevante Daten von den Datenquellen transparent für den Benutzer angefragt werden. Aufbauend auf diesen Systemen entstanden schließlich Peer-Daten-Management-Systeme (PDMSs) bzw. schemabasierte P2P-Systeme. An einem PDMS teilnehmende Knoten (Peers) können einerseits als Mediatoren agieren andererseits jedoch ebenso als Datenquellen. Darüber hinaus sind diese Peers autonom und können das Netzwerk jederzeit verlassen bzw. betreten. Die potentiell riesige Datenmenge, die in einem derartigen Netzwerk verfügbar ist, führt zudem in der Regel zu sehr großen Anfrageergebnissen, die nur schwer zu bewältigen sind. Daher ist das Bestimmen einer vollständigen Ergebnismenge in vielen Fällen äußerst aufwändig oder sogar unmöglich. In diesen Fällen bietet sich die Anwendung von Top-N- und Skyline-Operatoren, ggf. in Verbindung mit Approximationstechniken, an, da diese Operatoren lediglich diejenigen Datensätze als Ergebnis ausgeben, die aufgrund nutzerdefinierter Ranking-Funktionen am relevantesten für den Benutzer sind. Da durch die Anwendung dieser Operatoren zumeist nur ein kleiner Teil des Ergebnisses tatsächlich dem Benutzer ausgegeben wird, muss nicht zwangsläufig die vollständige Ergebnismenge berechnet werden sondern nur der Teil, der tatsächlich relevant für das Endergebnis ist. Die Frage ist nun, wie man derartige Anfragen durch die Ausnutzung dieser Erkenntnis effizient in PDMSs bearbeiten kann. Die Beantwortung dieser Frage ist das Hauptanliegen dieser Dissertation. Zur Lösung dieser Problemstellung stellen wir effiziente Anfragebearbeitungsstrategien in PDMSs vor, die die charakteristischen Eigenschaften ranking-basierter Operatoren sowie Approximationstechniken ausnutzen. Peers werden dabei sowohl auf Schema- als auch auf Datenebene hinsichtlich der Relevanz ihrer Daten geprüft und dementsprechend in die Anfragebearbeitung einbezogen oder ausgeschlossen. Durch die Heterogenität der Peers werden Techniken zum Umschreiben einer Anfrage von einem Schema in ein anderes nötig. Da existierende Techniken zum Umschreiben von Anfragen zumeist nur konjunktive Anfragen betrachten, stellen wir eine Erweiterung dieser Techniken vor, die Anfragen mit ranking-basierten Anfrageoperatoren berücksichtigt. Da PDMSs dynamische Systeme sind und teilnehmende Peers jederzeit ihre Daten ändern können, betrachten wir in dieser Dissertation nicht nur wie Routing-Indexe verwendet werden, um die Relevanz eines Peers auf Datenebene zu bestimmen, sondern auch wie sie gepflegt werden können. Schließlich stellen wir SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) vor, ein System, welches im Rahmen dieser Dissertation entwickelt wurde und alle vorgestellten Techniken implementiert.In recent years, there has been considerable research with respect to query processing in data integration and P2P systems. Conventional data integration systems consist of multiple sources with possibly different schemas, adhere to a hierarchical structure, and have a central component (mediator) that manages a global schema. Queries are formulated against this global schema and the mediator processes them by retrieving relevant data from the sources transparently to the user. Arising from these systems, eventually Peer Data Management Systems (PDMSs), or schema-based P2P systems respectively, have attracted attention. Peers participating in a PDMS can act both as a mediator and as a data source, are autonomous, and might leave or join the network at will. Due to these reasons peers often hold incomplete or erroneous data sets and mappings. The possibly huge amount of data available in such a network often results in large query result sets that are hard to manage. Due to these reasons, retrieving the complete result set is in most cases difficult or even impossible. Applying rank-aware query operators such as top-N and skyline, possibly in conjunction with approximation techniques, is a remedy to these problems as these operators select only those result records that are most relevant to the user. Being aware that in most cases only a small fraction of the complete result set is actually output to the user, retrieving the complete set before evaluating such operators is obviously inefficient. Therefore, the questions we want to answer in this dissertation are how to compute such queries in PDMSs and how to do that efficiently. We propose strategies for efficient query processing in PDMSs that exploit the characteristics of rank-aware queries and optionally apply approximation techniques. A peer's relevance is determined on two levels: on schema-level and on data-level. According to its relevance a peer is either considered for query processing or not. Because of heterogeneity queries need to be rewritten, enabling cooperation between peers that use different schemas. As existing query rewriting techniques mostly consider conjunctive queries only, we present an extension that allows for rewriting queries involving rank-aware query operators. As PDMSs are dynamic systems and peers might update their local data, this dissertation addresses not only the problem of considering such structures within a query processing strategy but also the problem of keeping them up-to-date. Finally, we provide a system-level evaluation by presenting SmurfPDMS (SiMUlating enviRonment For Peer Data Management Systems) -- a system created in the context of this dissertation implementing all presented techniques
    corecore