1,552 research outputs found

    Processing techniques for partial tree-pattern queries on XML data

    Get PDF
    In recent years, eXtensible Markup Language (XML) has become a de facto standard for exporting and exchanging data on the Web. XML structures data as trees. Querying capabilities are provided through patterns matched against the XML trees. Research on the processing of XML queries has focused mainly on tree-pattern queries. Tree-pattern queries are not appropriate for querying XML data sources whose structure is not fully known to the user, or for querying multiple data sources which structure information differently. Recently, a class of queries, called Partial Tree-Pattern Queries (PTPQs) was identified. A central feature of PTPQs is that the structure can be specified fully, partially, or not at all in a query. For this reason. PTPQs can be used for flexibly querying XML data sources. This thesis deals with processing techniques for PTPQs. In particular, it addresses the satisfiability, containment and minimization problems for PTPQs. In order to cope with structural expression derivation issues and to compare PTPQs, a set of inference rules is suggested and a canonical form for PTPQs that comprises all derived structural expressions is defined. This canonical form is used for determining necessary and sufficient conditions for PTPQ satisfiability. The containment problem is studied both in the absence and in the presence of structural summaries of data called dimension graphs. It is shown that this problem cannot be characterized by homomorphisms between PTPQs, even when PTPQs are put in canonical form. In both cases of the problem, necessary and sufficient conditions for PTPQ containment are provided in terms of homomorphisms between PTPQs and (a possibly exponential number of) tree-pattern queries. This result is used to identify a subclass of PTPQs that strictly contains tree-pattern queries for which the containment problem can be fully characterized through the existence of homomorphisms. To cope with the high complexity of PTPQ containment, heuristic approaches for this problem are designed that trade accuracy for speed. The heuristic approaches equivalently add structural expressions to PTPQs in order to increase the possibility for a homomorphism between two contained PTPQs to exist. An implementation and extensive experimental evaluation of these heuristics shows that they are useful in practice, and that they can be efficiently implemented in a query optimizer. The goal of PTPQ minimization is to produce an equivalent PTPQ which is syntactically smaller in size. This problem is studied in the absence of structural summaries. It is shown that PTPQs cannot be minimized by removing redundant parts as is the case with certain classes of tree-pattern queries. It is also shown that, in general, a PTPQ does not have a unique minimal equivalent PTPQ. Finally, sound, but not complete, heuristic approaches for PTPQ minimization are presented. These approaches gradually trade execution time for accuracy

    Topological Equivalence and Similarity in Multi-Representation Geographic Databases

    Get PDF
    Geographic databases contain collections of spatial data representing the variety of views for the real world at a specific time. Depending on the resolution or scale of the spatial data, spatial objects may have different spatial dimensions, and they may be represented by point, linear, or polygonal features, or combination of them. The diversity of data that are collected over the same area, often from different sources, imposes a question of how to integrate and to keep them consistent in order to provide correct answers for spatial queries. This thesis is concerned with the development of a tool to check topological equivalence and similarity for spatial objects in multi-representation databases. The main question is what are the components of a model to identify topological consistency, based on a set of possible transitions for the different types of spatial representations. This work develops a new formalism to model consistently spatial objects and spatial relations between several objects, each represented at multiple levels of detail. It focuses on the topological consistency constraints that must hold among the different representation of objects, but it is not concerned about generalization operations of how to derive one representation level from another. The result of this thesis is a?computational tool to evaluate topological equivalence and similarity across multiple representations. This thesis proposes to organize a spatial scene -a set of spatial objects and their embeddings in space- directly as a relation-based model that uses a hierarchical graph representation. The focus of the relation-based model is on relevant object representations. Only the highest-dimensional object representations are explicitly stored, while their parts are not represented in the graph

    Semantics and efficient evaluation of partial tree-pattern queries on XML

    Get PDF
    Current applications export and exchange XML data on the web. Usually, XML data are queried using keyword queries or using the standard structured query language XQuery the core of which consists of the navigational query language XPath. In this context, one major challenge is the querying of the data when the structure of the data sources is complex or not fully known to the user. Another challenge is the integration of multiple data sources that export data with structural differences and irregularities. In this dissertation, a query language for XML called Partial Tree-Pattern Query (PTPQ) language is considered. PTPQs generalize and strictly contain Tree-Pattern Queries (TPQs) and can express a broad structural fragment of XPath. Because of their expressive power and flexibility, they are useful for querying XML documents the structure of which is complex or not fully known to the user, and for integrating XML data sources with different structures. The dissertation focuses on three issues. The first one is the design of efficient non-main-memory evaluation methods for PTPQs. The second one is the assignment of semantics to PTPQs so that they return meaningful answers. The third one is the development of techniques for answering TPQs using materialized views. Non-main-memory XML query evaluation can be done in two modes (which also define two evaluation models). In the first mode, data is preprocessed and indexes, called inverted lists, are built for it. In the second mode, data are unindexed and arrives continuously in the form of a stream. Existing algorithms cannot be used directly or indirectly to efficiently compute PTPQs in either mode. Initially, the problem of efficiently evaluating partial path queries in the inverted lists model has been addressed. Partial path queries form a subclass of PTPQs which is not contained in the class of TPQs. Three novel algorithms for evaluating partial path queries including a holistic one have been designed. The analytical and experimental results show that the holistic algorithm outperforms the other two. These results have been extended into holistic and non-holistic approaches for PTPQs in the inverted lists model. The experiments show again the superiority of the holistic approach. The dissertation has also addressed the problem of evaluating PTPQs in the streaming model, and two original efficient streaming algorithms for PTPQs have been designed. Compared to the only known streaming algorithm that supports an extension of TPQs, the experimental results show that the proposed algorithms perform better by orders of magnitude while consuming a much smaller fraction of memory space. An original approach for assigning semantics to PTPQs has also been devised. The novel semantics seamlessly applies to keyword queries and to queries with structural restrictions. In contrast to previous approaches that operate locally on data, the proposed approach operates globally on structural summaries of data to extract tree patterns. Compared to previous approaches, an experimental evaluation shows that our approach has a perfect recall both for XML documents with complete and with incomplete data. It also shows better precision compared to approaches with similar recall. Finally, the dissertation has addressed the problem of answering XML queries using exclusively materialized views. An original approach for materializing views in the context of the inverted lists model has been suggested. Necessary and sufficient conditions have been provided for tree-pattern query answerability in terms of view-to-query homomorphisms. A time and space efficient algorithm was designed for deciding query answerability and a technique for computing queries over view materializations using stack- based holistic algorithms was developed. Further, optimizations were developed which (a) minimize the storage space and avoid redundancy by materializing views as bitmaps, and (b) optimize the evaluation of the queries over the views by applying bitwise operations on view materializations. The experimental results show that the proposed approach obtains largely higher hit rates than previous approaches, speeds up significantly the evaluation of queries without using views, and scales very smoothly in terms of storage space and computational overhead

    16th Scandinavian Symposium and Workshops on Algorithm Theory: SWAT 2018, June 18-20, 2018, Malmö University, Malmö, Sweden

    Get PDF
    • …
    corecore