12 research outputs found

    Implementing a universal relation interface using access scripts with binding patterns

    Get PDF
    We propose the use of the universal relation as a user interface to provide transparent access to a network of distributed, heterogeneous, and autonomous information sources. We implement this interface in two layers. The lower layer consists of access scripts, which encapsulate knowledge about information sources and are capable of answering basic queries. The upper layer uses combinations of these scripts to answer user queries phrased in terms of a universal relation. Access scripts know how to obtain information either directly from sources or from service providers (mediators, traders, and the like). They present this information in relational form, but with an inherent direction, in the sense that whenever values for a fixed subset of attributes of the relation are given, the access script will deliver values for the rest of the attributes in the relation. In this paper, we address the problem of defining the semantics of a user query posed against the universal relation and of finding a sequence of access script invocations that gathers the information requested in the query

    An incremental algorithm for computing ranked full disjunctions

    Get PDF
    AbstractThe full disjunction is a variation of the join operator that maximally combines tuples from connected relations, while preserving all information in the relations. The full disjunction can be seen as a natural extension of the binary outerjoin operator to an arbitrary number of relations and is a useful operator for information integration. This paper presents the algorithm IncrementalFD for computing the full disjunction of a set of relations. IncrementalFD improves upon previous algorithms for computing the full disjunction in four ways. First, it has a lower total runtime when computing the full result and a lower runtime when computing only k tuples of the result, for any constant k. Second, for a natural class of ranking functions, IncrementalFD can be adapted to return tuples in ranking order. Third, a variation of IncrementalFD can be used to return approximate full disjunctions (which contain maximal approximately join consistent tuples). Fourth, IncrementalFD can be adapted to have a block-based execution, instead of a tuple-based execution

    Semantic Integration of heterogeneous data sources in the MOMIS Data Transformation System

    Get PDF
    In the last twenty years, many data integration systems following a classical wrapper/mediator architecture and providing a Global Virtual Schema (a.k.a. Global Virtual View - GVV) have been proposed by the research community. The main issues faced by these approaches range from system-level heterogeneities, to structural syntax level heterogeneities at the semantic level. Despite the research effort, all the approaches proposed require a lot of user intervention for customizing and managing the data integration and reconciliation tasks. In some cases, the effort and the complexity of the task is huge, since it requires the development of specific programming codes. Unfortunately, due to the specificity to be addressed, application codes and solutions are not frequently reusable in other domains. For this reason, the Lowell Report 2005 has provided the guideline for the definition of a public benchmark for information integration problem. The proposal, called THALIA (Test Harness for the Assessment of Legacy information Integration Approaches), focuses on how the data integration systems manage syntactic and semantic heterogeneities, which definitely are the greatest technical challenges in the field. We developed a Data Transformation System (DTS) that supports data transformation functions and produces query translation in order to push down to the sources the execution. Our DTS is based on MOMIS, a mediator-based data integration system that our research group is developing and supporting since 1999. In this paper, we show how the DTS is able to solve all the twelve queries of the THALIA benchmark by using a simple combination of declarative translation functions already available in the standard SQL language. We think that this is a remarkable result, mainly for two reasons: firstly to the best of our knowledge there is no system that has provided a complete answer to the benchmark, secondly, our queries does not require any overhead of new code

    A Method for Automatically Generating Join Queries Based on Relations-Attributes Distance Matrix over Data Lakes

    Get PDF
    Techniques for identifying joinable or unionable tables in data lakes can yield valuable information for data scientists. However, more than half of their working time is spent familiarizing themselves with the metadata and correlations of datasets. Simplifying the use of information in data lakes is crucial for enhancing their utilization. The existing solution of integrating correlated relations into a single large data table via full disjunction requires integration updating when either data or metadata changes, complicating data maintenance. This paper proposes a method for automatically generating join queries based on the distance matrix of relations and attributes in data lakes. The distance matrix only requires updating when metadata changes, simplifying data maintenance. Experimental results demonstrate that once the distance matrix is generated, the time required to generate the join queries is negligible. Compared to the existing solution, the time cost for executing join queries over correlated tables is nearly identical to that of selection queries over integrated tables. The results of these two queries are also the same, showcasing the effectiveness and efficiency of our method

    Querying Semantically Tagged Documents on the World-Wide Web

    Full text link

    An Algebraic Approach to XQuery Optimization

    Get PDF
    As more data is stored in XML and more applications need to process this data, XML query optimization becomes performance critical. While optimization techniques for relational databases have been developed over the last thirty years, the optimization of XML queries poses new challenges. Query optimizers for XQuery, the standard query language for XML data, need to consider both document order and sequence order. Nevertheless, algebraic optimization proved powerful in query optimizers in relational and object oriented databases. Thus, this dissertation presents an algebraic approach to XQuery optimization. In this thesis, an algebra over sequences is presented that allows for a simple translation of XQuery into this algebra. The formal definitions of the operators in this algebra allow us to reason formally about algebraic optimizations. This thesis leverages the power of this formalism when unnesting nested XQuery expressions. In almost all cases unnesting nested queries in XQuery reduces query execution times from hours to seconds or milliseconds. Moreover, this dissertation presents three basic algebraic patterns of nested queries. For every basic pattern a decision tree is developed to select the most effective unnesting equivalence for a given query. Query unnesting extends the search space that can be considered during cost-based optimization of XQuery. As a result, substantially more efficient query execution plans may be detected. This thesis presents two more important cases where the number of plan alternatives leads to substantially shorter query execution times: join ordering and reordering location steps in path expressions. Our algebraic framework detects cases where document order or sequence order is destroyed. However, state-of-the-art techniques for order optimization in cost-based query optimizers have efficient mechanisms to repair order in these cases. The results obtained for query unnesting and cost-based optimization of XQuery underline the need for an algebraic approach to XQuery optimization for efficient XML query processing. Moreover, they are applicable to optimization in relational databases where order semantics are considered

    Handling metadata in the scope of coreference detection in data collections

    Get PDF

    Equivalence of Queries with Nested Aggregation

    Get PDF
    Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization. In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation

    Integrating Information by Outerjoins and Full Disjunctions

    No full text
    Our motivation is the piecing together tidbits of information found on the "web" into a usable information structure. The problem is related to that of computing the natural outerjoin of many relations in a way that preserves all possible connections among facts. Such a computation has been termed a "full disjunction" by GalindoLegaria. We are thus led to ask the question of when a full disjunction can be computed by some sequence of natural outerjoins. The answer involves a concept of from Fagin [1983] called "fl-acyclic hypergraphs." We prove that there is a natural outerjoin sequence producing the full disjunction if and only if the set of relation schemes forms a connected, fl-acyclic hypergraph. I. Motivation Let us imagine we are constructing an information resource that accepts queries about university information. We gather our information from the on-line information found at the various universities and their schools or departments, and we integrate the information into an ..
    corecore