420 research outputs found

    Inconsistency-tolerant Query Answering in Ontology-based Data Access

    Get PDF
    Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them. In this paper, we address the problem of dealing with inconsistencies in OBDA. Our general goal is both to study DL semantical frameworks that are inconsistency-tolerant, and to devise techniques for answering unions of conjunctive queries under such inconsistency-tolerant semantics. Our work is inspired by the approaches to consistent query answering in databases, which are based on the idea of living with inconsistencies in the database, but trying to obtain only consistent information during query answering, by relying on the notion of database repair. We first adapt the notion of database repair to our context, and show that, according to such a notion, inconsistency-tolerant query answering is intractable, even for very simple DLs. Therefore, we propose a different repair-based semantics, with the goal of reaching a good compromise between the expressive power of the semantics and the computational complexity of inconsistency-tolerant query answering. Indeed, we show that query answering under the new semantics is first-order rewritable in OBDA, even if the ontology is expressed in one of the most expressive members of the DL-Lite family

    Queries with Guarded Negation (full version)

    Full text link
    A well-established and fundamental insight in database theory is that negation (also known as complementation) tends to make queries difficult to process and difficult to reason about. Many basic problems are decidable and admit practical algorithms in the case of unions of conjunctive queries, but become difficult or even undecidable when queries are allowed to contain negation. Inspired by recent results in finite model theory, we consider a restricted form of negation, guarded negation. We introduce a fragment of SQL, called GN-SQL, as well as a fragment of Datalog with stratified negation, called GN-Datalog, that allow only guarded negation, and we show that these query languages are computationally well behaved, in terms of testing query containment, query evaluation, open-world query answering, and boundedness. GN-SQL and GN-Datalog subsume a number of well known query languages and constraint languages, such as unions of conjunctive queries, monadic Datalog, and frontier-guarded tgds. In addition, an analysis of standard benchmark workloads shows that most usage of negation in SQL in practice is guarded negation

    General Boolean Expressions in Publish-Subscribe Systems

    Get PDF
    The increasing amount of electronically available information in society today is undeniable. Examples include the numbers of general web pages, scientific publications, and items in online auctions. From a user's perspective, this trend will lead to information overflow. Moreover, information publishers are compromised by this situation, as users have greater difficulty in identifying useful information. Publish-subscribe systems can be applied to cope with the reality of information overflow. In these systems, users specify their information interests as subscriptions and, subsequently, only matching information (event messages) is delivered; uninteresting information is filtered out before reaching users. In this dissertation, we consider content-based publish-subscribe systems, a sophisticated example of these systems. They perform the information-filtering task based on the content of provided information. In order to deal with high numbers of subscriptions and frequencies of event messages, publish-subscribe systems are realized as distributed systems. Advertisements---publisher specifications of potential future event messages---are optionally applied in these systems to reduce the internal distribution of subscriptions. Existing work on content-based publish-subscribe concepts mainly focuses on subscriptions and advertisements as pure conjunctive expressions. Therefore, subscriptions or advertisements using operators other than conjunction need to be canonically converted to disjunctive normal form by these systems. Each conjunctive component is then treated as individual subscription or advertisement. Unfortunately, the size of converted expressions is exponential in the worst case. In this dissertation, we show that the direct support of general Boolean subscriptions and advertisements improves the time and space efficiency of general-purpose content-based publish-subscribe systems. For this purpose, we develop suitable approaches for the filtering and routing of general Boolean expressions in these systems. Our approaches represent solutions to exactly those components of content-based publish-subscribe systems that currently restrict subscriptions and advertisements to conjunctive expressions. On the subscription side, we present an effective generic filtering algorithm, and a novel approach to optimize event routing tables, which we call subscription pruning. To support advertisements, we show how to calculate the overlap between subscriptions and advertisements, and introduce the first designated subscription routing optimization, which we refer to as advertisement pruning. We integrate these approaches into our prototype BoP (BOolean Publish-subscribe) which allows for the full support of general Boolean expressions in its filtering and routing components. In the evaluation part of this dissertation, we empirically analyze our prototypical implementation BoP and compare its algorithms to existing conjunctive solutions. We firstly show that our general-purpose Boolean filtering algorithm is more space- and time-efficient than a general-purpose conjunctive filtering algorithm. Secondly, we illustrate the effectiveness of the subscription pruning routing optimization and compare it to the existing covering optimization approach. Finally, we demonstrate the optimization effect of advertisement pruning while maintaining the existing overlapping relationships in the system

    Four Lessons in Versatility or How Query Languages Adapt to the Web

    Get PDF
    Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”

    Equivalence of Queries with Nested Aggregation

    Get PDF
    Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization. In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation

    Techniques for improving efficiency and scalability for the integration of information retrieval and databases

    Get PDF
    PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with particular focuses on improving efficiency and scalability of integrated IR and DB technology (IR+DB). The main purpose of this study is to develop efficient and scalable techniques for supporting integrated IR and DB technology, which is a popular approach today for handling complex queries over text and structured data. Our specific interest in this thesis is how to efficiently handle queries over large-scale text and structured data. The work is based on a technology that integrates probability theory and relational algebra, where retrievals for text and data are to be expressed in probabilistic logical programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient processing of probabilistic logical programs, we proposed three optimization techniques that focus on aspects covered logical and physical layers, which include: scoring-driven query optimization using scoring expression, query processing with top-k incorporated pipeline, and indexing with relational inverted index. Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics of implied scoring functions of PRA expressions, so that efficient query execution plan can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and effectiveness so that to improve query response time, we studied methods for incorporating topk algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which can be used to support efficient probability estimation and aggregation as well as conventional relational operations. Experiments were carried out to investigate the performances of proposed techniques. Experimental results showed that the efficiency and scalability of an IR+DB prototype have been improved, while the system can handle queries efficiently on considerable large data sets for a number of IR tasks

    Subsumption between queries to object-oriented databases

    Get PDF
    Most work on query optimization in relational and object-oriented databases has concentrated on tuning algebraic expressions and the physical access to the database contents. The attention to semantic query optimization, however, has been restricted due to its inherent complexity. We take a second look at semantic query optimization in object-oriented databases and find that reasoning techniques for concept languages developed in Artificial Intelligence apply to this problem because concept languages have been tailored for efficiency and their semantics is compatible with class and query definitions in object-oriented databases. We propose a query optimizer that recognizes subset relationships between a query and a view (a simpler query whose answer is stored) in polynomial time

    The Completeness Problem of Ordered Relational Databases

    Get PDF
    Support of order in query processing is a crucial component in relational database systems, not only because the output of a query is often required to be sorted in a specific order, but also because employing order properties can significantly reduce the query execution cost. Therefore, finding an effective approach to answer queries over ordered data is important to the efficiency of query processing in relational databases. In this dissertation, an ordered relational database model is proposed, which captures both data tuples of relations and tuple ordering in relations. Based on this conceptual model, ordered relational queries are formally defined in a two-sorted first-order calculus, which serves as a yardstick to evaluate expressive power of other ordered query representations. The primary purpose of this dissertation is to investigate the expressive power of different ordered query representations. Particularly, the completeness problem of ordered relational algebras is studied with respect to the first-order calculus: does there exist an ordered algebra such that any first-order expressible ordered relational query can be expressed by a finite sequence of ordered operations? The significance of studying the completeness problem of ordered relational algebras is in that the completeness of ordered relational algebras leads to the possibility of implementing a finite set of ordered operators to express all first-order expressible ordered queries in relational databases. The dissertation then focuses on the completeness problem of ordered conjunctive queries. This investigation is performed in an incremental manner: first, the ordered conjunctive queries with data-decided order is considered; then, the ordered conjunctive queries with t-decided order is studied; finally, the completeness problem for the general ordered conjunctive queries is explored. The completeness theorem of ordered algebras is proven for all three classes of ordered conjunctive queries. Although this ordered relational database model is only conceptual, and ordered operators are not implemented in this dissertation, we do prove that a complete set of ordered operators exists to retrieve all first order expressible ordered queries in the three classes of ordered conjunctive queries. This research sheds light on the possibility of implementing a complete set of ordered operators in relational databases to solve the performance problem of order-relevant queries
    corecore