420 research outputs found
Inconsistency-tolerant Query Answering in Ontology-based Data Access
Ontology-based data access (OBDA) is receiving great attention as a new paradigm for managing information systems through semantic technologies. According to this paradigm, a Description Logic ontology provides an abstract and formal representation of the domain of interest to the information system, and is used as a sophisticated schema for accessing the data and formulating queries over them. In this paper, we address the problem of dealing with inconsistencies in OBDA. Our general goal is both to study DL semantical frameworks that are inconsistency-tolerant, and to devise techniques for answering unions of conjunctive queries under such inconsistency-tolerant semantics. Our work is inspired by the approaches to consistent query answering in databases, which are based on the idea of living with inconsistencies in the database, but trying to obtain only consistent information during query answering, by relying on the notion of database repair. We first adapt the notion of database repair to our context, and show that, according to such a notion, inconsistency-tolerant query answering is intractable, even for very simple DLs. Therefore, we propose a different repair-based semantics, with the goal of reaching a good compromise between the expressive power of the semantics and the computational complexity of inconsistency-tolerant query answering. Indeed, we show that query answering under the new semantics is first-order rewritable in OBDA, even if the ontology is expressed in one of the most expressive members of the DL-Lite family
Queries with Guarded Negation (full version)
A well-established and fundamental insight in database theory is that
negation (also known as complementation) tends to make queries difficult to
process and difficult to reason about. Many basic problems are decidable and
admit practical algorithms in the case of unions of conjunctive queries, but
become difficult or even undecidable when queries are allowed to contain
negation. Inspired by recent results in finite model theory, we consider a
restricted form of negation, guarded negation. We introduce a fragment of SQL,
called GN-SQL, as well as a fragment of Datalog with stratified negation,
called GN-Datalog, that allow only guarded negation, and we show that these
query languages are computationally well behaved, in terms of testing query
containment, query evaluation, open-world query answering, and boundedness.
GN-SQL and GN-Datalog subsume a number of well known query languages and
constraint languages, such as unions of conjunctive queries, monadic Datalog,
and frontier-guarded tgds. In addition, an analysis of standard benchmark
workloads shows that most usage of negation in SQL in practice is guarded
negation
General Boolean Expressions in Publish-Subscribe Systems
The increasing amount of electronically available information in society today is undeniable. Examples include the numbers of general web pages, scientific publications, and items in online auctions. From a user's perspective, this trend will lead to information overflow. Moreover, information publishers are compromised by this situation, as users have greater difficulty in identifying useful information.
Publish-subscribe systems can be applied to cope with the reality of information overflow. In these systems, users specify their information interests as subscriptions and, subsequently, only matching information (event messages) is delivered; uninteresting information is filtered out before reaching users. In this dissertation, we consider content-based publish-subscribe systems, a sophisticated example of these systems. They perform the information-filtering task based on the content of provided information. In order to deal with high numbers of subscriptions and frequencies of event messages, publish-subscribe systems are realized as distributed systems. Advertisements---publisher specifications of potential future event messages---are optionally applied in these systems to reduce the internal distribution of subscriptions.
Existing work on content-based publish-subscribe concepts mainly focuses on subscriptions and advertisements as pure conjunctive expressions. Therefore, subscriptions or advertisements using operators other than conjunction need to be canonically converted to disjunctive normal form by these systems. Each conjunctive component is then treated as individual subscription or advertisement. Unfortunately, the size of converted expressions is exponential in the worst case.
In this dissertation, we show that the direct support of general Boolean subscriptions and advertisements improves the time and space efficiency of general-purpose content-based publish-subscribe systems. For this purpose, we develop suitable approaches for the filtering and routing of general Boolean expressions in these systems. Our approaches represent solutions to exactly those components of content-based publish-subscribe systems that currently restrict subscriptions and advertisements to conjunctive expressions.
On the subscription side, we present an effective generic filtering algorithm, and a novel approach to optimize event routing tables, which we call subscription pruning. To support advertisements, we show how to calculate the overlap between subscriptions and advertisements, and introduce the first designated subscription routing optimization, which we refer to as advertisement pruning. We integrate these approaches into our prototype BoP (BOolean Publish-subscribe) which allows for the full support of general Boolean expressions in its filtering and routing components.
In the evaluation part of this dissertation, we empirically analyze our prototypical implementation BoP and compare its algorithms to existing conjunctive solutions. We firstly show that our general-purpose Boolean filtering algorithm is more space- and time-efficient than a general-purpose conjunctive filtering algorithm. Secondly, we illustrate the effectiveness of the subscription pruning routing optimization and compare it to the existing covering optimization approach. Finally, we demonstrate the optimization effect of advertisement pruning while maintaining the existing overlapping relationships in the system
Four Lessons in Versatility or How Query Languages Adapt to the Web
Exposing not only human-centered information, but machine-processable data on the Web is one of the commonalities of recent Web trends. It has enabled a new kind of applications and businesses where the data is used in ways not foreseen by the data providers. Yet this exposition has fractured the Web into islands of data, each in different Web formats: Some providers choose XML, others RDF, again others JSON or OWL, for their data, even in similar domains. This fracturing stifles innovation as application builders have to cope not only with one Web stack (e.g., XML technology) but with several ones, each of considerable complexity. With Xcerpt we have developed a rule- and pattern based query language that aims to give shield application builders from much of this complexity: In a single query language XML and RDF data can be accessed, processed, combined, and re-published. Though the need for combined access to XML and RDF data has been recognized in previous work (including the W3C’s GRDDL), our approach differs in four main aspects: (1) We provide a single language (rather than two separate or embedded languages), thus minimizing the conceptual overhead of dealing with disparate data formats. (2) Both the declarative (logic-based) and the operational semantics are unified in that they apply for querying XML and RDF in the same way. (3) We show that the resulting query language can be implemented reusing traditional database technology, if desirable. Nevertheless, we also give a unified evaluation approach based on interval labelings of graphs that is at least as fast as existing approaches for tree-shaped XML data, yet provides linear time and space querying also for many RDF graphs. We believe that Web query languages are the right tool for declarative data access in Web applications and that Xcerpt is a significant step towards a more convenient, yet highly efficient data access in a “Web of Data”
Equivalence of Queries with Nested Aggregation
Query equivalence is a fundamental problem within database theory. The correctness of all forms of logical query rewriting—join minimization, view flattening, rewriting over materialized views, various semantic optimizations that exploit schema dependencies, federated query processing and other forms of data integration—requires proving that the final executed query is equivalent to the original user query. Hence, advances in the theory of query equivalence enable advances in query processing and optimization.
In this thesis we address the problem of deciding query equivalence between conjunctive SQL queries containing aggregation operators that may be nested. Our focus is on understanding the interaction between nested aggregation operators and the other parts of the query body, and so we model aggregation functions simply as abstract collection constructors. Hence, the precise language that we study is a conjunctive algebraic language that constructs complex objects from databases of flat relations. Using an encoding of complex objects as flat relations, we reduce the query equivalence problem for this algebraic language to deciding equivalence between relational encodings output by traditional conjunctive queries (not containing aggregation). This encoding-equivalence cleanly unifies and generalizes previous results for deciding equivalence of conjunctive queries evaluated under various processing semantics. As part of our study of aggregation operators that can construct empty sub-collections—so-called “scalar” aggregation—we consider query equivalence for conjunctive queries extended with a left outer join operator, a very practical class of queries for which the general equivalence problem has never before been analyzed. Although we do not completely solve the equivalence problem for queries with outer joins or with scalar aggregation, we do propose useful sufficient conditions that generalize previously known results for restricted classes of queries. Overall, this thesis offers new insight into the fundamental principles governing the behaviour of nested aggregation
Techniques for improving efficiency and scalability for the integration of information retrieval and databases
PhDThis thesis is on the topic of integration of Information Retrieval (IR) and Databases (DB), with
particular focuses on improving efficiency and scalability of integrated IR and DB technology
(IR+DB). The main purpose of this study is to develop efficient and scalable techniques for
supporting integrated IR and DB technology, which is a popular approach today for handling
complex queries over text and structured data.
Our specific interest in this thesis is how to efficiently handle queries over large-scale text
and structured data. The work is based on a technology that integrates probability theory and
relational algebra, where retrievals for text and data are to be expressed in probabilistic logical
programs such as probabilistic relational algebra or probabilistic Datalog. To support efficient
processing of probabilistic logical programs, we proposed three optimization techniques
that focus on aspects covered logical and physical layers, which include: scoring-driven query
optimization using scoring expression, query processing with top-k incorporated pipeline, and
indexing with relational inverted index.
Specifically, scoring expressions are proposed for expressing the scoring or probabilistic semantics
of implied scoring functions of PRA expressions, so that efficient query execution plan
can be generated by rule-based scoring-driven optimizer. Secondly, to balance efficiency and
effectiveness so that to improve query response time, we studied methods for incorporating topk
algorithms into pipelined query execution engine for IR+DB systems. Thirdly, the proposed
relational inverted index integrates IR-style inverted index and DB-style tuple-based index, which
can be used to support efficient probability estimation and aggregation as well as conventional
relational operations.
Experiments were carried out to investigate the performances of proposed techniques. Experimental
results showed that the efficiency and scalability of an IR+DB prototype have been
improved, while the system can handle queries efficiently on considerable large data sets for a
number of IR tasks
Subsumption between queries to object-oriented databases
Most work on query optimization in relational and object-oriented databases has concentrated on tuning algebraic expressions and the physical access to the database contents. The attention to semantic query optimization, however, has been restricted due to its inherent complexity. We take a second look at semantic query optimization in object-oriented databases and find that reasoning techniques for concept languages developed in Artificial Intelligence apply to this problem because concept languages have been tailored for efficiency and their semantics is compatible with class and query definitions in object-oriented databases. We propose a query optimizer that recognizes subset relationships between a query and a view (a simpler query whose answer is stored) in polynomial time
The Completeness Problem of Ordered Relational Databases
Support of order in query processing is a crucial component in
relational database systems, not only because the output of a
query is often required to be sorted in a specific order, but also
because employing order properties can significantly reduce the
query execution cost. Therefore, finding an effective approach to
answer queries over ordered data is important to the efficiency of
query processing in relational databases.
In this dissertation, an ordered relational database model is
proposed, which captures both data tuples of relations and tuple
ordering in relations. Based on this conceptual model, ordered
relational queries are formally defined in a two-sorted first-order calculus, which serves as a yardstick to evaluate
expressive power of other ordered query representations.
The primary purpose of this dissertation is to investigate the
expressive power of different ordered query representations.
Particularly, the completeness problem of ordered relational
algebras is studied with respect to the first-order calculus:
does there exist an ordered algebra such that any first-order expressible ordered
relational query can be expressed by a finite sequence of ordered
operations? The significance of studying the completeness problem
of ordered relational algebras is in that the completeness of
ordered relational algebras leads to the possibility of
implementing a finite set of ordered operators to express all
first-order expressible ordered queries in relational databases.
The dissertation then focuses on the completeness problem of
ordered conjunctive queries. This investigation is performed in an
incremental manner: first, the ordered conjunctive queries with
data-decided order is considered; then,
the ordered conjunctive queries with t-decided order is
studied; finally, the completeness problem for the general ordered
conjunctive queries is explored. The completeness theorem
of ordered algebras is proven for all three classes of ordered
conjunctive queries.
Although this ordered relational database model is only
conceptual, and ordered operators are not implemented in this
dissertation, we do prove that a complete set of ordered operators
exists to retrieve all first order expressible ordered queries in
the three classes of ordered conjunctive queries. This research
sheds light on the possibility of implementing a complete set of
ordered operators in relational databases to solve the performance
problem of order-relevant queries
- …