5 research outputs found

    Logic, Languages, and Rules for Web Data Extraction and Reasoning over Data

    Get PDF
    This paper gives a short overview of specific logical approaches to data extraction, data management, and reasoning about data. In particular, we survey theoretical results and formalisms that have been obtained and used in the context of the Lixto Project at TU Wien, the DIADEM project at the University of Oxford, and the VADA project, which is currently being carried out jointly by the universities of Edinburgh, Manchester, and Oxford. We start with a formal approach to web data extraction rooted in monadic second order logic and monadic Datalog, which gave rise to the Lixto data extraction system. We then present some complexity results for monadic Datalog over trees and for XPath query evaluation. We further argue that for value creation and for ontological reasoning over data, we need existential quantifiers (or Skolem terms) in rule heads, and introduce the Datalog± family. We give an overview of important members of this family and discuss related complexity issues

    Ontological query answering under expressive entity-relationship schemata

    No full text
    The Entity–Relationship (ER) model is a fundamental tool for database design, recently extended and employed in knowledge representation and reasoning due to its expressiveness and comprehensibility. We address the problem of answering conjunctive queries under constraints representing schemata expressed in an extended version of the Entity–Relationship model. This extended model, called ER+ER+, comprises is-a constraints among entities and relationships, plus functional and mandatory participation constraints. In particular, it allows for arbitrary permutations of the roles in is-a among relationships. A key notion that ensures high tractability in ER+ER+ schemata is separability, i.e., the absence of interaction between the functional participation constraints and the other constructs of ER+ER+. We provide a precise syntactic characterization of separable ER+ER+ schemata by means of a necessary and sufficient condition. We present a complete complexity analysis of the conjunctive query answering problem under separable ER+ER+ schemata, and also under several sublanguages of ER+ER+. We show that the addition of so-called negative constraints does not increase the complexity of query answering. With such constraints, our model properly generalizes the most widely adopted tractable ontology languages, including those in the DL-Lite family

    Polynomial combined first-order rewritings for linear and guarded existential rules

    Get PDF
    We consider the problem of ontological query answering, that is, the problem of answering a database query (typically a conjunctive query) in the presence of an ontology. This means that during the query answering process we also need to take into account the knowledge that can be inferred from the given database and ontology. Building, however, ontology-aware database systems from scratch, with sophisticated optimization techniques, is a highly non-trivial task that requires a great engineering effort. Therefore, exploiting conventional database systems is an important route towards efficient ontological query answering. Nevertheless, standard database systems are unaware of ontologies. An approach to ontological query answering that enables the use of standard database systems is the so-called polynomial combined query rewriting, originally introduced in the context of description logics: the conjunctive query q and the ontology Σ are rewritten in polynomial time into a first-order query qΣ (in a database-independent way), while the database D and the ontology Σ are rewritten in polynomial time into a new database DΣ (in a query-independent way), such that the answer to q in the presence of Σ over D coincides with the answer to qΣ over DΣ. The latter can then be computed by exploiting a conventional database system. In this work, we focus on linear and guarded existential rules, which form robust rule-based languages for modeling ontologies, and investigate the limits of polynomial combined query rewriting. In particular, we show that this type of rewriting can be successfully applied to (i) linear existential rules when the rewritten query can use the full power of first-order queries, (ii) linear existential rules when the arity of the underlying schema is fixed and the rewritten query is positive existential, namely it uses only existential quantification, conjunction, and disjunction, and (iii) guarded existential rules when the underlying schema is fixed and the rewritten query is positive existential. We can show that the above results reach the limits (under standard complexity-theoretic assumptions such as [Formula presented]) of polynomial combined query rewriting in the case of linear and guarded existential rules

    A framework for information integration using ontological foundations

    Get PDF
    With the increasing amount of data, ability to integrate information has always been a competitive advantage in information management. Semantic heterogeneity reconciliation is an important challenge of many information interoperability applications such as data exchange and data integration. In spite of a large amount of research in this area, the lack of theoretical foundations behind semantic heterogeneity reconciliation techniques has resulted in many ad-hoc approaches. In this thesis, I address this issue by providing ontological foundations for semantic heterogeneity reconciliation in information integration. In particular, I investigate fundamental semantic relations between properties from an ontological point of view and show how one of the basic and natural relations between properties – inferring implicit properties from existing properties – can be used to enhance information integration. These ontological foundations have been exploited in four aspects of information integration. First, I propose novel algorithms for semantic enrichment of schema mappings. Second, using correspondences between similar properties at different levels of abstraction, I propose a configurable data integration system, in which query rewriting techniques allows the tradeoff between accuracy and completeness in query answering. Third, to keep the semantics in data exchange, I propose an entity preserving data exchange approach that reflects source entities in the target independent of classification of entities. Finally, to improve the efficiency of the data exchange approach proposed in this thesis, I propose an extended model of the column-store model called sliced column store. Working prototypes of the techniques proposed in this thesis are implemented to show the feasibility of realizing these techniques. Experiments that have been performed using various datasets show the techniques proposed in this thesis outperform many existing techniques in terms of ability to handle semantic heterogeneities and performance of information exchange
    corecore