    Preliminary results on Ontology-based Open Data Publishing

    Despite the current interest in Open Data publishing, a formal and comprehensive methodology supporting an organization in deciding which data to publish and carrying out precise procedures for publishing high-quality data, is still missing. In this paper we argue that the Ontology-based Data Management paradigm can provide a formal basis for a principled approach to publish high quality, semantically annotated Open Data. We describe two main approaches to using an ontology for this endeavor, and then we present some technical results on one of the approaches, called bottom-up, where the specification of the data to be published is given in terms of the sources, and specific techniques allow deriving suitable annotations for interpreting the published data under the light of the ontology

    Abstraction in ontology-based data management

    In many aspects of our society there is growing awareness and consent on the need for data-driven approaches that are resilient, transparent, and fully accountable. But in order to fulfil the promises and benefits of a data-driven society, it is necessary that the data services exposed by the organisations' information systems are well-documented, and their semantics is clearly specified. Effectively documenting data services is indeed a crucial issue for organisations, not only for governing their own data, but also for interoperation purposes. In this thesis, we propose a new approach to automatically associate formal semantic descriptions to data services, thus bringing them into compliance with the FAIR guiding principles, i.e., make data services automatically Findable, Accessible, Interoperable, and Reusable (FAIR). We base our proposal on the Ontology-based Data Management (OBDM) paradigm, where a domain ontology is used to provide a semantic layer mapped to the data sources of an organisation, thus abstracting from the technical details of the data layer implementation. The basic idea is to characterise or explain the semantics of a given data service expressed as query over the source schema in terms of a query over the ontology. Thus, the query over the ontology represents an abstraction of the given data service in terms of the domain ontology through the mapping, and, together with the elements in the vocabulary of the ontology, such abstraction forms a basis for annotating the given data service with suitable metadata expressing its semantics. We illustrate a formal framework for the task of automatically produce a semantic characterisation of a given data service expressed as a query over the source schema. The framework is based on three semantically well-founded notions, namely perfect, sound, and complete source-to-ontology rewriting, and on two associated basic computational problems, namely verification and computation. The former verifies whether a given query over the ontology is a perfect (respectively, sound, complete) source-to-ontology rewriting of a given data service expressed as a query over the source schema, whereas the latter computes one such rewriting, provided it exists. We provide an in-depth complexity analysis of these two computational problems in a very general scenario which uses languages amongst the most popular considered in the literature of managing data through an ontology. Furthermore, since we study also cases where the target query language for expressing source-to-ontology rewritings allows inequality atoms, we also investigate the problem of answering queries with inequalities over lightweight ontologies, a problem that has been rarely addressed. In another direction, we study and advocate the use of a non-monotonic target query language for expressing source-to-ontology rewritings. Last but not least, we outline a detailed related work, which illustrates how the results achieved in this thesis notably contributes to new results in the Semantic Web context, in the relational database theory, and in view-based query processing

    Semantic technology for open data publishing

    After years of focus on technologies for big data storing and processing, many observers are pointing out that making sense of big data cannot be done without suitable tools for conceptualizing, preparing, and integrating data (see http://www.dbta.com/). Research in the last years has shown that taking into account the semantics of data is crucial for devising powerful data integration solutions. In this work we focus on a specific paradigm for semantic data integration, named "Ontology-Based Data Access" (OBDA), proposed in [1-4]. According to such paradigm, the client of the information system is freed from being aware of how data and processes are structured in concrete resources (databases, software programs, services, etc.), and interacts with the system by expressing her queries and goals in terms of a conceptual representation of the domain of interest, called ontology. More precisely, a system realizing the vision of OBDA is constituted by three components: The ontology, whose goal is to provide a formal, clean and high level representation of the domain of interest, and constitutes the component with which the clients of the system (both humans and software programs) interact. fiedata source layer, representing the existing data sources in the information system, which are managed by the processes and services operating on their data. e mapping between the two layers, which is an explicit representation of the relationship between the data sources and the ontology, and is used to translate the operations on the ontology (e.g., query answering) in terms of concrete actions on the data sources.

    Non-Monotonic Ontology-based Abstractions of Data Services

    In Ontology-Based Data Access (OBDA), a domain ontology is linked to the data sources of an organization in order to query, integrate and manage data through the concepts and relations of the domain of interest, thus abstracting from the structure and the implementation details of the data layer. While the great majority of contributions in OBDA in the last decade have been concerned with the issue of computing the answers of queries expressed over the ontology, recent papers address a different problem, namely the one of providing suitable abstractions of data services, i.e., characterizing or explaining the semantics of queries over the sources in terms of queries over the domain ontology. Current works on this subject are based on expressing abstractions in terms of unions of conjunctive queries (UCQs). In this paper we advocate the use of a non-monotonic language for this task. As a first contribution, we present a simple extension of UCQs with nonmonotonic features, and show that non-monotonicity provides more expressive power in characterizing the semantics of data services. A second contribution is to prove that, similarly to the case of monotonic abstractions, depending on the expressive power of the languages used to specify the various components of the OBDA system, there are cases where neither perfect nor approximated abstractions exist for a given data service. As a third contribution, we single out interesting special cases where the existence of abstractions is guaranteed, and we present algorithms for computing such abstractions in these cases

    Monotone Abstractions in Ontology-Based Data Management

    In Ontology-Based Data Management (OBDM), an abstraction of a source query q is a query over the ontology capturing the semantics of q in terms of the concepts and the relations available in the ontology. Since a perfect characterization of a source query may not exist, the notions of best sound and complete approximations of an abstraction have been introduced and studied in the typical OBDM context, i.e., in the case where the ontology is expressed in DL-Lite, and source queries are expressed as unions of conjunctive queries (UCQs). Interestingly, if we restrict our attention to abstractions expressed as UCQs, even best approximations of abstractions are not guaranteed to exist. Thus, a natural question to ask is whether such limitations affect even larger classes of queries. In this paper, we answer this fundamental question for an essential class of queries, namely the class of monotone queries. We define a monotone query language based on disjunctive Datalog enriched with an epistemic operator, and show that its expressive power suffices for expressing the best approximations of monotone abstractions of UCQs

    Combining Global and Local Merges in Logic-based Entity Resolution

    In the recently proposed Lace framework for collective entity resolution, logical rules and constraints are used to identify pairs of entity references (e.g. author or paper ids) that denote the same entity. This identification is global: all occurrences of those entity references (possibly across multiple database tuples) are deemed equal and can be merged. By contrast, a local form of merge is often more natural when identifying pairs of data values, e.g. some occurrences of 'J. Smith' may be equated with 'Joe Smith', while others should merge with 'Jane Smith'. This motivates us to extend Lace with local merges of values and explore the computational properties of the resulting formalism.Comment: Accepted at KR 202

    CQE in OWL 2 QL: A "Longest Honeymoon" Approach (extended version)

    Controlled Query Evaluation (CQE) has been recently studied in the context of Semantic Web ontologies. The goal of CQE is concealing some query answers so as to prevent external users from inferring confidential information. In general, there exist multiple, mutually incomparable ways of concealing answers, and previous CQE approaches choose in advance which answers are visible and which are not. In this paper, instead, we study a dynamic CQE method, namely, we propose to alter the answer to the current query based on the evaluation of previous ones. We aim at a system that, besides being able to protect confidential data, is maximally cooperative, which intuitively means that it answers affirmatively to as many queries as possible; it achieves this goal by delaying answer modifications as much as possible. We also show that the behavior we get cannot be intensionally simulated through a static approach, independent of query history. Interestingly, for OWL 2 QL ontologies and policy expressed through denials, query evaluation under our semantics is first-order rewritable, and thus in AC0 in data complexity. This paves the way for the development of practical algorithms, which we also preliminarily discuss in the paper.Comment: This paper is the extended version of "P.Bonatti, G.Cima, D.Lembo, L.Marconi, R.Rosati, L.Sauro, and D.F.Savo. Controlled query evaluation in OWL 2 QL: A "Longest Honeymoon" approach" accepted for publication at ISWC 202
