14 research outputs found

    Ontology Based Data Access in Statoil

    Get PDF
    Ontology Based Data Access (OBDA) is a prominent approach to query databases which uses an ontology to expose data in a conceptually clear manner by abstracting away from the technical schema-level details of the underlying data. The ontology is ‘connected’ to the data via mappings that allow to automatically translate queries posed over the ontology into data-level queries that can be executed by the underlying database management system. Despite a lot of attention from the research community, there are still few instances of real world industrial use of OBDA systems. In this work we present data access challenges in the data-intensive petroleum company Statoil and our experience in addressing these challenges with OBDA technology. In particular, we have developed a deployment module to create ontologies and mappings from relational databases in a semi-automatic fashion; a query processing module to perform and optimise the process of translating ontological queries into data queries and their execution over either a single DB of federated DBs; and a query formulation module to support query construction for engineers with a limited IT background. Our modules have been integrated in one OBDA system, deployed at Statoil, integrated with Statoil’s infrastructure, and evaluated with Statoil’s engineers and data

    Handling redundant processing in OBDA query execution over relational sources

    No full text
    Redundant processing is a key problem in the translation of initial queries posed over an ontology into SQL queries, through mappings, as it is performed by ontology-based data access systems. Examples of such processing are duplicate answers obtained during query evaluation, which must finally be discarded, or common expressions evaluated multiple times from different parts of the same complex query. Many optimizations that aim to minimize this problem have been proposed and implemented, mostly based on semantic query optimization techniques, by exploiting ontological axioms and constraints defined in the database schema. However, data operations that introduce redundant processing are still generated in many practical settings, and this is a factor that impacts query execution. In this work we propose a cost-based method for query translation, which starts from an initial result and uses information about redundant processing in order to come up with an equivalent, more efficient translation. The method operates in a number of steps, by relying on certain heuristics indicating that we obtain a more efficient query in each step. Through experimental evaluation using the Ontop system for ontology-based data access, we exhibit the benefits of our method. © 2021 Elsevier B.V

    In-memory parallelization of join queries over large ontological hierarchies

    No full text
    The Resource Description Framework (RDF) data model enables the construction of knowledge graphs over various domains, using ontologies in order to encode information about the domain, and simple statements in the form of subject-predicate-object triples for data representation, facilitating the interlinking and exchange of Web data. However, this simplicity comes with the cost of having to execute a large number of joins in order to get the desirable query results, while at the same time large ontological hierarchies complicate the query answering process even more, for systems that provide complete answers with respect to such ontological axioms. In this work we present PARJ, an in-memory RDF store which takes into consideration ontological hierarchies during join processing with very low performance overhead, avoiding expensive preprocessing and materialization of implications, and is also amenable to straightforward parallelization. Specifically, we present a join implementation that allows to achieve any desired degree of parallelism on arbitrary join queries and RDF graphs stored in memory using compact vertical partitioning. We use an adaptive join processing approach, such that we take advantage of complete or even partial ordering of RDF data, which is compactly stored in order to increase spatial locality and keep memory consumption low, coupled with an ID-to-Position vector index used when ordering does not allow for efficient scanning of the input relation. Finally, we experimentally show the efficiency and scalability of our proposal. © 2020, Springer Science+Business Media, LLC, part of Springer Nature

    Enriching OWL Ontologies with Linguistic and User-Related Annotations: The ELEON System

    No full text

    Efficient Ontology-Based Data Integration with Canonical IRIs

    No full text
    In this paper, we study how to efficiently integrate multiple relational databases using an ontology-based approach. In ontology-based data integration (OBDI) an ontology provides a coherent view of multiple databases, and SPARQL queries over the ontology are rewritten into (federated) SQL queries over the underlying databases. Specifically, we address the scenario where records with different identifiers in different databases can represent the same entity. The standard approach in this case is to use sameAs to model the equivalence between entities. However, the standard semantics of sameAs may cause an exponential blow up of query results, since all possible combinations of equivalent identifiers have to be included in the answers. The large number of answers is not only detrimental to the performance of query evaluation, but also makes the answers difficult to understand due to the redundancy they introduce. This motivates us to propose an alternative approach, which is based on assigning canonical IRIs to entities in order to avoid redundancy. Formally, we present our approach as a new SPARQL entailment regime and compare it with the sameAs approach. We provide a prototype implementation and evaluate it in two experiments: in a real-world data integration scenario in Statoil and in an experiment extending the Wisconsin benchmark. The experimental results show that the canonical IRI approach is significantly more scalable. © 2018, Springer International Publishing AG, part of Springer Nature

    Distributed Query Processing on the Cloud: the Optique Point of View (Short Paper) ⋆

    No full text
    Abstract. The Optique European project 3 [6] aims at providing an end-to-end solution for scalable access to Big Data integration, where end users will formulate queries based on a familiar conceptualization of the underlying domain. From the users ’ queries the Optique platform will automatically generate appropriate queries over the underlying integrated data, optimize and execute them on the Cloud. In this paper we present the distributed query processing engine of the Optique platform. The efficient execution of complex queries posed by end users is an important and challenging task. The engine aims at providing a scalable solution for query execution in the Cloud, and should cope with heterogeneity of data sources as well as with temporal and streaming data.

    Adaptive Natural Language Interaction

    No full text
    The subject of this demonstration is natural language interaction, focusing on adaptivity and profiling of the dialogue management and the generated output (text and speech). These are demonstrated in a museum guide use-case, operating in a simulated environment. The main technical innovations presented are the profiling model, the dialogue and action management system, and the text generation and speech synthesis systems.

    Template-Based Question Answering over Linked Geospatial Data

    No full text
    Large amounts of geospatial data have been made available recently on the linked open data cloud and on the portals of many national cartographic agencies (e.g., OpenStreetMap data, administrative geographies of various countries, or land cover/land use data sets). These datasets use various geospatial vocabularies and can be queried using SPARQL or its OGC-standardized extension GeoSPARQL. In this paper we go beyond these approaches to offer a question answering service on top of linked geospatial data sources. Our system has been implemented as re-usable components of the Qanary question answering architecture to provide benefits for future research tasks. We give a detailed description of the architecture of the system, its underlying algorithms and its evaluation using a set of 201 natural language questions

    Ontology Based Data Access in Statoil

    No full text
    Ontology Based Data Access (OBDA) is a prominent approach to query databases which uses an ontology to expose data in a conceptually clear manner by abstracting away from the technical schema-level details of the underlying data. The ontology is ‘connected’ to the data via mappings that allow to automatically translate queries posed over the ontology into data-level queries that can be executed by the underlying database management system. Despite a lot of attention from the research community, there are still few instances of real world industrial use of OBDA systems. In this work we present data access challenges in the data-intensive petroleum company Statoil and our experience in addressing these challenges with OBDA technology. In particular, we have developed a deployment module to create ontologies and mappings from relational databases in a semi-automatic fashion; a query processing module to perform and optimise the process of translating ontological queries into data queries and their execution over either a single DB of federated DBs; and a query formulation module to support query construction for engineers with a limited IT background. Our modules have been integrated in one OBDA system, deployed at Statoil, integrated with Statoil's infrastructure, and evaluated with Statoil's engineers and data. © 2017 Elsevier B.V

    Optique: Towards OBDA systems for industry

    No full text
    The recently started EU FP7-funded project Optique will develop an end-to-end OBDA system providing scalable end-user access to industrial Big Data stores. This paper presents an initial architectural specification for the Optique system along with the individual system components. © Springer-Verlag 2013
    corecore