94 research outputs found

    Updating DL-Lite ontologies through first-order queries

    Get PDF
    In this paper we study instance-level update in DL-LiteA, the description logic underlying the OWL 2 QL standard. In particular we focus on formula-based approaches to ABox insertion and deletion. We show that DL-LiteA, which is well-known for enjoying first-order rewritability of query answering, enjoys a first-order rewritability property also for updates. That is, every update can be reformulated into a set of insertion and deletion instructions computable through a nonrecursive datalog program. Such a program is readily translatable into a first-order query over the ABox considered as a database, and hence into SQL. By exploiting this result, we implement an update component for DLLiteA-based systems and perform some experiments showing that the approach works in practice.Peer ReviewedPostprint (author's final draft

    Using Ontologies for Semantic Data Integration

    Get PDF
    While big data analytics is considered as one of the most important paths to competitive advantage of today’s enterprises, data scientists spend a comparatively large amount of time in the data preparation and data integration phase of a big data project. This shows that data integration is still a major challenge in IT applications. Over the past two decades, the idea of using semantics for data integration has become increasingly crucial, and has received much attention in the AI, database, web, and data mining communities. Here, we focus on a specific paradigm for semantic data integration, called Ontology-Based Data Access (OBDA). The goal of this paper is to provide an overview of OBDA, pointing out both the techniques that are at the basis of the paradigm, and the main challenges that remain to be addressed

    Automatic & Semi-Automatic Methods for Supporting Ontology Change

    Get PDF

    Fusing Automatically Extracted Annotations for the Semantic Web

    Get PDF
    This research focuses on the problem of semantic data fusion. Although various solutions have been developed in the research communities focusing on databases and formal logic, the choice of an appropriate algorithm is non-trivial because the performance of each algorithm and its optimal configuration parameters depend on the type of data, to which the algorithm is applied. In order to be reusable, the fusion system must be able to select appropriate techniques and use them in combination. Moreover, because of the varying reliability of data sources and algorithms performing fusion subtasks, uncertainty is an inherent feature of semantically annotated data and has to be taken into account by the fusion system. Finally, the issue of schema heterogeneity can have a negative impact on the fusion performance. To address these issues, we propose KnoFuss: an architecture for Semantic Web data integration based on the principles of problem-solving methods. Algorithms dealing with different fusion subtasks are represented as components of a modular architecture, and their capabilities are described formally. This allows the architecture to select appropriate methods and configure them depending on the processed data. In order to handle uncertainty, we propose a novel algorithm based on the Dempster-Shafer belief propagation. KnoFuss employs this algorithm to reason about uncertain data and method results in order to refine the fused knowledge base. Tests show that these solutions lead to improved fusion performance. Finally, we addressed the problem of data fusion in the presence of schema heterogeneity. We extended the KnoFuss framework to exploit results of automatic schema alignment tools and proposed our own schema matching algorithm aimed at facilitating data fusion in the Linked Data environment. We conducted experiments with this approach and obtained a substantial improvement in performance in comparison with public data repositories

    Maintaining Integrity Constraints in Semantic Web

    Get PDF
    As an expressive knowledge representation language for Semantic Web, Web Ontology Language (OWL) plays an important role in areas like science and commerce. The problem of maintaining integrity constraints arises because OWL employs the Open World Assumption (OWA) as well as the Non-Unique Name Assumption (NUNA). These assumptions are typically suitable for representing knowledge distributed across the Web, where the complete knowledge about a domain cannot be assumed, but make it challenging to use OWL itself for closed world integrity constraint validation. Integrity constraints (ICs) on ontologies have to be enforced; otherwise conflicting results would be derivable from the same knowledge base (KB). The current trends of incorporating ICs into OWL are based on its query language SPARQL, alternative semantics, or logic programming. These methods usually suffer from limited types of constraints they can handle, and/or inherited computational expensiveness. This dissertation presents a comprehensive and efficient approach to maintaining integrity constraints. The design enforces data consistency throughout the OWL life cycle, including the processes of OWL generation, maintenance, and interactions with other ontologies. For OWL generation, the Paraconsistent model is used to maintain integrity constraints during the relational database to OWL translation process. Then a new rule-based language with set extension is introduced as a platform to allow users to specify constraints, along with a demonstration of 18 commonly used constraints written in this language. In addition, a new constraint maintenance system, called Jena2Drools, is proposed and implemented, to show its effectiveness and efficiency. To further handle inconsistencies among multiple distributed ontologies, this work constructs a framework to break down global constraints into several sub-constraints for efficient parallel validation

    Pattern-based design applied to cultural heritage knowledge graphs

    Full text link
    Ontology Design Patterns (ODPs) have become an established and recognised practice for guaranteeing good quality ontology engineering. There are several ODP repositories where ODPs are shared as well as ontology design methodologies recommending their reuse. Performing rigorous testing is recommended as well for supporting ontology maintenance and validating the resulting resource against its motivating requirements. Nevertheless, it is less than straightforward to find guidelines on how to apply such methodologies for developing domain-specific knowledge graphs. ArCo is the knowledge graph of Italian Cultural Heritage and has been developed by using eXtreme Design (XD), an ODP- and test-driven methodology. During its development, XD has been adapted to the need of the CH domain e.g. gathering requirements from an open, diverse community of consumers, a new ODP has been defined and many have been specialised to address specific CH requirements. This paper presents ArCo and describes how to apply XD to the development and validation of a CH knowledge graph, also detailing the (intellectual) process implemented for matching the encountered modelling problems to ODPs. Relevant contributions also include a novel web tool for supporting unit-testing of knowledge graphs, a rigorous evaluation of ArCo, and a discussion of methodological lessons learned during ArCo development

    Inference as a data management problem

    Get PDF
    Inference over OWL ontologies with large A-Boxes has been researched as a data management problem in recent years. This work adopts the strategy of applying a tableaux-based reasoner for complete T-Box classification, and using a rule-based mechanism for scalable A-Box reasoning. Specifically, we establish for the classified T-Box an inference framework, which can be used to compute and materialise inference results. The inference we focus on is type inference in A-Box reasoning, which we define as the process of deriving for each A-Box instance its memberships of OWL classes and properties. As our approach materialises the inference results, it in general provides faster query processing than non-materialising techniques, at the expense of larger space requirement and slower update speed. When the A-Box size is suitable for an RDBMS, we compile the inference framework to triggers, which incrementally update the inference materialisation from both data inserts and data deletes, without needing to re-compute the whole inference. More importantly, triggers make inference available as atomic consequences of inserts or deletes, which preserves the ACID properties of transactions, and such inference is known as transactional reasoning. When the A-Box size is beyond the capability of an RDBMS, we then compile the inference framework to Spark programmes, which provide scalable inference materialisation in a Big Data system, and our evaluation considers up to reasoning 270 million A-Box facts. Evaluating our work, and comparing with two state-of-the-art reasoners, we empirically verify that our approach is able to perform scalable inference materialisation, and to provide faster query processing with comparable completeness of reasoning.Open Acces

    Tapaustutkimus: Semanttinen hakukäyttöliittymä älykkääseen verkonhallintajärjestelmään

    Get PDF
    The thesis is part of a project which aims to enhance the automation in mobile network management with a statistical reasoner. The statistical reasoner analyses given network data and creates configuration proposals to the network. This thesis is a case study for building a semantic search interface which demonstrates how the statistical reasoner behaves in the mobile network. The demonstrator is implemented as a faceted search interface which uses Semantic Web technologies for data management and HTML 5 for the graphical user interface. The objective of the case study is to discover the information need of the user and to find methods to provide the needed information. The final implementation presents three separate faceted search views in order to answer the information need: one for describing the relation between the input and output of the reasoner, one for the relation between the output and its impact on the network, and one for the rule base of the reasoner. The evaluation of the implementation shows that the chosen techniques and methods support user in exploring the relevant network- and reasoner-related information. The evaluation also discovered some deficiencies which is valuable information for the case study. Finally, this case study produced new thoughts with respect to the initial objectives and to future directions for the project.Tämä diplomityö on osa projektia, jossa tutkitaan, miten tilastollisella päättelijällä voidaan parantaa mobiiliverkon automaattista hallintaa. Tilastollinen päättelijä analysoi mobiiliverkosta saatavaa dataa ja tuottaa muutosehdotuksia mobiiliverkon parametreihin. Diplomityö toteutetaan tapaustutkimuksena, jossa suunnitellaan ja toteutetaan semanttinen hakukäyttöliittymä havainnollistamaan tilastollisen päättelijän toimintaa. Käyttöliittymä toteutetaan moninäkymähaulla, joka hyödyntää Semanttisen Webin tekniikoita tiedonhallinnassa ja HTML 5 -tekniikoita graafisessa käyttöliittymässä. Tapaustutkimuksen tavoitteena on löytää käyttäjän tiedontarve sekä menetelmät olennaisen tiedon tarjoamiselle. Toteutettu hakukäyttöliittymä esittelee kolme erillistä moninäkymähakua, joilla vastataan käyttäjän tiedontarpeeseen: ensimmäinen näkymä kuvaa päättelijän syötteen ja vasteen välistä suhdetta, toinen kuvaa vasteen ja sen vaikutusten suhdetta ja kolmas kuvaa päättelijän sääntökantaa. Sovelluksen arviointi osoittaa, että valitut tekniikat ja menetelmät tukevat hyvin käyttäjän tiedontarvetta ja tiedonhakuprosessia. Arvioinnissa käytiin läpi myös joitakin toteutuksen puutteita, jotka antoivat uutta arvokasta tietoa tapaustutkimukselle. Lopussa esitellään uusia ajatuksia liittyen tutkimuksen tavoitteisiin sekä mahdollisiin jatkotutkimusaiheisiin

    Implementation of Web Query Languages Reconsidered

    Get PDF
    Visions of the next generation Web such as the "Semantic Web" or the "Web 2.0" have triggered the emergence of a multitude of data formats. These formats have different characteristics as far as the shape of data is concerned (for example tree- vs. graph-shaped). They are accompanied by a puzzlingly large number of query languages each limited to one data format. Thus, a key feature of the Web, namely to make it possible to access anything published by anyone, is compromised. This thesis is devoted to versatile query languages capable of accessing data in a variety of Web formats. The issue is addressed from three angles: language design, common, yet uniform semantics, and common, yet uniform evaluation. % Thus it is divided in three parts: First, we consider the query language Xcerpt as an example of the advocated class of versatile Web query languages. Using this concrete exemplar allows us to clarify and discuss the vision of versatility in detail. Second, a number of query languages, XPath, XQuery, SPARQL, and Xcerpt, are translated into a common intermediary language, CIQLog. This language has a purely logical semantics, which makes it easily amenable to optimizations. As a side effect, this provides the, to the best of our knowledge, first logical semantics for XQuery and SPARQL. It is a very useful tool for understanding the commonalities and differences of the considered languages. Third, the intermediate logical language is translated into a query algebra, CIQCAG. The core feature of CIQCAG is that it scales from tree- to graph-shaped data and queries without efficiency losses when tree-data and -queries are considered: it is shown that, in these cases, optimal complexities are achieved. CIQCAG is also shown to evaluate each of the aforementioned query languages with a complexity at least as good as the best known evaluation methods so far. For example, navigational XPath is evaluated with space complexity O(q d) and time complexity O(q n) where q is the query size, n the data size, and d the depth of the (tree-shaped) data. CIQCAG is further shown to provide linear time and space evaluation of tree-shaped queries for a larger class of graph-shaped data than any method previously proposed. This larger class of graph-shaped data, called continuous-image graphs, short CIGs, is introduced for the first time in this thesis. A (directed) graph is a CIG if its nodes can be totally ordered in such a manner that, for this order, the children of any node form a continuous interval. CIQCAG achieves these properties by employing a novel data structure, called sequence map, that allows an efficient evaluation of tree-shaped queries, or of tree-shaped cores of graph-shaped queries on any graph-shaped data. While being ideally suited to trees and CIGs, the data structure gracefully degrades to unrestricted graphs. It yields a remarkably efficient evaluation on graph-shaped data that only a few edges prevent from being trees or CIGs

    Community-Driven Engineering of the DBpedia Infobox Ontology and DBpedia Live Extraction

    Get PDF
    The DBpedia project aims at extracting information based on semi-structured data present in Wikipedia articles, interlinking it with other knowledge bases, and publishing this information as RDF freely on the Web. So far, the DBpedia project has succeeded in creating one of the largest knowledge bases on the Data Web, which is used in many applications and research prototypes. However, the manual effort required to produce and publish a new version of the dataset – which was already partially outdated the moment it was released – has been a drawback. Additionally, the maintenance of the DBpedia Ontology, an ontology serving as a structural backbone for the extracted data, made the release cycles even more heavyweight. In the course of this thesis, we make two contributions: Firstly, we develop a wiki-based solution for maintaining the DBpedia Ontology. By allowing anyone to edit, we aim to distribute the maintenance work among the DBpedia community. Secondly, we extend DBpedia with a Live Extraction Framework, which is capable of extracting RDF data from articles that have recently been edited on the English Wikipedia. By making this RDF data automatically public in near realtime, namely via SPARQL and Linked Data, we overcome many of the drawbacks of the former release cycles
    corecore