12 research outputs found

    A vocabulary-independent generation framework for DBpedia and beyond

    Get PDF
    The dbpedia Extraction Framework, the generation framework behind one of the Linked Open Data cloud’s central hubs, has limitations which lead to quality issues with the dbpedia dataset. Therefore, we provide a new take on its Extraction Framework that allows for a sustainable and general-purpose Linked Data generation framework by adapting a semantic-driven approach. The proposed approach decouples, in a declarative manner, the extraction, transformation, and mapping rules execution. This way, among others, interchanging different schema annotations is supported, instead of being coupled to a certain ontology as it is now, because the dbpedia Extraction Framework allows only generating a certain dataset with a single semantic representation. In this paper, we shed more light to the added value that this aspect brings. We provide an extracted dbpedia dataset using a different vocabulary, and give users the opportunity to generate a new dbpedia dataset using a custom combination of vocabularies

    TowardsWeb-Scale Collaborative Knowledge Extraction

    Full text link

    Mind the Cultural Gap: Bridging Language-Specific DBpedia Chapters for Question Answering

    Get PDF
    International audienceIn order to publish information extracted from language specific pages of Wikipedia in a structured way, the Semantic Web community has started an effort of internationalization of DBpedia. Language specific DBpedia chapters can contain very different information from one language to another, in particular they provide more details on certain topics, or fill information gaps. Language specific DBpedia chapters are well connected through instance interlinking, extracted from Wikipedia. An alignment between properties is also carried out by DBpedia contributors as a mapping from the terms in Wikipedia to a common ontology, enabling the exploitation of information coming from language specific DBpedia chapters. However, the mapping process is currently incomplete, it is time-consuming as it is performed manually, and it may lead to the introduction of redundant terms in the ontology. In this chapter we first propose an approach to automatically extend the existing alignments, and we then present an extension of QAKiS, a system for Question Answering over Linked Data that allows to query language specific DB-pedia chapters relying on the above mentioned property alignment. In the current version of QAKiS, English, French and German DBpedia chapters are queried using a natural language interface

    Lemonade: A Web Assistant for Creating and Debugging Ontology Lexica

    No full text

    Test-driven evaluation of linked data quality

    No full text
    Linked Open Data (LOD) comprises an unprecedented volume of structured data on the Web. However, these datasets are of varying quality ranging from extensively curated datasets to crowdsourced or extracted data of often relatively low quality. We present a methodology for test-driven quality assessment of Linked Data, which is inspired by test-driven software development. We argue that vocabularies, ontologies and knowledge bases should be accompanied by a number of test cases, which help to ensure a basic level of quality. We present a methodology for assessing the quality of linked data resources, based on a formalization of bad smells and data quality problems. Our formalization employs SPARQL query templates, which are instantiated into concrete quality test case queries. Based on an extensive survey, we compile a comprehensive library of data quality test case patterns. We perform automatic test case instantiation based on schema constraints or semi-automatically enriched schemata and allow the user to generate specific test case instantiations that are applicable to a schema or dataset. We provide an extensive evaluation of five LOD datasets, manual test case instantiation for five schemas and automatic test case instantiations for all available schemata registered with Linked Open Vocabularies (LOV). One of the main advantages of our approach is that domain specific semantics can be encoded in the data quality test cases, thus being able to discover data quality problems beyond conventional quality heuristics

    MEPDaW+LDQ Preface

    No full text
    This joint volume of proceedings gathers together papers from the 2nd Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW) and the 3rd Workshop on Linked Data Quality (LDQ), held on the 30th of May of 2016 during the 13th ESWC conference in Anissaras, Crete, Greece

    Linked Data Cleansing and Change Management

    No full text
    corecore