61 research outputs found

    Using Crowdsourcing for Fine-Grained Entity Type Completion in Knowledge Bases

    Get PDF
    Recent years have witnessed the proliferation of large-scale Knowledge Bases (KBs). However, many entities in KBs have incomplete type information, and some are totally untyped. Even worse, fine-grained types (e.g., BasketballPlayer) containing rich semantic meanings are more likely to be incomplete, as they are more difficult to be obtained. Existing machine-based algorithms use predicates (e.g., birthPlace) of entities to infer their missing types, and they have limitations that the predicates may be insufficient to infer fine-grained types. In this paper, we utilize crowdsourcing to solve the problem, and address the challenge of controlling crowdsourcing cost. To this end, we propose a hybrid machine-crowdsourcing approach for fine-grained entity type completion. It firstly determines the types of some “representative” entities via crowdsourcing and then infers the types for remaining entities based on the crowdsourcing results. To support this approach, we first propose an embedding-based influence for type inference which considers not only the distance between entity embeddings but also the distances between entity and type embeddings. Second, we propose a new difficulty model for entity selection which can better capture the uncertainty of the machine algorithm when identifying the entity types. We demonstrate the effectiveness of our approach through experiments on real crowdsourcing platforms. The results show that our method outperforms the state-of-the-art algorithms by improving the effectiveness of fine-grained type completion at affordable crowdsourcing cost.Peer reviewe

    LD4IE - Linked data for information extraction

    Get PDF
    The World Wide Web provides access to tens of billions of pages, mostly containinginformation that is largely unstructured and only intended for human readability. Onthe other hand, the LOD provide billions of pieces of information linked together andmade available for automated processing. However, there is the lack of interconnectionbetween the information in the Web pages and that in LOD. A number of initiatives,like RDFa (supported by W3C) or Microformats (used by schema.org and supported bymajor search engines) are trying to enable machines to make sense of the informationcontained in human readable pages by providing the ability to annotate webpage contentwith links into LOD

    Canonicalizing Knowledge Base Literals

    Get PDF
    Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching

    Supporting the Linked Data Life Cycle Using an Integrated Tool Stack

    Get PDF

    Task-Oriented Uncertainty Evaluation for Linked Data Based on Graph Interlinks

    Get PDF
    International audienceFor data sources to ensure providing reliable linked data, they need to indicate information about the (un)certainty of their data based on the views of their consumers. In Addition, uncertainty information in terms of Semantic Web has also to be encoded into a readable, publishable, and exchangeable format to increase the interoperability of systems. This paper introduces a novel approach to evaluate the uncertainty of data in an RDF dataset based on its links with other datasets. We propose to evaluate uncertainty for sets of statements related to user-selected resources by exploiting their similarity interlinks with external resources. Our data-driven approach translates each interlink into a set of links referring to the position of a target dataset from a reference dataset, based on both object and predicate similarities. We show how our approach can be implemented and present an evaluation with real-world datasets. Finally, we discuss updating the publishable uncertainty values

    QueDI: From Knowledge Graph Querying to Data Visualization

    Get PDF
    Abstract While Open Data (OD) publishers are spur in providing data as Linked Open Data (LOD) to boost innovation and knowledge creation, the complexity of RDF querying languages, such as SPARQL, threatens their exploitation. We aim to help lay users (by focusing on experts in table manipulation, such as OD experts) in querying and exploiting LOD by taking advantage of our target users' expertise in table manipulation and chart creation. We propose QueDI (Query Data of Interest), a question-answering and visualization tool that implements a scaffold transitional approach to 1) query LOD without being aware of SPARQL and representing results by data tables; 2) once reached our target user comfort zone, users can manipulate and 3) visually represent data by exportable and dynamic visualizations. The main novelty of our approach is the split of the querying phase in SPARQL query building and data table manipulation. In this article, we present the QueDI operating mechanism, its interface supported by a guided use-case over DBpedia, and the evaluation of its accuracy and usability level

    Results of the ontology alignment evaluation initiative 2023

    Get PDF
    The Ontology Alignment Evaluation Initiative (OAEI) aims at comparing ontology matching systems on precisely defined test cases. These test cases can be based on ontologies of different levels of complexity and use different evaluation modalities. The OAEI 2023 campaign offered 15 tracks and was attended by 16 participants. This paper is an overall presentation of that campaign
    • …
    corecore