30 research outputs found

    Canonicalizing Knowledge Base Literals

    Get PDF
    Ontology-based knowledge bases (KBs) like DBpedia are very valuable resources, but their usefulness and usability is limited by various quality issues. One such issue is the use of string literals instead of semantically typed entities. In this paper we study the automated canonicalization of such literals, i.e., replacing the literal with an existing entity from the KB or with a new entity that is typed using classes from the KB. We propose a framework that combines both reasoning and machine learning in order to predict the relevant entities and types, and we evaluate this framework against state-of-the-art baselines for both semantic typing and entity matching

    Tackling scalability issues in mining path patterns from knowledge graphs: a preliminary study

    Get PDF
    Features mined from knowledge graphs are widely used within multiple knowledge discovery tasks such as classification or fact-checking. Here, we consider a given set of vertices, called seed vertices, and focus on mining their associated neighboring vertices, paths, and, more generally, path patterns that involve classes of ontologies linked with knowledge graphs. Due to the combinatorial nature and the increasing size of real-world knowledge graphs, the task of mining these patterns immediately entails scalability issues. In this paper, we address these issues by proposing a pattern mining approach that relies on a set of constraints (e.g., support or degree thresholds) and the monotonicity property. As our motivation comes from the mining of real-world knowledge graphs, we illustrate our approach with PGxLOD, a biomedical knowledge graph

    Correcting Knowledge Base Assertions

    Get PDF
    The usefulness and usability of knowledge bases (KBs) is often limited by quality issues. One common issue is the presence of erroneous assertions, often caused by lexical or semantic confusion. We study the problem of correcting such assertions, and present a general correction framework which combines lexical matching, semantic embedding, soft constraint mining and semantic consistency checking. The framework is evaluated using DBpedia and an enterprise medical KB

    Query-Driven On-The-Fly Knowledge Base Construction

    Get PDF

    Principles of Knowledge Representation and Reasoning in the FRAPPE System

    Get PDF
    The purpose of this paper is to elucidate the following four important architectural principles of knowledge representation and reasoning with the example of an implemented system: limited reasoning, truth maintenance, hybrid architecture, and many sorted logic.MIT Artificial Intelligence Laborator

    Query-Driven On-The-Fly Knowledge Base Construction

    Get PDF
    Today's openly available knowledge bases, such as DBpedia, Yago, Wikidata or Freebase, capture billions of facts about the world's entities. However, even the largest among these (i) are still limited in up-to-date coverage of what happens in the real world, and (ii) miss out on many relevant predicates that precisely capture the wide variety of relationships among entities. To overcome both of these limitations, we propose a novel approach to build on-the-fly knowledge bases in a query-driven manner. Our system, called QKBfly, supports analysts and journalists as well as question answering on emerging topics, by dynamically acquiring relevant facts as timely and comprehensively as possible. QKBfly is based on a semantic-graph representation of sentences, by which we perform three key IE tasks, namely named-entity disambiguation, co-reference resolution and relation extraction , in a light-weight and integrated manner. In contrast to Open IE, our output is canonicalized. In contrast to traditional IE, we capture more predicates, including ternary and higher-arity ones. Our experiments demonstrate that QKBfly can build high-quality, on-the-fly knowledge bases that can readily be deployed, e.g., for the task of ad-hoc question answering. </jats:p

    Vermeidung von ReprÀsentationsheterogenitÀten in realweltlichen Wissensgraphen

    Get PDF
    Knowledge graphs are repositories providing factual knowledge about entities. They are a great source of knowledge to support modern AI applications for Web search, question answering, digital assistants, and online shopping. The advantages of machine learning techniques and the Web's growth have led to colossal knowledge graphs with billions of facts about hundreds of millions of entities collected from a large variety of sources. While integrating independent knowledge sources promises rich information, it inherently leads to heterogeneities in representation due to a large variety of different conceptualizations. Thus, real-world knowledge graphs are threatened in their overall utility. Due to their sheer size, they are hardly manually curatable anymore. Automatic and semi-automatic methods are needed to cope with these vast knowledge repositories. We first address the general topic of representation heterogeneity by surveying the problem throughout various data-intensive fields: databases, ontologies, and knowledge graphs. Different techniques for automatically resolving heterogeneity issues are presented and discussed, while several open problems are identified. Next, we focus on entity heterogeneity. We show that automatic matching techniques may run into quality problems when working in a multi-knowledge graph scenario due to incorrect transitive identity links. We present four techniques that can be used to improve the quality of arbitrary entity matching tools significantly. Concerning relation heterogeneity, we show that synonymous relations in knowledge graphs pose several difficulties in querying. Therefore, we resolve these heterogeneities with knowledge graph embeddings and by Horn rule mining. All methods detect synonymous relations in knowledge graphs with high quality. Furthermore, we present a novel technique for avoiding heterogeneity issues at query time using implicit knowledge storage. We show that large neural language models are a valuable source of knowledge that is queried similarly to knowledge graphs already solving several heterogeneity issues internally.Wissensgraphen sind eine wichtige Datenquelle von EntitĂ€tswissen. Sie unterstĂŒtzen viele moderne KI-Anwendungen. Dazu gehören unter anderem Websuche, die automatische Beantwortung von Fragen, digitale Assistenten und Online-Shopping. Neue Errungenschaften im maschinellen Lernen und das außerordentliche Wachstum des Internets haben zu riesigen Wissensgraphen gefĂŒhrt. Diese umfassen hĂ€ufig Milliarden von Fakten ĂŒber Hunderte von Millionen von EntitĂ€ten; hĂ€ufig aus vielen verschiedenen Quellen. WĂ€hrend die Integration unabhĂ€ngiger Wissensquellen zu einer großen Informationsvielfalt fĂŒhren kann, fĂŒhrt sie inhĂ€rent zu HeterogenitĂ€ten in der WissensreprĂ€sentation. Diese HeterogenitĂ€t in den Daten gefĂ€hrdet den praktischen Nutzen der Wissensgraphen. Durch ihre GrĂ¶ĂŸe lassen sich die Wissensgraphen allerdings nicht mehr manuell bereinigen. DafĂŒr werden heutzutage hĂ€ufig automatische und halbautomatische Methoden benötigt. In dieser Arbeit befassen wir uns mit dem Thema ReprĂ€sentationsheterogenitĂ€t. Wir klassifizieren HeterogenitĂ€t entlang verschiedener Dimensionen und erlĂ€utern HeterogenitĂ€tsprobleme in Datenbanken, Ontologien und Wissensgraphen. Weiterhin geben wir einen knappen Überblick ĂŒber verschiedene Techniken zur automatischen Lösung von HeterogenitĂ€tsproblemen. Im nĂ€chsten Kapitel beschĂ€ftigen wir uns mit EntitĂ€tsheterogenitĂ€t. Wir zeigen Probleme auf, die in einem Multi-Wissensgraphen-Szenario aufgrund von fehlerhaften transitiven Links entstehen. Um diese Probleme zu lösen stellen wir vier Techniken vor, mit denen sich die QualitĂ€t beliebiger Entity-Alignment-Tools deutlich verbessern lĂ€sst. Wir zeigen, dass RelationsheterogenitĂ€t in Wissensgraphen zu Problemen bei der Anfragenbeantwortung fĂŒhren kann. Daher entwickeln wir verschiedene Methoden um synonyme Relationen zu finden. Eine der Methoden arbeitet mit hochdimensionalen Wissensgrapheinbettungen, die andere mit einem Rule Mining Ansatz. Beide Methoden können synonyme Relationen in Wissensgraphen mit hoher QualitĂ€t erkennen. DarĂŒber hinaus stellen wir eine neuartige Technik zur Vermeidung von HeterogenitĂ€tsproblemen vor, bei der wir eine implizite WissensreprĂ€sentation verwenden. Wir zeigen, dass große neuronale Sprachmodelle eine wertvolle Wissensquelle sind, die Ă€hnlich wie Wissensgraphen angefragt werden können. Im Sprachmodell selbst werden bereits viele der HeterogenitĂ€tsprobleme aufgelöst, so dass eine Anfrage heterogener Wissensgraphen möglich wird
    corecore