2,767 research outputs found

    EventEA: Benchmarking Entity Alignment for Event-centric Knowledge Graphs

    Full text link
    Entity alignment is to find identical entities in different knowledge graphs (KGs) that refer to the same real-world object. Embedding-based entity alignment techniques have been drawing a lot of attention recently because they can help solve the issue of symbolic heterogeneity in different KGs. However, in this paper, we show that the progress made in the past was due to biased and unchallenging evaluation. We highlight two major flaws in existing datasets that favor embedding-based entity alignment techniques, i.e., the isomorphic graph structures in relation triples and the weak heterogeneity in attribute triples. Towards a critical evaluation of embedding-based entity alignment methods, we construct a new dataset with heterogeneous relations and attributes based on event-centric KGs. We conduct extensive experiments to evaluate existing popular methods, and find that they fail to achieve promising performance. As a new approach to this difficult problem, we propose a time-aware literal encoder for entity alignment. The dataset and source code are publicly available to foster future research. Our work calls for more effective and practical embedding-based solutions to entity alignment.Comment: submitted to ISWC 202

    A Benchmarking Study of Matching Algorithms for Knowledge Graph Entity Alignment

    Full text link
    How to identify those equivalent entities between knowledge graphs (KGs), which is called Entity Alignment (EA), is a long-standing challenge. So far, many methods have been proposed, with recent focus on leveraging Deep Learning to solve this problem. However, we observe that most of the efforts has been paid to having better representation of entities, rather than improving entity matching from the learned representations. In fact, how to efficiently infer the entity pairs from this similarity matrix, which is essentially a matching problem, has been largely ignored by the community. Motivated by this observation, we conduct an in-depth analysis on existing algorithms that are particularly designed for solving this matching problem, and propose a novel matching method, named Bidirectional Matching (BMat). Our extensive experimental results on public datasets indicate that there is currently no single silver bullet solution for EA. In other words, different classes of entity similarity estimation may require different matching algorithms to reach the best EA results for each class. We finally conclude that using PARIS, the state-of-the-art EA approach, with BMat gives the best combination in terms of EA performance and the algorithm's time and space complexity.Comment: 11 pages, 1 figure, 7 table

    Dividing the Ontology Alignment Task with Semantic Embeddings and Logic-based Modules

    Get PDF
    Large ontologies still pose serious challenges to state-of-the-art ontology alignment systems. In this paper we present an approach that combines a neural embedding model and logic-based modules to accurately divide an input ontology matching task into smaller and more tractable matching (sub)tasks. We have conducted a comprehensive evaluation using the datasets of the Ontology Alignment Evaluation Initiative. The results are encouraging and suggest that the proposed method is adequate in practice and can be integrated within the workflow of systems unable to cope with very large ontologies

    Knowledge Graphs are not Created Equal: Exploring the Properties and Structure of Real KGs

    Full text link
    Despite the recent popularity of knowledge graph (KG) related tasks and benchmarks such as KG embeddings, link prediction, entity alignment and evaluation of the reasoning abilities of pretrained language models as KGs, the structure and properties of real KGs are not well studied. In this paper, we perform a large scale comparative study of 29 real KG datasets from diverse domains such as the natural sciences, medicine, and NLP to analyze their properties and structural patterns. Based on our findings, we make several recommendations regarding KG-based model development and evaluation. We believe that the rich structural information contained in KGs can benefit the development of better KG models across fields and we hope this study will contribute to breaking the existing data silos between different areas of research (e.g., ML, NLP, AI for sciences).Comment: Accepted at NeurIPS 2023 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2023

    Machine learning for managing structured and semi-structured data

    Get PDF
    As the digitalization of private, commercial, and public sectors advances rapidly, an increasing amount of data is becoming available. In order to gain insights or knowledge from these enormous amounts of raw data, a deep analysis is essential. The immense volume requires highly automated processes with minimal manual interaction. In recent years, machine learning methods have taken on a central role in this task. In addition to the individual data points, their interrelationships often play a decisive role, e.g. whether two patients are related to each other or whether they are treated by the same physician. Hence, relational learning is an important branch of research, which studies how to harness this explicitly available structural information between different data points. Recently, graph neural networks have gained importance. These can be considered an extension of convolutional neural networks from regular grids to general (irregular) graphs. Knowledge graphs play an essential role in representing facts about entities in a machine-readable way. While great efforts are made to store as many facts as possible in these graphs, they often remain incomplete, i.e., true facts are missing. Manual verification and expansion of the graphs is becoming increasingly difficult due to the large volume of data and must therefore be assisted or substituted by automated procedures which predict missing facts. The field of knowledge graph completion can be roughly divided into two categories: Link Prediction and Entity Alignment. In Link Prediction, machine learning models are trained to predict unknown facts between entities based on the known facts. Entity Alignment aims at identifying shared entities between graphs in order to link several such knowledge graphs based on some provided seed alignment pairs. In this thesis, we present important advances in the field of knowledge graph completion. For Entity Alignment, we show how to reduce the number of required seed alignments while maintaining performance by novel active learning techniques. We also discuss the power of textual features and show that graph-neural-network-based methods have difficulties with noisy alignment data. For Link Prediction, we demonstrate how to improve the prediction for unknown entities at training time by exploiting additional metadata on individual statements, often available in modern graphs. Supported with results from a large-scale experimental study, we present an analysis of the effect of individual components of machine learning models, e.g., the interaction function or loss criterion, on the task of link prediction. We also introduce a software library that simplifies the implementation and study of such components and makes them accessible to a wide research community, ranging from relational learning researchers to applied fields, such as life sciences. Finally, we propose a novel metric for evaluating ranking results, as used for both completion tasks. It allows for easier interpretation and comparison, especially in cases with different numbers of ranking candidates, as encountered in the de-facto standard evaluation protocols for both tasks.Mit der rasant fortschreitenden Digitalisierung des privaten, kommerziellen und öffentlichen Sektors werden immer größere Datenmengen verfügbar. Um aus diesen enormen Mengen an Rohdaten Erkenntnisse oder Wissen zu gewinnen, ist eine tiefgehende Analyse unerlässlich. Das immense Volumen erfordert hochautomatisierte Prozesse mit minimaler manueller Interaktion. In den letzten Jahren haben Methoden des maschinellen Lernens eine zentrale Rolle bei dieser Aufgabe eingenommen. Neben den einzelnen Datenpunkten spielen oft auch deren Zusammenhänge eine entscheidende Rolle, z.B. ob zwei Patienten miteinander verwandt sind oder ob sie vom selben Arzt behandelt werden. Daher ist das relationale Lernen ein wichtiger Forschungszweig, der untersucht, wie diese explizit verfügbaren strukturellen Informationen zwischen verschiedenen Datenpunkten nutzbar gemacht werden können. In letzter Zeit haben Graph Neural Networks an Bedeutung gewonnen. Diese können als eine Erweiterung von CNNs von regelmäßigen Gittern auf allgemeine (unregelmäßige) Graphen betrachtet werden. Wissensgraphen spielen eine wesentliche Rolle bei der Darstellung von Fakten über Entitäten in maschinenlesbaren Form. Obwohl große Anstrengungen unternommen werden, so viele Fakten wie möglich in diesen Graphen zu speichern, bleiben sie oft unvollständig, d. h. es fehlen Fakten. Die manuelle Überprüfung und Erweiterung der Graphen wird aufgrund der großen Datenmengen immer schwieriger und muss daher durch automatisierte Verfahren unterstützt oder ersetzt werden, die fehlende Fakten vorhersagen. Das Gebiet der Wissensgraphenvervollständigung lässt sich grob in zwei Kategorien einteilen: Link Prediction und Entity Alignment. Bei der Link Prediction werden maschinelle Lernmodelle trainiert, um unbekannte Fakten zwischen Entitäten auf der Grundlage der bekannten Fakten vorherzusagen. Entity Alignment zielt darauf ab, gemeinsame Entitäten zwischen Graphen zu identifizieren, um mehrere solcher Wissensgraphen auf der Grundlage einiger vorgegebener Paare zu verknüpfen. In dieser Arbeit stellen wir wichtige Fortschritte auf dem Gebiet der Vervollständigung von Wissensgraphen vor. Für das Entity Alignment zeigen wir, wie die Anzahl der benötigten Paare reduziert werden kann, während die Leistung durch neuartige aktive Lerntechniken erhalten bleibt. Wir erörtern auch die Leistungsfähigkeit von Textmerkmalen und zeigen, dass auf Graph-Neural-Networks basierende Methoden Schwierigkeiten mit verrauschten Paar-Daten haben. Für die Link Prediction demonstrieren wir, wie die Vorhersage für unbekannte Entitäten zur Trainingszeit verbessert werden kann, indem zusätzliche Metadaten zu einzelnen Aussagen genutzt werden, die oft in modernen Graphen verfügbar sind. Gestützt auf Ergebnisse einer groß angelegten experimentellen Studie präsentieren wir eine Analyse der Auswirkungen einzelner Komponenten von Modellen des maschinellen Lernens, z. B. der Interaktionsfunktion oder des Verlustkriteriums, auf die Aufgabe der Link Prediction. Außerdem stellen wir eine Softwarebibliothek vor, die die Implementierung und Untersuchung solcher Komponenten vereinfacht und sie einer breiten Forschungsgemeinschaft zugänglich macht, die von Forschern im Bereich des relationalen Lernens bis hin zu angewandten Bereichen wie den Biowissenschaften reicht. Schließlich schlagen wir eine neuartige Metrik für die Bewertung von Ranking-Ergebnissen vor, wie sie für beide Aufgaben verwendet wird. Sie ermöglicht eine einfachere Interpretation und einen leichteren Vergleich, insbesondere in Fällen mit einer unterschiedlichen Anzahl von Kandidaten, wie sie in den de-facto Standardbewertungsprotokollen für beide Aufgaben vorkommen
    corecore