1,509 research outputs found
On the Ambiguity of Rank-Based Evaluation of Entity Alignment or Link Prediction Methods
In this work, we take a closer look at the evaluation of two families of
methods for enriching information from knowledge graphs: Link Prediction and
Entity Alignment. In the current experimental setting, multiple different
scores are employed to assess different aspects of model performance. We
analyze the informativeness of these evaluation measures and identify several
shortcomings. In particular, we demonstrate that all existing scores can hardly
be used to compare results across different datasets. Moreover, we demonstrate
that varying size of the test size automatically has impact on the performance
of the same model based on commonly used metrics for the Entity Alignment task.
We show that this leads to various problems in the interpretation of results,
which may support misleading conclusions. Therefore, we propose adjustments to
the evaluation and demonstrate empirically how this supports a fair,
comparable, and interpretable assessment of model performance. Our code is
available at https://github.com/mberr/rank-based-evaluation
Machine learning for managing structured and semi-structured data
As the digitalization of private, commercial, and public sectors advances rapidly, an increasing amount of data is becoming available. In order to gain insights or knowledge from these enormous amounts of raw data, a deep analysis is essential. The immense volume requires highly automated processes with minimal manual interaction. In recent years, machine learning methods have taken on a central role in this task. In addition to the individual data points, their interrelationships often play a decisive role, e.g. whether two patients are related to each other or whether they are treated by the same physician. Hence, relational learning is an important branch of research, which studies how to harness this explicitly available structural information between different data points. Recently, graph neural networks have gained importance. These can be considered an extension of convolutional neural networks from regular grids to general (irregular) graphs.
Knowledge graphs play an essential role in representing facts about entities in a machine-readable way. While great efforts are made to store as many facts as possible in these graphs, they often remain incomplete, i.e., true facts are missing. Manual verification and expansion of the graphs is becoming increasingly difficult due to the large volume of data and must therefore be assisted or substituted by automated procedures which predict missing facts. The field of knowledge graph completion can be roughly divided into two categories: Link Prediction and Entity Alignment. In Link Prediction, machine learning models are trained to predict unknown facts between entities based on the known facts. Entity Alignment aims at identifying shared entities between graphs in order to link several such knowledge graphs based on some provided seed alignment pairs.
In this thesis, we present important advances in the field of knowledge graph completion. For Entity Alignment, we show how to reduce the number of required seed alignments while maintaining performance by novel active learning techniques. We also discuss the power of textual features and show that graph-neural-network-based methods have difficulties with noisy alignment data. For Link Prediction, we demonstrate how to improve the prediction for unknown entities at training time by exploiting additional metadata on individual statements, often available in modern graphs. Supported with results from a large-scale experimental study, we present an analysis of the effect of individual components of machine learning models, e.g., the interaction function or loss criterion, on the task of link prediction. We also introduce a software library that simplifies the implementation and study of such components and makes them accessible to a wide research community, ranging from relational learning researchers to applied fields, such as life sciences. Finally, we propose a novel metric for evaluating ranking results, as used for both completion tasks. It allows for easier interpretation and comparison, especially in cases with different numbers of ranking candidates, as encountered in the de-facto standard evaluation protocols for both tasks.Mit der rasant fortschreitenden Digitalisierung des privaten, kommerziellen und öffentlichen Sektors werden immer gröĂere Datenmengen verfĂŒgbar. Um aus diesen enormen Mengen an Rohdaten Erkenntnisse oder Wissen zu gewinnen, ist eine tiefgehende Analyse unerlĂ€sslich. Das immense Volumen erfordert hochautomatisierte Prozesse mit minimaler manueller Interaktion. In den letzten Jahren haben Methoden des maschinellen Lernens eine zentrale Rolle bei dieser Aufgabe eingenommen. Neben den einzelnen Datenpunkten spielen oft auch deren ZusammenhĂ€nge eine entscheidende Rolle, z.B. ob zwei Patienten miteinander verwandt sind oder ob sie vom selben Arzt behandelt werden. Daher ist das relationale Lernen ein wichtiger Forschungszweig, der untersucht, wie diese explizit verfĂŒgbaren strukturellen Informationen zwischen verschiedenen Datenpunkten nutzbar gemacht werden können. In letzter Zeit haben Graph Neural Networks an Bedeutung gewonnen. Diese können als eine Erweiterung von CNNs von regelmĂ€Ăigen Gittern auf allgemeine (unregelmĂ€Ăige) Graphen betrachtet werden.
Wissensgraphen spielen eine wesentliche Rolle bei der Darstellung von Fakten ĂŒber EntitĂ€ten in maschinenlesbaren Form. Obwohl groĂe Anstrengungen unternommen werden, so viele Fakten wie möglich in diesen Graphen zu speichern, bleiben sie oft unvollstĂ€ndig, d. h. es fehlen Fakten. Die manuelle ĂberprĂŒfung und Erweiterung der Graphen wird aufgrund der groĂen Datenmengen immer schwieriger und muss daher durch automatisierte Verfahren unterstĂŒtzt oder ersetzt werden, die fehlende Fakten vorhersagen. Das Gebiet der WissensgraphenvervollstĂ€ndigung lĂ€sst sich grob in zwei Kategorien einteilen: Link Prediction und Entity Alignment. Bei der Link Prediction werden maschinelle Lernmodelle trainiert, um unbekannte Fakten zwischen EntitĂ€ten auf der Grundlage der bekannten Fakten vorherzusagen. Entity Alignment zielt darauf ab, gemeinsame EntitĂ€ten zwischen Graphen zu identifizieren, um mehrere solcher Wissensgraphen auf der Grundlage einiger vorgegebener Paare zu verknĂŒpfen.
In dieser Arbeit stellen wir wichtige Fortschritte auf dem Gebiet der VervollstĂ€ndigung von Wissensgraphen vor. FĂŒr das Entity Alignment zeigen wir, wie die Anzahl der benötigten Paare reduziert werden kann, wĂ€hrend die Leistung durch neuartige aktive Lerntechniken erhalten bleibt. Wir erörtern auch die LeistungsfĂ€higkeit von Textmerkmalen und zeigen, dass auf Graph-Neural-Networks basierende Methoden Schwierigkeiten mit verrauschten Paar-Daten haben. FĂŒr die Link Prediction demonstrieren wir, wie die Vorhersage fĂŒr unbekannte EntitĂ€ten zur Trainingszeit verbessert werden kann, indem zusĂ€tzliche Metadaten zu einzelnen Aussagen genutzt werden, die oft in modernen Graphen verfĂŒgbar sind. GestĂŒtzt auf Ergebnisse einer groĂ angelegten experimentellen Studie prĂ€sentieren wir eine Analyse der Auswirkungen einzelner Komponenten von Modellen des maschinellen Lernens, z. B. der Interaktionsfunktion oder des Verlustkriteriums, auf die Aufgabe der Link Prediction. AuĂerdem stellen wir eine Softwarebibliothek vor, die die Implementierung und Untersuchung solcher Komponenten vereinfacht und sie einer breiten Forschungsgemeinschaft zugĂ€nglich macht, die von Forschern im Bereich des relationalen Lernens bis hin zu angewandten Bereichen wie den Biowissenschaften reicht. SchlieĂlich schlagen wir eine neuartige Metrik fĂŒr die Bewertung von Ranking-Ergebnissen vor, wie sie fĂŒr beide Aufgaben verwendet wird. Sie ermöglicht eine einfachere Interpretation und einen leichteren Vergleich, insbesondere in FĂ€llen mit einer unterschiedlichen Anzahl von Kandidaten, wie sie in den de-facto Standardbewertungsprotokollen fĂŒr beide Aufgaben vorkommen
PyKEEN 1.0: A Python Library for Training and Evaluating Knowledge Graph Embeddings
Recently, knowledge graph embeddings (KGEs) received significant attention,
and several software libraries have been developed for training and evaluating
KGEs. While each of them addresses specific needs, we re-designed and
re-implemented PyKEEN, one of the first KGE libraries, in a community effort.
PyKEEN 1.0 enables users to compose knowledge graph embedding models (KGEMs)
based on a wide range of interaction models, training approaches, loss
functions, and permits the explicit modeling of inverse relations. Besides, an
automatic memory optimization has been realized in order to exploit the
provided hardware optimally, and through the integration of Optuna extensive
hyper-parameter optimization (HPO) functionalities are provided
Representation learning on relational data
Humans utilize information about relationships or interactions between objects for orientation in various situations. For example, we trust our friend circle recommendations, become friends with the people we already have shared friends with, or adapt opinions as a result of interactions with other people.
In many Machine Learning applications, we also know about relationships, which bear essential information for the use-case.
Recommendations in social media, scene understanding in computer vision, or traffic prediction are few examples where relationships play a crucial role in the application.
In this thesis, we introduce methods taking relationships into account and demonstrate their benefits for various problems.
A large number of problems, where relationship information plays a central role, can be approached by modeling data by a graph structure and by task formulation as a prediction problem on the graph.
In the first part of the thesis, we tackle the problem of node classification from various directions. We start with unsupervised learning approaches, which differ by assumptions they make about the relationship's meaning in the graph.
For some applications such as social networks, it is a feasible assumption that densely connected nodes are similar. On the other hand, if we want to predict passenger traffic for the airport based on its flight connections, similar nodes are not necessarily positioned close to each other in the graph and more likely have comparable neighborhood patterns.
Furthermore, we introduce novel methods for classification and regression in a semi-supervised setting, where labels of interest are known for a fraction of nodes. We use the known prediction targets and information about how nodes connect to learn the relationships' meaning and their effect on the final prediction.
In the second part of the thesis, we deal with the problem of graph matching. Our first use-case is the alignment of different geographical maps, where the focus lies on the real-life setting. We introduce a robust method that can learn to ignore the noise in the data.
Next, our focus moves to the field of Entity Alignment in different Knowledge Graphs.
We analyze the process of manual data annotation and propose a setting and algorithms to accelerate this labor-intensive process.
Furthermore, we point to the several shortcomings in the empirical evaluations, make several suggestions on how to improve it, and extensively analyze existing approaches for the task.
The next part of the thesis is dedicated to the research direction dealing with automatic extraction and search of arguments, known as Argument Mining. We propose a novel approach for identifying arguments and demonstrate how it can make use of relational information. We apply our method to identify arguments in peer-reviews for scientific publications and show that arguments are essential for the decision process. Furthermore, we address the problem of argument search and introduce a novel approach that retrieves relevant and original arguments for the user's queries.
Finally, we propose an approach for subspace clustering, which can deal with large datasets and assign new objects to the found clusters. Our method learns the relationships between objects and performs the clustering on the resulting graph
Bringing Light Into the Dark: A Large-scale Evaluation of Knowledge Graph Embedding Models Under a Unified Framework
The heterogeneity in recently published knowledge graph embedding models'
implementations, training, and evaluation has made fair and thorough
comparisons difficult. In order to assess the reproducibility of previously
published results, we re-implemented and evaluated 21 interaction models in the
PyKEEN software package. Here, we outline which results could be reproduced
with their reported hyper-parameters, which could only be reproduced with
alternate hyper-parameters, and which could not be reproduced at all as well as
provide insight as to why this might be the case.
We then performed a large-scale benchmarking on four datasets with several
thousands of experiments and 24,804 GPU hours of computation time. We present
insights gained as to best practices, best configurations for each model, and
where improvements could be made over previously published best configurations.
Our results highlight that the combination of model architecture, training
approach, loss function, and the explicit modeling of inverse relations is
crucial for a model's performances, and not only determined by the model
architecture. We provide evidence that several architectures can obtain results
competitive to the state-of-the-art when configured carefully. We have made all
code, experimental configurations, results, and analyses that lead to our
interpretations available at https://github.com/pykeen/pykeen and
https://github.com/pykeen/benchmarkin
Query Embedding on Hyper-relational Knowledge Graphs
Multi-hop logical reasoning is an established problem in the field of
representation learning on knowledge graphs (KGs). It subsumes both one-hop
link prediction as well as other more complex types of logical queries.
Existing algorithms operate only on classical, triple-based graphs, whereas
modern KGs often employ a hyper-relational modeling paradigm. In this paradigm,
typed edges may have several key-value pairs known as qualifiers that provide
fine-grained context for facts. In queries, this context modifies the meaning
of relations, and usually reduces the answer set. Hyper-relational queries are
often observed in real-world KG applications, and existing approaches for
approximate query answering cannot make use of qualifier pairs. In this work,
we bridge this gap and extend the multi-hop reasoning problem to
hyper-relational KGs allowing to tackle this new type of complex queries.
Building upon recent advancements in Graph Neural Networks and query embedding
techniques, we study how to embed and answer hyper-relational conjunctive
queries. Besides that, we propose a method to answer such queries and
demonstrate in our experiments that qualifiers improve query answering on a
diverse set of query patterns
Recommended from our members
Alignment of special-interest subjects in the accounting standard-setting process : an investigation.
Enhancing explainability and scrutability of recommender systems
Our increasing reliance on complex algorithms for recommendations calls for models and methods for explainable, scrutable, and trustworthy AI. While explainability is required for understanding the relationships between model inputs and outputs, a scrutable system allows us to modify its behavior as desired. These properties help bridge the gap between our expectations and the algorithmâs behavior and accordingly boost our trust in AI. Aiming to cope with information overload, recommender systems play a crucial role in ïŹltering content (such as products, news, songs, and movies) and shaping a personalized experience for their users. Consequently, there has been a growing demand from the information consumers to receive proper explanations for their personalized recommendations. These explanations aim at helping users understand why certain items are recommended to them and how their previous inputs to the system relate to the generation of such recommendations. Besides, in the event of receiving undesirable content, explanations could possibly contain valuable information as to how the systemâs behavior can be modiïŹed accordingly. In this thesis, we present our contributions towards explainability and scrutability of recommender systems: âą We introduce a user-centric framework, FAIRY, for discovering and ranking post-hoc explanations for the social feeds generated by black-box platforms. These explanations reveal relationships between usersâ proïŹles and their feed items and are extracted from the local interaction graphs of users. FAIRY employs a learning-to-rank (LTR) method to score candidate explanations based on their relevance and surprisal. âą We propose a method, PRINCE, to facilitate provider-side explainability in graph-based recommender systems that use personalized PageRank at their core. PRINCE explanations are comprehensible for users, because they present subsets of the userâs prior actions responsible for the received recommendations. PRINCE operates in a counterfactual setup and builds on a polynomial-time algorithm for ïŹnding the smallest counterfactual explanations. âą We propose a human-in-the-loop framework, ELIXIR, for enhancing scrutability and subsequently the recommendation models by leveraging user feedback on explanations. ELIXIR enables recommender systems to collect user feedback on pairs of recommendations and explanations. The feedback is incorporated into the model by imposing a soft constraint for learning user-speciïŹc item representations. We evaluate all proposed models and methods with real user studies and demonstrate their beneïŹts at achieving explainability and scrutability in recommender systems.Unsere zunehmende AbhĂ€ngigkeit von komplexen Algorithmen fĂŒr maschinelle Empfehlungen erfordert Modelle und Methoden fĂŒr erklĂ€rbare, nachvollziehbare und vertrauenswĂŒrdige KI. Zum Verstehen der Beziehungen zwischen Modellein- und ausgaben muss KI erklĂ€rbar sein. Möchten wir das Verhalten des Systems hingegen nach unseren Vorstellungen Ă€ndern, muss dessen Entscheidungsprozess nachvollziehbar sein. ErklĂ€rbarkeit und Nachvollziehbarkeit von KI helfen uns dabei, die LĂŒcke zwischen dem von uns erwarteten und dem tatsĂ€chlichen Verhalten der Algorithmen zu schlieĂen und unser Vertrauen in KI-Systeme entsprechend zu stĂ€rken. Um ein ĂbermaĂ an Informationen zu verhindern, spielen Empfehlungsdienste eine entscheidende Rolle um Inhalte (z.B. Produkten, Nachrichten, Musik und Filmen) zu ïŹltern und deren Benutzern eine personalisierte Erfahrung zu bieten. Infolgedessen erheben immer mehr In- formationskonsumenten Anspruch auf angemessene ErklĂ€rungen fĂŒr deren personalisierte Empfehlungen. Diese ErklĂ€rungen sollen den Benutzern helfen zu verstehen, warum ihnen bestimmte Dinge empfohlen wurden und wie sich ihre frĂŒheren Eingaben in das System auf die Generierung solcher Empfehlungen auswirken. AuĂerdem können ErklĂ€rungen fĂŒr den Fall, dass unerwĂŒnschte Inhalte empfohlen werden, wertvolle Informationen darĂŒber enthalten, wie das Verhalten des Systems entsprechend geĂ€ndert werden kann. In dieser Dissertation stellen wir unsere BeitrĂ€ge zu ErklĂ€rbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten vor. âą Mit FAIRY stellen wir ein benutzerzentriertes Framework vor, mit dem post-hoc ErklĂ€rungen fĂŒr die von Black-Box-Plattformen generierten sozialen Feeds entdeckt und bewertet werden können. Diese ErklĂ€rungen zeigen Beziehungen zwischen BenutzerproïŹlen und deren Feeds auf und werden aus den lokalen Interaktionsgraphen der Benutzer extrahiert. FAIRY verwendet eine LTR-Methode (Learning-to-Rank), um die ErklĂ€rungen anhand ihrer Relevanz und ihres Grads unerwarteter Empfehlungen zu bewerten. âą Mit der PRINCE-Methode erleichtern wir das anbieterseitige Generieren von ErklĂ€rungen fĂŒr PageRank-basierte Empfehlungsdienste. PRINCE-ErklĂ€rungen sind fĂŒr Benutzer verstĂ€ndlich, da sie Teilmengen frĂŒherer Nutzerinteraktionen darstellen, die fĂŒr die erhaltenen Empfehlungen verantwortlich sind. PRINCE-ErklĂ€rungen sind somit kausaler Natur und werden von einem Algorithmus mit polynomieller Laufzeit erzeugt , um prĂ€zise ErklĂ€rungen zu ïŹnden. âą Wir prĂ€sentieren ein Human-in-the-Loop-Framework, ELIXIR, um die Nachvollziehbarkeit der Empfehlungsmodelle und die QualitĂ€t der Empfehlungen zu verbessern. Mit ELIXIR können Empfehlungsdienste Benutzerfeedback zu Empfehlungen und ErklĂ€rungen sammeln. Das Feedback wird in das Modell einbezogen, indem benutzerspeziïŹscher Einbettungen von Objekten gelernt werden. Wir evaluieren alle Modelle und Methoden in Benutzerstudien und demonstrieren ihren Nutzen hinsichtlich ErklĂ€rbarkeit und Nachvollziehbarkeit von Empfehlungsdiensten
- âŠ