11 research outputs found
Bias Assessments of Benchmarks for Link Predictions over Knowledge Graphs
Link prediction (LP) aims to tackle the challenge of predicting new
facts by reasoning over a knowledge graph (KG). Different machine
learning architectures have been proposed to solve the task of LP,
several of them competing for better performance on a few de-facto
benchmarks. The problem of this thesis is the characterization of LP
datasets regarding their structural bias properties and their effects on
attained performance results. We provide a domain-agnostic framework
that assesses the network topology, test leakage bias and sample
selection bias in LP datasets. The framework includes SPARQL queries
that can be reused in the explorative data analysis of KGs for
uncovering unusual patterns. We finally apply our framework to
characterize 7 common benchmarks used for assessing the task of LP. In
conducted experiments, we use a trained TransE model to show how the two
bias types affect prediction results.
Our analysis shows problematic patterns in most of the benchmark
datasets. Especially critical are the findings regarding the
state-of-the-art benchmarks FB15k-237, WN18RR and YAGO3-10
An Ontology-Based Assistant For Analyzing Agents\u27 Activities
This thesis reports on work in progress on software that helps an analyst identify and analyze activities of actors (such as vehicles) in an intelligence-relevant scenario. A system is being developed to aid intelligence analysts, IAGOA ((Intelligence Analystâs Geospatial and Ontological Assistant). Analysis may be accomplished by retrieving simulated satellite data of ground vehicles and interacting with software modules that allow the analyst to conjecture the activities in which the actor is engaged along with the (largely geospatial and temporal) features of the area of operation relevant to the natures of those activities. Activities are conceptualized by ontologies. The research relies on natural language components (semantic frames) gathered from the FrameNet lexical database, which captures the semantics of lexical items with an ontology using OWL. The software has two components, one for the analyst, and one for a modeler who produces HTML and parameterized KML documents used by the analyst. The most significant input to the modeler software is the FrameNet OWL file, and the interface for the analyst and, to some extent, the modeler is provided by the Google Earth API
Explainable methods for knowledge graph refinement and exploration via symbolic reasoning
Knowledge Graphs (KGs) have applications in many domains such as Finance, Manufacturing, and Healthcare. While recent efforts have created large KGs, their content is far from complete and sometimes includes invalid statements. Therefore, it is crucial to refine the constructed KGs to enhance their coverage and accuracy via KG completion and KG validation. It is also vital to provide human-comprehensible explanations for such refinements, so that humans have trust in the KG quality. Enabling KG exploration, by search and browsing, is also essential for users to understand the KG value and limitations towards down-stream applications. However, the large size of KGs makes KG exploration very challenging. While the type taxonomy of KGs is a useful asset along these lines, it remains insufficient for deep exploration. In this dissertation we tackle the aforementioned challenges of KG refinement and KG exploration by combining logical reasoning over the KG with other techniques such as KG embedding models and text mining. Through such combination, we introduce methods that provide human-understandable output. Concretely, we introduce methods to tackle KG incompleteness by learning exception-aware rules over the existing KG. Learned rules are then used in inferring missing links in the KG accurately. Furthermore, we propose a framework for constructing human-comprehensible explanations for candidate facts from both KG and text. Extracted explanations are used to insure the validity of KG facts. Finally, to facilitate KG exploration, we introduce a method that combines KG embeddings with rule mining to compute informative entity clusters with explanations.Wissensgraphen haben viele Anwendungen in verschiedenen Bereichen, beispielsweise im Finanz- und Gesundheitswesen. Wissensgraphen sind jedoch unvollständig und enthalten auch ungĂźltige Daten. Hohe Abdeckung und Korrektheit erfordern neue Methoden zur Wissensgraph-Erweiterung und Wissensgraph-Validierung. Beide Aufgaben zusammen werden als Wissensgraph-Verfeinerung bezeichnet. Ein wichtiger Aspekt dabei ist die Erklärbarkeit und Verständlichkeit von Wissensgraphinhalten fĂźr Nutzer. In Anwendungen ist darĂźber hinaus die nutzerseitige Exploration von Wissensgraphen von besonderer Bedeutung. Suchen und Navigieren im Graph hilft dem Anwender, die Wissensinhalte und ihre Limitationen besser zu verstehen. Aufgrund der riesigen Menge an vorhandenen Entitäten und Fakten ist die Wissensgraphen-Exploration eine Herausforderung. Taxonomische Typsystem helfen dabei, sind jedoch fĂźr tiefergehende Exploration nicht ausreichend. Diese Dissertation adressiert die Herausforderungen der Wissensgraph-Verfeinerung und der Wissensgraph-Exploration durch algorithmische Inferenz Ăźber dem Wissensgraph. Sie erweitert logisches Schlussfolgern und kombiniert es mit anderen Methoden, insbesondere mit neuronalen Wissensgraph-Einbettungen und mit Text-Mining. Diese neuen Methoden liefern Ausgaben mit Erklärungen fĂźr Nutzer. Die Dissertation umfasst folgende Beiträge: Insbesondere leistet die Dissertation folgende Beiträge: ⢠Zur Wissensgraph-Erweiterung präsentieren wir ExRuL, eine Methode zur Revision von Horn-Regeln durch HinzufĂźgen von Ausnahmebedingungen zum Rumpf der Regeln. Die erweiterten Regeln kĂśnnen neue Fakten inferieren und somit LĂźcken im Wissensgraphen schlieĂen. Experimente mit groĂen Wissensgraphen zeigen, dass diese Methode Fehler in abgeleiteten Fakten erheblich reduziert und nutzerfreundliche Erklärungen liefert. ⢠Mit RuLES stellen wir eine Methode zum Lernen von Regeln vor, die auf probabilistischen Repräsentationen fĂźr fehlende Fakten basiert. Das Verfahren erweitert iterativ die aus einem Wissensgraphen induzierten Regeln, indem es neuronale Wissensgraph-Einbettungen mit Informationen aus Textkorpora kombiniert. Bei der Regelgenerierung werden neue Metriken fĂźr die Regelqualität verwendet. Experimente zeigen, dass RuLES die Qualität der gelernten Regeln und ihrer Vorhersagen erheblich verbessert. ⢠Zur UnterstĂźtzung der Wissensgraph-Validierung wird ExFaKT vorgestellt, ein Framework zur Konstruktion von Erklärungen fĂźr Faktkandidaten. Die Methode transformiert Kandidaten mit Hilfe von Regeln in eine Menge von Aussagen, die leichter zu finden und zu validieren oder widerlegen sind. Die Ausgabe von ExFaKT ist eine Menge semantischer Evidenzen fĂźr Faktkandidaten, die aus Textkorpora und dem Wissensgraph extrahiert werden. Experimente zeigen, dass die Transformationen die Ausbeute und Qualität der entdeckten Erklärungen deutlich verbessert. Die generierten unterstĂźtzen Erklärungen unterstĂźtze sowohl die manuelle Wissensgraph- Validierung durch Kuratoren als auch die automatische Validierung. ⢠Zur UnterstĂźtzung der Wissensgraph-Exploration wird ExCut vorgestellt, eine Methode zur Erzeugung von informativen Entitäts-Clustern mit Erklärungen unter Verwendung von Wissensgraph-Einbettungen und automatisch induzierten Regeln. Eine Cluster-Erklärung besteht aus einer Kombination von Relationen zwischen den Entitäten, die den Cluster identifizieren. ExCut verbessert gleichzeitig die Cluster- Qualität und die Cluster-Erklärbarkeit durch iteratives Verschränken des Lernens von Einbettungen und Regeln. Experimente zeigen, dass ExCut Cluster von hoher Qualität berechnet und dass die Cluster-Erklärungen fĂźr Nutzer informativ sind
Tackling the veracity and variety of big data
This thesis tackles the veracity and variety challenges of big data, especially focusing
on graphs and relational data. We start with proposing a class of graph association
rules (GARs) to specify regularities between entities in graphs, which capture both
missing links and inconsistencies. A GAR is a combination of a graph pattern and a
dependency; it may take as predicates machine learning classifiers for link prediction.
We formalize association deduction with GARs in terms of the chase, and prove its
Church-Rosser property. We show that the satisfiability, implication and association
deduction problems for GARs are coNP-complete, NP-complete and NP-complete, respectively.
The incremental deduction problem is DP-complete for GARs. In addition,
we provide parallel algorithms for association deduction and incremental deduction.
We next develop a parallel algorithm to discover GARs, which applies an applicationdriven
strategy to cut back rules and data that are irrelevant to usersâ interest, by training
a machine learning model to identify data pertaining to a given application. Moreover,
we introduce a sampling method to reduce a big graph G to a set H of small
sample graphs. Given expected support and recall bounds, this method is able to deduce
samples in H and mine rules from H to satisfy the bounds in the entire G.
Then we propose a class of temporal association rules (TACOs) for event prediction
in temporal graphs. TACOs are defined on temporal graphs in terms of change patterns
and (temporal) conditions, and may carry machine learning predicates for temporal
event prediction. We settle the complexity of reasoning about TACOs, including their
satisfiability, implication and prediction problems. We develop a system that discovers
TACOs by iteratively training a rule creator based on generative models in a creatorcritic
framework, and predicts events by applying the discovered TACOs in parallel.
Finally, we propose an approach to querying relations D and graphs G taken together
in SQL. The key idea is that if a tuple t in D and a vertex v in G are determined
to refer to the same real-world entity, then we join t and v, correlate their information
and complement tuple t with additional attributes of v from graphs. We show how to
do this in SQL extended with only syntactic sugar, for both static joins when t is a tuple
in D and dynamic joins when t comes from intermediate results of sub-queries on D.
To support the semantic joins, we propose an attribute extraction scheme based on Kmeans
clustering, to identify and fetch graph properties that are linked to v via paths.
Moreover, we develop a scheme to extract a relation schema for entities in graphs, and
a heuristic join method based on the schema to strike a balance between the complexity
and accuracy of dynamic joins
In silico discoveries for biomedical sciences
Text-mining is a challenging field of research initially meant for reading large text collections with a computer. Text-mining is useful in summarizing text, searching for the informative documents, and most important to do knowledge discovery. Knowledge discovery is the main subject of this thesis. The hypothesis that knowledge discovery is possible started with the work done by Swanson. He made, as a first finding, links between Raynaud__s disease and fish oil using intermediate medical terms to relate them to each other. This principle was formalized in the AB- C concept. A and C are not directly related to each other but via an intermediate concept B that needs to be discovered. Tex data can be extended by adding other non textual data such as microarray experiments. Then we are in the field of data-mining. The final goal is to do all kinds of discoveries with computer (in silico) using data sources in order to assist biology research to save time and discover more.NBICUBL - phd migration 201