121 research outputs found
KGrEaT: A Framework to Evaluate Knowledge Graphs via Downstream Tasks
In recent years, countless research papers have addressed the topics of
knowledge graph creation, extension, or completion in order to create knowledge
graphs that are larger, more correct, or more diverse. This research is
typically motivated by the argumentation that using such enhanced knowledge
graphs to solve downstream tasks will improve performance. Nonetheless, this is
hardly ever evaluated. Instead, the predominant evaluation metrics - aiming at
correctness and completeness - are undoubtedly valuable but fail to capture the
complete picture, i.e., how useful the created or enhanced knowledge graph
actually is. Further, the accessibility of such a knowledge graph is rarely
considered (e.g., whether it contains expressive labels, descriptions, and
sufficient context information to link textual mentions to the entities of the
knowledge graph). To better judge how well knowledge graphs perform on actual
tasks, we present KGrEaT - a framework to estimate the quality of knowledge
graphs via actual downstream tasks like classification, clustering, or
recommendation. Instead of comparing different methods of processing knowledge
graphs with respect to a single task, the purpose of KGrEaT is to compare
various knowledge graphs as such by evaluating them on a fixed task setup. The
framework takes a knowledge graph as input, automatically maps it to the
datasets to be evaluated on, and computes performance metrics for the defined
tasks. It is built in a modular way to be easily extendable with additional
tasks and datasets.Comment: Accepted for the Short Paper track of CIKM'23, October 21-25, 2023,
Birmingham, United Kingdo
Data Integration for Open Data on the Web
In this lecture we will discuss and introduce challenges of
integrating openly available Web data and how to solve them. Firstly,
while we will address this topic from the viewpoint of Semantic Web
research, not all data is readily available as RDF or Linked Data, so
we will give an introduction to different data formats prevalent on the
Web, namely, standard formats for publishing and exchanging tabular,
tree-shaped, and graph data. Secondly, not all Open Data is really completely
open, so we will discuss and address issues around licences, terms
of usage associated with Open Data, as well as documentation of data
provenance. Thirdly, we will discuss issues connected with (meta-)data
quality issues associated with Open Data on the Web and how Semantic
Web techniques and vocabularies can be used to describe and remedy
them. Fourth, we will address issues about searchability and integration
of Open Data and discuss in how far semantic search can help to overcome
these. We close with briefly summarizing further issues not covered
explicitly herein, such as multi-linguality, temporal aspects (archiving,
evolution, temporal querying), as well as how/whether OWL and RDFS
reasoning on top of integrated open data could be help
Génération automatique d'alignements complexes d'ontologies
Le web de données liées (LOD) est composé de nombreux entrepôts de données. Ces données sont décrites par différents vocabulaires (ou ontologies). Chaque ontologie a une terminologie et une modélisation propre ce qui les rend hétérogènes. Pour lier et rendre les données du web de données liées interopérables, les alignements d'ontologies établissent des correspondances entre les entités desdites ontologies. Il existe de nombreux systèmes d'alignement qui génèrent des correspondances simples, i.e., ils lient une entité à une autre entité. Toutefois, pour surmonter l'hétérogénéité des ontologies, des correspondances plus expressives sont parfois nécessaires. Trouver ce genre de correspondances est un travail fastidieux qu'il convient d'automatiser. Dans le cadre de cette thèse, une approche d'alignement complexe basée sur des besoins utilisateurs et des instances communes est proposée. Le domaine des alignements complexes est relativement récent et peu de travaux adressent la problématique de leur évaluation. Pour pallier ce manque, un système d'évaluation automatique basé sur de la comparaison d'instances est proposé. Ce système est complété par un jeu de données artificiel sur le domaine des conférences.The Linked Open Data (LOD) cloud is composed of data repositories. The data in the repositories are described by vocabularies also called ontologies. Each ontology has its own terminology and model. This leads to heterogeneity between them. To make the ontologies and the data they describe interoperable, ontology alignments establish correspondences, or links between their entities. There are many ontology matching systems which generate simple alignments, i.e., they link an entity to another. However, to overcome the ontology heterogeneity, more expressive correspondences are sometimes needed. Finding this kind of correspondence is a fastidious task that can be automated. In this thesis, an automatic complex matching approach based on a user's knowledge needs and common instances is proposed. The complex alignment field is still growing and little work address the evaluation of such alignments. To palliate this lack, we propose an automatic complex alignment evaluation system. This system is based on instances. A famous alignment evaluation dataset has been extended for this evaluation
Exploiting general-purpose background knowledge for automated schema matching
The schema matching task is an integral part of the data integration process. It is usually the first step in integrating data. Schema matching is typically very complex and time-consuming. It is, therefore, to the largest part, carried out by humans. One reason for the low amount of automation is the fact that schemas are often defined with deep background knowledge that is not itself present within the schemas. Overcoming the problem of missing background knowledge is a core challenge in automating the data integration process.
In this dissertation, the task of matching semantic models, so-called ontologies, with the help of external background knowledge is investigated in-depth in Part I. Throughout this thesis, the focus lies on large, general-purpose resources since domain-specific resources are rarely available for most domains. Besides new knowledge resources, this thesis also explores new strategies to exploit such resources.
A technical base for the development and comparison of matching systems is presented in Part II. The framework introduced here allows for simple and modularized matcher development (with background knowledge sources) and for extensive evaluations of matching systems.
One of the largest structured sources for general-purpose background knowledge are knowledge graphs which have grown significantly in size in recent years. However, exploiting such graphs is not trivial. In Part III, knowledge graph em- beddings are explored, analyzed, and compared. Multiple improvements to existing approaches are presented.
In Part IV, numerous concrete matching systems which exploit general-purpose background knowledge are presented. Furthermore, exploitation strategies and resources are analyzed and compared. This dissertation closes with a perspective on real-world applications
Biomedical ontology alignment: An approach based on representation learning
While representation learning techniques have shown great promise in application to a number of different NLP tasks, they have had little impact on the problem of ontology matching. Unlike past work that has focused on feature engineering, we present a novel representation learning approach that is tailored to the ontology matching task. Our approach is based on embedding ontological terms in a high-dimensional Euclidean space. This embedding is derived on the basis of a novel phrase retrofitting strategy through which semantic similarity information becomes inscribed onto fields of pre-trained word vectors. The resulting framework also incorporates a novel outlier detection mechanism based on a denoising autoencoder that is shown to improve performance. An ontology matching system derived using the proposed framework achieved an F-score of 94% on an alignment scenario involving the Adult Mouse Anatomical Dictionary and the Foundational Model of Anatomy ontology (FMA) as targets. This compares favorably with the best performing systems on the Ontology Alignment Evaluation Initiative anatomy challenge. We performed additional experiments on aligning FMA to NCI Thesaurus and to SNOMED CT based on a reference alignment extracted from the UMLS Metathesaurus. Our system obtained overall F-scores of 93.2% and 89.2% for these experiments, thus achieving state-of-the-art results
Universal Preprocessing Operators for Embedding Knowledge Graphs with Literals
Knowledge graph embeddings are dense numerical representations of entities in
a knowledge graph (KG). While the majority of approaches concentrate only on
relational information, i.e., relations between entities, fewer approaches
exist which also take information about literal values (e.g., textual
descriptions or numerical information) into account. Those which exist are
typically tailored towards a particular modality of literal and a particular
embedding method. In this paper, we propose a set of universal preprocessing
operators which can be used to transform KGs with literals for numerical,
temporal, textual, and image information, so that the transformed KGs can be
embedded with any method. The results on the kgbench dataset with three
different embedding methods show promising results.Comment: Accepted for DL4KG Workshop at ISWC 202
Completing and Debugging Ontologies: state of the art and challenges
As semantically-enabled applications require high-quality ontologies,
developing and maintaining ontologies that are as correct and complete as
possible is an important although difficult task in ontology engineering. A key
step is ontology debugging and completion. In general, there are two steps:
detecting defects and repairing defects. In this paper we discuss the state of
the art regarding the repairing step. We do this by formalizing the repairing
step as an abduction problem and situating the state of the art with respect to
this framework. We show that there are still many open research problems and
show opportunities for further work and advancing the field.Comment: 56 page
- …