Search CORE

3 research outputs found

Information extraction pipelines for knowledge graphs

Author: Auer Sören
Both Andreas
Jaradeh Mohamad Yaser
Singh Kuldeep
Stocker Markus
Publication venue: London : Springer
Publication date: 01/01/2023
Field of study

In the last decade, a large number of knowledge graph (KG) completion approaches were proposed. Albeit effective, these efforts are disjoint, and their collective strengths and weaknesses in effective KG completion have not been studied in the literature. We extend Plumber, a framework that brings together the research community’s disjoint efforts on KG completion. We include more components into the architecture of Plumber to comprise 40 reusable components for various KG completion subtasks, such as coreference resolution, entity linking, and relation extraction. Using these components, Plumber dynamically generates suitable knowledge extraction pipelines and offers overall 432 distinct pipelines. We study the optimization problem of choosing optimal pipelines based on input sentences. To do so, we train a transformer-based classification model that extracts contextual embeddings from the input and finds an appropriate pipeline. We study the efficacy of Plumber for extracting the KG triples using standard datasets over three KGs: DBpedia, Wikidata, and Open Research Knowledge Graph. Our results demonstrate the effectiveness of Plumber in dynamically generating KG completion pipelines, outperforming all baselines agnostic of the underlying KG. Furthermore, we provide an analysis of collective failure cases, study the similarities and synergies among integrated components and discuss their limitations

Institutionelles Repositorium der Leibniz Universität Hannover

Recommended from our members

Information extraction pipelines for knowledge graphs

Author: Auer Sören
Both Andreas
Jaradeh Mohamad Yaser
Singh Kuldeep
Stocker Markus
Publication venue: London : Springer
Publication date: 01/01/2023
Field of study

Repositorium für Naturwissenschaften und Technik

Fuzzy Semantic Labeling of Semi-structured Numerical Datasets

Author: G Limaye
JC Bezdek
M Taheriyan
M Weise
Minh Pham
MJ Cafarella
P Venetis
Sebastian Neumaier
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2018
Field of study

SPARQL endpoints provide access to rich sources of data (e.g. knowledge graphs), which can be used to classify other less structured datasets (e.g. CSV files or HTML tables on the Web). We propose an approach to suggest types for the numerical columns of a collection of input files available as CSVs. Our approach is based on the application of the fuzzy c-means clustering technique to numerical data in the input files, using existing SPARQL endpoints to generate training datasets. Our approach has three major advantages: it works directly with live knowledge graphs, it does not require knowledge-graph profiling beforehand, and it avoids tedious and costly manual training to match values with types. We evaluate our approach against manually annotated datasets. The results show that the proposed approach classifies most of the types correctly for our test sets

Crossref

Archivo Digital UPM