Search CORE

112 research outputs found

Predicting wikipedia infobox type information using word embeddings on categories

Author: Biswas Russa
Koutraki Maria
Sack Harald
Publication venue: RWTH Aachen
Publication date: 01/01/2018
Field of study

Wikipedia has emerged as the largest multilingual, web based general reference work on the Internet. A huge amount of human resources have been invested in the creation and update of Wikipedia articles which are ideally complemented by so-called infobox templates defining the type of the underlying article. It has been observed that the Wikipedia infobox type information is often incomplete and inconsistent due to various reasons. However, the Wikipedia infobox type information plays a fundamental role for the RDF type information of Wikipedia based Knowledge Graphs such as DBpedia. This stimulates the need of always having the correct and complete infobox type information. In this work, we propose an approach to predict Wikipedia infobox types by using word embeddings on categories of Wikipedia articles, and analyze the impact of using minimal information from the Wikipedia articles in the prediction process

KITopen

Entity Type Prediction in Knowledge Graphs using Embeddings

Author: Alam Mehwish
Biswas Russa
Sack Harald
Sofronova Radina
Publication venue
Publication date: 01/01/2020
Field of study

Open Knowledge Graphs (such as DBpedia, Wikidata, YAGO) have been recognized as the backbone of diverse applications in the field of data mining and information retrieval. Hence, the completeness and correctness of the Knowledge Graphs (KGs) are vital. Most of these KGs are mostly created either via an automated information extraction from Wikipedia snapshots or information accumulation provided by the users or using heuristics. However, it has been observed that the type information of these KGs is often noisy, incomplete, and incorrect. To deal with this problem a multi-label classification approach is proposed in this work for entity typing using KG embeddings. We compare our approach with the current state-of-the-art type prediction method and report on experiments with the KGs

arXiv.org e-Print Archive

KITopen

Entity Type Prediction in Knowledge Graphs using Embeddings

Author: Alam Mehwish
Biswas Russa
Sack Harald
Soforonova Radina
Publication venue: RWTH Aachen
Publication date: 01/01/2020
Field of study

KITopen

InfoSync: Information Synchronization across Multilingual Semi-structured Tables

Author: Gupta Vivek
Jain Chelsi
Kataria Tushar
Khincha Siddharth
Zhang Shuo
Publication venue
Publication date: 06/07/2023
Field of study

Information Synchronization of semi-structured data across languages is challenging. For instance, Wikipedia tables in one language should be synchronized across languages. To address this problem, we introduce a new dataset InfoSyncC and a two-step method for tabular synchronization. InfoSync contains 100K entity-centric tables (Wikipedia Infoboxes) across 14 languages, of which a subset (3.5K pairs) are manually annotated. The proposed method includes 1) Information Alignment to map rows and 2) Information Update for updating missing/outdated information for aligned tables across multilingual tables. When evaluated on InfoSync, information alignment achieves an F1 score of 87.91 (en non-en). To evaluate information updation, we perform human-assisted Wikipedia edits on Infoboxes for 603 table pairs. Our approach obtains an acceptance rate of 77.28% on Wikipedia, showing the effectiveness of the proposed method.Comment: 22 pages, 7 figures, 20 tables, ACL 2023 (Toronto, Canada

arXiv.org e-Print Archive

An approach to correction of erroneous links in knowledge graphs

Author: Melo André
Paulheim Heiko
Publication venue: RWTH
Publication date: 01/01/2017
Field of study

MAnnheim DOCument Server

Table and Image Generation for Investigating Knowledge of Entities in Pre-trained Vision and Language Models

Author: Hayashi Katsuhiko
Kamigaito Hidetaka
Watanabe Taro
Publication venue
Publication date: 03/06/2023
Field of study

In this paper, we propose a table and image generation task to verify how the knowledge about entities acquired from natural language is retained in Vision & Language (V & L) models. This task consists of two parts: the first is to generate a table containing knowledge about an entity and its related image, and the second is to generate an image from an entity with a caption and a table containing related knowledge of the entity. In both tasks, the model must know the entities used to perform the generation properly. We created the Wikipedia Table and Image Generation (WikiTIG) dataset from about 200,000 infoboxes in English Wikipedia articles to perform the proposed tasks. We evaluated the performance on the tasks with respect to the above research question using the V & L model OFA, which has achieved state-of-the-art results in multiple tasks. Experimental results show that OFA forgets part of its entity knowledge by pre-training as a complement to improve the performance of image related tasks.Comment: Accepted at ACL 202

arXiv.org e-Print Archive

Natural language generation from structured data

Author: Matulík Martin
Publication venue: Czech Technical University in Prague. Computing and Information Centre.
Publication date
Field of study

Generování přirozeného jazyka je jedna z nejtěžších úloh strojového učení. Jejím cílem je obvykle prezentovat informaci původně uloženou ve strukturované podobě. V této práci implementuji a zkoumám systém založený na principu jazykového modelu, který generuje věty v přirozeném jazyce z dat uložených v tabulce.Natural language generation is on of the hardest tasks of machine learning. Usually, the task is to convey some information stored in structured form. In this work, we implement and test a system based on a neural language model which attempts to generate natural language sentences from data contained in a table

Digital Library of the Czech Technical University in Prague