Search CORE

919 research outputs found

Joint Representation Learning of Cross-lingual Words and Entities via Attentive Distant Supervision

Author: Cao Yixin
Chen Xu
Dong Tiansi
Hou Lei
Li Chengjiang
Li Juanzi
Liu Zhiyuan
Publication venue
Publication date: 01/01/2018
Field of study

Joint representation learning of words and entities benefits many NLP tasks, but has not been well explored in cross-lingual settings. In this paper, we propose a novel method for joint representation learning of cross-lingual words and entities. It captures mutually complementary knowledge, and enables cross-lingual inferences among knowledge bases and texts. Our method does not require parallel corpora, and automatically generates comparable data via distant supervision using multi-lingual knowledge bases. We utilize two types of regularizers to align cross-lingual words and entities, and design knowledge attention and cross-lingual attention to further reduce noises. We conducted a series of experiments on three tasks: word translation, entity relatedness, and cross-lingual entity linking. The results, both qualitatively and quantitatively, demonstrate the significance of our method.Comment: 11 pages, EMNLP201

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

Hansel: A Chinese Few-Shot and Zero-Shot Entity Linking Benchmark

Author: Hu Baotian
Li Yuxin
Qin Bing
Shan Zifei
Xu Zhenran
Publication venue
Publication date: 29/10/2023
Field of study

Modern Entity Linking (EL) systems entrench a popularity bias, yet there is no dataset focusing on tail and emerging entities in languages other than English. We present Hansel, a new benchmark in Chinese that fills the vacancy of non-English few-shot and zero-shot EL challenges. The test set of Hansel is human annotated and reviewed, created with a novel method for collecting zero-shot EL datasets. It covers 10K diverse documents in news, social media posts and other web articles, with Wikidata as its target Knowledge Base. We demonstrate that the existing state-of-the-art EL system performs poorly on Hansel (R@1 of 36.6% on Few-Shot). We then establish a strong baseline that scores a R@1 of 46.2% on Few-Shot and 76.6% on Zero-Shot on our dataset. We also show that our baseline achieves competitive results on TAC-KBP2015 Chinese Entity Linking task.Comment: WSDM 202

arXiv.org e-Print Archive

WikiGoldSK: Annotated Dataset, Baselines and Few-Shot Learning Experiments for Slovak Named Entity Recognition

Author: Hamerlik Endre
Kubík Jozef
Takáč Martin
Šuba Dávid
Šuppa Marek
Publication venue
Publication date: 08/04/2023
Field of study

Named Entity Recognition (NER) is a fundamental NLP tasks with a wide range of practical applications. The performance of state-of-the-art NER methods depends on high quality manually anotated datasets which still do not exist for some languages. In this work we aim to remedy this situation in Slovak by introducing WikiGoldSK, the first sizable human labelled Slovak NER dataset. We benchmark it by evaluating state-of-the-art multilingual Pretrained Language Models and comparing it to the existing silver-standard Slovak NER dataset. We also conduct few-shot experiments and show that training on a sliver-standard dataset yields better results. To enable future work that can be based on Slovak NER, we release the dataset, code, as well as the trained models publicly under permissible licensing terms at https://github.com/NaiveNeuron/WikiGoldSK.Comment: BSNLP 2023 Workshop at EACL 202

arXiv.org e-Print Archive

Cross-lingual Inference with a Chinese Entailment Graph

Author: Guillou Liane
Hosseini Javad
Li Tianyi
Steedman Mark
Weber Sabine
Publication venue
Publication date: 22/05/2022
Field of study

Edinburgh Research Explorer