Search CORE

7 research outputs found

Injecting Knowledge Base Information into End-to-End Joint Entity and Relation Extraction and Coreference Resolution

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Verlinden Severine
Zaporojets Klim
Publication venue
Publication date: 01/01/2021
Field of study

We consider a joint information extraction (IE) model, solving named entity recognition, coreference resolution and relation extraction jointly over the whole document. In particular, we study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either (i) hyperlinked text documents (Wikipedia), or (ii) a knowledge graph (Wikidata), and appear complementary in raising IE performance. Representations of corresponding entity linking (EL) candidates are added to text span representations of the input document, and we experiment with (i) taking a weighted average of the EL candidate representations based on their prior (in Wikipedia), and (ii) using an attention scheme over the EL candidate list. Results demonstrate an increase of up to 5% F1-score for the evaluated IE tasks on two datasets. Despite a strong performance of the prior-based model, our quantitative and qualitative analysis reveals the advantage of using the attention-based approach

arXiv.org e-Print Archive

Ghent University Academic Bibliography

Extreme Multi-Label Skill Extraction Training using Large Language Models

Author: Decorte Jens-Joris
Deleu Johannes
Demeester Thomas
Develder Chris
Van Hautte Jeroen
Verlinden Severine
Publication venue
Publication date: 20/07/2023
Field of study

Online job ads serve as a valuable source of information for skill requirements, playing a crucial role in labor market analysis and e-recruitment processes. Since such ads are typically formatted in free text, natural language processing (NLP) technologies are required to automatically process them. We specifically focus on the task of detecting skills (mentioned literally, or implicitly described) and linking them to a large skill ontology, making it a challenging case of extreme multi-label classification (XMLC). Given that there is no sizable labeled (training) dataset are available for this specific XMLC task, we propose techniques to leverage general Large Language Models (LLMs). We describe a cost-effective approach to generate an accurate, fully synthetic labeled dataset for skill extraction, and present a contrastive learning strategy that proves effective in the task. Our results across three skill extraction benchmarks show a consistent increase of between 15 to 25 percentage points in \textit{R-Precision@5} compared to previously published results that relied solely on distant supervision through literal matches.Comment: Accepted to the International workshop on AI for Human Resources and Public Employment Services (AI4HR&PES) as part of ECML-PKDD 202

arXiv.org e-Print Archive

Isolated anti-Ku antibody in scleroderma-myositis overlap syndrome: the histo-pathological patern

Author: Anne Peretz
Bernard Azanmene
Francis Corazza
Hazim Kadhim
Jacques Bentin
Maria Fernandez-Lopez
Severine Verlinden
Valerie Badot
Wolfram Fink
Publication venue: Springer Nature
Publication date: 28/11/2012
Field of study

Springer - Publisher Connector

PubMed Central

Injecting knowledge base information into end-to-end joint entity and relation extraction and coreference resolution

Author: Deleu Johannes
Demeester Thomas
Develder Chris
Verlinden Severine
Zaporojets Klim
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2021
Field of study

We consider a joint information extraction(IE) model, solving named entity recognition, coreference resolution and relation extraction jointly over the whole document. In particular, we study how to inject information from a knowledge base (KB) in such IE model, based on unsupervised entity linking. The used KB entity representations are learned from either(i) hyperlinked text documents (Wikipedia), or(ii) a knowledge graph (Wikidata), and ap-pear complementary in raising IE performance. Representations of corresponding entity linking (EL) candidates are added to text span representations of the input document, and we experiment with (i) taking a weighted average of the EL candidate representations based on their prior (in Wikipedia), and (ii) using an attention scheme over the EL candidate list. Results demonstrate an increase of up to 5%F1-score for the evaluated IE tasks on two datasets. Despite a strong performance of the prior-based model, our quantitative and qualitative analysis reveals the advantage of using the attention-based approach

Ghent University Academic Bibliography

Extreme multi-label skill extraction training using large language models

Author: Decorte Jens-Joris
Deleu Johannes
Demeester Thomas
Develder Chris
Van Hautte Jeroen
Verlinden Severine
Publication venue
Publication date: 01/01/2023
Field of study

Ghent University Academic Bibliography

Frozen pretrained transformers for neural sign language translation

Author: D'Oosterlinck Karel
Dambre Joni
De Coster Mathieu
Pizurica Marija
Rabaey Paloma
Van Herreweghe Mieke
Verlinden Severine
Publication venue: Association for Machine Translation in the Americas
Publication date: 01/01/2021
Field of study

One of the major challenges in sign language translation from a sign language to a spoken language is the lack of parallel corpora. Recent works have achieved promising results on the RWTH-PHOENIX-Weather 2014T dataset, which consists of over eight thousand parallel sentences between German sign language and German. However, from the perspective of neural machine translation, this is still a tiny dataset. To improve the performance of models trained on small datasets, transfer learning can be used. While this has been previously applied in sign language translation for feature extraction, to the best of our knowledge, pretrained language models have not yet been investigated. We use pretrained BERT-base and mBART-50 models to initialize our sign language video to spoken language text translation model. To mitigate overfitting, we apply the frozen pretrained transformer technique: we freeze the majority of parameters during training. Using a pretrained BERT model, we outperform a baseline trained from scratch by 1 to 2 BLEU-4. Our results show that pretrained language models can be used to improve sign language translation performance and that the self-attention patterns in BERT transfer in zero-shot to the encoder and decoder of sign language translation models

Ghent University Academic Bibliography

Isolated anti-Ku antibody in scleroderma-myositis overlap syndrome: the histo-pathological patern

Author: Anne Peretz
Bernard Azanmene
Francis Corazza
Hazim Kadhim
Jacques Bentin
Maria Josee Fernandez-Lopez
N Kamei
P Chérin
Severine Verlinden
Valerie Badot
Wolfram Fink
Y Yamanishi
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref