Search CORE

175 research outputs found

Breaking Through the 80% Glass Ceiling: Raising the State of the Art in Word Sense Disambiguation by Incorporating Knowledge Graph Information

Author: Michele Bevilacqua
Roberto Navigli
Publication venue: 'Association for Computational Linguistics (ACL)'
Publication date: 01/01/2020
Field of study

Neural architectures are the current state of the art in Word Sense Disambiguation (WSD). However, they make limited use of the vast amount of relational information encoded in Lexical Knowledge Bases (LKB). We present Enhanced WSD Integrating Synset Embeddings and Relations (EWISER), a neural supervised architecture that is able to tap into this wealth of knowledge by embedding information from the LKB graph within the neural architecture, and to exploit pretrained synset embeddings, enabling the network to predict synsets that are not in the training set. As a result, we set a new state of the art on almost all the evaluation settings considered, also breaking through, for the first time, the 80% ceiling on the concatenation of all the standard all-words English WSD evaluation benchmarks. On multilingual all-words WSD, we report state-of-the-art results by training on nothing but English

Archivio della ricerca- Università di Roma La Sapienza

Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss

Author: Chang Yu-Lin
Chen Wei-Ling
Hsieh Shu-Kai
Ku Mao-Chang
Tseng Yu-Hsiang
Publication venue
Publication date: 28/05/2023
Field of study

Contextualized embeddings are proven to be powerful tools in multiple NLP tasks. Nonetheless, challenges regarding their interpretability and capability to represent lexical semantics still remain. In this paper, we propose that the task of definition modeling, which aims to generate the human-readable definition of the word, provides a route to evaluate or understand the high dimensional semantic vectors. We propose a `Vec2Gloss' model, which produces the gloss from the target word's contextualized embeddings. The generated glosses of this study are made possible by the systematic gloss patterns provided by Chinese Wordnet. We devise two dependency indices to measure the semantic and contextual dependency, which are used to analyze the generated texts in gloss and token levels. Our results indicate that the proposed `Vec2Gloss' model opens a new perspective to the lexical-semantic applications of contextualized embeddings

arXiv.org e-Print Archive

Investigations into the value of labeled and unlabeled data in biomedical entity recognition and word sense disambiguation

Author: Sheikhshabbafghi Golnar
Publication venue
Publication date: 31/03/2021
Field of study

Human annotations, especially in highly technical domains, are expensive and time consuming togather, and can also be erroneous. As a result, we never have sufficiently accurate data to train andevaluate supervised methods. In this thesis, we address this problem by taking a semi-supervised approach to biomedical namedentity recognition (NER), and by proposing an inventory-independent evaluation framework for supervised and unsupervised word sense disambiguation. Our contributions are as follows: We introduce a novel graph-based semi-supervised approach to named entity recognition(NER) and exploit pre-trained contextualized word embeddings in several biomedical NER tasks. We propose a new evaluation framework for word sense disambiguation that permits a fair comparison between supervised methods trained on different sense inventories as well as unsupervised methods without a fixed sense inventory