Search CORE

3 research outputs found

Assessing Lexical-Semantic Regularities in Portuguese Word Embeddings

Author: Alves Ana
Gonçalo Oliveira Hugo
Sousa Tiago
Publication venue: 'Universidad Internacional de La Rioja'
Publication date: 01/01/2021
Field of study

Models of word embeddings are often assessed when solving syntactic and semantic analogies. Among the latter, we are interested in relations that one would find in lexical-semantic knowledge bases like WordNet, also covered by some analogy test sets for English. Briefly, this paper aims to study how well pretrained Portuguese word embeddings capture such relations. For this purpose, we created a new test, dubbed TALES, with an exclusive focus on Portuguese lexical-semantic relations, acquired from lexical resources. With TALES, we analyse the performance of methods previously used for solving analogies, on different models of Portuguese word embeddings. Accuracies were clearly below the state of the art in analogies of other kinds, which shows that TALES is a challenging test, mainly due to the nature of lexical-semantic relations, i.e., there are many instances sharing the same argument, thus allowing for several correct answers, sometimes too many to be all included in the dataset. We further inspect the results of the best performing combination of method and model to find that some acceptable answers had been considered incorrect. This was mainly due to the lack of coverage by the source lexical resources and suggests that word embeddings may be a useful source of information for enriching those resources, something we also discuss

Estudo Geral

Re-UNIR

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Author: Hugo Gonçalo Oliveira
Publication venue: 'MDPI AG'
Publication date: 08/02/2018
Field of study

Identifying similar and related words is not only key in natural language understanding but also a suitable task for assessing the quality of computational resources that organise words and meanings of a language, compiled by different means. This paper, which aims to be a reference for those interested in computing word similarity in Portuguese, presents several approaches for this task and is motivated by the recent availability of state-of-the-art distributional models of Portuguese words, which add to several lexical knowledge bases (LKBs) for this language, available for a longer time. The previous resources were exploited to answer word similarity tests, which also became recently available for Portuguese. We conclude that there are several valid approaches for this task, but not one that outperforms all the others in every single test. Distributional models seem to capture relatedness better, while LKBs are better suited for computing genuine similarity, but, in general, better results are obtained when knowledge from different sources is combined

Multidisciplinary Digital Publishing Institute

Distributional and Knowledge-Based Approaches for Computing Portuguese Word Similarity

Author: Hugo Gonçalo Oliveira
Publication venue: 'MDPI AG'
Publication date: 01/02/2018
Field of study

Directory of Open Access Journals