Search CORE

4,997 research outputs found

Comparative Analysis of Word Embeddings for Capturing Word Similarities

Author: Kalajdjieski Jovan
Stojanovska Frosina
Toshevska Martina
Publication venue: 'Academy and Industry Research Collaboration Center (AIRCC)'
Publication date: 07/05/2020
Field of study

Distributed language representation has become the most widely used technique for language representation in various natural language processing tasks. Most of the natural language processing models that are based on deep learning techniques use already pre-trained distributed word representations, commonly called word embeddings. Determining the most qualitative word embeddings is of crucial importance for such models. However, selecting the appropriate word embeddings is a perplexing task since the projected embedding space is not intuitive to humans. In this paper, we explore different approaches for creating distributed word representations. We perform an intrinsic evaluation of several state-of-the-art word embedding methods. Their performance on capturing word similarities is analysed with existing benchmark datasets for word pairs similarities. The research in this paper conducts a correlation analysis between ground truth word similarities and similarities obtained by different word embedding methods.Comment: Part of the 6th International Conference on Natural Language Processing (NATP 2020

arXiv.org e-Print Archive

Crossref

The TALP–UPC Spanish–English WMT biomedical task: bilingual embeddings and char-based neural language model rescoring in a phrase-based system

Author: Escolano Peinado Carlos
España-i-Bonet Cristina
Madhyastha Pranava
Rodríguez Fonollosa José Adrián
Ruiz Costa-Jussà Marta
Publication venue
Publication date: 01/01/2016
Field of study

This paper describes the TALP–UPC system in the Spanish–English WMT 2016 biomedical shared task. Our system is a standard phrase-based system enhanced with vocabulary expansion using bilingual word embeddings and a characterbased neural language model with rescoring. The former focuses on resolving outof- vocabulary words, while the latter enhances the fluency of the system. The two modules progressively improve the final translation as measured by a combination of several lexical metrics.Postprint (published version

Crossref

UPCommons. Portal del coneixement obert de la UPC