Search CORE

5 research outputs found

Surfing the modeling of pos taggers in low-resource scenarios

Author: Darriba Bilbao Victor Manuel
Graña Gil Jorge
Ribadas Pena Francisco Jose
Vilares Ferro Manuel
Publication venue: 'MDPI AG'
Publication date: 01/09/2022
Field of study

The recent trend toward the application of deep structured techniques has revealed the limits of huge models in natural language processing. This has reawakened the interest in traditional machine learning algorithms, which have proved still to be competitive in certain contexts, particularly in low-resource settings. In parallel, model selection has become an essential task to boost performance at reasonable cost, even more so when we talk about processes involving domains where the training and/or computational resources are scarce. Against this backdrop, we evaluate the early estimation of learning curves as a practical mechanism for selecting the most appropriate model in scenarios characterized by the use of non-deep learners in resource-lean settings. On the basis of a formal approximation model previously evaluated under conditions of wide availability of training and validation resources, we study the reliability of such an approach in a different and much more demanding operational environment. Using as a case study the generation of pos taggers for Galician, a language belonging to the Western Ibero-Romance group, the experimental results are consistent with our expectations.Ministerio de Ciencia e Innovación | Ref. PID2020-113230RB-C21Ministerio de Ciencia e Innovación | Ref. PID2020-113230RB-C22Xunta de Galicia | Ref. ED431C 2020/1

Repositorio da Universidade da Coruña

Investigo

Directory of Open Access Journals

Bridging the “gApp”: improving neural machine translation systems for multiword expression detection

Author: Hidalgo-Ternero Carlos Manuel
Pastor Gloria Corpas
Publication venue: 'Walter de Gruyter GmbH'
Publication date: 11/01/2021
Field of study

This is the published version of an article published by De Gruyter in Yearbook of Phraseology on 25/11/2020, available online: https://doi.org/10.1515/phras-2020-0005The present research introduces the tool gApp, a Python-based text preprocessing system for the automatic identification and conversion of discontinuous multiword expressions (MWEs) into their continuous form in order to enhance neural machine translation (NMT). To this end, an experiment with semi-fixed verb–noun idiomatic combinations (VNICs) will be carried out in order to evaluate to what extent gApp can optimise the performance of the two main free open-source NMT systems —Google Translate and DeepL— under the challenge of MWE discontinuity in the Spanish into English directionality. In the light of our promising results, the study concludes with suggestions on how to further optimise MWE-aware NMT systems.Published versio

Crossref

Wolverhampton Intellectual Repository and E-theses

Comparing linear and neural models for competitive MWE identification

Author: Al Saied Hazem
Candito Marie
Constant Mathieu
Publication venue: HAL CCSD
Publication date: 30/09/2019
Field of study

International audienceIn this paper, we compare the use of linear versus neural classifiers in a greedy transition system for MWE identification. Both our linear and neural models achieve a new state-of-the-art on the PARSEME 1.1 shared task data sets, comprising 20 languages. Surprisingly, our best model is a simple feed-forward network with one hidden layer, although more sophisticated (recurrent) architectures were tested. The feedback from this study is that tuning a SVM is rather straightforward, whereas tuning our neural system revealed more challenging. Given the number of languages and the variety of linguistic phenomena to handle for the MWE identification task, we have designed an accurate tuning procedure, and we show that hyper-parameters are better selected by using a majority-vote within random search configurations rather than a simple best configuration selection. Although the performance is rather good (better than both the best shared task system and the average of the best per-language results), further work is needed to improve the generalization power, especially on unseen MWEs

Hal-Diderot

Comparing linear and neural models for competitive MWE identification

Author: Al Saied Hazem
Candito Marie
Constant Mathieu
Publication venue: HAL CCSD
Publication date
Field of study