121 research outputs found
Optimising Multiple Metrics with MERT
International audienceThe main metric used for SMT systems evaluation an optimisation is BLEU score but this metric is questioned about its relevance to human evaluation. Some other metrics already exist but none of them are in perfect harmony with human evaluation. On the other hand, most evaluations use multiple metrics (BLEU, TER, METEOR, etc.). Systems can optimise toward other metrics than BLEU. But optimisation with other metrics tends to decrease BLEU score. As Machine Translation evaluations still use BLEU as main metric, it is important to min-imise the decrease of BLEU. We propose to optimise toward a metric combination like BLEU-TER. This proposition includes two new open source scorers for MERT, the SMT optimisation tool. The first one is a TER scorer that allows us to optimise toward TER; the second one is a combination scorer. The latter one enables the combination of two or more metrics for the optimisation process. This paper also presents some experiments on the MERT optimisation in the Statistical Machine Translation system Moses with the TER and the BLEU metrics and some metric combinations
On the Usability of Transformers-based models for a French Question-Answering task
For many tasks, state-of-the-art results have been achieved with
Transformer-based architectures, resulting in a paradigmatic shift in practices
from the use of task-specific architectures to the fine-tuning of pre-trained
language models. The ongoing trend consists in training models with an
ever-increasing amount of data and parameters, which requires considerable
resources. It leads to a strong search to improve resource efficiency based on
algorithmic and hardware improvements evaluated only for English. This raises
questions about their usability when applied to small-scale learning problems,
for which a limited amount of training data is available, especially for
under-resourced languages tasks. The lack of appropriately sized corpora is a
hindrance to applying data-driven and transfer learning-based approaches with
strong instability cases. In this paper, we establish a state-of-the-art of the
efforts dedicated to the usability of Transformer-based models and propose to
evaluate these improvements on the question-answering performances of French
language which have few resources. We address the instability relating to data
scarcity by investigating various training strategies with data augmentation,
hyperparameters optimization and cross-lingual transfer. We also introduce a
new compact model for French FrALBERT which proves to be competitive in
low-resource settings.Comment: French compact model paper: FrALBERT, Accepted to RANLP 202
On the cross-lingual transferability of multilingual prototypical models across NLU tasks
Supervised deep learning-based approaches have been applied to task-oriented
dialog and have proven to be effective for limited domain and language
applications when a sufficient number of training examples are available. In
practice, these approaches suffer from the drawbacks of domain-driven design
and under-resourced languages. Domain and language models are supposed to grow
and change as the problem space evolves. On one hand, research on transfer
learning has demonstrated the cross-lingual ability of multilingual
Transformers-based models to learn semantically rich representations. On the
other, in addition to the above approaches, meta-learning have enabled the
development of task and language learning algorithms capable of far
generalization. Through this context, this article proposes to investigate the
cross-lingual transferability of using synergistically few-shot learning with
prototypical neural networks and multilingual Transformers-based models.
Experiments in natural language understanding tasks on MultiATIS++ corpus shows
that our approach substantially improves the observed transfer learning
performances between the low and the high resource languages. More generally
our approach confirms that the meaningful latent space learned in a given
language can be can be generalized to unseen and under-resourced ones using
meta-learning.Comment: Accepted to the ACL workshop METANLP 202
mALBERT: Is a Compact Multilingual BERT Model Still Worth It?
Within the current trend of Pretained Language Models (PLM), emerge more and
more criticisms about the ethical andecological impact of such models. In this
article, considering these critical remarks, we propose to focus on
smallermodels, such as compact models like ALBERT, which are more ecologically
virtuous than these PLM. However,PLMs enable huge breakthroughs in Natural
Language Processing tasks, such as Spoken and Natural LanguageUnderstanding,
classification, Question--Answering tasks. PLMs also have the advantage of
being multilingual, and,as far as we know, a multilingual version of compact
ALBERT models does not exist. Considering these facts, wepropose the free
release of the first version of a multilingual compact ALBERT model,
pre-trained using Wikipediadata, which complies with the ethical aspect of such
a language model. We also evaluate the model against classicalmultilingual PLMs
in classical NLP tasks. Finally, this paper proposes a rare study on the
subword tokenizationimpact on language performances.Comment: The 2024 Joint International Conference on Computational Linguistics,
Language Resources and Evaluation, May 2024, Torino, Ital
Word2Vec vs DBnary ou comment (ré)concilier représentations distribuées et réseaux lexico-sémantiques ? Le cas de l’évaluation en traduction automatique
International audienceThis paper presents an approach combining lexical-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric : METEOR. METEOR enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are less efficient than lexical-semantic resources for MT evaluation but they can nonetheless bring interesting additional information
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
International audienceWe present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification
MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP
International audienceWe present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification
Benchmarking Transformers-based models on French Spoken Language Understanding tasks
In the last five years, the rise of the self-attentional Transformer-based
architectures led to state-of-the-art performances over many natural language
tasks. Although these approaches are increasingly popular, they require large
amounts of data and computational resources. There is still a substantial need
for benchmarking methodologies ever upwards on under-resourced languages in
data-scarce application conditions. Most pre-trained language models were
massively studied using the English language and only a few of them were
evaluated on French. In this paper, we propose a unified benchmark, focused on
evaluating models quality and their ecological impact on two well-known French
spoken language understanding tasks. Especially we benchmark thirteen
well-established Transformer-based models on the two available spoken language
understanding tasks for French: MEDIA and ATIS-FR. Within this framework, we
show that compact models can reach comparable results to bigger ones while
their ecological impact is considerably lower. However, this assumption is
nuanced and depends on the considered compression method.Comment: Accepted paper at INTERSPEECH 202
A Benchmark Evaluation of Clinical Named Entity Recognition in French
Background: Transformer-based language models have shown strong performance
on many Natural LanguageProcessing (NLP) tasks. Masked Language Models (MLMs)
attract sustained interest because they can be adaptedto different languages
and sub-domains through training or fine-tuning on specific corpora while
remaining lighterthan modern Large Language Models (LLMs). Recently, several
MLMs have been released for the biomedicaldomain in French, and experiments
suggest that they outperform standard French counterparts. However,
nosystematic evaluation comparing all models on the same corpora is available.
Objective: This paper presentsan evaluation of masked language models for
biomedical French on the task of clinical named entity recognition.Material and
methods: We evaluate biomedical models CamemBERT-bio and DrBERT and compare
them tostandard French models CamemBERT, FlauBERT and FrALBERT as well as
multilingual mBERT using three publicallyavailable corpora for clinical named
entity recognition in French. The evaluation set-up relies on
gold-standardcorpora as released by the corpus developers. Results: Results
suggest that CamemBERT-bio outperformsDrBERT consistently while FlauBERT offers
competitive performance and FrAlBERT achieves the lowest carbonfootprint.
Conclusion: This is the first benchmark evaluation of biomedical masked
language models for Frenchclinical entity recognition that compares model
performance consistently on nested entity recognition using metricscovering
performance and environmental impact
Better Evaluation of ASR in Speech Translation Context Using Word Embeddings
International audienceThis paper investigates the evaluation of ASR in spoken language translation context. More precisely, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, a preliminary experiment where ASR tuning is based on our new metric shows encouraging results. For reproductible experiments, the code allowing to call our modified WER and the corpora used are made available to the research community
- …