Search CORE

18 research outputs found

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Author: Besacier Laurent
Bérard Alexandre
Pietquin Olivier
Servan Christophe
Publication venue: HAL CCSD
Publication date: 23/05/2016
Field of study

International audienceWe present MultiVec, a new toolkit for computing continuous representations for text at different granularity levels (word-level or sequences of words). MultiVec includes Mikolov et al. [2013b]'s word2vec features, Le and Mikolov [2014]'s paragraph vector (batch and online) and Luong et al. [2015]'s model for bilingual distributed representations. MultiVec also includes different distance measures between words and sequences of words. The toolkit is written in C++ and is aimed at being fast (in the same order of magnitude as word2vec), easy to use, and easy to extend. It has been evaluated on several NLP tasks: the analogical reasoning task, sentiment analysis, and crosslingual document classification

Hal - Université Grenoble Alpes

Word2Vec vs DBnary ou comment (ré)concilier représentations distribuées et réseaux lexico-sémantiques ? Le cas de l’évaluation en traduction automatique

Author: Besacier Laurent
Blanchon Hervé
Elloumi Zied
Servan Christophe
Publication venue: HAL CCSD
Publication date: 01/07/2016
Field of study

International audienceThis paper presents an approach combining lexical-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric : METEOR. METEOR enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are less efficient than lexical-semantic resources for MT evaluation but they can nonetheless bring interesting additional information

Hal - Université Grenoble Alpes

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Author: Besacier Laurent
Bérard Alexandre
Pietquin Olivier
Servan Christophe
Publication venue: HAL CCSD
Publication date: 23/05/2016
Field of study

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Better Evaluation of ASR in Speech Translation Context Using Word Embeddings

Author: Besacier Laurent
Le Ngoc-Tien
Lecouteux Benjamin
Servan Christophe
Publication venue: HAL CCSD
Publication date: 01/09/2016
Field of study

International audienceThis paper investigates the evaluation of ASR in spoken language translation context. More precisely, we propose a simple extension of WER metric in order to penalize differently substitution errors according to their context using word embeddings. For instance, the proposed metric should catch near matches (mainly morphological variants) and penalize less this kind of error which has a more limited impact on translation performance. Our experiments show that the correlation of the new proposed metric with SLT performance is better than the one of WER. Oracle experiments are also conducted and show the ability of our metric to find better hypotheses (to be translated) in the ASR N-best. Finally, a preliminary experiment where ASR tuning is based on our new metric shows encouraging results. For reproductible experiments, the code allowing to call our modified WER and the corpora used are made available to the research community

Hal - Université Grenoble Alpes

Distribution maps of twenty-four Mediterranean and European ecologically and economically important forest tree species compiled from historical data collections

Author: Besacier Christophe
Fady Bruno
Garavaglia Valentina
Picard Nicolas
Wazen Nadine
Publication venue: 'Environmental Health Perspectives'
Publication date: 01/01/2020
Field of study

Species distribution maps are often lacking for scientific investigation and strategic management planning at international level. Here, we present the range-wide, natural distribution maps of twenty-four Mediterranean and European forest-tree species of key ecological and economic importance in the Mediterranean basin. Data on the geographic distribution of the twenty-four tree species were compiled from over one hundred published sources, making this contribution one of the most extensive resource available from historical data. Dataset can be accessed at: https://doi.org/10.5281/zenodo.822953. Associated metadata can be accessed at: http://www.fao.org/geonetwork/srv/en/metadata.show?id=56996. These data provide key spatial information to further investigate species occurrence-environment relationships, provide a baseline to assess the future impact of climate change, identify marginal populations with specific genetic resources, among other possible applications

HAL Descartes

CREA Journals (Consiglio per la ricerca in agricoltura e l’analisi dell’economia agraria)

Word2Vec vs DBnary: Augmenting METEOR using Vector Representations or Lexical Resources?

Author: Besacier Laurent
Blanchon Hervé
Bérard Alexandre
Elloumi Zied
Servan Christophe
Publication venue: HAL CCSD
Publication date: 05/10/2016
Field of study

International audienceThis paper presents an approach combining lexico-semantic resources and distributed representations of words applied to the evaluation in machine translation (MT). This study is made through the enrichment of a well-known MT evaluation metric: METEOR. This metric enables an approximate match (synonymy or morphological similarity) between an automatic and a reference translation. Our experiments are made in the framework of the Metrics task of WMT 2014. We show that distributed representations are a good alternative to lexico-semantic resources for MT evaluation and they can even bring interesting additional information. The augmented versions of METEOR, using vector representations, are made available on our Github page

arXiv.org e-Print Archive

HAL - Lille 3

Hal - Université Grenoble Alpes

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

An Open Source Toolkit for Word-level Confidence Estimation in Machine Translation

Author: Besacier Laurent
Le Ngoc-Tien
Lecouteux Benjamin
Luong Ngoc Quang
Servan Christophe
Publication venue: HAL CCSD
Publication date: 03/12/2015
Field of study

International audienceRecently, a growing need of Confidence Estimation (CE) for Statistical Machine Translation (SMT) systems in Computer Aided Translation (CAT), was observed. However, most of the CE toolkits are optimized for a single target language (mainly English) and, as far as we know, none of them are dedicated to this specific task and freely available. This paper presents an open-source toolkit for predicting the quality of words of a SMT output, whose novel contributions are (i) support for various target languages, (ii) handle a number of features of different types (system-based, lexical , syntactic and semantic). In addition, the toolkit also integrates a wide variety of Natural Language Processing or Machine Learning tools to pre-process data, extract features and estimate confidence at word-level. Features for Word-level Confidence Estimation (WCE) can be easily added / removed using a configuration file. We validate the toolkit by experimenting in the WCE evaluation framework of WMT shared task with two language pairs: French-English and English-Spanish. The toolkit is made available to the research community with ready-made scripts to launch full experiments on these language pairs, while achieving state-of-the-art and reproducible performances

Hal - Université Grenoble Alpes

MultiVec: a Multilingual and Multilevel Representation Learning Toolkit for NLP

Author: Besacier Laurent
Bérard Alexandre
Pietquin Olivier
Servan Christophe
Publication venue: HAL CCSD
Publication date: 23/05/2016
Field of study

INRIA a CCSD electronic archive server

Catalysing restoration efforts in the Mediterranean region through the United Nations Decade on Ecosystem Restoration 2021–2030

Author: Besacier Christophe
Corezzola Serena
de Dato Giovanbattista
Gallo Granizo Carolina
Garavaglia Valentina
Martinoli Alessio
Miozzo Marcello
Mohanna Chadi
Romero Montoya Andrea
Sater Melnik Cristiane
Turhan Ümit
Publication venue
Publication date: 26/07/2024
Field of study

Coventry University Pure Portal