454 research outputs found
Traduction automatisée d'une oeuvre littéraire: une étude pilote
International audienceCurrent machine translation (MT ) techniques are continuously improving. In specific areas, post-editing (PE) allows to obtain high-quality translations relatively quickly. But is such a pipeline (MT+PE) usable to translate a lite- rary work (fiction, short story) ? This paper tries to bring a preliminary answer to this question. A short story by American writer Richard Powers, still not available in French, is automatically translated and post-edited and then revised by non- professional translators. The LIG post-editing platform allows to read and edit the short story suggesting (for the future) a community of readers-editors that continuously improve the translations of their favorite author. In addition to presen- ting experimental evaluation results of the pipeline MT+PE (MT system used, auomatic evaluation), we also discuss the quality of the translation output from the perspective of a panel of readers (who read the translated short story in French, and answered to a survey afterwards). Finally, some remarks of the official french translator of R. Powers, requested on this occasion, are given at the end of this article.Les techniques actuelles de traduction automatique (TA) permettent de produire des traductions dont la qualité ne cesse de croitre. Dans des domaines spécifiques, la post-édition (PE) de traductions automatiques permet, par ailleurs, d'obtenir des traductions de qualité relativement rapidement. Mais un tel pipeline (TA+PE) est il envisageable pour traduire une oeuvre littéraire ? Cet article propose une ébauche de réponse à cette question. Un essai de l'auteur américain Richard Powers, encore non disponible en français, est traduit automatiquement puis post-édité et révisé par des traducteurs non-professionnels. La plateforme de post-édition du LIG utilisée permet de lire et éditer l'oeuvre traduite en français continuellement, suggérant (pour le futur) une communauté de lecteurs-réviseurs qui améliorent en continu les traductions de leur auteur favori. En plus de la présentation des résultats d'évaluation expérimentale du pipeline TA+PE (système de TA utilisé, scores automatiques), nous discutons également la qualité de la traduction produite du point de vue d'un panel de lecteurs (ayant lu la traduction en français, puis répondu à une enquête). Enfin, quelques remarques du traducteur français de R. Powers, sollicité à cette occasion, sont présentées à la fin de cet article
Machine Assisted Analysis of Vowel Length Contrasts in Wolof
Growing digital archives and improving algorithms for automatic analysis of
text and speech create new research opportunities for fundamental research in
phonetics. Such empirical approaches allow statistical evaluation of a much
larger set of hypothesis about phonetic variation and its conditioning factors
(among them geographical / dialectal variants). This paper illustrates this
vision and proposes to challenge automatic methods for the analysis of a not
easily observable phenomenon: vowel length contrast. We focus on Wolof, an
under-resourced language from Sub-Saharan Africa. In particular, we propose
multiple features to make a fine evaluation of the degree of length contrast
under different factors such as: read vs semi spontaneous speech ; standard vs
dialectal Wolof. Our measures made fully automatically on more than 20k vowel
tokens show that our proposed features can highlight different degrees of
contrast for each vowel considered. We notably show that contrast is weaker in
semi-spontaneous speech and in a non standard semi-spontaneous dialect.Comment: Accepted to Interspeech 201
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task
This paper presents the LIG-CRIStAL submission to the shared Automatic Post-
Editing task of WMT 2017. We propose two neural post-editing models: a
monosource model with a task-specific attention mechanism, which performs
particularly well in a low-resource scenario; and a chained architecture which
makes use of the source sentence to provide extra context. This latter
architecture manages to slightly improve our results when more training data is
available. We present and discuss our results on two datasets (en-de and de-en)
that are made available for the task.Comment: keywords: neural post-edition, attention model
Augmenting Librispeech with French Translations: A Multimodal Corpus for Direct Speech Translation Evaluation
Recent works in spoken language translation (SLT) have attempted to build
end-to-end speech-to-text translation without using source language
transcription during learning or decoding. However, while large quantities of
parallel texts (such as Europarl, OpenSubtitles) are available for training
machine translation systems, there are no large (100h) and open source parallel
corpora that include speech in a source language aligned to text in a target
language. This paper tries to fill this gap by augmenting an existing
(monolingual) corpus: LibriSpeech. This corpus, used for automatic speech
recognition, is derived from read audiobooks from the LibriVox project, and has
been carefully segmented and aligned. After gathering French e-books
corresponding to the English audio-books from LibriSpeech, we align speech
segments at the sentence level with their respective translations and obtain
236h of usable parallel data. This paper presents the details of the processing
as well as a manual evaluation conducted on a small subset of the corpus. This
evaluation shows that the automatic alignments scores are reasonably correlated
with the human judgments of the bilingual alignment quality. We believe that
this corpus (which is made available online) is useful for replicable
experiments in direct speech translation or more general spoken language
translation experiments.Comment: LREC 2018, Japa
Deep Investigation of Cross-Language Plagiarism Detection Methods
This paper is a deep investigation of cross-language plagiarism detection
methods on a new recently introduced open dataset, which contains parallel and
comparable collections of documents with multiple characteristics (different
genres, languages and sizes of texts). We investigate cross-language plagiarism
detection methods for 6 language pairs on 2 granularities of text units in
order to draw robust conclusions on the best methods while deeply analyzing
correlations across document styles and languages.Comment: Accepted to BUCC (10th Workshop on Building and Using Comparable
Corpora) colocated with ACL 201
CompiLIG at SemEval-2017 Task 1: Cross-Language Plagiarism Detection Methods for Semantic Textual Similarity
We present our submitted systems for Semantic Textual Similarity (STS) Track
4 at SemEval-2017. Given a pair of Spanish-English sentences, each system must
estimate their semantic similarity by a score between 0 and 5. In our
submission, we use syntax-based, dictionary-based, context-based, and MT-based
methods. We also combine these methods in unsupervised and supervised way. Our
best run ranked 1st on track 4a with a correlation of 83.02% with human
annotations
Data Selection for Compact Adapted SMT Models
International audienceData selection is a common technique for adapting statistical translation models for a specific domain, which has been shown to both improve translation quality and to reduce model size. Selection relies on some in-domain data, of the same domain of the texts expected to be translated. Selecting the sentence-pairs that are most similar to the in-domain data from a pool of parallel texts has been shown to be effective; yet, this approach holds the risk of resulting in a limited coverage, when necessary n-grams that do appear in the pool are less similar to in-domain data that is available in advance. Some methods select additional data based on the actual text that needs to be translated. While useful, this is not always a practical scenario. In this work we describe an extensive exploration of data selection techniques over Arabic to French datasets, and propose methods to address both similarity and coverage considerations while maintaining a limited model size
- …