24 research outputs found
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop
We summarize the accomplishments of a multi-disciplinary workshop exploring
the computational and scientific issues surrounding the discovery of linguistic
units (subwords and words) in a language without orthography. We study the
replacement of orthographic transcriptions by images and/or translated text in
a well-resourced language to help unsupervised discovery from raw speech.Comment: Accepted to ICASSP 201
ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task
International audienceThis paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English→ Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Univer-sité), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encoder-decoder architecture with attention mechanism was used for two primary submissions corresponding to the two EN-PT evaluations sets: (1) TED (MuST-C) and (2) How2. In this paper, we notably investigate impact of pooling heterogeneous corpora for training, impact of target tokeniza-tion (characters or BPEs), impact of speech input segmenta-tion and we also compare our best end-to-end model (BLEU of 26.91 on MuST-C and 43.82 on How2 validation sets) to a pipeline (ASR+MT) approach
A Data Efficient End-To-End Spoken Language Understanding Architecture
End-to-end architectures have been recently proposed for spoken language
understanding (SLU) and semantic parsing. Based on a large amount of data,
those models learn jointly acoustic and linguistic-sequential features. Such
architectures give very good results in the context of domain, intent and slot
detection, their application in a more complex semantic chunking and tagging
task is less easy. For that, in many cases, models are combined with an
external language model to enhance their performance.
In this paper we introduce a data efficient system which is trained
end-to-end, with no additional, pre-trained external module. One key feature of
our approach is an incremental training procedure where acoustic, language and
semantic models are trained sequentially one after the other. The proposed
model has a reasonable size and achieves competitive results with respect to
state-of-the-art while using a small training dataset. In particular, we reach
24.02% Concept Error Rate (CER) on MEDIA/test while training on MEDIA/train
without any additional data.Comment: Accepted to ICASSP 202
Fluent Translations from Disfluent Speech in End-to-End Speech Translation
Spoken language translation applications for speech suffer due to
conversational speech phenomena, particularly the presence of disfluencies.
With the rise of end-to-end speech translation models, processing steps such as
disfluency removal that were previously an intermediate step between speech
recognition and machine translation need to be incorporated into model
architectures. We use a sequence-to-sequence model to translate from noisy,
disfluent speech to fluent text with disfluencies removed using the recently
collected `copy-edited' references for the Fisher Spanish-English dataset. We
are able to directly generate fluent translations and introduce considerations
about how to evaluate success on this task. This work provides a baseline for a
new task, the translation of conversational speech with joint removal of
disfluencies.Comment: Accepted at NAACL 201