24 research outputs found
STOP: A dataset for Spoken Task Oriented Semantic Parsing
End-to-end spoken language understanding (SLU) predicts intent directly from
audio using a single model. It promises to improve the performance of assistant
systems by leveraging acoustic information lost in the intermediate textual
representation and preventing cascading errors from Automatic Speech
Recognition (ASR). Further, having one unified model has efficiency advantages
when deploying assistant systems on-device. However, the limited number of
public audio datasets with semantic parse labels hinders the research progress
in this area. In this paper, we release the Spoken Task-Oriented semantic
Parsing (STOP) dataset, the largest and most complex SLU dataset to be publicly
available. Additionally, we define low-resource splits to establish a benchmark
for improving SLU when limited labeled data is available. Furthermore, in
addition to the human-recorded audio, we are releasing a TTS-generated version
to benchmark the performance for low-resource domain adaptation of end-to-end
SLU systems. Initial experimentation show end-to-end SLU models performing
slightly worse than their cascaded counterparts, which we hope encourages
future work in this direction
Scaling Speech Technology to 1,000+ Languages
Expanding the language coverage of speech technology has the potential to
improve access to information for many more people. However, current speech
technology is restricted to about one hundred languages which is a small
fraction of the over 7,000 languages spoken around the world. The Massively
Multilingual Speech (MMS) project increases the number of supported languages
by 10-40x, depending on the task. The main ingredients are a new dataset based
on readings of publicly available religious texts and effectively leveraging
self-supervised learning. We built pre-trained wav2vec 2.0 models covering
1,406 languages, a single multilingual automatic speech recognition model for
1,107 languages, speech synthesis models for the same number of languages, as
well as a language identification model for 4,017 languages. Experiments show
that our multilingual speech recognition model more than halves the word error
rate of Whisper on 54 languages of the FLEURS benchmark while being trained on
a small fraction of the labeled data
CoNLL 2017 Shared Task : Multilingual Parsing from Raw Text to Universal Dependencies
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating systems.Peer reviewe
Relatório de estágio em farmácia comunitária
Relatório de estágio realizado no âmbito do Mestrado Integrado em Ciências Farmacêuticas, apresentado à Faculdade de Farmácia da Universidade de Coimbr
Les unités plus grossières bénéficient-elles d'un préapprentissage de la parole basé sur la prédiction de cluster?
International audienceThe research community has produced many successful selfsupervised speech representation learning methods over the past few years. Discrete units have been utilized in various self-supervised learning frameworks, such as VQ-VAE [1], wav2vec 2.0 [2], HuBERT [3], and Wav2Seq [4]. This paper studies the impact of altering the granularity and improving the quality of these discrete acoustic units for pre-training encoder-only and encoder-decoder models. We systematically study the current proposals of using Byte-Pair Encoding (BPE) and new extensions that use cluster smoothing and Brown clustering. The quality of learned units is studied intrinsically using zero speech metrics and on the downstream speech recognition (ASR) task. Our results suggest that longer-range units are helpful for encoder-decoder pre-training; however, encoder-only masked-prediction models cannot yet benefit from self-supervised word-like targets
Generative Spoken Dialogue Language Modeling
We introduce dGSLM, the first "textless" model able to generate audio samples of naturalistic spoken dialogues. It uses recent work on unsupervised spoken unit discovery coupled with a dual-tower transformer architecture with cross-attention trained on 2000 hours of two-channel raw conversational audio (Fisher dataset) without any text or labels. It is able to generate speech, laughter and other paralinguistic signals in the two channels simultaneously and reproduces naturalistic turn taking. Generation samples can be found at: https://speechbot.github.io/dgslm
Textless-lib: a Library for Textless Spoken Language Processing
International audienceTextless spoken language processing research aims to extend the applicability of standard NLP toolset onto spoken language and languages with few or no textual resources. In this paper, we introduce textless-lib, a PyTorch-based library aimed to facilitate research in this research area. We describe the building blocks that the library provides and demonstrate its usability by discuss three different use-case examples: (i) speaker probing, (ii) speech resynthesis and compression, and (iii) speech continuation. We believe that textless-lib substantially simplifies research the textless setting and will be handful not only for speech researchers but also for the NLP community at large. The code, documentation, and pre-trained models are available at https://github.com/ facebookresearch/textlesslib/
STOP: A dataset for spoken task oriented semantic parsing
International audienceEnd-to-end spoken language understanding (SLU) predicts intent directly from audio using a single model. It promises to improve the performance of assistant systems by leveraging acoustic information lost in the intermediate textual representation and preventing cascading errors from Automatic Speech Recognition (ASR). Further, having one unified model has efficiency advantages when deploying assistant systems on-device. However, the limited number of public audio datasets with semantic parse labels hinders the research progress in this area. In this paper, we release the Spoken Task-Oriented semantic Parsing (STOP) dataset 1 , the largest and most complex SLU dataset publicly available. Additionally, we define low-resource splits to establish a benchmark for improving SLU when limited labeled data is available. Furthermore, in addition to the human-recorded audio, we are releasing a TTS-generated versions to benchmark the performance for low-resource and domain adaptation of end-to-end SLU systems
CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies
The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets. In 2017, one of two tasks was devoted to learning dependency parsers for a large number of languages, in a real-world setting without any gold-standard annotation on input. All test sets followed a unified annotation scheme, namely that of Universal Dependencies. In this paper, we define the task and evaluation methodology, describe data preparation, report and analyze the main results, and provide a brief categorization of the different approaches of the participating syste
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
Universal Dependencies is a project that seeks to develop cross-linguistically consistent treebank annotation for many languages, with the goal of facilitating multilingual parser development, cross-lingual learning, and parsing research from a language typology perspective. The annotation scheme is based on (universal) Stanford dependencies (de Marneffe et al., 2006, 2008, 2014), Google universal part-of-speech tags (Petrov et al., 2012), and the Interset interlingua for morphosyntactic tagsets (Zeman, 2008).
This release contains the test data used in the CoNLL 2017 shared task on parsing Universal Dependencies. Due to the shared task the test data was held hidden and not released together with the training and development data of UD 2.0. Therefore this release complements the UD 2.0 release (http://hdl.handle.net/11234/1-1983) to a full release of UD treebanks. In addition, the present release contains 18 new parallel test sets and 4 test sets in surprise languages. The present release also includes the development data already released with UD 2.0. Unlike regular UD releases, this one uses the folder-file structure that was visible to the systems participating in the shared task