Search CORE

58 research outputs found

Duration mismatch compensation using four-covariance model and deep neural network for speaker verification

Author: Bousquet Pierre-Michel,
Rouvier Mickael
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2017
Field of study

International audienceDuration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10 % relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap

A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks

Author: Dufour Richard
Labrak Yanis
Rouvier Mickael
Publication venue
Publication date: 22/07/2023
Field of study

We evaluate four state-of-the-art instruction-tuned large language models (LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13 real-world clinical and biomedical natural language processing (NLP) tasks in English, such as named-entity recognition (NER), question-answering (QA), relation extraction (RE), etc. Our overall results demonstrate that the evaluated LLMs begin to approach performance of state-of-the-art models in zero- and few-shot scenarios for most tasks, and particularly well for the QA task, even though they have never seen examples from these tasks before. However, we observed that the classification and RE tasks perform below what can be achieved with a specifically trained model for the medical field, such as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all the studied tasks, with some models being better suited for certain tasks than others.Comment: Under review proces

arXiv.org e-Print Archive

LIA@CLEF 2018: Mining events opinion argumentation from raw unlabeled Twitter data using convolutional neural network

Author: Delorme Alexandre
Dufour Richard
Malinas Damien
Rouvier Mickael
Publication venue: HAL CCSD
Publication date: 10/09/2018
Field of study

International audienceSocial networks on the Internet are becoming increasingly important in our society. In recent years, this type of media, through communication platforms such as Twitter, has brought new research issues due to the massive size of data exchanged and the important number of ever-increasing users. In this context, the CLEF 2018 Mining opinion argumentation task aims to retrieve, for a specific event (festival name or topic), the most diverse argumentative microblogs from a large collection of tweets about festivals in different languages. In this paper, we propose a four-step approach for extracting argumentative microblogs related to a specific query (or event) while no reference data is provided

FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain

Author: Bazoge Adrien
Daille Béatrice
Dufour Richard
Gourraud Pierre-Antoine
Labrak Yanis
Morin Emmanuel
Rouvier Mickael
Publication venue
Publication date: 09/04/2023
Field of study

This paper introduces FrenchMedMCQA, the first publicly available Multiple-Choice Question Answering (MCQA) dataset in French for medical domain. It is composed of 3,105 questions taken from real exams of the French medical specialization diploma in pharmacy, mixing single and multiple answers. Each instance of the dataset contains an identifier, a question, five possible answers and their manual correction(s). We also propose first baseline models to automatically process this MCQA task in order to report on the current performances and to highlight the difficulty of the task. A detailed analysis of the results showed that it is necessary to have representations adapted to the medical domain or to the MCQA task: in our case, English specialized models yielded better results than generic French ones, even though FrenchMedMCQA is in French. Corpus, models and tools are available online

arXiv.org e-Print Archive

ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task

Author: Besacier Laurent
Bougares Fethi
Caubrière Antoine
Estève Yannick
Nguyen Manh Ha
Rouvier Mickael
Tomashenko Natalia
Zanon Boito Marcely
Publication venue: HAL CCSD
Publication date: 02/11/2019
Field of study

International audienceThis paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English→ Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Univer-sité), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encoder-decoder architecture with attention mechanism was used for two primary submissions corresponding to the two EN-PT evaluations sets: (1) TED (MuST-C) and (2) How2. In this paper, we notably investigate impact of pooling heterogeneous corpora for training, impact of target tokeniza-tion (characters or BPEs), impact of speech input segmenta-tion and we also compare our best end-to-end model (BLEU of 26.91 on MuST-C and 43.82 on How2 validation sets) to a pipeline (ASR+MT) approach

LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech

Author: Alisamir Sina
Allauzen Alexandre
Besacier Laurent
Boito Marcely Zanon
Coavoux Maximin
Dinarelli Marco
Esteve Yannick
Evain Solene
Goulian Jerome
Le Hang
Lecouteux Benjamin
Mdhaffar Salima
Nguyen Ha
Parcollet Titouan
Portet Francois
Pupier Adrien
Ringeval Fabien
Rossato Solange
Rouvier Mickael
Schwab Didier
Tomashenko Natalia
Zhang Shucong
Publication venue
Publication date: 11/09/2023
Field of study

Self-supervised learning (SSL) is at the origin of unprecedented improvements in many different domains including computer vision and natural language processing. Speech processing drastically benefitted from SSL as most of the current domain-related tasks are now being approached with pre-trained models. This work introduces LeBenchmark 2.0 an open-source framework for assessing and building SSL-equipped French speech technologies. It includes documented, large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to one billion learnable parameters shared with the community, and an evaluation protocol made of six downstream tasks to complement existing benchmarks. LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for speech with the investigation of frozen versus fine-tuned downstream models, task-agnostic versus task-specific pre-trained models as well as a discussion on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe

arXiv.org e-Print Archive

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE). The latest edition of such joint submission was in SRE 2018, in which the I4U submission was among the best-performing systems. SRE'18 also marks the 10-year anniversary of I4U consortium into NIST SRE series of evaluation. The primary objective of the current paper is to summarize the results and lessons learned based on the twelve sub-systems and their fusion submitted to SRE'18. It is also our intention to present a shared view on the advancements, progresses, and major paradigm shifts that we have witnessed as an SRE participant in the past decade from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm shift from supervector representation to deep speaker embedding, and a switch of research challenge from channel compensation to domain adaptation.Comment: 5 page

arXiv.org e-Print Archive

HAL AMU

INRIA a CCSD electronic archive server

Hal-Diderot

I4U System Description for NIST SRE'20 CTS Challenge

Author: Bailo Ignacio Viñals
Bousquet Pierre-Michel
Buera Luis
Colibro Daniele
Cumani Sandro
Das Rohan Kumar
Deldago Héctor
Giménez Alfonso Ortega
He Liang
Kinnunen Tomi
Lee Kong Aik
Li Haizhou
Liang Tianyu
Liu Meng
Liu Xuechen
Nautsch Andreas
Okabe Koji
Rouvier Mickael
Sahidullah Md
Sun Hanwu
Tao Ruijie
Vair Claudio
Wang Longbiao
Wang Qiongqiong
Yamamoto Hitoshi
Zhang Boning
Publication venue
Publication date: 02/11/2022
Field of study

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge. The I4U's submission was resulted from active collaboration among researchers across eight research teams - I

^2

R (Singapore), UEF (Finland), VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS (Singapore), INRIA (France) and TJU (China). The submission was based on the fusion of top performing sub-systems and sub-fusion systems contributed by individual teams. Efforts have been spent on the use of common development and validation sets, submission schedule and milestone, minimizing inconsistency in trial list and score file format across sites.Comment: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker Recognition Challenge, 14-12 December 202

arXiv.org e-Print Archive

Audio-based video genre identification

Author: Rouvier Mickael
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 05/01/2015
Field of study

EURECOM Repository