58 research outputs found
Duration mismatch compensation using four-covariance model and deep neural network for speaker verification
International audienceDuration mismatch between enrollment and test utterances still remains a major concern for reliability of real-life speaker recognition applications. Two approaches are proposed here to deal with this case when using the i-vector representation. The first one is an adaptation of Gaussian Probabilistic Linear Discriminant Analysis (PLDA) modeling, which can be extended to the case of any shift between i-vectors drawn from two distinct distributions. The second one attempts to map i-vectors of truncated segments of an utterance to the i-vector of the full segment, by the use of deep neural networks (DNN). Our results show that both new approaches outperform the standard PLDA by about 10 % relative, noting that these back-end methods could complement those quantifying the i-vector uncertainty during its extraction process, in the case of duration gap
A Zero-shot and Few-shot Study of Instruction-Finetuned Large Language Models Applied to Clinical and Biomedical Tasks
We evaluate four state-of-the-art instruction-tuned large language models
(LLMs) -- ChatGPT, Flan-T5 UL2, Tk-Instruct, and Alpaca -- on a set of 13
real-world clinical and biomedical natural language processing (NLP) tasks in
English, such as named-entity recognition (NER), question-answering (QA),
relation extraction (RE), etc. Our overall results demonstrate that the
evaluated LLMs begin to approach performance of state-of-the-art models in
zero- and few-shot scenarios for most tasks, and particularly well for the QA
task, even though they have never seen examples from these tasks before.
However, we observed that the classification and RE tasks perform below what
can be achieved with a specifically trained model for the medical field, such
as PubMedBERT. Finally, we noted that no LLM outperforms all the others on all
the studied tasks, with some models being better suited for certain tasks than
others.Comment: Under review proces
LIA@CLEF 2018: Mining events opinion argumentation from raw unlabeled Twitter data using convolutional neural network
International audienceSocial networks on the Internet are becoming increasingly important in our society. In recent years, this type of media, through communication platforms such as Twitter, has brought new research issues due to the massive size of data exchanged and the important number of ever-increasing users. In this context, the CLEF 2018 Mining opinion argumentation task aims to retrieve, for a specific event (festival name or topic), the most diverse argumentative microblogs from a large collection of tweets about festivals in different languages. In this paper, we propose a four-step approach for extracting argumentative microblogs related to a specific query (or event) while no reference data is provided
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain
This paper introduces FrenchMedMCQA, the first publicly available
Multiple-Choice Question Answering (MCQA) dataset in French for medical domain.
It is composed of 3,105 questions taken from real exams of the French medical
specialization diploma in pharmacy, mixing single and multiple answers. Each
instance of the dataset contains an identifier, a question, five possible
answers and their manual correction(s). We also propose first baseline models
to automatically process this MCQA task in order to report on the current
performances and to highlight the difficulty of the task. A detailed analysis
of the results showed that it is necessary to have representations adapted to
the medical domain or to the MCQA task: in our case, English specialized models
yielded better results than generic French ones, even though FrenchMedMCQA is
in French. Corpus, models and tools are available online
ON-TRAC Consortium End-to-End Speech Translation Systems for the IWSLT 2019 Shared Task
International audienceThis paper describes the ON-TRAC Consortium translation systems developed for the end-to-end model task of IWSLT Evaluation 2019 for the English→ Portuguese language pair. ON-TRAC Consortium is composed of researchers from three French academic laboratories: LIA (Avignon Univer-sité), LIG (Université Grenoble Alpes), and LIUM (Le Mans Université). A single end-to-end model built as a neural encoder-decoder architecture with attention mechanism was used for two primary submissions corresponding to the two EN-PT evaluations sets: (1) TED (MuST-C) and (2) How2. In this paper, we notably investigate impact of pooling heterogeneous corpora for training, impact of target tokeniza-tion (characters or BPEs), impact of speech input segmenta-tion and we also compare our best end-to-end model (BLEU of 26.91 on MuST-C and 43.82 on How2 validation sets) to a pipeline (ASR+MT) approach
LeBenchmark 2.0: a Standardized, Replicable and Enhanced Framework for Self-supervised Representations of French Speech
Self-supervised learning (SSL) is at the origin of unprecedented improvements
in many different domains including computer vision and natural language
processing. Speech processing drastically benefitted from SSL as most of the
current domain-related tasks are now being approached with pre-trained models.
This work introduces LeBenchmark 2.0 an open-source framework for assessing and
building SSL-equipped French speech technologies. It includes documented,
large-scale and heterogeneous corpora with up to 14,000 hours of heterogeneous
speech, ten pre-trained SSL wav2vec 2.0 models containing from 26 million to
one billion learnable parameters shared with the community, and an evaluation
protocol made of six downstream tasks to complement existing benchmarks.
LeBenchmark 2.0 also presents unique perspectives on pre-trained SSL models for
speech with the investigation of frozen versus fine-tuned downstream models,
task-agnostic versus task-specific pre-trained models as well as a discussion
on the carbon footprint of large-scale model training.Comment: Under submission at Computer Science and Language. Preprint allowe
I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences
The I4U consortium was established to facilitate a joint entry to NIST
speaker recognition evaluations (SRE). The latest edition of such joint
submission was in SRE 2018, in which the I4U submission was among the
best-performing systems. SRE'18 also marks the 10-year anniversary of I4U
consortium into NIST SRE series of evaluation. The primary objective of the
current paper is to summarize the results and lessons learned based on the
twelve sub-systems and their fusion submitted to SRE'18. It is also our
intention to present a shared view on the advancements, progresses, and major
paradigm shifts that we have witnessed as an SRE participant in the past decade
from SRE'08 to SRE'18. In this regard, we have seen, among others, a paradigm
shift from supervector representation to deep speaker embedding, and a switch
of research challenge from channel compensation to domain adaptation.Comment: 5 page
I4U System Description for NIST SRE'20 CTS Challenge
This manuscript describes the I4U submission to the 2020 NIST Speaker
Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS)
Challenge. The I4U's submission was resulted from active collaboration among
researchers across eight research teams - IR (Singapore), UEF (Finland),
VALPT (Italy, Spain), NEC (Japan), THUEE (China), LIA (France), NUS
(Singapore), INRIA (France) and TJU (China). The submission was based on the
fusion of top performing sub-systems and sub-fusion systems contributed by
individual teams. Efforts have been spent on the use of common development and
validation sets, submission schedule and milestone, minimizing inconsistency in
trial list and score file format across sites.Comment: SRE 2021, NIST Speaker Recognition Evaluation Workshop, CTS Speaker
Recognition Challenge, 14-12 December 202
- …