Search CORE

13 research outputs found

Learning speech embeddings for speaker adaptation and speech understanding

Author: Sari Leda
Publication venue
Publication date: 01/05/2021
Field of study

In recent years, deep neural network models gained popularity as a modeling approach for many speech processing tasks including automatic speech recognition (ASR) and spoken language understanding (SLU). In this dissertation, there are two main goals. The first goal is to propose modeling approaches in order to learn speaker embeddings for speaker adaptation or to learn semantic speech embeddings. The second goal is to introduce training objectives that achieve fairness for the ASR and SLU problems. In the case of speaker adaptation, we introduce an auxiliary network to an ASR model and learn to simultaneously detect speaker changes and adapt to the speaker in an unsupervised way. We show that this joint model leads to lower error rates as compared to a two-step approach where the signal is segmented into single speaker regions and then fed into an adaptation model. We then reformulate the speaker adaptation problem from a counterfactual fairness point-of-view and introduce objective functions to match the ASR performance of the individuals in the dataset to that of their counterfactual counterparts. We show that we can achieve lower error rate in an ASR system while reducing the performance disparity between protected groups. In the second half of the dissertation, we focus on SLU and tackle two problems associated with SLU datasets. The first SLU problem is the lack of large speech corpora. To handle this issue, we propose to use available non-parallel text data so that we can leverage the information in text to guide learning of the speech embeddings. We show that this technique increases the intent classification accuracy as compared to a speech-only system. The second SLU problem is the label imbalance problem in the datasets, which is also related to fairness since a model trained on skewed data usually leads to biased results. To achieve fair SLU, we propose to maximize the F-measure instead of conventional cross-entropy minimization and show that it is possible to increase the number of classes with nonzero recall. In the last two chapters, we provide additional discussions on the impact of these projects from both technical and social perspectives, propose directions for future research and summarize the findings

Illinois Digital Environment for Access to Learning and Scholarship Repository

Biased Self-supervised learning for ASR

Author: Guo Jinxi
Kreyssig Florian L.
Mohamed Abdelrahman
Sari Leda
Shi Yangyang
Woodland Philip C.
Publication venue
Publication date: 04/11/2022
Field of study

Self-supervised learning via masked prediction pre-training (MPPT) has shown impressive performance on a range of speech-processing tasks. This paper proposes a method to bias self-supervised learning towards a specific task. The core idea is to slightly finetune the model that is used to obtain the target sequence. This leads to better performance and a substantial increase in training speed. Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames. These approaches are evaluated for automatic speech recognition on the Librispeech corpus, where 100 hours of data served as the labelled data and 860 hours as the unlabelled data. The biased training outperforms the unbiased training by 15.5% after 250k updates and 23.8% after 100k updates on test-other. For the streaming models, the pre-training approach yields a reduction in word error rate of 44.1%.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Augmenting text for spoken language understanding with Large Language Models

Author: Ahn Kwanghoon
Kalinli Ozlem
Kansal Piyush
Kim Suyoun
Lazar Daniel
Le Trang
Sari Leda
Seltzer Michael
Sharma Roshan
Shrivastava Akshat
Publication venue
Publication date: 17/09/2023
Field of study

Spoken semantic parsing (SSP) involves generating machine-comprehensible parses from input speech. Training robust models for existing application domains represented in training data or extending to new domains requires corresponding triplets of speech-transcript-semantic parse data, which is expensive to obtain. In this paper, we address this challenge by examining methods that can use transcript-semantic parse data (unpaired text) without corresponding speech. First, when unpaired text is drawn from existing textual corpora, Joint Audio Text (JAT) and Text-to-Speech (TTS) are compared as ways to generate speech representations for unpaired text. Experiments on the STOP dataset show that unpaired text from existing and new domains improves performance by 2% and 30% in absolute Exact Match (EM) respectively. Second, we consider the setting when unpaired text is not available in existing textual corpora. We propose to prompt Large Language Models (LLMs) to generate unpaired text for existing and new domains. Experiments show that examples and words that co-occur with intents can be used to generate unpaired text with Llama 2.0. Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1.4% and 2.6% absolute for existing and new domains respectively.Comment: Submitted to ICASSP 202

arXiv.org e-Print Archive

Dynamic ASR Pathways: An Adaptive Masking Approach Towards Efficient Pruning of A Multilingual ASR Model

Author: Guo Jinxi
Jia Junteng
Kalinli Ozlem
Li Ke
Mahadeokar Jay
Sari Leda
Shangguan Yuan
Tjandra Andros
Wu Chunyang
Xie Jiamin
Publication venue
Publication date: 22/09/2023
Field of study

Neural network pruning offers an effective method for compressing a multilingual automatic speech recognition (ASR) model with minimal performance loss. However, it entails several rounds of pruning and re-training needed to be run for each language. In this work, we propose the use of an adaptive masking approach in two scenarios for pruning a multilingual ASR model efficiently, each resulting in sparse monolingual models or a sparse multilingual model (named as Dynamic ASR Pathways). Our approach dynamically adapts the sub-network, avoiding premature decisions about a fixed sub-network structure. We show that our approach outperforms existing pruning methods when targeting sparse monolingual models. Further, we illustrate that Dynamic ASR Pathways jointly discovers and trains better sub-networks (pathways) of a single multilingual model by adapting from different sub-network initializations, thereby reducing the need for language-specific pruning

arXiv.org e-Print Archive

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

Author: Adi Yossi
Hsu Wei-Ning
Karrer Brian
Le Matthew
Mahadeokar Jay
Manohar Vimal
Moritz Rashel
Sari Leda
Shi Bowen
Vyas Apoorv
Williamson Mary
Publication venue
Publication date: 23/06/2023
Field of study

Large-scale generative models such as GPT and DALL-E have revolutionized natural language processing and computer vision research. These models not only generate high fidelity text or image outputs, but are also generalists which can solve tasks not explicitly taught. In contrast, speech generative models are still primitive in terms of scale and task generalization. In this paper, we present Voicebox, the most versatile text-guided generative model for speech at scale. Voicebox is a non-autoregressive flow-matching model trained to infill speech, given audio context and text, trained on over 50K hours of speech that are neither filtered nor enhanced. Similar to GPT, Voicebox can perform many different tasks through in-context learning, but is more flexible as it can also condition on future context. Voicebox can be used for mono or cross-lingual zero-shot text-to-speech synthesis, noise removal, content editing, style conversion, and diverse sample generation. In particular, Voicebox outperforms the state-of-the-art zero-shot TTS model VALL-E on both intelligibility (5.9% vs 1.9% word error rates) and audio similarity (0.580 vs 0.681) while being up to 20 times faster. See voicebox.metademolab.com for a demo of the model

arXiv.org e-Print Archive

PENGARUH PEMBERIAN INFORMASI MELALUI MEDIA BOOKLET TERHADAP TINGKAT KEPATUHAN PASIEN DM TIPE 2

Author: Ariyoga I Nyoman
Kio Alfiery Leda
Sari Ni Ketut Puspita
Sutresna I Nyoman
Publication venue: 'Poltekkes Kemenkes Makassar'
Publication date: 30/06/2021
Field of study

Adherence is a major component of successful diabetes treatment which is influenced by knowledge and skills regarding disease management. Providing information through health education using a multimedia approach can help patients to master information more effectively, one example is using booklets. This study purposed to determine the effect of providing information through booklet media on the compliance level of type 2 DM patients. This study used a pre-experimental method with a One group pre-test-posttest design. This study included 36 samples selected by purposive sampling technique. Data collection using questionnaires, data analysis consists of univariate and bivariate analysis. This study showed the most results were 29 people or 80.6% were less obedient during the pre-test and the most were 34 people or 94.4% were obedient during the post-test. The results of the Wilcoxon sign rank test obtained Zstats = 4,949> Ztable = 1.96 and P-value = 0.001 <α 0.05, this result can be concluded that the provision of information through the media booklet has a significant effect on the level of compliance of type 2 DM patients. It is recommended that the hospital use booklet media when providing information to type 2 DM patients about diabetes mellitus and its treatment therapy, so that the information conveyed can be easier to understand

E-Journal Poltekkes Kemenkes Makassar

Novel Hepatitis B Virus Capsid Assembly Modulator Induces Potent Antiviral Responses In Vitro and in Humanized Mice

Author: Amblard Franck
Bassit Leda
Bouclé Sébastien
Boussand Maud
Chen Zhe
Cox Bryan
de Rocquigny Hugues
Di Santo James, P
Fiquet Oriane
Ozturk Tugba
Rat Virgile
Russell Olivia
Sari Ozkan
Schinazi Raymond, F
Strick-Marchand Helene
Tao Sijia
Verma Kiran
Publication venue: American Society for Microbiology
Publication date: 27/01/2020
Field of study

International audienceHepatitis B virus (HBV) affects an estimated 250 million chronic carriers worldwide. Though several vaccines exist, they are ineffective for those already infected. HBV persists due to the formation of covalently closed circular DNA (cccDNA)-the viral minichromosome-in the nucleus of hepatocytes. Current nucleoside analogs and interferon therapies rarely clear cccDNA, requiring lifelong treatment. Our group identified GLP-26, a novel glyoxamide derivative that alters HBV nucleocapsid assembly and prevents viral DNA replication. GLP-26 exhibited single-digit nanomolar anti-HBV activity, inhibition of HBV e antigen (HBeAg) secretion, and reduced cccDNA amplification, in addition to showing a promising preclinical profile. Strikingly, long term combination treatment with entecavir in a humanized mouse model induced a decrease in viral loads and viral antigens that was sustained for up to 12 weeks after treatment cessation

HAL-Inserm

HAL Université de Tours

HAL-Pasteur

Synthesis and antiviral evaluation of 2′,2′,3′,3′-tetrafluoro nucleoside analogs

Author: Boydell
Bryan Cox
Christina Gavegnano
Dey
Franck Amblard
Huang
Kramer
Leda Bassit
Linclau
Liu
Louise McCormick
McGuigan
Ozkan Sari
Perlman
Porcheddu
Pradere
Purser
Raymond F. Schinazi
Reichman
Schinazi
Steven J. Coats
Stuyver
Stuyver
Su
Tamara R. McBrayer
Uebelhoer
Watanabe
Watanabe
Wright
Wójtowicz-Rajchel
Zmurko
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

2′-Chloro,2′-fluoro Ribonucleotide Prodrugs with Potent Pan-genotypic Activity against Hepatitis C Virus Replication in Culture

Author: Ahmed Khalil (2132113)
Coralie De Schutter (2100535)
Franck Amblard (1346292)
Leda Bassit (250822)
Longhu Zhou (1346283)
Maryam Ehteshami (1455148)
Nageh Abou-Taleb (4170508)
Olivia Ollinger Russell (4170505)
Ozkan Sari (1916068)
Peng Liu (120506)
Raymond F. Schinazi (250828)
Robert A. Domaoal (2164108)
Sawsan Mahmoud (4170502)
Shaoman Zhou (2372137)
Sheida Amiralaei (3611915)
Sijia Tao (1455145)
Steven J. Coats (1455139)
Tamara McBrayer (4170499)
Tony Whitaker (1346286)
Publication venue
Publication date
Field of study

Pan-genotypic nucleoside HCV inhibitors display a high genetic barrier to drug resistance and are the preferred direct-acting agents to achieve complete sustained virologic response in humans. Herein, we report, the discovery of a β-d-2′-Cl,2′-F-uridine phosphoramidate nucleotide 16, as a nontoxic pan-genotypic anti-HCV agent. Phosphoramidate 16 in its 5′-triphosphate form specifically inhibited HCV NS5B polymerase with no marked inhibition of human polymerases and cellular mitochondrial RNA polymerase. Studies on the intracellular half-life of phosphoramidate 16-TP in live cells demonstrated favorable half-life of 11.6 h, suggesting once-a-day dosing. Stability in human blood and favorable metabolism in human intestinal microsomes and liver microsomes make phosphoramidate 16 a prospective candidate for further studies to establish its potential value as a new anti-HCV agent

The Francis Crick Institute