Search CORE

17 research outputs found

Retaggio

Author: Mehrish Divya
Publication venue: DigitalCommons@WayneState
Publication date: 10/04/2023
Field of study

Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder

Author: Herremans Dorien
Mehrish Ambuj
Melechovsky Jan
Sisman Berrak
Publication venue
Publication date: 07/11/2022
Field of study

Accent plays a significant role in speech communication, influencing understanding capabilities and also conveying a person's identity. This paper introduces a novel and efficient framework for accented Text-to-Speech (TTS) synthesis based on a Conditional Variational Autoencoder. It has the ability to synthesize a selected speaker's speech that is converted to any desired target accent. Our thorough experiments validate the effectiveness of our proposed framework using both objective and subjective evaluations. The results also show remarkable performance in terms of the ability to manipulate accents in the synthesized speech and provide a promising avenue for future accented TTS research.Comment: preprint submitted to a conference, under revie

arXiv.org e-Print Archive

Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model

Author: Ghosal Deepanway
Majumder Navonil
Mehrish Ambuj
Poria Soujanya
Publication venue
Publication date: 29/05/2023
Field of study

The immense scale of the recent large language models (LLM) allows many interesting properties, such as, instruction- and chain-of-thought-based fine-tuning, that has significantly improved zero- and few-shot performance in many natural language processing (NLP) tasks. Inspired by such successes, we adopt such an instruction-tuned LLM Flan-T5 as the text encoder for text-to-audio (TTA) generation -- a task where the goal is to generate an audio from its textual description. The prior works on TTA either pre-trained a joint text-audio encoder or used a non-instruction-tuned model, such as, T5. Consequently, our latent diffusion model (LDM)-based approach TANGO outperforms the state-of-the-art AudioLDM on most metrics and stays comparable on the rest on AudioCaps test set, despite training the LDM on a 63 times smaller dataset and keeping the text encoder frozen. This improvement might also be attributed to the adoption of audio pressure level-based sound mixing for training set augmentation, whereas the prior methods take a random mix.Comment: https://github.com/declare-lab/tang

arXiv.org e-Print Archive

A Review of Deep Learning Techniques for Speech Processing

Author: Bhardwaj Rishabh
Majumder Navonil
Mehrish Ambuj
Mihalcea Rada
Poria Soujanya
Publication venue
Publication date: 01/05/2023
Field of study

The field of speech processing has undergone a transformative shift with the advent of deep learning. The use of multiple processing layers has enabled the creation of models capable of extracting intricate features from speech data. This development has paved the way for unparalleled advancements in speech recognition, text-to-speech synthesis, automatic speech recognition, and emotion recognition, propelling the performance of these tasks to unprecedented heights. The power of deep learning techniques has opened up new avenues for research and innovation in the field of speech processing, with far-reaching implications for a range of industries and applications. This review paper provides a comprehensive overview of the key deep learning models and their applications in speech-processing tasks. We begin by tracing the evolution of speech processing research, from early approaches, such as MFCC and HMM, to more recent advances in deep learning architectures, such as CNNs, RNNs, transformers, conformers, and diffusion models. We categorize the approaches and compare their strengths and weaknesses for solving speech-processing tasks. Furthermore, we extensively cover various speech-processing tasks, datasets, and benchmarks used in the literature and describe how different deep-learning networks have been utilized to tackle these tasks. Additionally, we discuss the challenges and future directions of deep learning in speech processing, including the need for more parameter-efficient, interpretable models and the potential of deep learning for multimodal speech processing. By examining the field's evolution, comparing and contrasting different approaches, and highlighting future directions and challenges, we hope to inspire further research in this exciting and rapidly advancing field

arXiv.org e-Print Archive

ADAPTERMIX: Exploring the Efficacy of Mixture of Adapters for Low-Resource TTS Adaptation

Author: Kashyap Abhinav Ramesh
Majumder Navonil
Mehrish Ambuj
Poria Soujanya
Yingting Li
Publication venue
Publication date: 29/05/2023
Field of study

There are significant challenges for speaker adaptation in text-to-speech for languages that are not widely spoken or for speakers with accents or dialects that are not well-represented in the training data. To address this issue, we propose the use of the "mixture of adapters" method. This approach involves adding multiple adapters within a backbone-model layer to learn the unique characteristics of different speakers. Our approach outperforms the baseline, with a noticeable improvement of 5% observed in speaker preference tests when using only one minute of data for each new speaker. Moreover, following the adapter paradigm, we fine-tune only the adapter parameters (11% of the total model parameters). This is a significant achievement in parameter-efficient speaker adaptation, and one of the first models of its kind. Overall, our proposed approach offers a promising solution to the speech synthesis techniques, particularly for adapting to speakers from diverse backgrounds.Comment: Interspeech 202

arXiv.org e-Print Archive

Partnering with women collectives for delivering essential women\u2019s nutrition interventions in tribal areas of eastern India: a scoping study

Author: Bhalla Surbhi
Bhanot Arti
Bhattacharjee Sourav
Daniel Abner
Gope Rajkumar
Mebrahtu Saba
Sethi Vani
Sharma Deepika Mehrish
Publication venue: icddr,b
Publication date: 05/11/2018
Field of study

Background: We examined the feasibility of engaging women collectives in delivering a package of women\u2019s nutrition messages/services as a funded stakeholder in three tribal-dominated districts of Odisha, Jharkhand and Chhattisgarh States, in eastern India. These districts have high prevalence of child stunting and poor government service outreach. Methods: Conducted between July 2014 and March 2015, an exploratory mix-methods design was adopted (review of coverage data and government reports, field interviews and focus group discussion with multiple stakeholders and intended communities) to assess coverage of women\u2019s nutrition services. A capacity assessment tool was developed to map all types of community collectives and assess their awareness, institutional and programme capacity as a funded stakeholder for delivering women\u2019s nutrition services/behaviour promotion. Results: Limited targeting of pre-pregnancy period, delays in first trimester registration of pregnant women, and low micronutrient supplementation supply and awareness issues emerged as key bottlenecks in improving women\u2019s nutrition in these districts. Amongst the 18 different types of community collectives mapped, Self Help Groups (SHGs) and their federations (tier 2 and tier 3), with total membership of over 650,000, emerged as the most promising community collective due to their vast network, governance structure, bank linkage, and regular interface. Nearly 400,000 (or 20% of women) in these districts can be reached through the mapped 31,919 SHGs. SHGs with organisational readiness for receiving and managing grants for income generation and community development activities varied from 41 to 94% across study districts. Stakeholders perceived that SHGs federations managing grants from government and be engaged for nutrition promotion and service delivery and SHG weekly meetings can serve as community interface for discussing/resolving local issues impeding access to services. Conclusions: Women SHGs (with tier 2 and tier 3) can become direct grantees for strengthening coverage of women\u2019s nutrition interventions in these tribal districts/pockets, provided they are capacitated, supervised and given safe guards against exploitation and violence

Bioline International

SPEAKER EMBEDDINGS FOR DIARIZATION OF BROADCAST DATA IN THE ALLIES CHALLENGE

Author: Carrive Jean
Doukhan David
Evans Nicholas
Galibert Olivier
Larcher Anthony
Mehrish Ambuj
Meignier Sylvain
Tahon Marie
Publication venue: HAL CCSD
Publication date: 07/06/2021
Field of study

International audienceDiarization consists in the segmentation of speech signals and the clustering of homogeneous speaker segments. State-of-the-art systems typically operate upon speaker embeddings, such as ivectors or neural x-vectors, extracted from mel cepstral coefficients (MFCCs) or spectrograms. The recent SincNet architecture extracts x-vectors directly from raw speech signals. The work reported in this paper compares the performance of different embeddings extracted from MFCCs or the raw signal for speaker diarization and broadcast media treated with compression and sub-sampling, operations which typically degrade performance. Experiments are performed with the new ALLIES database that was designed to complement existing, publicly available French corpora of broadcast radio and TV shows. Results show that, in adverse conditions, with compression and sampling mismatch, SincNet x-vectors outperform i-vectors and x-vectors by relative DERs of 43% and 73% respectively. Additionally we found that SincNet x-vectors are not the absolute best embeddings but are more robust to data mismatch than others

HAL Descartes

Partnering with women collectives for delivering essential women’s nutrition interventions in tribal areas of eastern India: a scoping study

Author: Abner Daniel
Arti Bhanot
Deepika Mehrish Sharma
Rajkumar Gope
Saba Mebrahtu
Sourav Bhattacharjee
Surbhi Bhalla
Vani Sethi
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/05/2017
Field of study

Abstract Background We examined the feasibility of engaging women collectives in delivering a package of women’s nutrition messages/services as a funded stakeholder in three tribal-dominated districts of Odisha, Jharkhand and Chhattisgarh States, in eastern India. These districts have high prevalence of child stunting and poor government service outreach. Methods Conducted between July 2014 and March 2015, an exploratory mix-methods design was adopted (review of coverage data and government reports, field interviews and focus group discussion with multiple stakeholders and intended communities) to assess coverage of women’s nutrition services. A capacity assessment tool was developed to map all types of community collectives and assess their awareness, institutional and programme capacity as a funded stakeholder for delivering women’s nutrition services/behaviour promotion. Results Limited targeting of pre-pregnancy period, delays in first trimester registration of pregnant women, and low micronutrient supplementation supply and awareness issues emerged as key bottlenecks in improving women’s nutrition in these districts. Amongst the 18 different types of community collectives mapped, Self Help Groups (SHGs) and their federations (tier 2 and tier 3), with total membership of over 650,000, emerged as the most promising community collective due to their vast network, governance structure, bank linkage, and regular interface. Nearly 400,000 (or 20% of women) in these districts can be reached through the mapped 31,919 SHGs. SHGs with organisational readiness for receiving and managing grants for income generation and community development activities varied from 41 to 94% across study districts. Stakeholders perceived that SHGs federations managing grants from government and be engaged for nutrition promotion and service delivery and SHG weekly meetings can serve as community interface for discussing/resolving local issues impeding access to services. Conclusions Women SHGs (with tier 2 and tier 3) can become direct grantees for strengthening coverage of women’s nutrition interventions in these tribal districts/pockets, provided they are capacitated, supervised and given safe guards against exploitation and violence

Directory of Open Access Journals

Firearms identification by the acoustic signals of their mechanisms

Author: Boris Varer
Buckland
Chen
Eckert
Fraz
Gnanadesikan
Goutte
Grother
Haag
Haag
Haag
Haag
Haag
Hastie
Hollien
Khan
Maher
McCombs
McFee
Mehrish
Mermelstein
Paredes
Pavel Giverts
Raschka
Saad Sofer
Tharwat
Yosef Solewicz
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref

Extraction of PRNU noise from partly decoded video

Author: Al-Ani
Amerini
Bayram
Bin Ma
Chen
Chen
Chunpeng Wang
Fridrich
Goljan
Goljan
Horowitz
Hou
Jian Li
Kang
Korus
Lawgaly
Li
Li
Li
Li
Lin
Lin
Lukas
Mehrish
Muhit
Pande
Pratt
Qin
Sharabayko
Villalba
Wiegand
Yeo-Jin
Publication venue: 'Elsevier BV'
Publication date
Field of study

Crossref