Search CORE

26 research outputs found

data2vec-aqc: Search for the right Teaching Assistant in the Teacher-Student training setup

Author: Ghosh Sreyan
Lodagala Vasista Sai
Umesh S.
Publication venue
Publication date: 02/11/2022
Field of study

In this paper, we propose a new Self-Supervised Learning (SSL) algorithm called data2vec-aqc, for speech representation learning from unlabeled speech data. Our goal is to improve SSL for speech in domains where both unlabeled and labeled data are limited. Building on the recently introduced data2vec, we introduce additional modules to the data2vec framework that leverage the benefit of data augmentations, quantized representations, and clustering. The interaction between these modules helps solve the cross-contrastive loss as an additional self-supervised objective. data2vec-aqc achieves up to 14.1% and 20.9% relative WER improvement over the existing state-of-the-art data2vec system on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. Our proposed model also achieves up to 17.8% relative WER improvement over the baseline data2vec when fine-tuned on Switchboard data.Comment: Submitted to ICASSP 2023. arXiv admin note: text overlap with arXiv:2210.0259

arXiv.org e-Print Archive

SALTTS: Leveraging Self-Supervised Speech Representations for improved Text-to-Speech Synthesis

Author: Lodagala Vasista Sai
Sivaguru Ramanan
Umesh S
Publication venue
Publication date: 02/08/2023
Field of study

While FastSpeech2 aims to integrate aspects of speech such as pitch, energy, and duration as conditional inputs, it still leaves scope for richer representations. As a part of this work, we leverage representations from various Self-Supervised Learning (SSL) models to enhance the quality of the synthesized speech. In particular, we pass the FastSpeech2 encoder's length-regulated outputs through a series of encoder layers with the objective of reconstructing the SSL representations. In the SALTTS-parallel implementation, the representations from this second encoder are used for an auxiliary reconstruction loss with the SSL features. The SALTTS-cascade implementation, however, passes these representations through the decoder in addition to having the reconstruction loss. The richness of speech characteristics from the SSL features reflects in the output speech quality, with the objective and subjective evaluation measures of the proposed approach outperforming the baseline FastSpeech2.Comment: Accepted for publication at Interspeech 202

arXiv.org e-Print Archive

HIV Skews a Balanced Mtb-Specific Th17 Response in Latent Tuberculosis Subjects to a Pro-inflammatory Profile Independent of Viral Load:HIV alters the nature of the Mtb-specific Th17 response

Author: Adiga Vasista
Alampalli Shuba Varshini
D'Souza George
De Rosa Stephen C.
Dhar Chirag
Finak Greg
Ghate Manisha
Hingankar Nitin
Ottenhoff Tom H.M.
Paranjape Ramesh S.
Rakshit Srabanti
Sahoo Pravat Nalini
Sundararaj Bharath K.
Thakar Madhuri R.
Uday Kumar J Anto Jesuraj
Virkar Rashmi Govind
Vyakarnam Annapurna
Publication venue: 'Elsevier BV'
Publication date: 01/12/2020
Field of study

King's Research Portal

Genomic clustering analysis identifies molecular subtypes of thymic epithelial tumors independent of World Health Organization histologic type

Author: Badve Sunil S.
Basu Kabya
Gökmen-Polar Yesim
Hellyer Jessica A.
Kumar Ansu
Padda Sukhmani K.
Singh Neeraj K.
Vasista Sumanth M.
Wakelee Heather A.
Publication venue: 'Impact Journals, LLC'
Publication date: 08/06/2021
Field of study

Further characterization of thymic epithelial tumors (TETs) is needed. Genomic information from 102 evaluable TETs from The Cancer Genome Atlas (TCGA) dataset and from the IU-TAB-1 cell line (type AB thymoma) underwent clustering analysis to identify molecular subtypes of TETs. Six novel molecular subtypes (TH1-TH6) of TETs from the TCGA were identified, and there was no association with WHO histologic subtype. The IU-TAB-1 cell line clustered into the TH4 molecular subtype and in vitro testing of candidate therapeutics was performed. The IU-TAB-1 cell line was noted to be resistant to everolimus (mTORC1 inhibitor) and sensitive to nelfinavir (AKT1 inhibitor) across the endpoints measured. Sensitivity to nelfinavir was due to the IU-TAB-1 cell line’s gain-of function (GOF) mutation in PIK3CA and amplification of genes observed from array comparative genomic hybridization (aCGH), including AURKA, ERBB2, KIT, PDGFRA and PDGFB, that are known upregulate AKT, while resistance to everolimus was primarily driven by upregulation of downstream signaling of KIT, PDGFRA and PDGFB in the presence of mTORC1 inhibition. We present a novel molecular classification of TETs independent of WHO histologic subtype, which may be used for preclinical validation studies of potential candidate therapeutics of interest for this rare disease

IUPUIScholarWorks

PubMed Central

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video. This task becomes challenging when the source and target languages belong to different language families, resulting in differences in generated audio duration. This is further compounded by the original speaker's rhythm, especially for extempore speech. This paper describes the challenges in regenerating English lecture videos in Indian languages semi-automatically. A prototype is developed for dubbing lectures into 9 Indian languages. A mean-opinion-score (MOS) is obtained for two languages, Hindi and Tamil, on two different courses. The output video is compared with the original video in terms of MOS (1-5) and lip synchronisation with scores of 4.09 and 3.74, respectively. The human effort also reduces by 75%

arXiv.org e-Print Archive

Directional control of weakly localized Raman from a random network of fractal nanowires

Disordered optical media are an emerging class of materials capable of strongly scattering light. Their study is relevant to investigate transport phenomena and for applications in imaging, sensing and energy storage. While such materials can be used to generate coherent light, their directional emission is typically hampered by their very multiple scattering nature. Here, we tune the out-of-plane directionality of coherent Raman light scattered by a fractal network of silicon nanowires. By visualizing Rayleigh scattering, photoluminescence and weakly localized Raman light from the random network of nanowires via real-space microscopy and Fourier imaging, we gain insight on the light transport mechanisms responsible for the material's inelastic coherent signal and for its directionality. The possibility of visualizing and manipulating directional coherent light in such networks of nanowires opens venues for fundamental studies of light propagation in disordered media as well as for the development of next generation optical devices based on disordered structures, inclusive of sensors, light sources and optical switches

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

UCL Discovery

PubMed Central

CCC-wav2vec 2.0: Clustering aided Cross Contrastive Self-supervised learning of speech representations

Author: Ghosh Sreyan
Lodagala Vasista Sai
Umesh S.
Publication venue
Publication date: 13/05/2023
Field of study

While Self-Supervised Learning has helped reap the benefit of the scale from the available unlabeled data, the learning paradigms are continuously being bettered. We present a new pre-training strategy named ccc-wav2vec 2.0, which uses clustering and an augmentation-based cross-contrastive loss as its self-supervised objective. Through the clustering module, we scale down the influence of those negative examples that are highly similar to the positive. The Cross-Contrastive loss is computed between the encoder output of the original sample and the quantizer output of its augmentation and vice-versa, bringing robustness to the pre-training strategy. ccc-wav2vec 2.0 achieves up to 15.6% and 12.7% relative WER improvement over the baseline wav2vec 2.0 on the test-clean and test-other sets, respectively, of LibriSpeech, without the use of any language model. The proposed method also achieves up to 14.9% relative WER improvement over the baseline wav2vec 2.0 when fine-tuned on Switchboard data. We make all our codes publicly available on GitHub.Comment: Accepted to IEEE SLT 202

arXiv.org e-Print Archive

Pressurized Morphing Wing Structures

Author: Allaire G.
Shan Y.
Vasista S.
Zhou M.
Publication venue: 'American Institute of Aeronautics and Astronautics (AIAA)'
Publication date
Field of study

Crossref

Three-dimensional design of a large-displacement morphing wing droop nose device

Author: ANSYS
Felix Nolte
Hans Peter Monner
Marco Burnazzi
MathWorks Inc.
Nolte F
Peter Horst
Schmitz A
Schmitz A
Seume J
Srinivas Vasista
Vasista S
Publication venue: 'SAGE Publications'
Publication date
Field of study

Crossref