Search CORE

25 research outputs found

Structure of pauses in speech in the context of speaker verification and classification of speech type

Author: Bartosz Ziółko
Magdalena Igras-Cybulska
Marcin Witkowski
Piotr Żelasko
Publication venue: Springer Nature
Publication date: 01/01/2016
Field of study

Regularizing Contrastive Predictive Coding for Speech Applications

Author: Bhati Saurabhchand
Dehak Najim
Moro-Velazquez Laureano
Villalba Jesús
Żelasko Piotr
Publication venue
Publication date: 26/04/2023
Field of study

Self-supervised methods such as Contrastive predictive Coding (CPC) have greatly improved the quality of the unsupervised representations. These representations significantly reduce the amount of labeled data needed for downstream task performance, such as automatic speech recognition. CPC learns representations by learning to predict future frames given current frames. Based on the observation that the acoustic information, e.g., phones, changes slower than the feature extraction rate in CPC, we propose regularization techniques that impose slowness constraints on the features. Here we propose two regularization techniques: Self-expressing constraint and Left-or-Right regularization. We evaluate the proposed model on ABX and linear phone classification tasks, acoustic unit discovery, and automatic speech recognition. The regularized CPC trained on 100 hours of unlabeled data matches the performance of the baseline CPC trained on 360 hours of unlabeled data. We also show that our regularization techniques are complementary to data augmentation and can further boost the system's performance. In monolingual, cross-lingual, or multilingual settings, with/without data augmentation, regardless of the amount of data used for training, our regularized models outperformed the baseline CPC models on the ABX task

arXiv.org e-Print Archive