26 research outputs found
Regularizing Contrastive Predictive Coding for Speech Applications
Self-supervised methods such as Contrastive predictive Coding (CPC) have
greatly improved the quality of the unsupervised representations. These
representations significantly reduce the amount of labeled data needed for
downstream task performance, such as automatic speech recognition. CPC learns
representations by learning to predict future frames given current frames.
Based on the observation that the acoustic information, e.g., phones, changes
slower than the feature extraction rate in CPC, we propose regularization
techniques that impose slowness constraints on the features. Here we propose
two regularization techniques: Self-expressing constraint and Left-or-Right
regularization. We evaluate the proposed model on ABX and linear phone
classification tasks, acoustic unit discovery, and automatic speech
recognition. The regularized CPC trained on 100 hours of unlabeled data matches
the performance of the baseline CPC trained on 360 hours of unlabeled data. We
also show that our regularization techniques are complementary to data
augmentation and can further boost the system's performance. In monolingual,
cross-lingual, or multilingual settings, with/without data augmentation,
regardless of the amount of data used for training, our regularized models
outperformed the baseline CPC models on the ABX task