Contrastive Predictive Coding (CPC), based on predicting future segments of
speech based on past segments is emerging as a powerful algorithm for
representation learning of speech signal. However, it still under-performs
other methods on unsupervised evaluation benchmarks. Here, we introduce
WavAugment, a time-domain data augmentation library and find that applying
augmentation in the past is generally more efficient and yields better
performances than other methods. We find that a combination of pitch
modification, additive noise and reverberation substantially increase the
performance of CPC (relative improvement of 18-22%), beating the reference
Libri-light results with 600 times less data. Using an out-of-domain dataset,
time-domain data augmentation can push CPC to be on par with the state of the
art on the Zero Speech Benchmark 2017. We also show that time-domain data
augmentation consistently improves downstream limited-supervision phoneme
classification tasks by a factor of 12-15% relative