research

A speaker adaptive DNN training approach for speaker-independent acoustic inversion

Abstract

We address the speaker-independent acoustic inversion (AI) problem, also referred to as acoustic-to-articulatory mapping. The scarce availability of multi-speaker articulatory data makes it difficult to learn a mapping which generalizes from a limited number of training speakers and reliably reconstructs the articulatory movements of unseen speakers. In this paper, we propose a Multi-task Learning (MTL)-based approach that explicitly separates the modeling of each training speaker AI peculiarities from the modeling of AI characteristics that are shared by all speakers. Our approach stems from the well known Regularized MTL approach and extends it to feed-forward deep neural networks (DNNs). Given multiple training speakers, we learn for each an acoustic-to-articulatory mapping represented by a DNN. Then, through an iterative procedure, we search for a canonical speaker-independent DNN that is "similar" to all speaker-dependent DNNs. The degree of similarity is controlled by a regularization parameter. We report experiments on the University of Wisconsin X-ray Microbeam Database under different training/testing experimental settings. The results obtained indicate that our MTL-trained canonical DNN largely outperforms a standardly trained (i.e., single task learning-based) speaker independent DNN

    Similar works