1 research outputs found
Joint gender and age estimation based on speech signals using x-vectors and transfer learning
In this paper we extend the x-vector framework for the task of speaker's age
estimation and gender classification. In particular, we replace the baseline
multilayer-TDNN architecture with QuartzNet, a convolutional architecture that
has gained success in the field of speech recognition. We further propose a
two-staged transfer learning scheme, utilizing large scale speech datasets:
VoxCeleb and Common Voice, and usage of multitask learning to allow for joint
age estimation and gender classification with a single system. We train and
evaluate the performance on the TIMIT dataset. The proposed transfer learning
scheme yields consecutive performance improvements in terms of both age
estimation error and gender classification accuracy and the best performing
system achieves new state-of-the-art results on the task of age estimation on
the TIMIT TEST dataset with MAE of 5.12 and 5.29 years and RMSE of 7.24 and
8.12 years for male and female speakers respectively while maintaining a gender
classification accuracy of 99.6%