'Institute of Electrical and Electronics Engineers (IEEE)'
Doi
Abstract
This paper compares two approaches of automatic age and gen-der classification with 7 classes. The first approach are Gaus-sian Mixture Models (GMMs) with Universal Background Models (UBMs), which is well known for the task of speaker identifica-tion/verification. The training is performed by the EM algorithm or MAP adaptation respectively. For the second approach for each speaker of the test and training set a GMM model is trained. The means of each model are extracted and concatenated, which results in a GMM supervector for each speaker. These supervectors are then used in a support vector machine (SVM). Three different ker-nels were employed for the SVM approach: a polynomial kernel (with different polynomials), an RBF kernel and a linear GMM dis-tance kernel, based on the KL divergence. With the SVM approach we improved the recognition rate to 74 % (p < 0.001) and are in the same range as humans. Index Terms β Acoustic signal analysis, speaker classification, age, gender, Gaussian mixture models (GMM), support vector ma-chine (SVM) 1