106 research outputs found
MCE 2018: The 1st Multi-target Speaker Detection and Identification Challenge Evaluation
The Multi-target Challenge aims to assess how well current speech technology
is able to determine whether or not a recorded utterance was spoken by one of a
large number of blacklisted speakers. It is a form of multi-target speaker
detection based on real-world telephone conversations. Data recordings are
generated from call center customer-agent conversations. The task is to measure
how accurately one can detect 1) whether a test recording is spoken by a
blacklisted speaker, and 2) which specific blacklisted speaker was talking.
This paper outlines the challenge and provides its baselines, results, and
discussions.Comment: http://mce.csail.mit.edu . arXiv admin note: text overlap with
arXiv:1807.0666
Discriminative and generative approaches for long- and short-term speaker characteristics modeling : application to speaker verification
The speaker verification problem can be stated as follows: given two speech recordings, determine whether or not they have been uttered by the same speaker. Most current speaker verification systems are based on Gaussian mixture models. This probabilistic representation allows to adequately model the complex distribution of the underlying speech feature parameters. It however represents an inadequate basis for discriminating between speakers, which is the key issue in the area of speaker verification. In the first part of this thesis, we attempt to overcome these difficulties by proposing to combine support vector machines, a well established discriminative modeling, with two generative approaches based on Gaussian mixture models. In the first generative approach, a target speaker is represented by a Gaussian mixture model corresponding to a Maximum A Posteriori adaptation of a large Gaussian mixture model, coined universal background model, to the target speaker data. The second generative approach is the Joint Factor Analysis that has become the state-of-the-art in the field of speaker verification during the last three years. The advantage of this technique is that it provides a framework of powerful tools for modeling the inter-speaker and channel variabilities. We propose and test several kernel functions that are integrated in the design of both previous combinations. The best results are obtained when the support vector machines are applied within a new space called the "total variability space", defined using the factor analysis. In this novel modeling approach, the channel effect is treated through a combination of linear discnminant analysis and kemel normalization based on the inverse of the within covariance matrix of the speaker.
In the second part of this thesis, we present a new approach to modeling the speaker's longterm prosodic and spectral characteristics. This novel approach is based on continuous approximations of the prosodic and cepstral contours contained in a pseudo-syllabic segment of speech. Each of these contours is fitted to a Legendre polynomial, whose coefficients are modeled by a Gaussian mixture model. The joint factor analysis is used to treat the speaker and channel variabilities. Finally, we perform a scores fusion between systems based on long-term speaker characteristics with those described above that use short-term speaker features
- …