The performance of speaker verification systems degrades when vocal effort
conditions between enrollment and test (e.g., shouted vs. normal speech) are
different. This is a potential situation in non-cooperative speaker
verification tasks. In this paper, we present a study on different methods for
linear compensation of embeddings making use of Gaussian mixture models to
cluster shouted and normal speech domains. These compensation techniques are
borrowed from the area of robustness for automatic speech recognition and, in
this work, we apply them to compensate the mismatch between shouted and normal
conditions in speaker verification. Before compensation, shouted condition is
automatically detected by means of logistic regression. The process is
computationally light and it is performed in the back-end of an x-vector
system. Experimental results show that applying the proposed approach in the
presence of vocal effort mismatch yields up to 13.8% equal error rate relative
improvement with respect to a system that applies neither shouted speech
detection nor compensation