2,600 research outputs found
Homomorphic Encryption for Speaker Recognition: Protection of Biometric Templates and Vendor Model Parameters
Data privacy is crucial when dealing with biometric data. Accounting for the
latest European data privacy regulation and payment service directive,
biometric template protection is essential for any commercial application.
Ensuring unlinkability across biometric service operators, irreversibility of
leaked encrypted templates, and renewability of e.g., voice models following
the i-vector paradigm, biometric voice-based systems are prepared for the
latest EU data privacy legislation. Employing Paillier cryptosystems, Euclidean
and cosine comparators are known to ensure data privacy demands, without loss
of discrimination nor calibration performance. Bridging gaps from template
protection to speaker recognition, two architectures are proposed for the
two-covariance comparator, serving as a generative model in this study. The
first architecture preserves privacy of biometric data capture subjects. In the
second architecture, model parameters of the comparator are encrypted as well,
such that biometric service providers can supply the same comparison modules
employing different key pairs to multiple biometric service operators. An
experimental proof-of-concept and complexity analysis is carried out on the
data from the 2013-2014 NIST i-vector machine learning challenge
Semi-Supervised Speech Emotion Recognition with Ladder Networks
Speech emotion recognition (SER) systems find applications in various fields
such as healthcare, education, and security and defense. A major drawback of
these systems is their lack of generalization across different conditions. This
problem can be solved by training models on large amounts of labeled data from
the target domain, which is expensive and time-consuming. Another approach is
to increase the generalization of the models. An effective way to achieve this
goal is by regularizing the models through multitask learning (MTL), where
auxiliary tasks are learned along with the primary task. These methods often
require the use of labeled data which is computationally expensive to collect
for emotion recognition (gender, speaker identity, age or other emotional
descriptors). This study proposes the use of ladder networks for emotion
recognition, which utilizes an unsupervised auxiliary task. The primary task is
a regression problem to predict emotional attributes. The auxiliary task is the
reconstruction of intermediate feature representations using a denoising
autoencoder. This auxiliary task does not require labels so it is possible to
train the framework in a semi-supervised fashion with abundant unlabeled data
from the target domain. This study shows that the proposed approach creates a
powerful framework for SER, achieving superior performance than fully
supervised single-task learning (STL) and MTL baselines. The approach is
implemented with several acoustic features, showing that ladder networks
generalize significantly better in cross-corpus settings. Compared to the STL
baselines, the proposed approach achieves relative gains in concordance
correlation coefficient (CCC) between 3.0% and 3.5% for within corpus
evaluations, and between 16.1% and 74.1% for cross corpus evaluations,
highlighting the power of the architecture
- …