6,339 research outputs found
Anti-spoofing Methods for Automatic SpeakerVerification System
Growing interest in automatic speaker verification (ASV)systems has lead to
significant quality improvement of spoofing attackson them. Many research works
confirm that despite the low equal er-ror rate (EER) ASV systems are still
vulnerable to spoofing attacks. Inthis work we overview different acoustic
feature spaces and classifiersto determine reliable and robust countermeasures
against spoofing at-tacks. We compared several spoofing detection systems,
presented so far,on the development and evaluation datasets of the Automatic
SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge
2015.Experimental results presented in this paper demonstrate that the useof
magnitude and phase information combination provides a substantialinput into
the efficiency of the spoofing detection systems. Also wavelet-based features
show impressive results in terms of equal error rate. Inour overview we compare
spoofing performance for systems based on dif-ferent classifiers. Comparison
results demonstrate that the linear SVMclassifier outperforms the conventional
GMM approach. However, manyresearchers inspired by the great success of deep
neural networks (DNN)approaches in the automatic speech recognition, applied
DNN in thespoofing detection task and obtained quite low EER for known and
un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer
and Information Science (CCIS) vol. 66
On Using Backpropagation for Speech Texture Generation and Voice Conversion
Inspired by recent work on neural network image generation which rely on
backpropagation towards the network inputs, we present a proof-of-concept
system for speech texture synthesis and voice conversion based on two
mechanisms: approximate inversion of the representation learned by a speech
recognition neural network, and on matching statistics of neuron activations
between different source and target utterances. Similar to image texture
synthesis and neural style transfer, the system works by optimizing a cost
function with respect to the input waveform samples. To this end we use a
differentiable mel-filterbank feature extraction pipeline and train a
convolutional CTC speech recognition network. Our system is able to extract
speaker characteristics from very limited amounts of target speaker data, as
little as a few seconds, and can be used to generate realistic speech babble or
reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201
Novel Fourier Quadrature Transforms and Analytic Signal Representations for Nonlinear and Non-stationary Time Series Analysis
The Hilbert transform (HT) and associated Gabor analytic signal (GAS)
representation are well-known and widely used mathematical formulations for
modeling and analysis of signals in various applications. In this study, like
the HT, to obtain quadrature component of a signal, we propose the novel
discrete Fourier cosine quadrature transforms (FCQTs) and discrete Fourier sine
quadrature transforms (FSQTs), designated as Fourier quadrature transforms
(FQTs). Using these FQTs, we propose sixteen Fourier-Singh analytic signal
(FSAS) representations with following properties: (1) real part of eight FSAS
representations is the original signal and imaginary part is the FCQT of the
real part, (2) imaginary part of eight FSAS representations is the original
signal and real part is the FSQT of the real part, (3) like the GAS, Fourier
spectrum of the all FSAS representations has only positive frequencies, however
unlike the GAS, the real and imaginary parts of the proposed FSAS
representations are not orthogonal to each other. The Fourier decomposition
method (FDM) is an adaptive data analysis approach to decompose a signal into a
set of small number of Fourier intrinsic band functions which are AM-FM
components. This study also proposes a new formulation of the FDM using the
discrete cosine transform (DCT) with the GAS and FSAS representations, and
demonstrate its efficacy for improved time-frequency-energy representation and
analysis of nonlinear and non-stationary time series.Comment: 22 pages, 13 figure
Vertical axis non-linearities in wavelength scanning interferometry
The uncertainty of measurements made on an areal surface topography instrument is directly influenced by its metrological characteristics. In this work, the vertical axis deviation from linearity of a wavelength scanning interferometer is evaluated. The vertical axis non-linearities are caused by the spectral leakage resulting from the Fourier transform algorithm for phase slope estimation. These non-linearities are simulated and the results are compared with experimental measurements. In order to reduce the observed non-linearities, a
modification of the algorithm is proposed. The application of a Hamming window and the exclusion of edge points in the extracted phase are shown to increase the accuracy over the whole instrument range
Blind Normalization of Speech From Different Channels
We show how to construct a channel-independent representation of speech that
has propagated through a noisy reverberant channel. This is done by blindly
rescaling the cepstral time series by a non-linear function, with the form of
this scale function being determined by previously encountered cepstra from
that channel. The rescaled form of the time series is an invariant property of
it in the following sense: it is unaffected if the time series is transformed
by any time-independent invertible distortion. Because a linear channel with
stationary noise and impulse response transforms cepstra in this way, the new
technique can be used to remove the channel dependence of a cepstral time
series. In experiments, the method achieved greater channel-independence than
cepstral mean normalization, and it was comparable to the combination of
cepstral mean normalization and spectral subtraction, despite the fact that no
measurements of channel noise or reverberations were required (unlike spectral
subtraction).Comment: 25 pages, 7 figure
- …