Search CORE

6,339 research outputs found

Anti-spoofing Methods for Automatic SpeakerVerification System

Author: Lavrentyeva Galina
Novoselov Sergey
Simonchik Konstantin
Publication venue
Publication date: 24/05/2017
Field of study

Growing interest in automatic speaker verification (ASV)systems has lead to significant quality improvement of spoofing attackson them. Many research works confirm that despite the low equal er-ror rate (EER) ASV systems are still vulnerable to spoofing attacks. Inthis work we overview different acoustic feature spaces and classifiersto determine reliable and robust countermeasures against spoofing at-tacks. We compared several spoofing detection systems, presented so far,on the development and evaluation datasets of the Automatic SpeakerVerification Spoofing and Countermeasures (ASVspoof) Challenge 2015.Experimental results presented in this paper demonstrate that the useof magnitude and phase information combination provides a substantialinput into the efficiency of the spoofing detection systems. Also wavelet-based features show impressive results in terms of equal error rate. Inour overview we compare spoofing performance for systems based on dif-ferent classifiers. Comparison results demonstrate that the linear SVMclassifier outperforms the conventional GMM approach. However, manyresearchers inspired by the great success of deep neural networks (DNN)approaches in the automatic speech recognition, applied DNN in thespoofing detection task and obtained quite low EER for known and un-known type of spoofing attacks.Comment: 12 pages, 0 figures, published in Springer Communications in Computer and Information Science (CCIS) vol. 66

arXiv.org e-Print Archive

Crossref

On Using Backpropagation for Speech Texture Generation and Voice Conversion

Author: Bengio Samy
Chorowski Jan
Saurous Rif A.
Weiss Ron J.
Publication venue
Publication date: 08/03/2018
Field of study

Inspired by recent work on neural network image generation which rely on backpropagation towards the network inputs, we present a proof-of-concept system for speech texture synthesis and voice conversion based on two mechanisms: approximate inversion of the representation learned by a speech recognition neural network, and on matching statistics of neuron activations between different source and target utterances. Similar to image texture synthesis and neural style transfer, the system works by optimizing a cost function with respect to the input waveform samples. To this end we use a differentiable mel-filterbank feature extraction pipeline and train a convolutional CTC speech recognition network. Our system is able to extract speaker characteristics from very limited amounts of target speaker data, as little as a few seconds, and can be used to generate realistic speech babble or reconstruct an utterance in a different voice.Comment: Accepted to ICASSP 201

arXiv.org e-Print Archive

Crossref

Novel Fourier Quadrature Transforms and Analytic Signal Representations for Nonlinear and Non-stationary Time Series Analysis

Author: Singh Pushpendra
Publication venue: 'The Royal Society'
Publication date: 29/03/2018
Field of study

The Hilbert transform (HT) and associated Gabor analytic signal (GAS) representation are well-known and widely used mathematical formulations for modeling and analysis of signals in various applications. In this study, like the HT, to obtain quadrature component of a signal, we propose the novel discrete Fourier cosine quadrature transforms (FCQTs) and discrete Fourier sine quadrature transforms (FSQTs), designated as Fourier quadrature transforms (FQTs). Using these FQTs, we propose sixteen Fourier-Singh analytic signal (FSAS) representations with following properties: (1) real part of eight FSAS representations is the original signal and imaginary part is the FCQT of the real part, (2) imaginary part of eight FSAS representations is the original signal and real part is the FSQT of the real part, (3) like the GAS, Fourier spectrum of the all FSAS representations has only positive frequencies, however unlike the GAS, the real and imaginary parts of the proposed FSAS representations are not orthogonal to each other. The Fourier decomposition method (FDM) is an adaptive data analysis approach to decompose a signal into a set of small number of Fourier intrinsic band functions which are AM-FM components. This study also proposes a new formulation of the FDM using the discrete cosine transform (DCT) with the GAS and FSAS representations, and demonstrate its efficacy for improved time-frequency-energy representation and analysis of nonlinear and non-stationary time series.Comment: 22 pages, 13 figure

arXiv.org e-Print Archive

ZENODO

Dryad Digital Repository (Duke University)

Electronic Archiving System

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Vertical axis non-linearities in wavelength scanning interferometry

Author: Connor Daniel
Jiang Xiang
Leach Richard K.
Moschetti Giuseppe
Muhamedsalih Hussam
Publication venue: EUSPEN
Publication date: 01/01/2015
Field of study

The uncertainty of measurements made on an areal surface topography instrument is directly influenced by its metrological characteristics. In this work, the vertical axis deviation from linearity of a wavelength scanning interferometer is evaluated. The vertical axis non-linearities are caused by the spectral leakage resulting from the Fourier transform algorithm for phase slope estimation. These non-linearities are simulated and the results are compared with experimental measurements. In order to reduce the observed non-linearities, a modification of the algorithm is proposed. The application of a Hamming window and the exclusion of edge points in the extracted phase are shown to increase the accuracy over the whole instrument range

University of Huddersfield Repository

Huddersfield Research Portal

Acoustic signal processing with robust machine learning algorithm for improved monitoring of particulate solid materials in a gas flowline

Author: Andrew Cowell
Bello
Don McGlinchey
Droubi
El-Alej
El-Alej
Guido
Guo
Haugsdal
Hu
Isaacson
Kos
Kuda Tijjani Aminu
Le
Ludeña-Choez
Mackinnon
Mason
McCulloch
McKay
Mirjalili
Mirjalili
Mitrović
Mittal
Odigie
Ooi
Riedmiller
Shannon
Shuiping
Sun
Sun
Thiruvenkatanathan
Toh
Waibel
Wang
Wang
Xie
Yan
Publication venue: 'Elsevier BV'
Publication date: 01/03/2019
Field of study

Crossref

ResearchOnline@GCU

Blind Normalization of Speech From Different Channels

Author: Boll S. F.
Cox R. V.
David N. Levin
Gales M. J. F.
Levin D. N.
Levin D. N.
Levin D. N.
Roweis S. T.
Tenenbaum J. B.
Young S.
Publication venue: 'Acoustical Society of America (ASA)'
Publication date: 02/04/2002
Field of study

We show how to construct a channel-independent representation of speech that has propagated through a noisy reverberant channel. This is done by blindly rescaling the cepstral time series by a non-linear function, with the form of this scale function being determined by previously encountered cepstra from that channel. The rescaled form of the time series is an invariant property of it in the following sense: it is unaffected if the time series is transformed by any time-independent invertible distortion. Because a linear channel with stationary noise and impulse response transforms cepstra in this way, the new technique can be used to remove the channel dependence of a cepstral time series. In experiments, the method achieved greater channel-independence than cepstral mean normalization, and it was comparable to the combination of cepstral mean normalization and spectral subtraction, despite the fact that no measurements of channel noise or reverberations were required (unlike spectral subtraction).Comment: 25 pages, 7 figure

arXiv.org e-Print Archive

Crossref