837 research outputs found
Application of the Mutual Information Minimization to speaker recognition / identification improvement
In this paper we propose the inversion of nonlinear distortions in
order to improve the recognition rates of a speaker recognizer system. We
study the effect of saturations on the test signals, trying to take into account real
situations where the training material has been recorded in a controlled
situation but the testing signals present some mismatch with the input signal
level (saturations). The experimental results for speaker recognition shows that
a combination of several strategies can improve the recognition rates with
saturated test sentences from 80% to 89.39%, while the results with clean
speech (without saturation) is 87.76% for one microphone, and for speaker
identification can reduce the minimum detection cost function with saturated
test sentences from 6.42% to 4.15%, while the results with clean speech
(without saturation) is 5.74% for one microphone and 7.02% for the other one
Speaker recognition improvement using blind inversion of distortions
In this paper we propose the inversion of nonlinear
distortions in order to improve the recognition rates of a
speaker recognizer system. We study the effect of
saturations on the test signals, trying to take into account
real situations where the training material has been recorded
in a controlled situation but the testing signals present some
mismatch with the input signal level (saturations). The
experimental results shows that a combination of several
strategies can improve the recognition rates with saturated
test sentences from 80% to 89.39%, while the results with
clean speech (without saturation) is 87.76% for one
microphone
OBJECTIVE AND SUBJECTIVE EVALUATION OF DEREVERBERATION ALGORITHMS
Reverberation significantly impacts the quality and intelligibility of speech. Several dereverberation algorithms have been proposed in the literature to combat this problem. A majority of these algorithms utilize a single channel and are developed for monaural applications, and as such do not preserve the cues necessary for sound localization. This thesis describes a blind two-channel dereverberation technique that improves the quality of speech corrupted by reverberation while preserving cues that affect localization. The method is based by combining a short term (2ms) and long term (20ms) weighting function of the linear prediction (LP) residual of the input signal. The developed and other dereverberation algorithms are evaluated objectively and subjectively in terms of sound quality and localization accuracy. The binaural adaptation provides a significant increase in sound quality while removing the loss in localization ability found in the bilateral implementation
A non-linear polynomial approximation filter for robust speaker verification
Bibliography: leaves 101-109
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
Speech Recognition
Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
EMG-to-Speech: Direct Generation of Speech from Facial Electromyographic Signals
The general objective of this work is the design, implementation, improvement and evaluation of a system that uses surface electromyographic (EMG) signals and directly synthesizes an audible speech output: EMG-to-speech
- …