96 research outputs found
Robust equalization of multichannel acoustic systems
In most real-world acoustical scenarios, speech signals captured by distant microphones from a source are reverberated due to multipath propagation, and the reverberation may impair speech intelligibility. Speech dereverberation can be achieved
by equalizing the channels from the source to microphones. Equalization systems can
be computed using estimates of multichannel acoustic impulse responses. However,
the estimates obtained from system identification always include errors; the fact that
an equalization system is able to equalize the estimated multichannel acoustic system does not mean that it is able to equalize the true system. The objective of this
thesis is to propose and investigate robust equalization methods for multichannel
acoustic systems in the presence of system identification errors.
Equalization systems can be computed using the multiple-input/output inverse theorem or multichannel least-squares method. However, equalization systems
obtained from these methods are very sensitive to system identification errors. A
study of the multichannel least-squares method with respect to two classes of characteristic channel zeros is conducted. Accordingly, a relaxed multichannel least-
squares method is proposed. Channel shortening in connection with the multiple-
input/output inverse theorem and the relaxed multichannel least-squares method is
discussed.
Two algorithms taking into account the system identification errors are developed. Firstly, an optimally-stopped weighted conjugate gradient algorithm is
proposed. A conjugate gradient iterative method is employed to compute the equalization system. The iteration process is stopped optimally with respect to system identification errors. Secondly, a system-identification-error-robust equalization
method exploring the use of error models is presented, which incorporates system
identification error models in the weighted multichannel least-squares formulation
Spherical microphone array acoustic rake receivers
Several signal independent acoustic rake receivers are proposed for speech dereverberation using spherical microphone arrays. The proposed rake designs take advantage of multipaths, by separately capturing and combining early reflections with the direct path. We investigate several approaches in combining reflections with the direct path source signal, including the development of beam patterns that point nulls at all preceding reflections. The proposed designs are tested in experimental simulations and their dereverberation performances evaluated using objective measures. For the tested configuration, the proposed designs achieve higher levels of dereverberation compared to conventional signal independent beamforming systems; achieving up to 3.6 dB improvement in the direct-to-reverberant ratio over the plane-wave decomposition beamformer
System Identification with Applications in Speech Enhancement
As the increasing popularity of integrating hands-free telephony on mobile portable devices
and the rapid development of voice over internet protocol, identification of acoustic
systems has become desirable for compensating distortions introduced to speech signals
during transmission, and hence enhancing the speech quality. The objective of this research
is to develop system identification algorithms for speech enhancement applications
including network echo cancellation and speech dereverberation.
A supervised adaptive algorithm for sparse system identification is developed for
network echo cancellation. Based on the framework of selective-tap updating scheme
on the normalized least mean squares algorithm, the MMax and sparse partial update
tap-selection strategies are exploited in the frequency domain to achieve fast convergence
performance with low computational complexity. Through demonstrating how
the sparseness of the network impulse response varies in the transformed domain, the
multidelay filtering structure is incorporated to reduce the algorithmic delay.
Blind identification of SIMO acoustic systems for speech dereverberation in the
presence of common zeros is then investigated. First, the problem of common zeros is
defined and extended to include the presence of near-common zeros. Two clustering algorithms
are developed to quantify the number of these zeros so as to facilitate the study
of their effect on blind system identification and speech dereverberation. To mitigate such
effect, two algorithms are developed where the two-stage algorithm based on channel
decomposition identifies common and non-common zeros sequentially; and the forced
spectral diversity approach combines spectral shaping filters and channel undermodelling
for deriving a modified system that leads to an improved dereverberation performance.
Additionally, a solution to the scale factor ambiguity problem in subband-based blind system identification is developed, which motivates further research on subbandbased
dereverberation techniques. Comprehensive simulations and discussions demonstrate
the effectiveness of the aforementioned algorithms. A discussion on possible directions
of prospective research on system identification techniques concludes this thesis
Multichannel equalisation for high-order spherical microphone arrays using beamformed channels
High-order spherical microphone arrays offer many practical benefits including relatively fine spatial resolution in all directions and rotation invariant processing using eigenbeams. Spatial filtering can reduce interference from noise and reverberation but in even moderately reverberant environments the beam pattern fails to suppress reverberation to a level adequate for typical applications. In this paper we investigate the feasibility of applying dereverberation by considering multiple beamformer outputs as channels to be dereverberated. In one realisation we process directly in the spherical harmonic domain where the beampatterns are mutually orthogonal. In a second realisation, which is not limited to spherical microphone arrays, beams are pointed in the direction of dominant reflections. Simulations demonstrate that in both cases reverberation is significantly reduced and, in the best case, clarity index is improved by 15 dB
Instrumental and perceptual evaluation of dereverberation techniques based on robust acoustic multichannel equalization
Speech signals recorded in an enclosed space by microphones at a distance from the speaker are often corrupted by reverberation, which arises from the superposition of many delayed and attenuated copies of the source signal. Because reverberation degrades the signal, removing reverberation would enhance quality. Dereverberation techniques based on acoustic multichannel equalization are known to be sensitive to room impulse response perturbations. In order to increase robustness, several methods have been proposed, as for example, using a shorter reshaping filter length, incorporating regularization, or applying a sparsity-promoting penalty function. This paper focuses on evaluating the performance of these methods for single-source multi-microphone scenarios, using instrumental performance measures as well as using subjective listening tests. By analyzing the correlation between the instrumental and the perceptual results, it is shown that signal-based performance measures are more advantageous than channel-based performance measures to evaluate the perceptual speech quality of signals that were dereverberated by equalization techniques. Furthermore, this analysis also demonstrates the need to develop more reliable instrumental performance measures
Joint source localization and dereverberation by sound field interpolation using sparse regularization
In this paper, source localization and dereverberation are formulated jointly as an inverse problem. The inverse problem consists in the interpolation of the sound field measured by a set of microphones by matching the recorded sound pressure with that of a particular acoustic model. This model is based on a collection of equivalent sources creating either spherical or plane waves. In order to achieve meaningful results, spatial, spatio-temporal and spatio-spectral sparsity can be promoted in the signals originating from the equivalent sources. The inverse problem consists of a large-scale optimization problem that is solved using a first order matrix-free optimization algorithm. It is shown that once the equivalent source signals capable of effectively interpolating the sound field are obtained, they can be readily used to localize a speech sound source in terms of Direction of Arrival (DOA) and to perform dereverberation in a highly reverberant environment
Glottal-synchronous speech processing
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity
of voiced speech is exploited. Traditionally, speech processing involves segmenting
and processing short speech frames of predefined length; this may fail to exploit the inherent
periodic structure of voiced speech which glottal-synchronous speech frames have
the potential to harness. Glottal-synchronous frames are often derived from the glottal
closure instants (GCIs) and glottal opening instants (GOIs).
The SIGMA algorithm was developed for the detection of GCIs and GOIs from
the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and
GOI detection from speech signals, the YAGA algorithm provides a measured accuracy
of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to
reverberation than single-channel algorithms.
The GCIs are applied to real-world applications including speech dereverberation,
where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance
of voicing detection in glottal-synchronous algorithms is demonstrated by subjective
testing. The GCIs are further exploited in a new area of data-driven speech modelling,
providing new insights into speech production and a set of tools to aid deployment into
real-world applications. The technique is shown to be applicable in areas of speech coding,
identification and artificial bandwidth extension of telephone speec
- …