502 research outputs found
Model-based Speech Enhancement for Intelligibility Improvement in Binaural Hearing Aids
Speech intelligibility is often severely degraded among hearing impaired
individuals in situations such as the cocktail party scenario. The performance
of the current hearing aid technology has been observed to be limited in these
scenarios. In this paper, we propose a binaural speech enhancement framework
that takes into consideration the speech production model. The enhancement
framework proposed here is based on the Kalman filter that allows us to take
the speech production dynamics into account during the enhancement process. The
usage of a Kalman filter requires the estimation of clean speech and noise
short term predictor (STP) parameters, and the clean speech pitch parameters.
In this work, a binaural codebook-based method is proposed for estimating the
STP parameters, and a directional pitch estimator based on the harmonic model
and maximum likelihood principle is used to estimate the pitch parameters. The
proposed method for estimating the STP and pitch parameters jointly uses the
information from left and right ears, leading to a more robust estimation of
the filter parameters. Objective measures such as PESQ and STOI have been used
to evaluate the enhancement framework in different acoustic scenarios
representative of the cocktail party scenario. We have also conducted
subjective listening tests on a set of nine normal hearing subjects, to
evaluate the performance in terms of intelligibility and quality improvement.
The listening tests show that the proposed algorithm, even with access to only
a single channel noisy observation, significantly improves the overall speech
quality, and the speech intelligibility by up to 15%.Comment: after revisio
Reconstruction-based speech enhancement from robust acoustic features
This paper proposes a method of speech enhancement where a clean speech signal is reconstructed from a sinusoidal model of speech production and a set of acoustic speech features. The acoustic features are estimated from noisy speech and comprise, for each frame, a voicing classification (voiced, unvoiced or non-speech), fundamental frequency (for voiced frames) and spectral envelope. Rather than using different algorithms to estimate each parameter, a single statistical model is developed. This comprises a set of acoustic models and has similarity to the acoustic modelling used in speech recognition. This allows noise and speaker adaptation to be applied to acoustic feature estimation to improve robustness. Objective and subjective tests compare reconstruction-based enhancement with other methods of enhancement and show the proposed method to be highly effective at removing noise
Recommended from our members
Systems and methods for physiological signal enhancement and biometric extraction using non-invasive optical sensors
A system and method for signal processing to remove unwanted noise components including: (i) wavelength-independent motion artifacts such as tissue, bone and skin effects, and (ii) wavelength-dependent motion artifact/noise components such as venous blood pulsation and movement due to various sources including muscle pump, respiratory pump and physical perturbation. Disclosed are methods, analytics, and their uses for reliable perfusion monitoring, arterial oxygen saturation monitoring, heart rate monitoring during daily activities and in hospital settings and for extraction of physiological parameters such as respiration information, hemodynamic parameters, venous capacity, and fluid responsiveness. The system and methods disclosed are extendable to include monitoring platforms for perfusion, hypoxia, arrhythmia detection, airway obstruction detection and sleep disorders including apnea.Board of Regents, University of Texas Syste
Single Channel Speech Enhancement using Kalman Filter
The quality and intelligibility of speech conversation are generally degraded by the
surrounding noises. The main objective of speech enhancement (SE) is to eliminate
or reduce such disturbing noises from the degraded speech. Various SE methods have
been proposed in literature. Among them, the Kalman filter (KF) is known to be an
efficient SE method that uses the minimum mean square error (MMSE). However,
most of the conventional KF based speech enhancement methods need access to clean
speech and additive noise information for the state-space model parameters, namely,
the linear prediction coefficients (LPCs) and the additive noise variance estimation,
which is impractical in the sense that in practice, we can access only the noisy speech.
Moreover, it is quite difficult to estimate these model parameters efficiently in the
presence of adverse environmental noises. Therefore, the main focus of this thesis is to
develop single channel speech enhancement algorithms using Kalman filter, where the
model parameters are estimated in noisy conditions. Depending on these parameter
estimation techniques, the proposed SE methods are classified into three approaches
based on non-iterative, iterative, and sub-band iterative KF.
In the first approach, a non-iterative Kalman filter based speech enhancement
algorithm is presented, which operates on a frame-by-frame basis. In this proposed
method, the state-space model parameters, namely, the LPCs and noise variance, are
estimated first in noisy conditions. For LPC estimation, a combined speech smoothing
and autocorrelation method is employed. A new method based on a lower-order
truncated Taylor series approximation of the noisy speech along with a difference
operation serving as high-pass filtering is introduced for the noise variance estimation.
The non-iterative Kalman filter is then implemented with these estimated parameters
effectively.
In order to enhance the SE performance as well as parameter estimation accuracy
in noisy conditions, an iterative Kalman filter based single channel SE method is
proposed as the second approach, which also operates on a frame-by-frame basis.
For each frame, the state-space model parameters of the KF are estimated through
an iterative procedure. The Kalman filtering iteration is first applied to each noisy
speech frame, reducing the noise component to a certain degree. At the end of this
first iteration, the LPCs and other state-space model parameters are re-estimated
using the processed speech frame and the Kalman filtering is repeated for the same
processed frame. This iteration continues till the KF converges or a maximum number
of iterations is reached, giving further enhanced speech frame. The same procedure
will repeat for the following frames until the last noisy speech frame being processed.
For further improving the speech enhancement performance, a sub-band iterative
Kalman filter based SE method is also proposed as the third approach. A wavelet
filter-bank is first used to decompose the noisy speech into a number of sub-bands.
To achieve the best trade-off among the noise reduction, speech intelligibility and
computational complexity, a partial reconstruction scheme based on consecutive mean
squared error (CMSE) is proposed to synthesize the low-frequency (LF) and highfrequency (HF) sub-bands such that the iterative KF is employed only to the partially
reconstructed HF sub-band speech. Finally, the enhanced HF sub-band speech is
combined with the partially reconstructed LF sub-band speech to reconstruct the
full-band enhanced speech.
Experimental results have shown that the proposed KF based SE methods are
capable of reducing adverse environmental noises for a wide range of input SNRs,
and the overall performance of the proposed methods in terms of different evaluation
metrics is superior to some existing state-of-the art SE methods
- …