2,475 research outputs found
Model-Based Speech Enhancement
Abstract
A method of speech enhancement is developed that reconstructs clean speech from
a set of acoustic features using a harmonic plus noise model of speech. This is a significant
departure from traditional filtering-based methods of speech enhancement.
A major challenge with this approach is to estimate accurately the acoustic features
(voicing, fundamental frequency, spectral envelope and phase) from noisy speech.
This is achieved using maximum a-posteriori (MAP) estimation methods that operate
on the noisy speech. In each case a prior model of the relationship between the
noisy speech features and the estimated acoustic feature is required. These models
are approximated using speaker-independent GMMs of the clean speech features
that are adapted to speaker-dependent models using MAP adaptation and for noise
using the Unscented Transform.
Objective results are presented to optimise the proposed system and a set of subjective
tests compare the approach with traditional enhancement methods. Threeway
listening tests examining signal quality, background noise intrusiveness and
overall quality show the proposed system to be highly robust to noise, performing
significantly better than conventional methods of enhancement in terms of background
noise intrusiveness. However, the proposed method is shown to reduce signal
quality, with overall quality measured to be roughly equivalent to that of the Wiener
filter
A Primal-Dual Proximal Algorithm for Sparse Template-Based Adaptive Filtering: Application to Seismic Multiple Removal
Unveiling meaningful geophysical information from seismic data requires to
deal with both random and structured "noises". As their amplitude may be
greater than signals of interest (primaries), additional prior information is
especially important in performing efficient signal separation. We address here
the problem of multiple reflections, caused by wave-field bouncing between
layers. Since only approximate models of these phenomena are available, we
propose a flexible framework for time-varying adaptive filtering of seismic
signals, using sparse representations, based on inaccurate templates. We recast
the joint estimation of adaptive filters and primaries in a new convex
variational formulation. This approach allows us to incorporate plausible
knowledge about noise statistics, data sparsity and slow filter variation in
parsimony-promoting wavelet frames. The designed primal-dual algorithm solves a
constrained minimization problem that alleviates standard regularization issues
in finding hyperparameters. The approach demonstrates significantly good
performance in low signal-to-noise ratio conditions, both for simulated and
real field seismic data
Real-time detection of auditory : steady-state brainstem potentials evoked by auditory stimuli
The auditory steady-state response (ASSR) is advantageous against other hearing techniques because of its capability in providing objective and frequency specific information. The objectives are to reduce the lengthy test duration, and improve the signal detection rate and the robustness of the detection against the background noise and unwanted artefacts.Two prominent state estimation techniques of Luenberger observer and Kalman filter have been used in the development of the autonomous ASSR detection scheme. Both techniques are real-time implementable, while the challenges faced in the application of the observer and Kalman filter techniques are the very poor SNR (could be as low as −30dB) of ASSRs and unknown statistics of the noise. Dual-channel architecture is proposed, one is for the estimate of sinusoid and the other for the estimate of the background noise. Simulation and experimental studies were also conducted to evaluate the performances of the developed ASSR detection scheme, and to compare the new method with other conventional techniques. In general, both the state estimation techniques within the detection scheme produced comparable results as compared to the conventional techniques, but achieved significant measurement time reduction in some cases. A guide is given for the determination of the observer gains, while an adaptive algorithm has been used for adjustment of the gains in the Kalman filters.In order to enhance the robustness of the ASSR detection scheme with adaptive Kalman filters against possible artefacts (outliers), a multisensory data fusion approach is used to combine both standard mean operation and median operation in the ASSR detection algorithm. In addition, a self-tuned statistical-based thresholding using the regression technique is applied in the autonomous ASSR detection scheme. The scheme with adaptive Kalman filters is capable of estimating the variances of system and background noise to improve the ASSR detection rate
Studies on noise robust automatic speech recognition
Noise in everyday acoustic environments such as cars, traffic environments, and cafeterias remains one of the main challenges in automatic speech recognition (ASR). As a research theme, it has received wide attention in conferences and scientific journals focused on speech technology. This article collection reviews both the classic and novel approaches suggested for noise robust ASR. The articles are literature reviews written for the spring 2009 seminar course on noise robust automatic speech recognition (course code T-61.6060) held at TKK
Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)
The implicit objective of the biennial "international - Traveling Workshop on
Interactions between Sparse models and Technology" (iTWIST) is to foster
collaboration between international scientific teams by disseminating ideas
through both specific oral/poster presentations and free discussions. For its
second edition, the iTWIST workshop took place in the medieval and picturesque
town of Namur in Belgium, from Wednesday August 27th till Friday August 29th,
2014. The workshop was conveniently located in "The Arsenal" building within
walking distance of both hotels and town center. iTWIST'14 has gathered about
70 international participants and has featured 9 invited talks, 10 oral
presentations, and 14 posters on the following themes, all related to the
theory, application and generalization of the "sparsity paradigm":
Sparsity-driven data sensing and processing; Union of low dimensional
subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph
sensing/processing; Blind inverse problems and dictionary learning; Sparsity
and computational neuroscience; Information theory, geometry and randomness;
Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?;
Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website:
http://sites.google.com/site/itwist1
Multi-Condition Training for Unknown Environment Adaptation in Robust ASR Under Real Conditions
Automatic speech recognition (ASR) systems frequently work in a noisy environment. As they are often trained on clean speech data, noise reduction or adaptation techniques are applied to decrease the influence of background disturbance even in the case of unknown conditions. Speech data mixed with noise recordings from particular environment are often used for the purposes of model adaptation. This paper analyses the improvement of recognition performance within such adaptation when multi-condition training data from a real environment is used for training initial models. Although the quality of such models can decrease with the presence of noise in the training material, they are assumed to include initial information about noise and consequently support the adaptation procedure. Experimental results show significant improvement of the proposed training method in a robust ASR task under unknown noisy conditions. The decrease by 29 % and 14 % in word error rate in comparison with clean speech training data was achieved for the non-adapted and adapted system, respectively.
- …