5,262 research outputs found
Studies in Signal Processing Techniques for Speech Enhancement: A comparative study
Speech enhancement is very essential to suppress the background noise and to increase speech intelligibility and reduce fatigue in hearing. There exist many simple speech enhancement algorithms like spectral subtraction to complex algorithms like Bayesian Magnitude estimators based on Minimum Mean Square Error (MMSE) and its variants. A continuous research is going and new algorithms are emerging to enhance speech signal recorded in the background of environment such as industries, vehicles and aircraft cockpit. In aviation industries speech enhancement plays a vital role to bring crucial information from pilot’s conversation in case of an incident or accident by suppressing engine and other cockpit instrument noises. In this work proposed is a new approach to speech enhancement making use harmonic wavelet transform and Bayesian estimators. The performance indicators, SNR and listening confirms to the fact that newly modified algorithms using harmonic wavelet transform indeed show better results than currently existing methods. Further, the Harmonic Wavelet Transform is computationally efficient and simple to implement due to its inbuilt decimation-interpolation operations compared to those of filter-bank approach to realize sub-bands
Block-Online Multi-Channel Speech Enhancement Using DNN-Supported Relative Transfer Function Estimates
This work addresses the problem of block-online processing for multi-channel
speech enhancement. Such processing is vital in scenarios with moving speakers
and/or when very short utterances are processed, e.g., in voice assistant
scenarios. We consider several variants of a system that performs beamforming
supported by DNN-based voice activity detection (VAD) followed by
post-filtering. The speaker is targeted through estimating relative transfer
functions between microphones. Each block of the input signals is processed
independently in order to make the method applicable in highly dynamic
environments. Owing to the short length of the processed block, the statistics
required by the beamformer are estimated less precisely. The influence of this
inaccuracy is studied and compared to the processing regime when recordings are
treated as one block (batch processing). The experimental evaluation of the
proposed method is performed on large datasets of CHiME-4 and on another
dataset featuring moving target speaker. The experiments are evaluated in terms
of objective and perceptual criteria (such as signal-to-interference ratio
(SIR) or perceptual evaluation of speech quality (PESQ), respectively).
Moreover, word error rate (WER) achieved by a baseline automatic speech
recognition system is evaluated, for which the enhancement method serves as a
front-end solution. The results indicate that the proposed method is robust
with respect to short length of the processed block. Significant improvements
in terms of the criteria and WER are observed even for the block length of 250
ms.Comment: 10 pages, 8 figures, 4 tables. Modified version of the article
accepted for publication in IET Signal Processing journal. Original results
unchanged, additional experiments presented, refined discussion and
conclusion
Traction force microscopy with optimized regularization and automated Bayesian parameter selection for comparing cells
Adherent cells exert traction forces on to their environment, which allows
them to migrate, to maintain tissue integrity, and to form complex
multicellular structures. This traction can be measured in a perturbation-free
manner with traction force microscopy (TFM). In TFM, traction is usually
calculated via the solution of a linear system, which is complicated by
undersampled input data, acquisition noise, and large condition numbers for
some methods. Therefore, standard TFM algorithms either employ data filtering
or regularization. However, these approaches require a manual selection of
filter- or regularization parameters and consequently exhibit a substantial
degree of subjectiveness. This shortcoming is particularly serious when cells
in different conditions are to be compared because optimal noise suppression
needs to be adapted for every situation, which invariably results in systematic
errors. Here, we systematically test the performance of new methods from
computer vision and Bayesian inference for solving the inverse problem in TFM.
We compare two classical schemes, L1- and L2-regularization, with three
previously untested schemes, namely Elastic Net regularization, Proximal
Gradient Lasso, and Proximal Gradient Elastic Net. Overall, we find that
Elastic Net regularization, which combines L1 and L2 regularization,
outperforms all other methods with regard to accuracy of traction
reconstruction. Next, we develop two methods, Bayesian L2 regularization and
Advanced Bayesian L2 regularization, for automatic, optimal L2 regularization.
Using artificial data and experimental data, we show that these methods enable
robust reconstruction of traction without requiring a difficult selection of
regularization parameters specifically for each data set. Thus, Bayesian
methods can mitigate the considerable uncertainty inherent in comparing
cellular traction forces
Deep Learning for Environmentally Robust Speech Recognition: An Overview of Recent Developments
Eliminating the negative effect of non-stationary environmental noise is a
long-standing research topic for automatic speech recognition that stills
remains an important challenge. Data-driven supervised approaches, including
ones based on deep neural networks, have recently emerged as potential
alternatives to traditional unsupervised approaches and with sufficient
training, can alleviate the shortcomings of the unsupervised methods in various
real-life acoustic environments. In this light, we review recently developed,
representative deep learning approaches for tackling non-stationary additive
and convolutional degradation of speech with the aim of providing guidelines
for those involved in the development of environmentally robust speech
recognition systems. We separately discuss single- and multi-channel techniques
developed for the front-end and back-end of speech recognition systems, as well
as joint front-end and back-end training frameworks
- …