2,126 research outputs found
Speech Enhancement Using Pitch Detection Approach For Noisy Environment
Acoustical mismatch among training and testing phases degrades outstandingly
speech recognition results. This problem has limited the development of
real-world nonspecific applications, as testing conditions are highly variant
or even unpredictable during the training process. Therefore the background
noise has to be removed from the noisy speech signal to increase the signal
intelligibility and to reduce the listener fatigue. Enhancement techniques
applied, as pre-processing stages; to the systems remarkably improve
recognition results. In this paper, a novel approach is used to enhance the
perceived quality of the speech signal when the additive noise cannot be
directly controlled. Instead of controlling the background noise, we propose to
reinforce the speech signal so that it can be heard more clearly in noisy
environments. The subjective evaluation shows that the proposed method improves
perceptual quality of speech in various noisy environments. As in some cases
speaking may be more convenient than typing, even for rapid typists: many
mathematical symbols are missing from the keyboard but can be easily spoken and
recognized. Therefore, the proposed system can be used in an application
designed for mathematical symbol recognition (especially symbols not available
on the keyboard) in schools.Comment: Pages: 06 Figures : 0
Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise
Automatic speaker verification (ASV) technology is recently finding its way
to end-user applications for secure access to personal data, smart services or
physical facilities. Similar to other biometric technologies, speaker
verification is vulnerable to spoofing attacks where an attacker masquerades as
a particular target speaker via impersonation, replay, text-to-speech (TTS) or
voice conversion (VC) techniques to gain illegitimate access to the system. We
focus on TTS and VC that represent the most flexible, high-end spoofing
attacks. Most of the prior studies on synthesized or converted speech detection
report their findings using high-quality clean recordings. Meanwhile, the
performance of spoofing detectors in the presence of additive noise, an
important consideration in practical ASV implementations, remains largely
unknown. To this end, we analyze the suitability of state-of-the-art synthetic
speech detectors under additive noise with a special focus on front-end
features. Our comparison includes eight acoustic feature sets, five related to
spectral magnitude and three to spectral phase information. Our extensive
experiments on ASVSpoof 2015 corpus reveal several important findings. Firstly,
all the countermeasures break down even at relatively high signal-to-noise
ratios (SNRs) and fail to generalize to noisy conditions. Secondly, speech
enhancement is not found helpful. Thirdly, GMM back-end generally outperforms
the more involved i-vector back-end. Fourthly, concerning the compared
features, the Mel-frequency cepstral coefficients (MFCCs) and subband spectral
centroid magnitude coefficients (SCMCs) perform the best on average though the
winner method depends on SNR and noise type. Finally, a study with two score
fusion strategies shows that combining different feature based systems improves
recognition accuracy for known and unknown attacks in both clean and noisy
conditions.Comment: 23 Pages, 7 figure
Speech Enhancement Modeling Towards Robust Speech Recognition System
Form about four decades human beings have been dreaming of an intelligent
machine which can master the natural speech. In its simplest form, this machine
should consist of two subsystems, namely automatic speech recognition (ASR) and
speech understanding (SU). The goal of ASR is to transcribe natural speech
while SU is to understand the meaning of the transcription. Recognizing and
understanding a spoken sentence is obviously a knowledge-intensive process,
which must take into account all variable information about the speech
communication process, from acoustics to semantics and pragmatics. While
developing an Automatic Speech Recognition System, it is observed that some
adverse conditions degrade the performance of the Speech Recognition System. In
this contribution, speech enhancement system is introduced for enhancing speech
signals corrupted by additive noise and improving the performance of Automatic
Speech Recognizers in noisy conditions. Automatic speech recognition
experiments show that replacing noisy speech signals by the corresponding
enhanced speech signals leads to an improvement in the recognition accuracies.
The amount of improvement varies with the type of the corrupting noise.Comment: Pages: 04; Conference Proceedings International Conference on Advance
Computing (ICAC-2008), Indi
PROSE: Perceptual Risk Optimization for Speech Enhancement
The goal in speech enhancement is to obtain an estimate of clean speech
starting from the noisy signal by minimizing a chosen distortion measure, which
results in an estimate that depends on the unknown clean signal or its
statistics. Since access to such prior knowledge is limited or not possible in
practice, one has to estimate the clean signal statistics. In this paper, we
develop a new risk minimization framework for speech enhancement, in which, one
optimizes an unbiased estimate of the distortion/risk instead of the actual
risk. The estimated risk is expressed solely as a function of the noisy
observations. We consider several perceptually relevant distortion measures and
develop corresponding unbiased estimates under realistic assumptions on the
noise distribution and a priori signal-to-noise ratio (SNR). Minimizing the
risk estimates gives rise to the corresponding denoisers, which are nonlinear
functions of the a posteriori SNR. Perceptual evaluation of speech quality
(PESQ), average segmental SNR (SSNR) computations, and listening tests show
that the proposed risk optimization approach employing Itakura-Saito and
weighted hyperbolic cosine distortions gives better performance than the other
distortion measures. For SNRs greater than 5 dB, the proposed approach gives
superior denoising performance over the benchmark techniques based on the
Wiener filter, log-MMSE minimization, and Bayesian nonnegative matrix
factorization
Compression, Restoration, Re-sampling, Compressive Sensing: Fast Transforms in Digital Imaging
Transform image processing methods are methods that work in domains of image
transforms, such as Discrete Fourier, Discrete Cosine, Wavelet and alike. They
are the basic tool in image compression, in image restoration, in image
re-sampling and geometrical transformations and can be traced back to early
1970-ths. The paper presents a review of these methods with emphasis on their
comparison and relationships, from the very first steps of transform image
compression methods to adaptive and local adaptive transform domain filters for
image restoration, to methods of precise image re-sampling and image
reconstruction from sparse samples and up to "compressive sensing" approach
that has gained popularity in last few years. The review has a tutorial
character and purpose.Comment: 41 pages, 16 figure
Enhancement of Noisy Speech exploiting a Gaussian Modeling based Threshold and a PDF Dependent Thresholding Function
This paper presents a speech enhancement method, where an adaptive threshold
is statistically determined based on Gaussian modeling of Teager energy (TE)
operated perceptual wavelet packet (PWP) coefficients of noisy speech. In order
to obtain an enhanced speech, the threshold thus derived is applied upon the
PWP coefficients by employing a Gaussian pdf dependent custom thresholding
function, which is designed based on a combination of modified hard and
semisoft thresholding functions. The effectiveness of the proposed method is
evaluated for car and multi-talker babble noise corrupted speech signals
through performing extensive simulations using the NOIZEUS database. The
proposed method is found to outperform some of the state-of-the-art speech
enhancement methods not only at at high but also at low levels of SNRs in the
sense of standard objective measures and subjective evaluations including
formal listening tests.Comment: 22 pages, 18 figures, 8 tables; submitted to EURASIP Journal on
Audio, Speech, and Music Processing. arXiv admin note: substantial text
overlap with arXiv:1802.05962; text overlap with arXiv:1802.0347
Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients with an Erlang-2 PDF for Real Time Enhancement of Noisy Speech
In this paper, for real time enhancement of noisy speech, a method of
threshold determination based on modeling of Teager energy (TE) operated
perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by
an Erlang-2 PDF is presented. The proposed method is computationally much
faster than the existing wavelet packet based thresholding methods. A custom
thresholding function based on a combination of mu-law and semisoft
thresholding functions is designed and exploited to apply the statistically
derived threshold upon the PWP coefficients. The proposed custom thresholding
function works as a mu-law or a semisoft thresholding function or their
combination based on the probability of speech presence and absence in a
subband of the PWP transformed noisy speech. By using the speech files
available in NOIZEUS database, a number of simulations are performed to
evaluate the performance of the proposed method for speech signals in the
presence of Gaussian white and street noises. The proposed method outperforms
some of the state-of-the-art speech enhancement methods both at high and low
levels of SNRs in terms of standard objective measures and subjective
evaluations including formal listening tests.Comment: To appear in Digital Signal Processing, 27 pages, 19 figures, 10
table
Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation
In this paper, a speech enhancement method based on noise compensation
performed on short time magnitude as well phase spectra is presented. Unlike
the conventional geometric approach (GA) to spectral subtraction (SS), here the
noise estimate to be subtracted from the noisy speech spectrum is proposed to
be determined by exploiting the low frequency regions of current frame of noisy
speech rather than depending only on the initial silence frames. This approach
gives the capability of tracking non-stationary noise thus resulting in a
non-stationary noise-driven geometric approach of spectral subtraction for
speech enhancement. The noise compensated magnitude spectrum from the GA step
is then recombined with unchanged phase of noisy speech spectrum and used in
phase compensation to obtain an enhanced complex spectrum, which is used to
produce an enhanced speech frame. Extensive simulations are carried out using
speech files available in the NOIZEUS database shows that the proposed method
consistently outperforms some of the recent methods of speech enhancement when
employed on the noisy speeches corrupted by street or babble noise at different
levels of SNR in terms of objective measures, spectrogram analysis and formal
subjective listening tests.Comment: 13 pages, 10 figures, 8 tables. arXiv admin note: substantial text
overlap with arXiv:1803.00396; text overlap with arXiv:1802.02665,
arXiv:1802.05125, arXiv:1803.0184
On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering
This report focuses on algorithms that perform single-channel speech
enhancement. The author of this report uses modulation-domain Kalman filtering
algorithms for speech enhancement, i.e. noise suppression and dereverberation,
in [1], [2], [3], [4] and [5]. Modulation-domain Kalman filtering can be
applied for both noise and late reverberation suppression and in [2], [1], [3]
and [4], various model-based speech enhancement algorithms that perform
modulation-domain Kalman filtering are designed, implemented and tested. The
model-based enhancement algorithm in [2] estimates and tracks the speech phase.
The short-time-Fourier-transform-based enhancement algorithm in [5] uses the
active speech level estimator presented in [6]. This report describes how
different algorithms perform speech enhancement and the algorithms discussed in
this report are addressed to researchers interested in monaural speech
enhancement. The algorithms are composed of different processing blocks and
techniques [7]; understanding the implementation choices made during the system
design is important because this provides insights that can assist the
development of new algorithms. Index Terms - Speech enhancement,
dereverberation, denoising, Kalman filter, minimum mean squared error
estimation.Comment: 13 page
An instrumental intelligibility metric based on information theory
We propose a monaural intrusive instrumental intelligibility metric called
speech intelligibility in bits (SIIB). SIIB is an estimate of the amount of
information shared between a talker and a listener in bits per second. Unlike
existing information theoretic intelligibility metrics, SIIB accounts for
talker variability and statistical dependencies between time-frequency units.
Our evaluation shows that relative to state-of-the-art intelligibility metrics,
SIIB is highly correlated with the intelligibility of speech that has been
degraded by noise and processed by speech enhancement algorithms.Comment: Published in IEEE Signal Processing Letter
- …