Search CORE

2,126 research outputs found

Speech Enhancement Using Pitch Detection Approach For Noisy Environment

Author: Makhijani Rashmi
Shrawankar Urmila
Thakare V M
Publication venue
Publication date: 09/05/2013
Field of study

Acoustical mismatch among training and testing phases degrades outstandingly speech recognition results. This problem has limited the development of real-world nonspecific applications, as testing conditions are highly variant or even unpredictable during the training process. Therefore the background noise has to be removed from the noisy speech signal to increase the signal intelligibility and to reduce the listener fatigue. Enhancement techniques applied, as pre-processing stages; to the systems remarkably improve recognition results. In this paper, a novel approach is used to enhance the perceived quality of the speech signal when the additive noise cannot be directly controlled. Instead of controlling the background noise, we propose to reinforce the speech signal so that it can be heard more clearly in noisy environments. The subjective evaluation shows that the proposed method improves perceptual quality of speech in various noisy environments. As in some cases speaking may be more convenient than typing, even for rapid typists: many mathematical symbols are missing from the keyboard but can be easily spoken and recognized. Therefore, the proposed system can be used in an application designed for mathematical symbol recognition (especially symbols not available on the keyboard) in schools.Comment: Pages: 06 Figures : 0

arXiv.org e-Print Archive

Spoofing Detection Goes Noisy: An Analysis of Synthetic Speech Detection in the Presence of Additive Noise

Author: Hanilci Cemal
Kinnunen Tomi
Sahidullah Md
Sizov Aleksandr
Publication venue
Publication date: 14/09/2016
Field of study

Automatic speaker verification (ASV) technology is recently finding its way to end-user applications for secure access to personal data, smart services or physical facilities. Similar to other biometric technologies, speaker verification is vulnerable to spoofing attacks where an attacker masquerades as a particular target speaker via impersonation, replay, text-to-speech (TTS) or voice conversion (VC) techniques to gain illegitimate access to the system. We focus on TTS and VC that represent the most flexible, high-end spoofing attacks. Most of the prior studies on synthesized or converted speech detection report their findings using high-quality clean recordings. Meanwhile, the performance of spoofing detectors in the presence of additive noise, an important consideration in practical ASV implementations, remains largely unknown. To this end, we analyze the suitability of state-of-the-art synthetic speech detectors under additive noise with a special focus on front-end features. Our comparison includes eight acoustic feature sets, five related to spectral magnitude and three to spectral phase information. Our extensive experiments on ASVSpoof 2015 corpus reveal several important findings. Firstly, all the countermeasures break down even at relatively high signal-to-noise ratios (SNRs) and fail to generalize to noisy conditions. Secondly, speech enhancement is not found helpful. Thirdly, GMM back-end generally outperforms the more involved i-vector back-end. Fourthly, concerning the compared features, the Mel-frequency cepstral coefficients (MFCCs) and subband spectral centroid magnitude coefficients (SCMCs) perform the best on average though the winner method depends on SNR and noise type. Finally, a study with two score fusion strategies shows that combining different feature based systems improves recognition accuracy for known and unknown attacks in both clean and noisy conditions.Comment: 23 Pages, 7 figure

arXiv.org e-Print Archive

Speech Enhancement Modeling Towards Robust Speech Recognition System

Author: Shrawankar Urmila
Thakare V. M.
Publication venue
Publication date: 07/05/2013
Field of study

Form about four decades human beings have been dreaming of an intelligent machine which can master the natural speech. In its simplest form, this machine should consist of two subsystems, namely automatic speech recognition (ASR) and speech understanding (SU). The goal of ASR is to transcribe natural speech while SU is to understand the meaning of the transcription. Recognizing and understanding a spoken sentence is obviously a knowledge-intensive process, which must take into account all variable information about the speech communication process, from acoustics to semantics and pragmatics. While developing an Automatic Speech Recognition System, it is observed that some adverse conditions degrade the performance of the Speech Recognition System. In this contribution, speech enhancement system is introduced for enhancing speech signals corrupted by additive noise and improving the performance of Automatic Speech Recognizers in noisy conditions. Automatic speech recognition experiments show that replacing noisy speech signals by the corresponding enhanced speech signals leads to an improvement in the recognition accuracies. The amount of improvement varies with the type of the corrupting noise.Comment: Pages: 04; Conference Proceedings International Conference on Advance Computing (ICAC-2008), Indi

arXiv.org e-Print Archive

PROSE: Perceptual Risk Optimization for Speech Enhancement

Author: Muraka Nagarjuna Reddy
Sadasivan Jishnu
Seelamantula Chandra Sekhar
Publication venue
Publication date: 11/10/2017
Field of study

The goal in speech enhancement is to obtain an estimate of clean speech starting from the noisy signal by minimizing a chosen distortion measure, which results in an estimate that depends on the unknown clean signal or its statistics. Since access to such prior knowledge is limited or not possible in practice, one has to estimate the clean signal statistics. In this paper, we develop a new risk minimization framework for speech enhancement, in which, one optimizes an unbiased estimate of the distortion/risk instead of the actual risk. The estimated risk is expressed solely as a function of the noisy observations. We consider several perceptually relevant distortion measures and develop corresponding unbiased estimates under realistic assumptions on the noise distribution and a priori signal-to-noise ratio (SNR). Minimizing the risk estimates gives rise to the corresponding denoisers, which are nonlinear functions of the a posteriori SNR. Perceptual evaluation of speech quality (PESQ), average segmental SNR (SSNR) computations, and listening tests show that the proposed risk optimization approach employing Itakura-Saito and weighted hyperbolic cosine distortions gives better performance than the other distortion measures. For SNRs greater than 5 dB, the proposed approach gives superior denoising performance over the benchmark techniques based on the Wiener filter, log-MMSE minimization, and Bayesian nonnegative matrix factorization

arXiv.org e-Print Archive

Compression, Restoration, Re-sampling, Compressive Sensing: Fast Transforms in Digital Imaging

Author: Yaroslavsky Leonid
Publication venue
Publication date: 27/08/2014
Field of study

Transform image processing methods are methods that work in domains of image transforms, such as Discrete Fourier, Discrete Cosine, Wavelet and alike. They are the basic tool in image compression, in image restoration, in image re-sampling and geometrical transformations and can be traced back to early 1970-ths. The paper presents a review of these methods with emphasis on their comparison and relationships, from the very first steps of transform image compression methods to adaptive and local adaptive transform domain filters for image restoration, to methods of precise image re-sampling and image reconstruction from sparse samples and up to "compressive sensing" approach that has gained popularity in last few years. The review has a tutorial character and purpose.Comment: 41 pages, 16 figure

arXiv.org e-Print Archive

Enhancement of Noisy Speech exploiting a Gaussian Modeling based Threshold and a PDF Dependent Thresholding Function

Author: Islam Md Tauhidul
Shahnaz Celia
Publication venue
Publication date: 03/03/2018
Field of study

This paper presents a speech enhancement method, where an adaptive threshold is statistically determined based on Gaussian modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of noisy speech. In order to obtain an enhanced speech, the threshold thus derived is applied upon the PWP coefficients by employing a Gaussian pdf dependent custom thresholding function, which is designed based on a combination of modified hard and semisoft thresholding functions. The effectiveness of the proposed method is evaluated for car and multi-talker babble noise corrupted speech signals through performing extensive simulations using the NOIZEUS database. The proposed method is found to outperform some of the state-of-the-art speech enhancement methods not only at at high but also at low levels of SNRs in the sense of standard objective measures and subjective evaluations including formal listening tests.Comment: 22 pages, 18 figures, 8 tables; submitted to EURASIP Journal on Audio, Speech, and Music Processing. arXiv admin note: substantial text overlap with arXiv:1802.05962; text overlap with arXiv:1802.0347

arXiv.org e-Print Archive

Modeling of Teager Energy Operated Perceptual Wavelet Packet Coefficients with an Erlang-2 PDF for Real Time Enhancement of Noisy Speech

Author: Ahmad M. Omair
Islam Md Tauhidul
Shahnaz Celia
Zhu Wei-Ping
Publication venue
Publication date: 09/02/2018
Field of study

In this paper, for real time enhancement of noisy speech, a method of threshold determination based on modeling of Teager energy (TE) operated perceptual wavelet packet (PWP) coefficients of the noisy speech and noise by an Erlang-2 PDF is presented. The proposed method is computationally much faster than the existing wavelet packet based thresholding methods. A custom thresholding function based on a combination of mu-law and semisoft thresholding functions is designed and exploited to apply the statistically derived threshold upon the PWP coefficients. The proposed custom thresholding function works as a mu-law or a semisoft thresholding function or their combination based on the probability of speech presence and absence in a subband of the PWP transformed noisy speech. By using the speech files available in NOIZEUS database, a number of simulations are performed to evaluate the performance of the proposed method for speech signals in the presence of Gaussian white and street noises. The proposed method outperforms some of the state-of-the-art speech enhancement methods both at high and low levels of SNRs in terms of standard objective measures and subjective evaluations including formal listening tests.Comment: To appear in Digital Signal Processing, 27 pages, 19 figures, 10 table

arXiv.org e-Print Archive

Speech Enhancement Based on Non-stationary Noise-driven Geometric Spectral Subtraction and Phase Spectrum Compensation

Author: Hussain Ahmed Bin
Islam Md Tauhidul
Saha Udoy
Shahid K. T.
Shahnaz Celia
Publication venue
Publication date: 03/03/2018
Field of study

In this paper, a speech enhancement method based on noise compensation performed on short time magnitude as well phase spectra is presented. Unlike the conventional geometric approach (GA) to spectral subtraction (SS), here the noise estimate to be subtracted from the noisy speech spectrum is proposed to be determined by exploiting the low frequency regions of current frame of noisy speech rather than depending only on the initial silence frames. This approach gives the capability of tracking non-stationary noise thus resulting in a non-stationary noise-driven geometric approach of spectral subtraction for speech enhancement. The noise compensated magnitude spectrum from the GA step is then recombined with unchanged phase of noisy speech spectrum and used in phase compensation to obtain an enhanced complex spectrum, which is used to produce an enhanced speech frame. Extensive simulations are carried out using speech files available in the NOIZEUS database shows that the proposed method consistently outperforms some of the recent methods of speech enhancement when employed on the noisy speeches corrupted by street or babble noise at different levels of SNR in terms of objective measures, spectrogram analysis and formal subjective listening tests.Comment: 13 pages, 10 figures, 8 tables. arXiv admin note: substantial text overlap with arXiv:1803.00396; text overlap with arXiv:1802.02665, arXiv:1802.05125, arXiv:1803.0184

arXiv.org e-Print Archive

On Single-Channel Speech Enhancement and On Non-Linear Modulation-Domain Kalman Filtering

Author: Dionelis Nikolaos
Publication venue
Publication date: 31/10/2018
Field of study

This report focuses on algorithms that perform single-channel speech enhancement. The author of this report uses modulation-domain Kalman filtering algorithms for speech enhancement, i.e. noise suppression and dereverberation, in [1], [2], [3], [4] and [5]. Modulation-domain Kalman filtering can be applied for both noise and late reverberation suppression and in [2], [1], [3] and [4], various model-based speech enhancement algorithms that perform modulation-domain Kalman filtering are designed, implemented and tested. The model-based enhancement algorithm in [2] estimates and tracks the speech phase. The short-time-Fourier-transform-based enhancement algorithm in [5] uses the active speech level estimator presented in [6]. This report describes how different algorithms perform speech enhancement and the algorithms discussed in this report are addressed to researchers interested in monaural speech enhancement. The algorithms are composed of different processing blocks and techniques [7]; understanding the implementation choices made during the system design is important because this provides insights that can assist the development of new algorithms. Index Terms - Speech enhancement, dereverberation, denoising, Kalman filter, minimum mean squared error estimation.Comment: 13 page

arXiv.org e-Print Archive

An instrumental intelligibility metric based on information theory

Author: Hendriks Richard C.
Kleijn W. Bastiaan
Van Kuyk Steven
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/01/2018
Field of study

We propose a monaural intrusive instrumental intelligibility metric called speech intelligibility in bits (SIIB). SIIB is an estimate of the amount of information shared between a talker and a listener in bits per second. Unlike existing information theoretic intelligibility metrics, SIIB accounts for talker variability and statistical dependencies between time-frequency units. Our evaluation shows that relative to state-of-the-art intelligibility metrics, SIIB is highly correlated with the intelligibility of speech that has been degraded by noise and processed by speech enhancement algorithms.Comment: Published in IEEE Signal Processing Letter

arXiv.org e-Print Archive