1,130 research outputs found

    A Robust Noise Spectral Estimation Algorithm for Speech Enhancement in Voice Devices

    Get PDF
    In this thesis, a new robust noise spectral estimation algorithm is proposed for the purpose of single-microphone speech enhancement. This algorithm can generate the optimal noise spectral estimates in the Minimum Mean Square Error (MMSE) sense based on the speech statistics in the noisy environments. Compared to the well-adopted conventional noise spectral estimation method using the single-pole recursion, our proposed scheme is more reliable since the recursion coefficients are adaptable and optimal in the MMSE therein. We also propose a new accurate Resulting Signal-to-Noise Ratio (R-SNR) estimator as a quality measure to benchmark the existing noise spectral estimation techniques. This new R-SNR estimator can be applied to quantify not only the residual noise but also the speech distortion and therefore it can well serve as the overall speech quality measure after the noise suppression. We conduct the experiments to evaluate the performance of the noise suppression using our robust noise spectral estimation algorithm and compare it with those of two major existing noise spectral estimation methods. Through numerous simulations, we have shown that our noise suppression technique significantly outperforms the conventional methods in both stationary and nonstationary noise environments

    Blind deconvolution of medical ultrasound images: parametric inverse filtering approach

    Get PDF
    ©2007 IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or distribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.DOI: 10.1109/TIP.2007.910179The problem of reconstruction of ultrasound images by means of blind deconvolution has long been recognized as one of the central problems in medical ultrasound imaging. In this paper, this problem is addressed via proposing a blind deconvolution method which is innovative in several ways. In particular, the method is based on parametric inverse filtering, whose parameters are optimized using two-stage processing. At the first stage, some partial information on the point spread function is recovered. Subsequently, this information is used to explicitly constrain the spectral shape of the inverse filter. From this perspective, the proposed methodology can be viewed as a ldquohybridizationrdquo of two standard strategies in blind deconvolution, which are based on either concurrent or successive estimation of the point spread function and the image of interest. Moreover, evidence is provided that the ldquohybridrdquo approach can outperform the standard ones in a number of important practical cases. Additionally, the present study introduces a different approach to parameterizing the inverse filter. Specifically, we propose to model the inverse transfer function as a member of a principal shift-invariant subspace. It is shown that such a parameterization results in considerably more stable reconstructions as compared to standard parameterization methods. Finally, it is shown how the inverse filters designed in this way can be used to deconvolve the images in a nonblind manner so as to further improve their quality. The usefulness and practicability of all the introduced innovations are proven in a series of both in silico and in vivo experiments. Finally, it is shown that the proposed deconvolution algorithms are capable of improving the resolution of ultrasound images by factors of 2.24 or 6.52 (as judged by the autocorrelation criterion) depending on the type of regularization method used

    Single-channel speech enhancement using implicit Wiener filter for high-quality speech communication

    Get PDF
    Speech enables easy human-to-human communication as well as human-to-machine interaction. However, the quality of speech degrades due to background noise in the environment, such as drone noise embedded in speech during search and rescue operations. Similarly, helicopter noise, airplane noise, and station noise reduce the quality of speech. Speech enhancement algorithms reduce background noise, resulting in a crystal clear and noise-free conversation. For many applications, it is also necessary to process these noisy speech signals at the edge node level. Thus, we propose implicit Wiener filter-based algorithm for speech enhancement using edge computing system. In the proposed algorithm, a first order recursive equation is used to estimate the noise. The performance of the proposed algorithm is evaluated for two speech utterances, one uttered by a male speaker and the other by a female speaker. Both utterances are degraded by different types of non-stationary noises such as exhibition, station, drone, helicopter, airplane, and white Gaussian stationary noise with different signal-to-noise ratios. Further, we compare the performance of the proposed speech enhancement algorithm with the conventional spectral subtraction algorithm. Performance evaluations using objective speech quality measures demonstrate that the proposed speech enhancement algorithm outperforms the spectral subtraction algorithm in estimating the clean speech from the noisy speech. Finally, we implement the proposed speech enhancement algorithm, in addition to the spectral subtraction algorithm, on the Raspberry Pi 4 Model B, which is a low power edge computing device.publishedVersio

    Speech Enhancement for Automatic Analysis of Child-Centered Audio Recordings

    Get PDF
    Analysis of child-centred daylong naturalist audio recordings has become a de-facto research protocol in the scientific study of child language development. The researchers are increasingly using these recordings to understand linguistic environment a child encounters in her routine interactions with the world. These audio recordings are captured by a microphone that a child wears throughout a day. The audio recordings, being naturalistic, contain a lot of unwanted sounds from everyday life which degrades the performance of speech analysis tasks. The purpose of this thesis is to investigate the utility of speech enhancement (SE) algorithms in the automatic analysis of such recordings. To this effect, several classical signal processing and modern machine learning-based SE methods were employed 1) as a denoiser for speech corrupted with additive noise sampled from real-life child-centred daylong recordings and 2) as front-end for downstream speech processing tasks of addressee classification (infant vs. adult-directed speech) and automatic syllable count estimation from the speech. The downstream tasks were conducted on data derived from a set of geographically, culturally, and linguistically diverse child-centred daylong audio recordings. The performance of denoising was evaluated through objective quality metrics (spectral distortion and instrumental intelligibility) and through the downstream task performance. Finally, the objective evaluation results were compared with downstream task performance results to find whether objective metrics can be used as a reasonable proxy to select SE front-end for a downstream task. The results obtained show that a recently proposed Long Short-Term Memory (LSTM)-based progressive learning architecture provides maximum performance gains in the downstream tasks in comparison with the other SE methods and baseline results. Classical signal processing-based SE methods also lead to competitive performance. From the comparison of objective assessment and downstream task performance results, no predictive relationship between task-independent objective metrics and performance of downstream tasks was found

    Speech Enhancement By Exploiting The Baseband Phase Structure Of Voiced Speech For Effective Non-Stationary Noise Estimation

    Get PDF
    Speech enhancement is one of the most important and challenging issues in the speech communication and signal processing field. It aims to minimize the effect of additive noise on the quality and intelligibility of the speech signal. Speech quality is the measure of noise remaining after the processing on the speech signal and of how pleasant the resulting speech sounds, while intelligibility refers to the accuracy of understanding speech. Speech enhancement algorithms are designed to remove the additive noise with minimum speech distortion.The task of speech enhancement is challenging due to lack of knowledge about the corrupting noise. Hence, the most challenging task is to estimate the noise which degrades the speech. Several approaches has been adopted for noise estimation which mainly fall under two categories: single channel algorithms and multiple channel algorithms. Due to this, the speech enhancement algorithms are also broadly classified as single and multiple channel enhancement algorithms.In this thesis, speech enhancement is studied in acoustic and modulation domains along with both amplitude and phase enhancement. We propose a noise estimation technique based on the spectral sparsity, detected by using the harmonic property of voiced segment of the speech. We estimate the frame to frame phase difference for the clean speech from available corrupted speech. This estimated frame-to-frame phase difference is used as a means of detecting the noise-only frequency bins even in voiced frames. This gives better noise estimation for the highly non-stationary noises like babble, restaurant and subway noise. This noise estimation along with the phase difference as an additional prior is used to extend the standard spectral subtraction algorithm. We also verify the effectiveness of this noise estimation technique when used with the Minimum Mean Squared Error Short Time Spectral Amplitude Estimator (MMSE STSA) speech enhancement algorithm. The combination of MMSE STSA and spectral subtraction results in further improvement of speech quality

    Speech enhancement by modeling of stationary time-frequency regions

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1999.Includes bibliographical references (leaf 65).by Rubén E. Galarza.S.M

    Filtering, smoothing, and prediction using a control-loop spectral factorization method for coloured noise

    Get PDF
    A method for the linear least-squares estimation of random signals contaminated with random noise is shown that uses a new method of spectral factorization. It is shown that the optimal filter can be written entirely in terms of the two spectral factors of signal plus noise and noise-alone, and can be applied to the general case of coloured and white additive noise. The method of spectral factorization used is novel and uses control-system methodology

    Deep Neural Networks for Speech Enhancement in Complex-Noisy Environments

    Get PDF
    In this paper, we considered the problem of the speech enhancement similar to the real-world environments where several complex noise sources simultaneously degrade the quality and intelligibility of a target speech. The existing literature on the speech enhancement principally focuses on the presence of one noise source in mixture signals. However, in real-world situations, we generally face and attempt to improve the quality and intelligibility of speech where various complex stationary and nonstationary noise sources are simultaneously mixed with the target speech. Here, we have used deep learning for speech enhancement in complex-noisy environments and used ideal binary mask (IBM) as a binary classification function by using deep neural networks (DNNs). IBM is used as a target function during training and the trained DNNs are used to estimate IBM during enhancement stage. The estimated target function is then applied to the complex-noisy mixtures to obtain the target speech. The mean square error (MSE) is used as an objective cost function at various epochs. The experimental results at different input signal-to-noise ratio (SNR) showed that DNN-based complex-noisy speech enhancement outperformed the competing methods in terms of speech quality by using perceptual evaluation of speech quality (PESQ), segmental signal-to-noise ratio (SNRSeg), log-likelihood ratio (LLR), weighted spectral slope (WSS). Moreover, short-time objective intelligibility (STOI) reinforced the better speech intelligibility

    Convolutional Deblurring for Natural Imaging

    Full text link
    In this paper, we propose a novel design of image deblurring in the form of one-shot convolution filtering that can directly convolve with naturally blurred images for restoration. The problem of optical blurring is a common disadvantage to many imaging applications that suffer from optical imperfections. Despite numerous deconvolution methods that blindly estimate blurring in either inclusive or exclusive forms, they are practically challenging due to high computational cost and low image reconstruction quality. Both conditions of high accuracy and high speed are prerequisites for high-throughput imaging platforms in digital archiving. In such platforms, deblurring is required after image acquisition before being stored, previewed, or processed for high-level interpretation. Therefore, on-the-fly correction of such images is important to avoid possible time delays, mitigate computational expenses, and increase image perception quality. We bridge this gap by synthesizing a deconvolution kernel as a linear combination of Finite Impulse Response (FIR) even-derivative filters that can be directly convolved with blurry input images to boost the frequency fall-off of the Point Spread Function (PSF) associated with the optical blur. We employ a Gaussian low-pass filter to decouple the image denoising problem for image edge deblurring. Furthermore, we propose a blind approach to estimate the PSF statistics for two Gaussian and Laplacian models that are common in many imaging pipelines. Thorough experiments are designed to test and validate the efficiency of the proposed method using 2054 naturally blurred images across six imaging applications and seven state-of-the-art deconvolution methods.Comment: 15 pages, for publication in IEEE Transaction Image Processin
    • …
    corecore