Search CORE

3,979 research outputs found

Compressive speech enhancement using semi-soft thresholding and improved threshold estimation

Author: Rayavarapu Neela
Sahu Smriti
Publication venue: Institute of Advanced Engineering and Science
Publication date: 01/06/2023
Field of study

Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches

ZENODO

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY

Institute of Advanced Engineering and Science

Enhanced Compressive Wideband Frequency Spectrum Sensing for Dynamic Spectrum Access

Author: Liu Yipeng
Wan Qun
Publication venue
Publication date: 01/01/2012
Field of study

Wideband spectrum sensing detects the unused spectrum holes for dynamic spectrum access (DSA). Too high sampling rate is the main problem. Compressive sensing (CS) can reconstruct sparse signal with much fewer randomized samples than Nyquist sampling with high probability. Since survey shows that the monitored signal is sparse in frequency domain, CS can deal with the sampling burden. Random samples can be obtained by the analog-to-information converter. Signal recovery can be formulated as an L0 norm minimization and a linear measurement fitting constraint. In DSA, the static spectrum allocation of primary radios means the bounds between different types of primary radios are known in advance. To incorporate this a priori information, we divide the whole spectrum into subsections according to the spectrum allocation policy. In the new optimization model, the minimization of the L2 norm of each subsection is used to encourage the cluster distribution locally, while the L0 norm of the L2 norms is minimized to give sparse distribution globally. Because the L0/L2 optimization is not convex, an iteratively re-weighted L1/L2 optimization is proposed to approximate it. Simulations demonstrate the proposed method outperforms others in accuracy, denoising ability, etc.Comment: 23 pages, 6 figures, 4 table. arXiv admin note: substantial text overlap with arXiv:1005.180

arXiv.org e-Print Archive

Springer - Publisher Connector

A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

Author: Baby Deepak
Broucke Arthur Van Den
Verhulst Sarah
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 04/12/2020
Field of study

Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications

arXiv.org e-Print Archive

Ghent University Academic Bibliography

PubMed Central

Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

Author: Asaei Afsaneh
Bourlard Hervé
Cevher Volkan
Golbabaee Mohammad
Publication venue
Publication date: 01/01/2012
Field of study

We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page

arXiv.org e-Print Archive

Edinburgh Research Explorer