3,979 research outputs found

    Compressive speech enhancement using semi-soft thresholding and improved threshold estimation

    Get PDF
    Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches

    Enhanced Compressive Wideband Frequency Spectrum Sensing for Dynamic Spectrum Access

    Get PDF
    Wideband spectrum sensing detects the unused spectrum holes for dynamic spectrum access (DSA). Too high sampling rate is the main problem. Compressive sensing (CS) can reconstruct sparse signal with much fewer randomized samples than Nyquist sampling with high probability. Since survey shows that the monitored signal is sparse in frequency domain, CS can deal with the sampling burden. Random samples can be obtained by the analog-to-information converter. Signal recovery can be formulated as an L0 norm minimization and a linear measurement fitting constraint. In DSA, the static spectrum allocation of primary radios means the bounds between different types of primary radios are known in advance. To incorporate this a priori information, we divide the whole spectrum into subsections according to the spectrum allocation policy. In the new optimization model, the minimization of the L2 norm of each subsection is used to encourage the cluster distribution locally, while the L0 norm of the L2 norms is minimized to give sparse distribution globally. Because the L0/L2 optimization is not convex, an iteratively re-weighted L1/L2 optimization is proposed to approximate it. Simulations demonstrate the proposed method outperforms others in accuracy, denoising ability, etc.Comment: 23 pages, 6 figures, 4 table. arXiv admin note: substantial text overlap with arXiv:1005.180

    A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications

    Full text link
    Auditory models are commonly used as feature extractors for automatic speech-recognition systems or as front-ends for robotics, machine-hearing and hearing-aid applications. Although auditory models can capture the biophysical and nonlinear properties of human hearing in great detail, these biophysical models are computationally expensive and cannot be used in real-time applications. We present a hybrid approach where convolutional neural networks are combined with computational neuroscience to yield a real-time end-to-end model for human cochlear mechanics, including level-dependent filter tuning (CoNNear). The CoNNear model was trained on acoustic speech material and its performance and applicability were evaluated using (unseen) sound stimuli commonly employed in cochlear mechanics research. The CoNNear model accurately simulates human cochlear frequency selectivity and its dependence on sound intensity, an essential quality for robust speech intelligibility at negative speech-to-background-noise ratios. The CoNNear architecture is based on parallel and differentiable computations and has the power to achieve real-time human performance. These unique CoNNear features will enable the next generation of human-like machine-hearing applications

    Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings

    Get PDF
    We tackle the multi-party speech recovery problem through modeling the acoustic of the reverberant chambers. Our approach exploits structured sparsity models to perform room modeling and speech recovery. We propose a scheme for characterizing the room acoustic from the unknown competing speech sources relying on localization of the early images of the speakers by sparse approximation of the spatial spectra of the virtual sources in a free-space model. The images are then clustered exploiting the low-rank structure of the spectro-temporal components belonging to each source. This enables us to identify the early support of the room impulse response function and its unique map to the room geometry. To further tackle the ambiguity of the reflection ratios, we propose a novel formulation of the reverberation model and estimate the absorption coefficients through a convex optimization exploiting joint sparsity model formulated upon spatio-spectral sparsity of concurrent speech representation. The acoustic parameters are then incorporated for separating individual speech signals through either structured sparse recovery or inverse filtering the acoustic channels. The experiments conducted on real data recordings demonstrate the effectiveness of the proposed approach for multi-party speech recovery and recognition.Comment: 31 page
    corecore