3,979 research outputs found
Compressive speech enhancement using semi-soft thresholding and improved threshold estimation
Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches
Enhanced Compressive Wideband Frequency Spectrum Sensing for Dynamic Spectrum Access
Wideband spectrum sensing detects the unused spectrum holes for dynamic
spectrum access (DSA). Too high sampling rate is the main problem. Compressive
sensing (CS) can reconstruct sparse signal with much fewer randomized samples
than Nyquist sampling with high probability. Since survey shows that the
monitored signal is sparse in frequency domain, CS can deal with the sampling
burden. Random samples can be obtained by the analog-to-information converter.
Signal recovery can be formulated as an L0 norm minimization and a linear
measurement fitting constraint. In DSA, the static spectrum allocation of
primary radios means the bounds between different types of primary radios are
known in advance. To incorporate this a priori information, we divide the whole
spectrum into subsections according to the spectrum allocation policy. In the
new optimization model, the minimization of the L2 norm of each subsection is
used to encourage the cluster distribution locally, while the L0 norm of the L2
norms is minimized to give sparse distribution globally. Because the L0/L2
optimization is not convex, an iteratively re-weighted L1/L2 optimization is
proposed to approximate it. Simulations demonstrate the proposed method
outperforms others in accuracy, denoising ability, etc.Comment: 23 pages, 6 figures, 4 table. arXiv admin note: substantial text
overlap with arXiv:1005.180
A convolutional neural-network model of human cochlear mechanics and filter tuning for real-time applications
Auditory models are commonly used as feature extractors for automatic
speech-recognition systems or as front-ends for robotics, machine-hearing and
hearing-aid applications. Although auditory models can capture the biophysical
and nonlinear properties of human hearing in great detail, these biophysical
models are computationally expensive and cannot be used in real-time
applications. We present a hybrid approach where convolutional neural networks
are combined with computational neuroscience to yield a real-time end-to-end
model for human cochlear mechanics, including level-dependent filter tuning
(CoNNear). The CoNNear model was trained on acoustic speech material and its
performance and applicability were evaluated using (unseen) sound stimuli
commonly employed in cochlear mechanics research. The CoNNear model accurately
simulates human cochlear frequency selectivity and its dependence on sound
intensity, an essential quality for robust speech intelligibility at negative
speech-to-background-noise ratios. The CoNNear architecture is based on
parallel and differentiable computations and has the power to achieve real-time
human performance. These unique CoNNear features will enable the next
generation of human-like machine-hearing applications
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
- …