610 research outputs found
Structured Sparsity Models for Multiparty Speech Recovery from Reverberant Recordings
We tackle the multi-party speech recovery problem through modeling the
acoustic of the reverberant chambers. Our approach exploits structured sparsity
models to perform room modeling and speech recovery. We propose a scheme for
characterizing the room acoustic from the unknown competing speech sources
relying on localization of the early images of the speakers by sparse
approximation of the spatial spectra of the virtual sources in a free-space
model. The images are then clustered exploiting the low-rank structure of the
spectro-temporal components belonging to each source. This enables us to
identify the early support of the room impulse response function and its unique
map to the room geometry. To further tackle the ambiguity of the reflection
ratios, we propose a novel formulation of the reverberation model and estimate
the absorption coefficients through a convex optimization exploiting joint
sparsity model formulated upon spatio-spectral sparsity of concurrent speech
representation. The acoustic parameters are then incorporated for separating
individual speech signals through either structured sparse recovery or inverse
filtering the acoustic channels. The experiments conducted on real data
recordings demonstrate the effectiveness of the proposed approach for
multi-party speech recovery and recognition.Comment: 31 page
Compressive speech enhancement using semi-soft thresholding and improved threshold estimation
Compressive speech enhancement is based on the compressive sensing (CS) sampling theory and utilizes the sparsity of the signal for its enhancement. To improve the performance of the discrete wavelet transform (DWT) basis-function based compressive speech enhancement algorithm, this study presents a semi-soft thresholding approach suggesting improved threshold estimation and threshold rescaling parameters. The semi-soft thresholding approach utilizes two thresholds, one threshold value is an improved universal threshold and the other is calculated based on the initial-silence-region of the signal. This study suggests that thresholding should be applied to both detail coefficients and approximation coefficients to remove noise effectively. The performances of the hard, soft, garrote and semi-soft thresholding approaches are compared based on objective quality and speech intelligibility measures. The normalized covariance measure is introduced as an effective intelligibility measure as it has a strong correlation with the intelligibility of the speech signal. A visual inspection of the output signal is used to verify the results. Experiments were conducted on the noisy speech corpus (NOIZEUS) speech database. The experimental results indicate that the proposed method of semi-soft thresholding using improved threshold estimation provides better enhancement compared to the other thresholding approaches
Compressed Sensing based Dynamic PSD Map Construction in Cognitive Radio Networks
In the context of spectrum sensing in cognitive radio networks, collaborative spectrum sensing has been proposed as a way to overcome multipath and shadowing, and hence increasing the reliability of the sensing. Due to the high amount of information to be transmitted, a dynamic compressive sensing approach is proposed to map the PSD estimate to a sparse domain which is then transmitted to the fusion center. In this regard, CRs send a compressed version of their estimated PSD to the fusion center, whose job is to reconstruct the PSD estimates of the CRs, fuse them, and make a global decision on the availability of the spectrum in space and frequency domains at a given time. The proposed compressive sensing based method considers the dynamic nature of the PSD map, and uses this dynamicity in order to decrease the amount of data needed to be transmitted between CR sensors’ and the fusion center. By using the proposed method, an acceptable PSD map for cognitive radio purposes can be achieved by only 20 % of full data transmission between sensors and master node. Also, simulation results show the robustness of the proposed method against the channel variations, diverse compression ratios and processing times in comparison with static methods
Exploiting Low-dimensional Structures to Enhance DNN Based Acoustic Modeling in Speech Recognition
We propose to model the acoustic space of deep neural network (DNN)
class-conditional posterior probabilities as a union of low-dimensional
subspaces. To that end, the training posteriors are used for dictionary
learning and sparse coding. Sparse representation of the test posteriors using
this dictionary enables projection to the space of training data. Relying on
the fact that the intrinsic dimensions of the posterior subspaces are indeed
very small and the matrix of all posteriors belonging to a class has a very low
rank, we demonstrate how low-dimensional structures enable further enhancement
of the posteriors and rectify the spurious errors due to mismatch conditions.
The enhanced acoustic modeling method leads to improvements in continuous
speech recognition task using hybrid DNN-HMM (hidden Markov model) framework in
both clean and noisy conditions, where upto 15.4% relative reduction in word
error rate (WER) is achieved
- …