131 research outputs found
Monaural speech separation using source-adapted models
We propose a model-based source separation system for use on single channel speech mixtures where the precise source characteristics are not known a priori. We do this by representing the space of source variation with a parametric signal model based on the eigenvoice technique for rapid speaker adaptation. We present an algorithm to infer the characteristics of the sources present in a mixture, allowing for significantly improved separation performance over that obtained using unadapted source models. The algorithm is evaluated on the task defined in the 2006 Speech Separation Challenge [1] and compared with separation using source-dependent models
Recommended from our members
Some Projects in Real-World Sound Analysis
A summary of work in speech separation, soundtrack classification, and music audio analysis at the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University
Recommended from our members
Using Speech Models for Separation
Talk based on the work of Ron Weiss and Mike Mandel, given at a special session on understanding speech in interference
A variational EM algorithm for learning eigenvoice parameters in mixed signals
We derive an efficient learning algorithm for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using factor-analyzed hidden Markov models (HMM) where source specific characteristics are captured by an "eigenvoice" speaker subspace model. The proposed algorithm is able to learn adaptation parameters for two speech sources when only a mixture of signals is observed. We evaluate the algorithm on the 2006 speech separation challenge data set and show that it is significantly faster than our earlier system at a small cost in terms of performance
Recommended from our members
Learning, Using, and Adapting Models in Scene Analysis
Discusses models of source behavior as the way to conquer uncertainty in mixtures
Recommended from our members
Combining Localization Cues and Source Model Constraints for Binaural Source Separation
We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. The proposed algorithm effectively combines information derived from low level perceptual cues, similar to those used by the human auditory system, with higher level information related to speaker identity. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels in the presence of reverberation. In simulated mixtures of speech from two and three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 1.7 dB over a baseline algorithm which uses only interaural cues. Further improvement is obtained by incorporating eigenvoice speaker adaptation to enable the source model to better match the sources present in the signal. This improves performance over the baseline by 2.7 dB when the speakers used for training and testing are matched. However, the improvement is minimal when the test data is very different from that used in training
Recommended from our members
Environmental Sound Recognition and Classification
Describes getting information out of soundtracks and environmental recordings
Recommended from our members
Speech Separation for Recognition and Enhancement
A pitch for the significance of complex acoustic scenes ("Speech in the Wild"), and the importance of thinking about ways for separating and organizing them. Includes very brief reviews of separation by spatial cues, pitch, and source models
- …