Search CORE

131 research outputs found

Monaural speech separation using source-adapted models

Author: Ellis Daniel P. W.
Weiss Ron J.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2007
Field of study

We propose a model-based source separation system for use on single channel speech mixtures where the precise source characteristics are not known a priori. We do this by representing the space of source variation with a parametric signal model based on the eigenvoice technique for rapid speaker adaptation. We present an algorithm to infer the characteristics of the sources present in a mixture, allowing for significantly improved separation performance over that obtained using unadapted source models. The algorithm is evaluated on the task defined in the 2006 Speech Separation Challenge [1] and compared with separation using source-dependent models

Crossref

Columbia University Academic Commons

Recommended from our members

Some Projects in Real-World Sound Analysis

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

A summary of work in speech separation, soundtrack classification, and music audio analysis at the Laboratory for Recognition and Organization of Speech and Audio, Department of Electrical Engineering, Columbia University

Columbia University Academic Commons

Recommended from our members

Using Speech Models for Separation

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Talk based on the work of Ron Weiss and Mike Mandel, given at a special session on understanding speech in interference

Columbia University Academic Commons

A variational EM algorithm for learning eigenvoice parameters in mixed signals

Author: Ellis Daniel P. W.
Weiss Ron J.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

We derive an efficient learning algorithm for model-based source separation for use on single channel speech mixtures where the precise source characteristics are not known a priori. The sources are modeled using factor-analyzed hidden Markov models (HMM) where source specific characteristics are captured by an "eigenvoice" speaker subspace model. The proposed algorithm is able to learn adaptation parameters for two speech sources when only a mixture of signals is observed. We evaluate the algorithm on the 2006 speech separation challenge data set and show that it is significantly faster than our earlier system at a small cost in terms of performance

CiteSeerX

Crossref

Columbia University Academic Commons

Recommended from our members

Learning, Using, and Adapting Models in Scene Analysis

Author: Ellis Daniel P. W.
Weiss Ron J.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2009
Field of study

Discusses models of source behavior as the way to conquer uncertainty in mixtures

Columbia University Academic Commons

Recommended from our members

Combining Localization Cues and Source Model Constraints for Binaural Source Separation

Author: Ellis Daniel P. W.
Mandel Michael I.
Weiss Ron J.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

We describe a system for separating multiple sources from a two-channel recording based on interaural cues and prior knowledge of the statistics of the underlying source signals. The proposed algorithm effectively combines information derived from low level perceptual cues, similar to those used by the human auditory system, with higher level information related to speaker identity. We combine a probabilistic model of the observed interaural level and phase differences with a prior model of the source statistics and derive an EM algorithm for finding the maximum likelihood parameters of the joint model. The system is able to separate more sound sources than there are observed channels in the presence of reverberation. In simulated mixtures of speech from two and three speakers the proposed algorithm gives a signal-to-noise ratio improvement of 1.7 dB over a baseline algorithm which uses only interaural cues. Further improvement is obtained by incorporating eigenvoice speaker adaptation to enable the source model to better match the sources present in the signal. This improves performance over the baseline by 2.7 dB when the speakers used for training and testing are matched. However, the improvement is minimal when the test data is very different from that used in training

Columbia University Academic Commons

Recommended from our members

Environmental Sound Recognition and Classification

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

Describes getting information out of soundtracks and environmental recordings

Columbia University Academic Commons

Recommended from our members

Speech Separation for Recognition and Enhancement

Author: Ellis Daniel P. W.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2011
Field of study

A pitch for the significance of complex acoustic scenes ("Speech in the Wild"), and the importance of thinking about ways for separating and organizing them. Includes very brief reviews of separation by spatial cues, pitch, and source models

Columbia University Academic Commons

Cross-lingual speech emotion recognition through factor analysis

Author: Demuynck Kris
Desplanques Brecht
Publication venue: 'International Speech Communication Association'
Publication date: 01/01/2018
Field of study

Crossref

Ghent University Academic Bibliography