Search CORE

40,221 research outputs found

Jointly Tracking and Separating Speech Sources Using Multiple Features and the generalized labeled multi-Bernoulli Framework

Author: Lin Shoufeng
Publication venue
Publication date: 16/04/2018
Field of study

This paper proposes a novel joint multi-speaker tracking-and-separation method based on the generalized labeled multi-Bernoulli (GLMB) multi-target tracking filter, using sound mixtures recorded by microphones. Standard multi-speaker tracking algorithms usually only track speaker locations, and ambiguity occurs when speakers are spatially close. The proposed multi-feature GLMB tracking filter treats the set of vectors of associated speaker features (location, pitch and sound) as the multi-target multi-feature observation, characterizes transitioning features with corresponding transition models and overall likelihood function, thus jointly tracks and separates each multi-feature speaker, and addresses the spatial ambiguity problem. Numerical evaluation verifies that the proposed method can correctly track locations of multiple speakers and meanwhile separate speech signals

arXiv.org e-Print Archive

Crossref

Sound Source Separation

Author: Evangelista G
Marchand S
Plumbley MD
Vincent E
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL Descartes

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Identify, locate and separate: Audio-visual object extraction in large video collections using weak supervision

Author: Duong Ngoc
Essid Slim
Ozerov Alexey
Parekh Sanjeel
Pérez Patrick
Richard Gaël
Publication venue
Publication date: 07/11/2018
Field of study

We tackle the problem of audiovisual scene analysis for weakly-labeled data. To this end, we build upon our previous audiovisual representation learning framework to perform object classification in noisy acoustic environments and integrate audio source enhancement capability. This is made possible by a novel use of non-negative matrix factorization for the audio modality. Our approach is founded on the multiple instance learning paradigm. Its effectiveness is established through experiments over a challenging dataset of music instrument performance videos. We also show encouraging visual object localization results

arXiv.org e-Print Archive

HAL-Rennes 1

An adaptive stereo basis method for convolutive blind audio source separation

Author: Abdallah
Abdallah
Aharon
Amari
Amari
Araki
Bell
Cardoso
Cardoso
Cardoso
Davies
Douglas
Emmanuel Vincent
Hyvärinen
Ikeda
Ikram
Jafari
Jourjine
Knapp
Kurita
Lewicki
Makino
Maria G. Jafari
Mark D. Plumbley
Matsuda
Mike E. Davies
Mitianoudis
Mitianoudis
O’Grady
Parra
Samer A. Abdallah
Saruwatari
Sawada
Schmidt
Smaragdis
Torkkola
Vincent
Vincent
Viste
Yilmaz
Zhang
Publication venue: 'Elsevier BV'
Publication date: 01/01/2008
Field of study

NOTICE: this is the author’s version of a work that was accepted for publication in Neurocomputing. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. A definitive version was subsequently published in PUBLICATION, [71, 10-12, June 2008] DOI:neucom.2007.08.02

Crossref

UCL Discovery

Edinburgh Research Explorer

Queen Mary Research Online