Search CORE

1,365 research outputs found

Sinusoidal masks for single channel speech separation

Author: Christensen Mads Græsbøll
Jensen Søren Holdt
Mowlaee Pejman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2010
Field of study

VBN

New Stategies for Single-channel Speech Separation

Author: Mowlaee Beikzadehmahalen Pejman
Publication venue: Institut for Elektroniske Systemer, Aalborg Universitet
Publication date: 01/01/2010
Field of study

VBN

Sound Source Separation

Author: Evangelista G
Marchand S
Plumbley MD
Vincent E
Publication venue: 'Wiley'
Publication date: 01/01/2011
Field of study

This is the author's accepted pre-print of the article, first published as G. Evangelista, S. Marchand, M. D. Plumbley and E. Vincent. Sound source separation. In U. Zölzer (ed.), DAFX: Digital Audio Effects, 2nd edition, Chapter 14, pp. 551-588. John Wiley & Sons, March 2011. ISBN 9781119991298. DOI: 10.1002/9781119991298.ch14file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.26file: Proof:e\EvangelistaMarchandPlumbleyV11-sound.pdf:PDF owner: markp timestamp: 2011.04.2

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

Queen Mary Research Online

Surrey Research Insight

HAL-Rennes 1

Improved single-channel speech separation using sinusoidal modeling

Author: Christensen Mads Græsbøll
Jensen Søren Holdt
Mowlaee Pejman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 14/03/2010
Field of study

VBN

Multi-Channel Masking with Learnable Filterbank for Sound Source Separation

Author: Dai Wang
Politis Archontis
Virtanen Tuomas
Publication venue
Publication date: 14/03/2023
Field of study

This work proposes a learnable filterbank based on a multi-channel masking framework for multi-channel source separation. The learnable filterbank is a 1D Conv layer, which transforms the raw waveform into a 2D representation. In contrast to the conventional single-channel masking method, we estimate a mask for each individual microphone channel. The estimated masks are then applied to the transformed waveform representation like in the traditional filter-and-sum beamforming operation. Specifically, each mask is used to multiply the corresponding channel's 2D representation, and the masked output of all channels are then summed. At last, a 1D transposed Conv layer is used to convert the summed masked signal into the waveform domain. The experimental results show our method outperforms single-channel masking with a learnable filterbank and can outperform multi-channel complex masking with STFT complex spectrum in the STGCSEN model if a learnable filterbank is transformed to a higher feature dimension. The spatial response analysis also verifies that multi-channel masking in the learnable filterbank domain has spatial selectivity

arXiv.org e-Print Archive

New Results on Single-Channel Speech Separation Using Sinusoidal Modeling

Author: Christensen Mads Græsbøll
Jensen Søren Holdt
Mowlaee Pejman
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Crossref

VBN

An iterative model-based approach to cochannel speech separation

Author
Publication venue: Springer
Publication date: 26/06/2013
Field of study

Springer - Publisher Connector

High-resolution sinusoidal analysis for resolving harmonic collisions in music audio signal processing

Author: Ehmann Andreas
Publication venue
Publication date: 01/12/2011
Field of study

Many music signals can largely be considered an additive combination of multiple sources, such as musical instruments or voice. If the musical sources are pitched instruments, the spectra they produce are predominantly harmonic, and are thus well suited to an additive sinusoidal model. However, due to resolution limits inherent in time-frequency analyses, when the harmonics of multiple sources occupy equivalent time-frequency regions, their individual properties are additively combined in the time-frequency representation of the mixed signal. Any such time-frequency point in a mixture where multiple harmonics overlap produces a single observation from which the contributions owed to each of the individual harmonics cannot be trivially deduced. These overlaps are referred to as overlapping partials or harmonic collisions. If one wishes to infer some information about individual sources in music mixtures, the information carried in regions where collided harmonics exist becomes unreliable due to interference from other sources. This interference has ramifications in a variety of music signal processing applications such as multiple fundamental frequency estimation, source separation, and instrumentation identification. This thesis addresses harmonic collisions in music signal processing applications. As a solution to the harmonic collision problem, a class of signal subspace-based high-resolution sinusoidal parameter estimators is explored. Specifically, the direct matrix pencil method, or equivalently, the Estimation of Signal Parameters via Rotational Invariance Techniques (ESPRIT) method, is used with the goal of producing estimates of the salient parameters of individual harmonics that occupy equivalent time-frequency regions. This estimation method is adapted here to be applicable to time-varying signals such as musical audio. While high-resolution methods have been previously explored in the context of music signal processing, previous work has not addressed whether or not such methods truly produce high-resolution sinusoidal parameter estimates in real-world music audio signals. Therefore, this thesis answers the question of whether high-resolution sinusoidal parameter estimators are really high-resolution for real music signals. This work directly explores the capabilities of this form of sinusoidal parameter estimation to resolve collided harmonics. The capabilities of this analysis method are also explored in the context of music signal processing applications. Potential benefits of high-resolution sinusoidal analysis are examined in experiments involving multiple fundamental frequency estimation and audio source separation. This work shows that there are indeed benefits to high-resolution sinusoidal analysis in music signal processing applications, especially when compared to methods that produce sinusoidal parameter estimates based on more traditional time-frequency representations. The benefits of this form of sinusoidal analysis are made most evident in multiple fundamental frequency estimation applications, where substantial performance gains are seen. High-resolution analysis in the context of computational auditory scene analysis-based source separation shows similar performance to existing comparable methods

Illinois Digital Environment for Access to Learning and Scholarship Repository