Search CORE

20 research outputs found

Sound Source Separation using Shifted Non-negative Tensor Factorisation

Author: Coyle Eugene
Cranitch Matt
Fitzgerald Derry
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2006
Field of study

Recently, shifted non-negative Matrix Factorisation was developed as a means of separating harmonic instruments from single channel mixtures. However, in many cases two or more channels are available, in which case it would be advantageous to have a multichannel version of the algorithm. To this end, a shifted Non-negative Tensor Factorisation algorithm is derived, which extends shifted Non-negative Matrix Factoristiaon to the multi channel case. The use of this algorithm for multi-channel sound source separation of harmonic instruments is demonstrated. Further, it is shown that the algorithm can be used to perform Non-negative Tensor Deconvolution, to separate sound sources which have time evolving spectra from multi-channel signals

Arrow@TUDublin

Single-Channel Speech Separation using Sparse Non-Negative Matrix Factorization

Author: Olsson Rasmus Kongsgaard
Schmidt Mikkel N.
Publication venue
Publication date: 01/01/2006
Field of study

We apply machine learning techniques to the problem of separating multiple speech sources from a single microphone recording. The method of choice is a sparse non-negative matrix factorization algorithm, which in an unsupervised manner can learn sparse representations of the data. This is applied to the learning of personalized dictionaries from a speech corpus, which in turn are used to separate the audio stream into its components. We show that computational savings can be achieved by segmenting the training data on a phoneme level. To split the data, a conventional speech recognizer is used. The performance of the unsupervised and supervised adaptation schemes result in significant improvements in terms of the target-to-masker ratio. Index Terms: Single-channel source separation, sparse nonnegative matrix factorization

CiteSeerX

Online Research Database In Technology

Linear Regression on Sparse Features for Single-Channel Speech Separation

Author: Olsson Rasmus Kongsgaard
Schmidt Mikkel N.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2007
Field of study

Crossref

Online Research Database In Technology

A novel approach to Acoustic Echo cancellation

Author: Cahill Niall M.
Lawlor Bob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

In this paper a novel approach to single microphone Acoustic Echo cancellation (AEC) is presented. This approach performs AEC by employing techniques developed for monaural sound source separation. It is shown that the AEC problem can be cast in a monaural sound source separation framework and through this framework significant echo suppression can be achieved. The new approach is evaluated through experiments on simulated data

MURAL - Maynooth University Research Archive Library

NUI Maynooth Eprint Archive

Maynooth University ePrints and eTheses Archive

A novel approach to Acoustic Echo cancellation

Author: Cahill Niall M.
Lawlor Bob
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2008
Field of study

MURAL - Maynooth University Research Archive Library

On the Use of Masking Filters in Sound Source Separation

Author: Fitzgerald Derry
Jaiswal Amit
Publication venue: Dublin Institute of Technology
Publication date: 01/01/2012
Field of study

Many sound source separation algorithms, such as NMF and related approaches, disregard phase information and operate only on magnitude or power spectrograms. In this context, generalised Wiener filters have been widely used to generate masks which are applied to the original complex-valued spectrogram before inversion to the time domain, as these masks have been shown to give good results. However, these masks may not be optimal from a perceptual point of view. To this end, we propose new families of masks and compare their performance to generalised Wiener filter masks using three different factorisation-based separation algorithms. Further, to-date no analysis of how the performance of masking varies with the number of iterations performed when estimating the separated sources. We perform such an analysis and show that when using these masks, running to convergence may not be required in order to obtain good separation performance

Arrow@TUDublin

Upmixing from Mono : a Source Separation Approach

Author: Fitzgerald Derry
Publication venue: Dublin Institute of Technology
Publication date: 01/07/2011
Field of study

We present a system for upmixing mono recordings to stereo through the use of sound source separation techniques. The use of sound source separation has the advantage of allowing sources to be placed at distinct points in the stereo field, resulting in more natural sounding upmixes. The system separates an input signal into a number of sources, which can then be imported into a digital audio workstation for upmixing to stereo. Considerations to be taken into account when upmixing are discussed, and a brief overview of the various sound source separation techniques used in the system are given. The effectiveness of the proposed system is then demonstrated on real-world mono recordings

Arrow@TUDublin

Joint Multi-Pitch Detection Using Harmonic Envelope Estimation for Polyphonic Music Transcription

Author: Emmanouil Benetos
Simon Dixon
Student Member
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

In this paper, a method for automatic transcription of music signals based on joint multiple-F0 estimation is proposed. As a time-frequency representation, the constant-Q resonator time-frequency image is employed, while a novel noise suppression technique based on pink noise assumption is applied in a preprocessing step. In the multiple-F0 estimation stage, the optimal tuning and inharmonicity parameters are computed and a salience function is proposed in order to select pitch candidates. For each pitch candidate combination, an overlapping partial treatment procedure is used, which is based on a novel spectral envelope estimation procedure for the log-frequency domain, in order to compute the harmonic envelope of candidate pitches. In order to select the optimal pitch combination for each time frame, a score function is proposed which combines spectral and temporal characteristics of the candidate pitches and also aims to suppress harmonic errors. For postprocessing, hidden Markov models (HMMs) and conditional random fields (CRFs) trained on MIDI data are employed, in order to boost transcription accuracy. The system was trained on isolated piano sounds from the MAPS database and was tested on classic and jazz recordings from the RWC database, as well as on recordings from a Disklavier piano. A comparison with several state-of-the-art systems is provided using a variety of error metrics, where encouraging results are indicated

CiteSeerX

City Research Online

Crossref

An exploration into the sparse representation of spectra

Author: Mthembu Linda
Publication venue: Department of Electrical Engineering
Publication date: 01/01/2007
Field of study

Includes bibliographical references (leaves 73-76)This thesis describes an exploration in achieving sparse representations of object, with special focus on spectral data. Given a database of objects one would like to know the actual aspects of each class that distinguish it from any other class in the database. We explore the hypothesis that simple abstractions (descriptions) that humans normally make, especially based on the visual phenomenology or physics on the problem, can be helpful in extracting and formulating useful sparse representations of the observed objects. In this thesis we focus on the discovery of such underlying features, employing a number of recent methods from machine learning. Firstly we find that an approach to automatic feature discovery recently proposed in the literature (Non Negative Matrix Factorization) is not as it seems. We show the limitations of this approach and demonstrate a more efficient method on a synthetic problem. Secondly we explore a more empirical approach to extracting visually attractive features of spectra from which we formulate simple re-representation of spectral data and show that the identification and discovery of certain intuitive features at various scales can be sufficient to describe a spectrum profile. Finally we explore a more traditional and principled automatic method of analyzing a spectrum at different resolutions (Wavelets). We find that certain classes of spectra can easily be discriminated between by a simple approximation of the spectrum profile while in other cases only the finer profile details are important. Throughout this thesis we employ a measure called the separability index as our measure of how easy it is to discriminate objects in a database with the proposed representations

Cape Town University OpenUCT