Search CORE

103 research outputs found

Source Separation for Hearing Aid Applications

Author: Pedersen Michael Syskind
Publication venue: Technical University of Denmark
Publication date: 01/11/2006
Field of study

A general framework for online audio source separation

Author: E. Vincent
M.S. Brandstein
N.Q.K. Duong
S. Makino
Y. Mori
Publication venue
Publication date: 28/12/2011
Field of study

We consider the problem of online audio source separation. Existing algorithms adopt either a sliding block approach or a stochastic gradient approach, which is faster but less accurate. Also, they rely either on spatial cues or on spectral cues and cannot separate certain mixtures. In this paper, we design a general online audio source separation framework that combines both approaches and both types of cues. The model parameters are estimated in the Maximum Likelihood (ML) sense using a Generalised Expectation Maximisation (GEM) algorithm with multiplicative updates. The separation performance is evaluated as a function of the block size and the step size and compared to that of an offline algorithm.Comment: International conference on Latente Variable Analysis and Signal Separation (2012

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Convolutive Blind Source Separation Methods

Author: Kjems Ulrik
Larsen Jan
Parra Lucas C.
Pedersen Michael Syskind
Publication venue: 'Springer Fachmedien Wiesbaden GmbH'
Publication date: 01/01/2008
Field of study

In this chapter, we provide an overview of existing algorithms for blind source separation of convolutive audio mixtures. We provide a taxonomy, wherein many of the existing algorithms can be organized, and we present published results from those algorithms that have been applied to real-world audio separation tasks

CiteSeerX

Online Research Database In Technology

Informed algorithms for sound source separation in enclosed reverberant environments

Author: Muhammad Salman Khan (7202543)
Publication venue
Publication date: 01/01/2013
Field of study

While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

Loughborough University Institutional Repository

Vocal Processing with Spectral Analysis

Author: Fitzgerald Bradley J.
Publication venue: Digital Commons @ Olivet
Publication date: 19/04/2018
Field of study

A well-known signal processing issue is that of the “cocktail party problem,” which A well-known signal processing issue is that of the “cocktail party problem,” which refers to the need to be able to separate speakers from a mixture of voices. A solution to this problem could provide insight into signal separation in a variety of signal processing fields. In this study, a method of vocal signal processing was examined to determine if principal component analysis of spectral data could be used to characterize differences between speakers and if these differences could be used to separate mixtures of vocal signals. Processing was done on a set of voice recordings from thirty different speakers to create a projection matrix that could be used by an algorithm to identify the source of an unknown recording from one of the thirty speakers. Two different identification algorithms were tested. The first had an average correct prediction rate of 15.69%, while the second had an average correct prediction rate of 10.47%. Additionally, one principal component derived from the processing provided a notable distinction between principal values for male and female speakers. Males tended to produce positive principal values, while females tended to produce negative values. The success of the algorithm could be improved by implementing differentiation between time segments of speech and segments of silence. The incorporation of this distinction into the signal processing method was recommended as a topic for future study

Olivet Nazarene University

Vocal Processing with Spectral Analysis

Author: Fitzgerald Brad
Publication venue: Digital Commons @ Olivet
Publication date: 19/04/2018
Field of study

A method of vocal signal processing was examined to determine if principal component analysis of spectral data may be used to characterize differences between speakers and if these differences may be used to separate mixtures of vocal signals. Processing was done on a set of voice recordings from 30 different speakers in order to create a projection matrix which could be used by an algorithm to identify the source of an unknown recording from one of the 30 speakers. Two different identification algorithms were tested, both of which were generally unable to correctly identify the source of a single vocal signal. However, one principal component derived from the processing provided a notable distinction between values for male and female speakers. Because of the lack of success in identifying single speakers, the method was unable to be used to separate mixtures of vocal signals. A possible cause of the lack of success could be rooted in the processing methodology’s lack of differentiation between time segments of speech and segments of silence. The incorporation of this distinction into the signal processing method was recommended as a topic for future study

Olivet Nazarene University

Source Separation and DOA Estimation for Underdetermined Auditory Scene

Author: Ding Ning
Hamada Nozomu
Publication venue: 'IntechOpen'
Publication date: 05/03/2014
Field of study

IntechOpen

Independent Component Analysis Enhancements for Source Separation in Immersive Audio Environments

Author: Zhao Yue
Publication venue: UKnowledge
Publication date: 01/01/2013
Field of study

In immersive audio environments with distributed microphones, Independent Component Analysis (ICA) can be applied to uncover signals from a mixture of other signals and noise, such as in a cocktail party recording. ICA algorithms have been developed for instantaneous source mixtures and convolutional source mixtures. While ICA for instantaneous mixtures works when no delays exist between the signals in each mixture, distributed microphone recordings typically result various delays of the signals over the recorded channels. The convolutive ICA algorithm should account for delays; however, it requires many parameters to be set and often has stability issues. This thesis introduces the Channel Aligned FastICA (CAICA), which requires knowledge of the source distance to each microphone, but does not require knowledge of noise sources. Furthermore, the CAICA is combined with Time Frequency Masking (TFM), yielding even better SOI extraction even in low SNR environments. Simulations were conducted for ranking experiments tested the performance of three algorithms: Weighted Beamforming (WB), CAICA, CAICA with TFM. The Closest Microphone (CM) recording is used as a reference for all three. Statistical analyses on the results demonstrated superior performance for the CAICA with TFM. The algorithms were applied to experimental recordings to support the conclusions of the simulations. These techniques can be deployed in mobile platforms, used in surveillance for capturing human speech and potentially adapted to biomedical fields

University of Kentucky

An audio-visual system for object-based audio : from recording to listening

Author: Coleman P
Cox TJ
de Campos T
Fazi FM
Franck A
Francombe J
Galvez MFS
Hilton A
Hughes RJ
Jackson PJB
Liu Q
Melchior F
Menzies D
Pike C
Tang Y
Woodcock JS
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 17/01/2018
Field of study

Object-based audio is an emerging representation for audio content, where content is represented in a reproduction format-agnostic way and, thus, produced once for consumption on many different kinds of devices. This affords new opportunities for immersive, personalized, and interactive listening experiences. This paper introduces an end-to-end object-based spatial audio pipeline, from sound recording to listening. A high-level system architecture is proposed, which includes novel audiovisual interfaces to support object-based capture and listenertracked rendering, and incorporates a proposed component for objectification, that is, recording content directly into an object-based form. Text-based and extensible metadata enable communication between the system components. An open architecture for object rendering is also proposed. The system’s capabilities are evaluated in two parts. First, listener-tracked reproduction of metadata automatically estimated from two moving talkers is evaluated using an objective binaural localization model. Second, object-based scene capture with audio extracted using blind source separation (to remix between two talkers) and beamforming (to remix a recording of a jazz group) is evaluate

University of Salford Institutional Repository

Southampton (e-Prints Soton)

University of Surrey

Surrey Research Insight

Blind Source Separation for Speech Application Under Real Acoustic Environment

Author: Hiroshi Saruwatari
Yu Takahashi
Publication venue: 'IntechOpen'
Publication date: 10/10/2012
Field of study

IntechOpen