64 research outputs found

    Perceptually motivated blind source separation of convolutive audio mixtures

    Get PDF

    Multimodal methods for blind source separation of audio sources

    Get PDF
    The enhancement of the performance of frequency domain convolutive blind source separation (FDCBSS) techniques when applied to the problem of separating audio sources recorded in a room environment is the focus of this thesis. This challenging application is termed the cocktail party problem and the ultimate aim would be to build a machine which matches the ability of a human being to solve this task. Human beings exploit both their eyes and their ears in solving this task and hence they adopt a multimodal approach, i.e. they exploit both audio and video modalities. New multimodal methods for blind source separation of audio sources are therefore proposed in this work as a step towards realizing such a machine. The geometry of the room environment is initially exploited to improve the separation performance of a FDCBSS algorithm. The positions of the human speakers are monitored by video cameras and this information is incorporated within the FDCBSS algorithm in the form of constraints added to the underlying cross-power spectral density matrix-based cost function which measures separation performance. [Continues.

    Enhanced IVA for audio separation in highly reverberant environments

    Get PDF
    Blind Audio Source Separation (BASS), inspired by the "cocktail-party problem", has been a leading research application for blind source separation (BSS). This thesis concerns the enhancement of frequency domain convolutive blind source separation (FDCBSS) techniques for audio separation in highly reverberant room environments. Independent component analysis (ICA) is a higher order statistics (HOS) approach commonly used in the BSS framework. When applied to audio FDCBSS, ICA based methods suffer from the permutation problem across the frequency bins of each source. Independent vector analysis (IVA) is an FD-BSS algorithm that theoretically solves the permutation problem by using a multivariate source prior, where the sources are considered to be random vectors. The algorithm allows independence between multivariate source signals, and retains dependency between the source signals within each source vector. The source prior adopted to model the nonlinear dependency structure within the source vectors is crucial to the separation performance of the IVA algorithm. The focus of this thesis is on improving the separation performance of the IVA algorithm in the application of BASS. An alternative multivariate Student's t distribution is proposed as the source prior for the batch IVA algorithm. A Student's t probability density function can better model certain frequency domain speech signals due to its tail dependency property. Then, the nonlinear score function, for the IVA, is derived from the proposed source prior. A novel energy driven mixed super Gaussian and Student's t source prior is proposed for the IVA and FastIVA algorithms. The Student's t distribution, in the mixed source prior, can model the high amplitude data points whereas the super Gaussian distribution can model the lower amplitude information in the speech signals. The ratio of both distributions can be adjusted according to the energy of the observed mixtures to adapt for different types of speech signals. A particular multivariate generalized Gaussian distribution is adopted as the source prior for the online IVA algorithm. The nonlinear score function derived from this proposed source prior contains fourth order relationships between different frequency bins, which provides a more informative and stronger dependency structure and thereby improves the separation performance. An adaptive learning scheme is developed to improve the performance of the online IVA algorithm. The scheme adjusts the learning rate as a function of proximity to the target solutions. The scheme is also accompanied with a novel switched source prior technique taking the best performance properties of the super Gaussian source prior and the generalized Gaussian source prior as the algorithm converges. The methods and techniques, proposed in this thesis, are evaluated with real speech source signals in different simulated and real reverberant acoustic environments. A variety of measures are used within the evaluation criteria of the various algorithms. The experimental results demonstrate improved performance of the proposed methods and their robustness in a wide range of situations

    Content-based music classification, summarization and retrieval

    Get PDF
    Ph.DDOCTOR OF PHILOSOPH

    A frequency-based BSS technique for speech source separation.

    Get PDF
    Ngan Lai Yin.Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.Includes bibliographical references (leaves 95-100).Abstracts in English and Chinese.Chapter 1 --- Introduction --- p.1Chapter 1.1 --- Blind Signal Separation (BSS) Methods --- p.4Chapter 1.2 --- Objectives of the Thesis --- p.6Chapter 1.3 --- Thesis Outline --- p.8Chapter 2 --- Blind Adaptive Frequency-Shift (BA-FRESH) Filter --- p.9Chapter 2.1 --- Cyclostationarity Properties --- p.10Chapter 2.2 --- Frequency-Shift (FRESH) Filter --- p.11Chapter 2.3 --- Blind Adaptive FRESH Filter --- p.12Chapter 2.4 --- Reduced-Rank BA-FRESH Filter --- p.14Chapter 2.4.1 --- CSP Method --- p.14Chapter 2.4.2 --- PCA Method --- p.14Chapter 2.4.3 --- Appropriate Choice of Rank --- p.14Chapter 2.5 --- Signal Extraction of Spectrally Overlapped Signals --- p.16Chapter 2.5.1 --- Simulation 1: A Fixed Rank --- p.17Chapter 2.5.2 --- Simulation 2: A Variable Rank --- p.18Chapter 2.6 --- Signal Separation of Speech Signals --- p.20Chapter 2.7 --- Chapter Summary --- p.22Chapter 3 --- Reverberant Environment --- p.23Chapter 3.1 --- Small Room Acoustics Model --- p.23Chapter 3.2 --- Effects of Reverberation to Speech Recognition --- p.27Chapter 3.2.1 --- Short Impulse Response --- p.27Chapter 3.2.2 --- Small Room Impulse Response Modelled by Image Method --- p.32Chapter 3.3 --- Chapter Summary --- p.34Chapter 4 --- Information Theoretic Approach for Signal Separation --- p.35Chapter 4.1 --- Independent Component Analysis (ICA) --- p.35Chapter 4.1.1 --- Kullback-Leibler (K-L) Divergence --- p.37Chapter 4.2 --- Information Maximization (Infomax) --- p.39Chapter 4.2.1 --- Stochastic Gradient Descent and Stability Problem --- p.41Chapter 4.2.2 --- Infomax and ICA --- p.41Chapter 4.2.3 --- Infomax and Maximum Likelihood --- p.42Chapter 4.3 --- Signal Separation by Infomax --- p.43Chapter 4.4 --- Chapter Summary --- p.45Chapter 5 --- Blind Signal Separation (BSS) in Frequency Domain --- p.47Chapter 5.1 --- Convolutive Mixing System --- p.48Chapter 5.2 --- Infomax in Frequency Domain --- p.52Chapter 5.3 --- Adaptation Algorithms --- p.54Chapter 5.3.1 --- Standard Gradient Method --- p.54Chapter 5.3.2 --- Natural Gradient Method --- p.55Chapter 5.3.3 --- Convergence Performance --- p.56Chapter 5.4 --- Subband Adaptation --- p.57Chapter 5.5 --- Energy Weighting --- p.59Chapter 5.6 --- The Permutation Problem --- p.61Chapter 5.7 --- Performance Evaluation --- p.63Chapter 5.7.1 --- De-reverberation Performance Factor --- p.63Chapter 5.7.2 --- De-Noise Performance Factor --- p.63Chapter 5.7.3 --- Spectral Signal-to-noise Ratio (SNR) --- p.65Chapter 5.8 --- Chapter Summary --- p.65Chapter 6 --- Simulation Results and Performance Analysis --- p.67Chapter 6.1 --- Small Room Acoustics Modelled by Image Method --- p.67Chapter 6.2 --- Signal Sources --- p.68Chapter 6.2.1 --- Cantonese Speech --- p.69Chapter 6.2.2 --- Noise --- p.69Chapter 6.3 --- De-Noise and De-Reverberation Performance Analysis --- p.69Chapter 6.3.1 --- Speech and White Noise --- p.73Chapter 6.3.2 --- Speech and Voice Babble Noise --- p.76Chapter 6.3.3 --- Two Female Speeches --- p.79Chapter 6.4 --- Recognition Accuracy Performance Analysis --- p.83Chapter 6.4.1 --- Speech and White Noise --- p.83Chapter 6.4.2 --- Speech and Voice Babble Noise --- p.84Chapter 6.4.3 --- Two Cantonese Speeches --- p.85Chapter 6.5 --- Chapter Summary --- p.87Chapter 7 --- Conclusions and Suggestions for Future Research --- p.88Chapter 7.1 --- Conclusions --- p.88Chapter 7.2 --- Suggestions for Future Research --- p.91Appendices --- p.92A The Proof of Stability Conditions for Stochastic Gradient De- scent Algorithm (Ref. (4.15)) --- p.92Bibliography --- p.9

    Informed algorithms for sound source separation in enclosed reverberant environments

    Get PDF
    While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

    Enhanced independent vector analysis for speech separation in room environments

    Get PDF
    PhD ThesisThe human brain has the ability to focus on a desired sound source in the presence of several active sound sources. The machine based method lags behind in mimicking this particular skill of human beings. In the domain of digital signal processing this problem is termed as the cocktail party problem. This thesis thus aims to further the eld of acoustic source separation in the frequency domain based on exploiting source independence. The main challenge in such frequency domain algorithms is the permutation problem. Independent vector analysis (IVA) is a frequency domain blind source separation algorithm which can theoretically obviate the permutation problem by preserving the dependency structure within each source vector whilst eliminating the dependency between the frequency bins of di erent source vectors. This thesis in particular focuses on improving the separation performance of IVA algorithms which are used for frequency domain acoustic source separation in real room environments. The source prior is crucial to the separation performance of the IVA algorithm as it is used to model the nonlinear dependency structure within the source vectors. An alternative multivariate Student's t distribution source prior is proposed for the IVA algorithm as it is known to be well suited for modelling certain speech signals due to its heavy tail nature. Therefore the nonlinear score function that is derived from the proposed Student's t source prior can better model the dependency structure within the frequency bins and thereby enhance the separation performance and the convergence speed of the IVA and the Fast version of the IVA (FastIVA) algorithms. 4 5 A novel energy driven mixed Student's t and the original super Gaussian source prior is also proposed for the IVA algorithms. As speech signals can be composed of many high and low amplitude data points, therefore the Student's t distribution in the mixed source prior can account for the high amplitude data points whereas the original su- per Gaussian distribution can cater for the other information in the speech signals. Furthermore, the weight of both distributions in the mixed source prior can be ad- justed according to the energy of the observed mixtures. Therefore the mixed source prior adapts the measured signals and further enhances the performance of the IVA algorithm. A common approach within the IVA algorithm is to model di erent speech sources with an identical source prior, however this does not account for the unique characteristics of each speech signal. Therefore dependency modelling for di erent speech sources can be improved by modelling di erent speech sources with di erent source priors. Hence, the Student's t mixture model (SMM) is introduced as a source prior for the IVA algorithm. This new source prior can adapt according to the nature of di erent speech signals and the parameters for the proposed SMM source prior are estimated by deriving an e cient expectation maximization (EM) algorithm. As a result of this study, a novel EM framework for the IVA algorithm with the SMM as a source prior is proposed which is capable of separating the sources in an e cient manner. The proposed algorithms are tested in various realistic reverberant room environments with real speech signals. All the experiments and evaluation demonstrate the robustness and enhanced separation performance of the proposed algorithms

    Independent component analysis and source analysis of auditory evoked potentials for assessment of cochlear implant users

    No full text
    Source analysis of the Auditory Evoked Potential (AEP) has been used before to evaluate the maturation of the auditory system in both adult and children; in the same way, this technique could be applied to ongoing EEG recordings, in response to acoustic specific frequency stimuli, from children with cochlear implants (CI). This is done in oder to objectively assess the performance of this electronic device and the maturation of the child?s hearing. However, these recordings are contaminated by an artifact produced by the normal operation of the CI; this artifact in particular makes the detection and analysis of AEPs much harder and generates errors in the source analysis process. The artifact can be spatially filtered using Independent Component Analysis (ICA); in this research, three different ICA algorithms were compared in order to establish the more suited algorithm to remove the CI artifact. Additionally, we show that pre-processing the EEG recording, using a temporal ICA algorithm, facilitates not only the identification of the AEP peaks but also the source analysis procedure. From results obtained in this research and limited dataset of CI vs normal recordings, it is possible to conclude that the AEPs source locations change from the inferior temporal areas in the first 2 years after implantation to the superior temporal area after three years using the CIs, close to the locations obtained in normal hearing children. It is intended that the results of this research are used as an objective technique for a general evaluation of the performance of children with CIs
    corecore