180 research outputs found

    Source Separation for Hearing Aid Applications

    Get PDF

    Efficient Multiband Algorithms for Blind Source Separation

    Get PDF
    The problem of blind separation refers to recovering original signals, called source signals, from the mixed signals, called observation signals, in a reverberant environment. The mixture is a function of a sequence of original speech signals mixed in a reverberant room. The objective is to separate mixed signals to obtain the original signals without degradation and without prior information of the features of the sources. The strategy used to achieve this objective is to use multiple bands that work at a lower rate, have less computational cost and a quicker convergence than the conventional scheme. Our motivation is the competitive results of unequal-passbands scheme applications, in terms of the convergence speed. The objective of this research is to improve unequal-passbands schemes by improving the speed of convergence and reducing the computational cost. The first proposed work is a novel maximally decimated unequal-passbands scheme.This scheme uses multiple bands that make it work at a reduced sampling rate, and low computational cost. An adaptation approach is derived with an adaptation step that improved the convergence speed. The performance of the proposed scheme was measured in different ways. First, the mean square errors of various bands are measured and the results are compared to a maximally decimated equal-passbands scheme, which is currently the best performing method. The results show that the proposed scheme has a faster convergence rate than the maximally decimated equal-passbands scheme. Second, when the scheme is tested for white and coloured inputs using a low number of bands, it does not yield good results; but when the number of bands is increased, the speed of convergence is enhanced. Third, the scheme is tested for quick changes. It is shown that the performance of the proposed scheme is similar to that of the equal-passbands scheme. Fourth, the scheme is also tested in a stationary state. The experimental results confirm the theoretical work. For more challenging scenarios, an unequal-passbands scheme with over-sampled decimation is proposed; the greater number of bands, the more efficient the separation. The results are compared to the currently best performing method. Second, an experimental comparison is made between the proposed multiband scheme and the conventional scheme. The results show that the convergence speed and the signal-to-interference ratio of the proposed scheme are higher than that of the conventional scheme, and the computation cost is lower than that of the conventional scheme

    Online source separation in reverberant environments exploiting known speaker locations

    Get PDF
    This thesis concerns blind source separation techniques using second order statistics and higher order statistics for reverberant environments. A focus of the thesis is algorithmic simplicity with a view to the algorithms being implemented in their online forms. The main challenge of blind source separation applications is to handle reverberant acoustic environments; a further complication is changes in the acoustic environment such as when human speakers physically move. A novel time-domain method which utilises a pair of finite impulse response filters is proposed. The method of principle angles is defined which exploits a singular value decomposition for their design. The pair of filters are implemented within a generalised sidelobe canceller structure, thus the method can be considered as a beamforming method which cancels one source. An adaptive filtering stage is then employed to recover the remaining source, by exploiting the output of the beamforming stage as a noise reference. A common approach to blind source separation is to use methods that use higher order statistics such as independent component analysis. When dealing with realistic convolutive audio and speech mixtures, processing in the frequency domain at each frequency bin is required. As a result this introduces the permutation problem, inherent in independent component analysis, across the frequency bins. Independent vector analysis directly addresses this issue by modeling the dependencies between frequency bins, namely making use of a source vector prior. An alternative source prior for real-time (online) natural gradient independent vector analysis is proposed. A Student's t probability density function is known to be more suited for speech sources, due to its heavier tails, and is incorporated into a real-time version of natural gradient independent vector analysis. The final algorithm is realised as a real-time embedded application on a floating point Texas Instruments digital signal processor platform. Moving sources, along with reverberant environments, cause significant problems in realistic source separation systems as mixing filters become time variant. A method which employs the pair of cancellation filters, is proposed to cancel one source coupled with an online natural gradient independent vector analysis technique to improve average separation performance in the context of step-wise moving sources. This addresses `dips' in performance when sources move. Results show the average convergence time of the performance parameters is improved. Online methods introduced in thesis are tested using impulse responses measured in reverberant environments, demonstrating their robustness and are shown to perform better than established methods in a variety of situations

    Blind separation of more sources than sensors in convolutive mixtures

    Get PDF

    Perceptually motivated blind source separation of convolutive audio mixtures

    Get PDF

    Informed algorithms for sound source separation in enclosed reverberant environments

    Get PDF
    While humans can separate a sound of interest amidst a cacophony of contending sounds in an echoic environment, machine-based methods lag behind in solving this task. This thesis thus aims at improving performance of audio separation algorithms when they are informed i.e. have access to source location information. These locations are assumed to be known a priori in this work, for example by video processing. Initially, a multi-microphone array based method combined with binary time-frequency masking is proposed. A robust least squares frequency invariant data independent beamformer designed with the location information is utilized to estimate the sources. To further enhance the estimated sources, binary time-frequency masking based post-processing is used but cepstral domain smoothing is required to mitigate musical noise. To tackle the under-determined case and further improve separation performance at higher reverberation times, a two-microphone based method which is inspired by human auditory processing and generates soft time-frequency masks is described. In this approach interaural level difference, interaural phase difference and mixing vectors are probabilistically modeled in the time-frequency domain and the model parameters are learned through the expectation-maximization (EM) algorithm. A direction vector is estimated for each source, using the location information, which is used as the mean parameter of the mixing vector model. Soft time-frequency masks are used to reconstruct the sources. A spatial covariance model is then integrated into the probabilistic model framework that encodes the spatial characteristics of the enclosure and further improves the separation performance in challenging scenarios i.e. when sources are in close proximity and when the level of reverberation is high. Finally, new dereverberation based pre-processing is proposed based on the cascade of three dereverberation stages where each enhances the twomicrophone reverberant mixture. The dereverberation stages are based on amplitude spectral subtraction, where the late reverberation is estimated and suppressed. The combination of such dereverberation based pre-processing and use of soft mask separation yields the best separation performance. All methods are evaluated with real and synthetic mixtures formed for example from speech signals from the TIMIT database and measured room impulse responses

    Exploiting the bimodality of speech in the cocktail party problem

    Get PDF
    The cocktail party problem is one of following a conversation in a crowded room where there are many competing sound sources, such as the voices of other speakers or music. To address this problem using computers, digital signal processing solutions commonly use blind source separation (BSS) which aims to separate all the original sources (voices) from the mixture simultaneously. Traditionally, BSS methods have relied on information derived from the mixture of sources to separate the mixture into its constituent elements. However, the human auditory system is well adapted to handle the cocktail party scenario, using both auditory and visual information to follow (or hold) a conversation in a such an environment. This thesis focuses on using visual information of the speakers in a cocktail party like scenario to aid in improving the performance of BSS. There are several useful applications of such technology, for example: a pre-processing step for a speech recognition system, teleconferencing or security surveillance. The visual information used in this thesis is derived from the speaker's mouth region, as it is the most visible component of speech production. Initial research presented in this thesis considers a joint statistical model of audio and visual features, which is used to assist in control ling the convergence behaviour of a BSS algorithm. The results of using the statistical models are compared to using the raw audio information alone and it is shown that the inclusion of visual information greatly improves its convergence behaviour. Further research focuses on using the speaker's mouth region to identify periods of time when the speaker is silent through the development of a visual voice activity detector (V-VAD) (i.e. voice activity detection using visual information alone). This information can be used in many different ways to simplify the BSS process. To this end, two novel V-VADs were developed and tested within a BSS framework, which result in significantly improved intelligibility of the separated source associated with the V-VAD output. Thus the research presented in this thesis confirms the viability of using visual information to improve solutions to the cocktail party problem.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Exploiting the bimodality of speech in the cocktail party problem

    Get PDF
    The cocktail party problem is one of following a conversation in a crowded room where there are many competing sound sources, such as the voices of other speakers or music. To address this problem using computers, digital signal processing solutions commonly use blind source separation (BSS) which aims to separate all the original sources (voices) from the mixture simultaneously. Traditionally, BSS methods have relied on information derived from the mixture of sources to separate the mixture into its constituent elements. However, the human auditory system is well adapted to handle the cocktail party scenario, using both auditory and visual information to follow (or hold) a conversation in a such an environment. This thesis focuses on using visual information of the speakers in a cocktail party like scenario to aid in improving the performance of BSS. There are several useful applications of such technology, for example: a pre-processing step for a speech recognition system, teleconferencing or security surveillance. The visual information used in this thesis is derived from the speaker's mouth region, as it is the most visible component of speech production. Initial research presented in this thesis considers a joint statistical model of audio and visual features, which is used to assist in control ling the convergence behaviour of a BSS algorithm. The results of using the statistical models are compared to using the raw audio information alone and it is shown that the inclusion of visual information greatly improves its convergence behaviour. Further research focuses on using the speaker's mouth region to identify periods of time when the speaker is silent through the development of a visual voice activity detector (V-VAD) (i.e. voice activity detection using visual information alone). This information can be used in many different ways to simplify the BSS process. To this end, two novel V-VADs were developed and tested within a BSS framework, which result in significantly improved intelligibility of the separated source associated with the V-VAD output. Thus the research presented in this thesis confirms the viability of using visual information to improve solutions to the cocktail party problem.EThOS - Electronic Theses Online ServiceGBUnited Kingdo
    • …
    corecore