67 research outputs found

    Evaluations on underdetermined blind source separation in adverse environments using time-frequency masking

    Get PDF
    The successful implementation of speech processing systems in the real world depends on its ability to handle adverse acoustic conditions with undesirable factors such as room reverberation and background noise. In this study, an extension to the established multiple sensors degenerate unmixing estimation technique (MENUET) algorithm for blind source separation is proposed based on the fuzzy c-means clustering to yield improvements in separation ability for underdetermined situations using a nonlinear microphone array. However, rather than test the blind source separation ability solely on reverberant conditions, this paper extends this to include a variety of simulated and real-world noisy environments. Results reported encouraging separation ability and improved perceptual quality of the separated sources for such adverse conditions. Not only does this establish this proposed methodology as a credible improvement to the system, but also implies further applicability in areas such as noise suppression in adverse acoustic environments

    Robust variational Bayesian clustering for underdetermined speech separation

    Get PDF
    The main focus of this thesis is the enhancement of the statistical framework employed for underdetermined T-F masking blind separation of speech. While humans are capable of extracting a speech signal of interest in the presence of other interference and noise; actual speech recognition systems and hearing aids cannot match this psychoacoustic ability. They perform well in noise and reverberant free environments but suffer in realistic environments. Time-frequency masking algorithms based on computational auditory scene analysis attempt to separate multiple sound sources from only two reverberant stereo mixtures. They essentially rely on the sparsity that binaural cues exhibit in the time-frequency domain to generate masks which extract individual sources from their corresponding spectrogram points to solve the problem of underdetermined convolutive speech separation. Statistically, this can be interpreted as a classical clustering problem. Due to analytical simplicity, a finite mixture of Gaussian distributions is commonly used in T-F masking algorithms for modelling interaural cues. Such a model is however sensitive to outliers, therefore, a robust probabilistic model based on the Student's t-distribution is first proposed to improve the robustness of the statistical framework. This heavy tailed distribution, as compared to the Gaussian distribution, can potentially better capture outlier values and thereby lead to more accurate probabilistic masks for source separation. This non-Gaussian approach is applied to the state-of the-art MESSL algorithm and comparative studies are undertaken to confirm the improved separation quality. A Bayesian clustering framework that can better model uncertainties in reverberant environments is then exploited to replace the conventional expectation-maximization (EM) algorithm within a maximum likelihood estimation (MLE) framework. A variational Bayesian (VB) approach is then applied to the MESSL algorithm to cluster interaural phase differences thereby avoiding the drawbacks of MLE; specifically the probable presence of singularities and experimental results confirm an improvement in the separation performance. Finally, the joint modelling of the interaural phase and level differences and the integration of their non-Gaussian modelling within a variational Bayesian framework, is proposed. This approach combines the advantages of the robust estimation provided by the Student's t-distribution and the robust clustering inherent in the Bayesian approach. In other words, this general framework avoids the difficulties associated with MLE and makes use of the heavy tailed Student's t-distribution to improve the estimation of the soft probabilistic masks at various reverberation times particularly for sources in close proximity. Through an extensive set of simulation studies which compares the proposed approach with other T-F masking algorithms under different scenarios, a significant improvement in terms of objective and subjective performance measures is achieved

    Probabilistic Modeling Paradigms for Audio Source Separation

    Get PDF
    This is the author's final version of the article, first published as E. Vincent, M. G. Jafari, S. A. Abdallah, M. D. Plumbley, M. E. Davies. Probabilistic Modeling Paradigms for Audio Source Separation. In W. Wang (Ed), Machine Audition: Principles, Algorithms and Systems. Chapter 7, pp. 162-185. IGI Global, 2011. ISBN 978-1-61520-919-4. DOI: 10.4018/978-1-61520-919-4.ch007file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04file: VincentJafariAbdallahPD11-probabilistic.pdf:v\VincentJafariAbdallahPD11-probabilistic.pdf:PDF owner: markp timestamp: 2011.02.04Most sound scenes result from the superposition of several sources, which can be separately perceived and analyzed by human listeners. Source separation aims to provide machine listeners with similar skills by extracting the sounds of individual sources from a given scene. Existing separation systems operate either by emulating the human auditory system or by inferring the parameters of probabilistic sound models. In this chapter, the authors focus on the latter approach and provide a joint overview of established and recent models, including independent component analysis, local time-frequency models and spectral template-based models. They show that most models are instances of one of the following two general paradigms: linear modeling or variance modeling. They compare the merits of either paradigm and report objective performance figures. They also,conclude by discussing promising combinations of probabilistic priors and inference algorithms that could form the basis of future state-of-the-art systems

    Application of sound source separation methods to advanced spatial audio systems

    Full text link
    This thesis is related to the field of Sound Source Separation (SSS). It addresses the development and evaluation of these techniques for their application in the resynthesis of high-realism sound scenes by means of Wave Field Synthesis (WFS). Because the vast majority of audio recordings are preserved in twochannel stereo format, special up-converters are required to use advanced spatial audio reproduction formats, such as WFS. This is due to the fact that WFS needs the original source signals to be available, in order to accurately synthesize the acoustic field inside an extended listening area. Thus, an object-based mixing is required. Source separation problems in digital signal processing are those in which several signals have been mixed together and the objective is to find out what the original signals were. Therefore, SSS algorithms can be applied to existing two-channel mixtures to extract the different objects that compose the stereo scene. Unfortunately, most stereo mixtures are underdetermined, i.e., there are more sound sources than audio channels. This condition makes the SSS problem especially difficult and stronger assumptions have to be taken, often related to the sparsity of the sources under some signal transformation. This thesis is focused on the application of SSS techniques to the spatial sound reproduction field. As a result, its contributions can be categorized within these two areas. First, two underdetermined SSS methods are proposed to deal efficiently with the separation of stereo sound mixtures. These techniques are based on a multi-level thresholding segmentation approach, which enables to perform a fast and unsupervised separation of sound sources in the time-frequency domain. Although both techniques rely on the same clustering type, the features considered by each of them are related to different localization cues that enable to perform separation of either instantaneous or real mixtures.Additionally, two post-processing techniques aimed at improving the isolation of the separated sources are proposed. The performance achieved by several SSS methods in the resynthesis of WFS sound scenes is afterwards evaluated by means of listening tests, paying special attention to the change observed in the perceived spatial attributes. Although the estimated sources are distorted versions of the original ones, the masking effects involved in their spatial remixing make artifacts less perceptible, which improves the overall assessed quality. Finally, some novel developments related to the application of time-frequency processing to source localization and enhanced sound reproduction are presented.Cobos Serrano, M. (2009). Application of sound source separation methods to advanced spatial audio systems [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/8969Palanci

    On The Positive Definiteness of Polarity Coincidence Correlation Coefficient Matrix

    Full text link
    Polarity coincidence correlator (PCC), when used to estimate the covariance matrix on an element-by-element basis, may not yield a positive semi-definite (PSD) estimate. Devlin et al. [1], claimed that element-wise PCC is not guaranteed to be PSD in dimensions p>3 for real signals. However, no justification or proof was available on this issue. In this letter, it is proved that for real signals with p<=3 and for complex signals with p<=2, a PSD estimate is guaranteed. Counterexamples are presented for higher dimensions which yield invalid covariance estimates.Comment: IEEE Signal Processing Letters, Volume 15, pp. 73-76, 200

    Blind source separation via independent and sparse component analysis with application to temporomandibular disorder

    Get PDF
    Blind source separation (BSS) addresses the problem of separating multi channel signals observed by generally spatially separated sensors into their constituent underlying sources. The passage of these sources through an unknown mixing medium results in these observed multichannel signals. This study focuses on BSS, with special emphasis on its application to the temporomandibular joint disorder (TMD). TMD refers to all medical problems related to the temporomandibular joint (TMJ), which holds the lower jaw (mandible) and the temporal bone (skull). The overall objective of the work is to extract the two TMJ sound sources generated by the two TMJs, from the bilateral recordings obtained from the auditory canals, so as to aid the clinician in diagnosis and planning treatment policies. Firstly, the concept of 'variable tap length' is adopted in convolutive blind source separation. This relatively new concept has attracted attention in the field of adaptive signal processing, notably the least mean square (LMS) algorithm, but has not yet been introduced in the context of blind signal separation. The flexibility of the tap length of the proposed approach allows for the optimum tap length to be found, thereby mitigating computational complexity or catering for fractional delays arising in source separation. Secondly, a novel fixed point BSS algorithm based on Ferrante's affine transformation is proposed. Ferrante's affine transformation provides the freedom to select the eigenvalues of the Jacobian matrix of the fixed point function and thereby improves the convergence properties of the fixed point iteration. Simulation studies demonstrate the improved convergence of the proposed approach compared to the well-known fixed point FastICA algorithm. Thirdly, the underdetermined blind source separation problem using a filtering approach is addressed. An extension of the FastICA algorithm is devised which exploits the disparity in the kurtoses of the underlying sources to estimate the mixing matrix and thereafter achieves source recovery by employing the i-norm algorithm. Additionally, it will be shown that FastICA can also be utilised to extract the sources. Furthermore, it is illustrated how this scenario is particularly suitable for the separation of TMJ sounds. Finally, estimation of fractional delays between the mixtures of the TMJ sources is proposed as a means for TMJ separation. The estimation of fractional delays is shown to simplify the source separation to a case of in stantaneous BSS. Then, the estimated delay allows for an alignment of the TMJ mixtures, thereby overcoming a spacing constraint imposed by a well- known BSS technique, notably the DUET algorithm. The delay found from the TMJ bilateral recordings corroborates with the range reported in the literature. Furthermore, TMJ source localisation is also addressed as an aid to the dental specialist.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Blind source separation via independent and sparse component analysis with application to temporomandibular disorder

    Get PDF
    Blind source separation (BSS) addresses the problem of separating multi channel signals observed by generally spatially separated sensors into their constituent underlying sources. The passage of these sources through an unknown mixing medium results in these observed multichannel signals. This study focuses on BSS, with special emphasis on its application to the temporomandibular joint disorder (TMD). TMD refers to all medical problems related to the temporomandibular joint (TMJ), which holds the lower jaw (mandible) and the temporal bone (skull). The overall objective of the work is to extract the two TMJ sound sources generated by the two TMJs, from the bilateral recordings obtained from the auditory canals, so as to aid the clinician in diagnosis and planning treatment policies. Firstly, the concept of 'variable tap length' is adopted in convolutive blind source separation. This relatively new concept has attracted attention in the field of adaptive signal processing, notably the least mean square (LMS) algorithm, but has not yet been introduced in the context of blind signal separation. The flexibility of the tap length of the proposed approach allows for the optimum tap length to be found, thereby mitigating computational complexity or catering for fractional delays arising in source separation. Secondly, a novel fixed point BSS algorithm based on Ferrante's affine transformation is proposed. Ferrante's affine transformation provides the freedom to select the eigenvalues of the Jacobian matrix of the fixed point function and thereby improves the convergence properties of the fixed point iteration. Simulation studies demonstrate the improved convergence of the proposed approach compared to the well-known fixed point FastICA algorithm. Thirdly, the underdetermined blind source separation problem using a filtering approach is addressed. An extension of the FastICA algorithm is devised which exploits the disparity in the kurtoses of the underlying sources to estimate the mixing matrix and thereafter achieves source recovery by employing the i-norm algorithm. Additionally, it will be shown that FastICA can also be utilised to extract the sources. Furthermore, it is illustrated how this scenario is particularly suitable for the separation of TMJ sounds. Finally, estimation of fractional delays between the mixtures of the TMJ sources is proposed as a means for TMJ separation. The estimation of fractional delays is shown to simplify the source separation to a case of in stantaneous BSS. Then, the estimated delay allows for an alignment of the TMJ mixtures, thereby overcoming a spacing constraint imposed by a well- known BSS technique, notably the DUET algorithm. The delay found from the TMJ bilateral recordings corroborates with the range reported in the literature. Furthermore, TMJ source localisation is also addressed as an aid to the dental specialist.EThOS - Electronic Theses Online ServiceGBUnited Kingdo

    Source Separation for Hearing Aid Applications

    Get PDF
    • …
    corecore