9 research outputs found

    ÎČ-Divergence Nonnegative Matrix Factorization on Biomedical Blind Source Separation

    Get PDF
    ÎČ-divergence has been studied for years, but it is yet to be discovered thoroughly. In this paper, we proposed the nonnegative matrix factorization (NMF) by using ÎČ-divergence in blind source separation (BSS) on biomedical field. The proposed idea is basically aimed at the separation of normal heart sound with normal lung sound. Temporal codes and spectral basis were modelled into a separated source, which is applied to the synthesis and real life data using multiplicative update rules. In the experiment, estimated and original source were compared to evaluate the performance of various source separation algorithms within a general framework, where the original sources and the noise that perturbed the mixture were included

    Two-Microphone Separation of Speech Mixtures

    Get PDF

    Monaural Speech Separation Based on Computational Auditory Scene Analysis and Objective Quality Assessment of Speech

    No full text

    Statistical single channel source separation

    Get PDF
    PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields in signal processing and has various significant applications. Unlike conventional SCSS methods which were based on linear instantaneous model, this research sets out to investigate the separation of single channel in two types of mixture which is nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear SCSS in instantaneous mixture, this research proposes a novel solution based on a two-stage process that consists of a Gaussianization transform which efficiently compensates for the nonlinear distortion follow by a maximum likelihood estimator to perform source separation. For linear SCSS in convolutive mixture, this research proposes new methods based on nonnegative matrix factorization which decomposes a mixture into two-dimensional convolution factor matrices that represent the spectral basis and temporal code. The proposed factorization considers the convolutive mixing in the decomposition by introducing frequency constrained parameters in the model. The method aims to separate the mixture into its constituent spectral-temporal source components while alleviating the effect of convolutive mixing. In addition, family of Itakura-Saito divergence has been developed as a cost function which brings the beneficial property of scale-invariant. Two new statistical techniques are proposed, namely, Expectation-Maximisation (EM) based algorithm framework which maximizes the log-likelihood of a mixed signals, and the maximum a posteriori approach which maximises the joint probability of a mixed signal using multiplicative update rules. To further improve this research work, a novel method that incorporates adaptive sparseness into the solution has been proposed to resolve the ambiguity and hence, improve the algorithm performance. The theoretical foundation of the proposed solutions has been rigorously developed and discussed in details. Results have concretely shown the effectiveness of all the proposed algorithms presented in this thesis in separating the mixed signals in single channel and have outperformed others available methods.Universiti Teknikal Malaysia Melaka(UTeM), Ministry of Higher Education of Malaysi

    Locating and extracting acoustic and neural signals

    Get PDF
    This dissertation presents innovate methodologies for locating, extracting, and separating multiple incoherent sound sources in three-dimensional (3D) space; and applications of the time reversal (TR) algorithm to pinpoint the hyper active neural activities inside the brain auditory structure that are correlated to the tinnitus pathology. Specifically, an acoustic modeling based method is developed for locating arbitrary and incoherent sound sources in 3D space in real time by using a minimal number of microphones, and the Point Source Separation (PSS) method is developed for extracting target signals from directly measured mixed signals. Combining these two approaches leads to a novel technology known as Blind Sources Localization and Separation (BSLS) that enables one to locate multiple incoherent sound signals in 3D space and separate original individual sources simultaneously, based on the directly measured mixed signals. These technologies have been validated through numerical simulations and experiments conducted in various non-ideal environments where there are non-negligible, unspecified sound reflections and reverberation as well as interferences from random background noise. Another innovation presented in this dissertation is concerned with applications of the TR algorithm to pinpoint the exact locations of hyper-active neurons in the brain auditory structure that are directly correlated to the tinnitus perception. Benchmark tests conducted on normal rats have confirmed the localization results provided by the TR algorithm. Results demonstrate that the spatial resolution of this source localization can be as high as the micrometer level. This high precision localization may lead to a paradigm shift in tinnitus diagnosis, which may in turn produce a more cost-effective treatment for tinnitus than any of the existing ones

    ContribuiçÔes ao reconhecimento automåtico de fala robusto

    Get PDF
    Tese (doutorado) - Universidade Federal de Santa Catarina, Centro TecnolĂłgico, Programa de PĂłs-Graduação em Engenharia de Automação e Sistemas, FlorianĂłpolis, 2010Reconhecimento AutomĂĄtico de Fala (RAF) Ă© uma ĂĄrea fascinante e complexa. Durante dĂ©cadas a demanda de pesquisas baseava-se em RAF para vocabulĂĄrio nĂŁo muito extenso, com tĂ©cnicas que precisavam de alto desempenho computacional para processar dados produzidos em ambientes silenciosos de laboratĂłrios. Dos meados da dĂ©cada de 80 para a frente, a tecnologia de processamento de voz avançou, com a utilização dos modelos ocultos de Markov (HMMs) e com o alto avanço de tĂ©cnicas de programação e de processamento computacionais, conseguindo taxas de acerto, em ambientes silenciosos, prĂłximas de 100%. Com a finalidade de colocar sistemas de RAF para funcionar na vida real, hĂĄ alguns anos pesquisas intensas foram e continuam sendo feitas sobre reconhecimento de fala robusto. Por isso, aplicaçÔes como DSR (Distributed Speech Recognition), entre outras, surgiram no mercado. Para obter uma performance similar ao do ouvido humano em ambientes ruidosos, no entanto, sistemas desse tipo ainda sĂŁo o foco de muitas pesquisas. Assim, este trabalho faz um estudo sobre sistemas de reconhecimento automĂĄtico de fala robusto, objetivando a anĂĄlise e comportamento de quatro tipos de ruĂ­dos (corte de metal, automĂłveis em frente a um tĂșnel, automĂłveis dentro do tĂșnel e multidĂŁo de crianças), gravados em ambientes diferentes, para a avaliação e construção de bases de dados ruidosas. Desta forma, sĂŁo desenvolvidas duas bases de dados, deixando como contribuição principal a metodologia para sua construção e o processo de anĂĄlise e avaliação dos dados envolvidos na sua construção. AlĂ©m disso, Ă© apresentado um desenvolvimento matemĂĄtico de um algoritmo que Ă© a solução numĂ©rica para uma função logĂ­stica de trĂȘs parĂąmetros de difĂ­cil solução, empregada para modelar o comportamento dos sistemas WI007 e WI008 usados aqui. Um mĂ©todo de ajuste inicial logĂ­stico (Mail) das curvas Pesq vs. TA para a avaliação do comportamento do sistema de RAF adotado, tambĂ©m Ă© uma das contribuiçÔes deste trabalho. Como um dos resultados da aplicação da metodologia proposta, obteve-se uma melhora significativa na taxa de acerto do WI007 para o ruĂ­do corte de metal que, em mĂ©dia, foi igual a 3,69%.Automatic Speech Recognition (ASR) is a fascinating and complex area. For decades the demand for research was based at ASR for not very extensive vocabulary, using techniques that need high performance computing to process the data produced in quiet laboratory environments. From the mid-80 forward, the speech processing technology has advanced, with the use of Hidden Markov Models (HMM) and the high advancement of programming techniques and computer processing, achieving recognition rates in quiet environments close to 100%. In order to put ASR systems to work in real life, several years of intensive research have been and are being made on robust speech recognition. Therefore, applications such as DSR (Distributed Speech Recognition), among others, appeared on the market. In order to achieve a performance similar to the human ear in noisy environments, however, such systems are still the focus of much research. This work makes a study on robust automatic speech recognition systems, aiming at the analysis and behavior of four types of noises (metal cutting, cars in front of a tunnel, cars inside the tunnel and a crowd of children), recorded in different environments for the evaluation and construction of noisy databases. Thus, two databases were developed, having as major contributions the methodology for their construction and the process of analysis and evaluation of data involved in its construction. Furthermore, we present a mathematical development of an algorithm which is the numerical solution to a logistic function of three parameters of difficult solution, used to model the behavior of WI007 and WI008 systems employed here. A method for initial logistic adjustment (Mail) for Pesq vs. TA curves to evaluate the behavior of the adopted ASR system is also one of the contributions of this work. As one result of the proposed methodology, we obtained a significant improvement in the recognition rate for WI007 for the metal cutting noise which, on average, was equal to 3.69%
    corecore