17 research outputs found

    Statistical single channel source separation

    Get PDF
    PhD ThesisSingle channel source separation (SCSS) principally is one of the challenging fields in signal processing and has various significant applications. Unlike conventional SCSS methods which were based on linear instantaneous model, this research sets out to investigate the separation of single channel in two types of mixture which is nonlinear instantaneous mixture and linear convolutive mixture. For the nonlinear SCSS in instantaneous mixture, this research proposes a novel solution based on a two-stage process that consists of a Gaussianization transform which efficiently compensates for the nonlinear distortion follow by a maximum likelihood estimator to perform source separation. For linear SCSS in convolutive mixture, this research proposes new methods based on nonnegative matrix factorization which decomposes a mixture into two-dimensional convolution factor matrices that represent the spectral basis and temporal code. The proposed factorization considers the convolutive mixing in the decomposition by introducing frequency constrained parameters in the model. The method aims to separate the mixture into its constituent spectral-temporal source components while alleviating the effect of convolutive mixing. In addition, family of Itakura-Saito divergence has been developed as a cost function which brings the beneficial property of scale-invariant. Two new statistical techniques are proposed, namely, Expectation-Maximisation (EM) based algorithm framework which maximizes the log-likelihood of a mixed signals, and the maximum a posteriori approach which maximises the joint probability of a mixed signal using multiplicative update rules. To further improve this research work, a novel method that incorporates adaptive sparseness into the solution has been proposed to resolve the ambiguity and hence, improve the algorithm performance. The theoretical foundation of the proposed solutions has been rigorously developed and discussed in details. Results have concretely shown the effectiveness of all the proposed algorithms presented in this thesis in separating the mixed signals in single channel and have outperformed others available methods.Universiti Teknikal Malaysia Melaka(UTeM), Ministry of Higher Education of Malaysi

    Proceedings of the second "international Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST'14)

    Get PDF
    The implicit objective of the biennial "international - Traveling Workshop on Interactions between Sparse models and Technology" (iTWIST) is to foster collaboration between international scientific teams by disseminating ideas through both specific oral/poster presentations and free discussions. For its second edition, the iTWIST workshop took place in the medieval and picturesque town of Namur in Belgium, from Wednesday August 27th till Friday August 29th, 2014. The workshop was conveniently located in "The Arsenal" building within walking distance of both hotels and town center. iTWIST'14 has gathered about 70 international participants and has featured 9 invited talks, 10 oral presentations, and 14 posters on the following themes, all related to the theory, application and generalization of the "sparsity paradigm": Sparsity-driven data sensing and processing; Union of low dimensional subspaces; Beyond linear and convex inverse problem; Matrix/manifold/graph sensing/processing; Blind inverse problems and dictionary learning; Sparsity and computational neuroscience; Information theory, geometry and randomness; Complexity/accuracy tradeoffs in numerical methods; Sparsity? What's next?; Sparse machine learning and inference.Comment: 69 pages, 24 extended abstracts, iTWIST'14 website: http://sites.google.com/site/itwist1

    Robust speech recognition with spectrogram factorisation

    Get PDF
    Communication by speech is intrinsic for humans. Since the breakthrough of mobile devices and wireless communication, digital transmission of speech has become ubiquitous. Similarly distribution and storage of audio and video data has increased rapidly. However, despite being technically capable to record and process audio signals, only a fraction of digital systems and services are actually able to work with spoken input, that is, to operate on the lexical content of speech. One persistent obstacle for practical deployment of automatic speech recognition systems is inadequate robustness against noise and other interferences, which regularly corrupt signals recorded in real-world environments. Speech and diverse noises are both complex signals, which are not trivially separable. Despite decades of research and a multitude of different approaches, the problem has not been solved to a sufficient extent. Especially the mathematically ill-posed problem of separating multiple sources from a single-channel input requires advanced models and algorithms to be solvable. One promising path is using a composite model of long-context atoms to represent a mixture of non-stationary sources based on their spectro-temporal behaviour. Algorithms derived from the family of non-negative matrix factorisations have been applied to such problems to separate and recognise individual sources like speech. This thesis describes a set of tools developed for non-negative modelling of audio spectrograms, especially involving speech and real-world noise sources. An overview is provided to the complete framework starting from model and feature definitions, advancing to factorisation algorithms, and finally describing different routes for separation, enhancement, and recognition tasks. Current issues and their potential solutions are discussed both theoretically and from a practical point of view. The included publications describe factorisation-based recognition systems, which have been evaluated on publicly available speech corpora in order to determine the efficiency of various separation and recognition algorithms. Several variants and system combinations that have been proposed in literature are also discussed. The work covers a broad span of factorisation-based system components, which together aim at providing a practically viable solution to robust processing and recognition of speech in everyday situations

    Proceedings of the Detection and Classification of Acoustic Scenes and Events 2017 Workshop (DCASE2017)

    Get PDF

    Dissimilarity-based multiple instance classification and dictionary learning for bioacoustic signal recognition

    Get PDF
    In this thesis, two promising and actively researched fields from pattern recognition (PR) and digital signal processing (DSP) are studied, adapted and applied for the automated recognition of bioacoustic signals: (i) learning from weakly-labeled data, and (ii) dictionary-based decomposition. The document begins with an overview of the current methods and techniques applied for the automated recognition of bioacoustic signals, and an analysis of the impact of this technology at global and local scales. This is followed by a detailed description of my research on studying two approaches from the above-mentioned fields, multiple instance learning (MIL) and dictionary learning (DL), as solutions to particular challenges in bioacoustic data analysis. The most relevant contributions and findings of this thesis are the following ones: 1) the proposal of an unsupervised recording segmentation method of audio birdsong recordings that improves species classification with the benefit of an easier implementation since no manual handling of recordings is required; 2) the confirmation that, in the analyzed audio datasets, appropriate dissimilarity measures are those which capture most of the overall differences between bags, such as the modified Hausdorff distance and the mean minimum distance; 3) the adoption of dissimilarity adaptation techniques for the enhancement of dissimilarity-based multiple instance classification, along with the potential further enhancement of the classification performance by building dissimilarity spaces and increasing training set sizes; 4) the proposal of a framework for solving MIL problems by using the one nearest neighbor (1-NN) classifier; 5) a novel convolutive DL method for learning a representative dictionary from a collection of multiple-bird audio recordings; 6) such a DL method is successfully applied to spectrogram denoising and species classification; and, 7) an efficient online version of the DL method that outperforms other state-of-the-art batch and online methods, in both, computational cost and quality of the discovered patternsResumen : En esta tesis se estudian, adaptan y aplican dos prometedoras y activas áreas del reconocimiento de patrones (PR) y procesamiento digital de señales (DSP): (i) aprendizaje débilmente supervisado y (ii) descomposiciones basadas en diccionarios. Inicialmente se hace una revisión de los métodos y técnicas que actualmente se aplican en tareas de reconocimiento automatizado de señales bioacústicas y se describe el impacto de esta tecnología a escalas nacional y global. Posteriormente, la investigación se enfoca en el estudio de dos técnicas de las áreas antes mencionadas, aprendizaje multi-instancia (MIL) y aprendizaje de diccionarios (DL), como soluciones a retos particulares del análisis de datos bioacústicos. Las contribuciones y hallazgos ms relevantes de esta tesis son los siguientes: 1) se propone un método de segmentacin de grabaciones de audio que mejora la clasificación automatizada de especies, el cual es fácil de implementar ya que no necesita información supervisada de entrenamiento; 2) se confirma que, en los conjuntos de datos analizados, las medidas de disimilitudes que capturan las diferencias globales entre bolsas funcionan apropiadamente, tales como la distancia modificada de Hausdorff y la distancia media de los mínimos; 3) la adopción de técnicas de adaptación de disimilitudes para mejorar la clasificación multi-instancia, junto con el incremento potencial del desempeño por medio de la construcción de espacios de disimilitudes y el aumento del tamaño de los conjuntos de entrenamiento; 4) se presenta un esquema para la solución de problemas MIL por medio del clasificador del vecino ms cercano (1-NN); 5) se propone un método novedoso de DL, basado en convoluciones, para el aprendizaje automatizado de un diccionario representativo a partir de un conjunto de grabaciones de audio de múltiples vocalizaciones de aves; 6) dicho mtodo DL se utiliza exitosamente como técnica de reducción de ruido en espectrogramas y clasificación de grabaciones bioacústicas; y 7) un método DL, de procesamiento en línea, que supera otros métodos del estado del arte en costo computacional y calidad de los patrones descubiertosDoctorad

    Object-based Modeling of Audio for Coding and Source Separation

    Get PDF
    This thesis studies several data decomposition algorithms for obtaining an object-based representation of an audio signal. The estimation of the representation parameters are coupled with audio-specific criteria, such as the spectral redundancy, sparsity, perceptual relevance and spatial position of sounds. The objective is to obtain an audio signal representation that is composed of meaningful entities called audio objects that reflect the properties of real-world sound objects and events. The estimation of the object-based model is based on magnitude spectrogram redundancy using non-negative matrix factorization with extensions to multichannel and complex-valued data. The benefits of working with object-based audio representations over the conventional time-frequency bin-wise processing are studied. The two main applications of the object-based audio representations proposed in this thesis are spatial audio coding and sound source separation from multichannel microphone array recordings. In the proposed spatial audio coding algorithm, the audio objects are estimated from the multichannel magnitude spectrogram. The audio objects are used for recovering the content of each original channel from a single downmixed signal, using time-frequency filtering. The perceptual relevance of modeling the audio signal is considered in the estimation of the parameters of the object-based model, and the sparsity of the model is utilized in encoding its parameters. Additionally, a quantization of the model parameters is proposed that reflects the perceptual relevance of each quantized element. The proposed object-based spatial audio coding algorithm is evaluated via listening tests and comparing the overall perceptual quality to conventional time-frequency block-wise methods at the same bitrates. The proposed approach is found to produce comparable coding efficiency while providing additional functionality via the object-based coding domain representation, such as the blind separation of the mixture of sound sources in the encoded channels. For the sound source separation from multichannel audio recorded by a microphone array, a method combining an object-based magnitude model and spatial covariance matrix estimation is considered. A direction of arrival-based model for the spatial covariance matrices of the sound sources is proposed. Unlike the conventional approaches, the estimation of the parameters of the proposed spatial covariance matrix model ensures a spatially coherent solution for the spatial parameterization of the sound sources. The separation quality is measured with objective criteria and the proposed method is shown to improve over the state-of-the-art sound source separation methods, with recordings done using a small microphone array

    Analyse de séries temporelles d’images à moyenne résolution spatiale : reconstruction de profils de LAI, démélangeage : application pour le suivi de la végétation sur des images MODIS

    Get PDF
    This PhD dissertation is concerned with time series analysis for medium spatial resolution (MSR) remote sensing images. The main advantage of MSR data is their high temporal rate which allows to monitor land use. However, two main problems arise with such data. First, because of cloud coverage and bad acquisition conditions, the resulting time series are often corrupted and not directly exploitable. Secondly, pixels in medium spatial resolution images are often “mixed” in the sense that the spectral response is a combination of the response of “pure” elements.These two problems are addressed in this PhD. First, we propose a data assimilation technique able to recover consistent time series of Leaf Area Index from corrupted MODIS sequences. To this end, a plant growth model, namely GreenLab, is used as a dynamical constraint. Second, we propose a new and efficient unmixing technique for time series. It is in particular based on the use of “elastic” kernels able to properly compare time series shifted in time or of various lengths.Experimental results are shown both on synthetic and real data and demonstrate the efficiency of the proposed methodologies.Cette thèse s’intéresse à l’analyse de séries temporelles d’images satellites à moyenne résolution spatiale. L’intérêt principal de telles données est leur haute répétitivité qui autorise des analyses de l’usage des sols. Cependant, deux problèmes principaux subsistent avec de telles données. En premier lieu, en raison de la couverture nuageuse, des mauvaises conditions d’acquisition, ..., ces données sont souvent très bruitées. Deuxièmement, les pixels associés à la moyenne résolution spatiale sont souvent “mixtes” dans la mesure où leur réponse spectrale est une combinaison de la réponse de plusieurs éléments “purs”. Ces deux problèmes sont abordés dans cette thèse. Premièrement, nous proposons une technique d’assimilation de données capable de recouvrer des séries temporelles cohérentes de LAI (Leaf Area Index) à partir de séquences d’images MODIS bruitées. Pour cela, le modèle de croissance de plantes GreenLab estutilisé. En second lieu, nous proposons une technique originale de démélangeage, qui s’appuie notamment sur des noyaux “élastiques” capables de gérer les spécificités des séries temporelles (séries de taille différentes, décalées dans le temps, ...)Les résultats expérimentaux, sur des données synthétiques et réelles, montrent de bonnes performances des méthodologies proposées

    Speech Recognition

    Get PDF
    Chapters in the first part of the book cover all the essential speech processing techniques for building robust, automatic speech recognition systems: the representation for speech signals and the methods for speech-features extraction, acoustic and language modeling, efficient algorithms for searching the hypothesis space, and multimodal approaches to speech recognition. The last part of the book is devoted to other speech processing applications that can use the information from automatic speech recognition for speaker identification and tracking, for prosody modeling in emotion-detection systems and in other speech processing applications that are able to operate in real-world environments, like mobile communication services and smart homes
    corecore