77 research outputs found

    A low-cost variational-Bayes technique for merging mixtures of probabilistic principal component analyzers

    Get PDF
    International audienceMixtures of probabilistic principal component analyzers (MPPCA) have shown effective for modeling high-dimensional data sets living on nonlinear manifolds. Briefly stated, they conduct mixture model estimation and dimensionality reduction through a single process. This paper makes two contributions: first, we disclose a Bayesian technique for estimating such mixture models. Then, assuming several MPPCA models are available, we address the problem of aggregating them into a single MPPCA model, which should be as parsimonious as possible. We disclose in detail how this can be achieved in a cost-effective way, without sampling nor access to data, but solely requiring mixture parameters. The proposed approach is based on a novel variational-Bayes scheme operating over model parameters. Numerous experimental results and discussion are provided

    Component-level aggregation of probabilistic PCA mixtures using variational-Bayes

    Get PDF
    Technical Report. This report of an extended version of our ICPR'2010 paper.This paper proposes a technique for aggregating mixtures of probabilistic principal component analyzers, which are a powerful probabilistic generative model for coping with a high-dimensional, non linear, data set. Aggregation is carried out through Bayesian estimation with a specific prior and an original variational scheme. We demonstrate how such models may be aggregated by accessing model parameters only, rather than original data, which can be advantageous for learning from distributed data sets. Experimental results illustrate the effectiveness of the proposal

    Learning Density Models via Structured Latent Variables

    Get PDF
    As one principal approach to machine learning and cognitive science, the probabilistic framework has been continuously developed both theoretically and practically. Learning a probabilistic model can be thought of as inferring plausible models to explain observed data. The learning process exploits random variables as building blocks which are held together with probabilistic relationships. The key idea behind latent variable models is to introduce latent variables as powerful attributes (setting/instrument) to reveal data structures and explore underlying features which can sensitively describe the real-world data. The classical research approaches engage shallow architectures, including latent feature models and finite mixtures of latent variable models. Within the classical frameworks, we should make certain assumptions about the form, structure, and distribution of the data. Since the shallow form may not describe the data structures sufficiently, new types of latent structures are promptly developed with the probabilistic frameworks. In this line, three main research interests are sparked, including infinite latent feature models, mixtures of the mixture models, and deep models. This dissertation summarises our work which is advancing the state-of-the-art in both classical and emerging areas. In the first block, a finite latent variable model with the parametric priors is presented for clustering and is further extended into a two-layer mixture model for discrimination. These models embed the dimensionality reduction in their learning tasks by designing a latent structure called common loading. Referred to as the joint learning models, these models attain more appropriate low-dimensional space that better matches the learning task. Meanwhile, the parameters are optimised simultaneously for both the low-dimensional space and model learning. However, these joint learning models must assume the fixed number of features as well as mixtures, which are normally tuned and searched using a trial and error approach. In general, the simpler inference can be performed by fixing more parameters. However, the fixed parameters will limit the flexibility of models, and false assumptions could even derive incorrect inferences from the data. Thus, a richer model is allowed for reducing the number of assumptions. Therefore an infinite tri-factorisation structure is proposed with non-parametric priors in the second block. This model can automatically determine an optimal number of features and leverage the interrelation between data and features. In the final block, we introduce how to promote the shallow latent structures model to deep structures to handle the richer structured data. This part includes two tasks: one is a layer-wise-based model, another is a deep autoencoder-based model. In a deep density model, the knowledge of cognitive agents can be modelled using more complex probability distributions. At the same time, inference and parameter computation procedure are straightforward by using a greedy layer-wise algorithm. The deep autoencoder-based joint learning model is trained in an end-to-end fashion which does not require pre-training of the autoencoder network. Also, it can be optimised by standard backpropagation without the inference of maximum a posteriori. Deep generative models are much more efficient than their shallow architectures for unsupervised and supervised density learning tasks. Furthermore, they can also be developed and used in various practical applications

    Unsupervised methods for speaker diarization

    Get PDF
    Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 93-95).Given a stream of unlabeled audio data, speaker diarization is the process of determining "who spoke when." We propose a novel approach to solving this problem by taking advantage of the effectiveness of factor analysis as a front-end for extracting speaker-specific features and exploiting the inherent variabilities in the data through the use of unsupervised methods. Upon initial evaluation, our system achieves state-of-the art results of 0.9% Diarization Error Rate in the diarization of two-speaker telephone conversations. The approach is then generalized to the problem of K-speaker diarization, for which we take measures to address issues of data sparsity and experiment with the use of the von Mises-Fisher distribution for clustering on a unit hypersphere. Our extended system performs competitively on the diarization of conversations involving two or more speakers. Finally, we present promising initial results obtained from applying variational inference on our front-end speaker representation to estimate the unknown number of speakers in a given utterance.by Stephen Shum.S.M

    Modeling of mutual dependencies

    Get PDF
    Data analysis means applying computational models to analyzing large collections of data, such as video signals, text collections, or measurements of gene activities in human cells. Unsupervised or exploratory data analysis refers to a subtask of data analysis, in which the goal is to find novel knowledge based on only the data. A central challenge in unsupervised data analysis is separating relevant and irrelevant information from each other. In this thesis, novel solutions to focusing on more relevant findings are presented. Measurement noise is one source of irrelevant information. If we have several measurements of the same objects, the noise can be suppressed by averaging over the measurements. Simple averaging is, however, only possible when the measurements share a common representation. In this thesis, we show how irrelevant information can be suppressed or ignored also in cases where the measurements come from different kinds of sensors or sources, such as video and audio recordings of the same scene. For combining the measurements, we use mutual dependencies between them. Measures of dependency, such as mutual information, characterize commonalities between two sets of measurements. Two measurements can hence be combined to reduce irrelevant variation by finding new representations for the objects so that the representations are maximally dependent. The combination is optimal, given the assumption that what is in common between the measurements is more relevant than information specific to any one of the sources. Several practical models for the task are introduced. In particular, novel Bayesian generative models, including a Bayesian version of the classical method of canonical correlation analysis, are given. Bayesian modeling is especially justified approach to learning from small data sets. Hence, generative models can be used to extract dependencies in a more reliable manner in, for example, medical applications, where obtaining a large number of samples is difficult. Also, novel non-Bayesian models are presented: Dependent component analysis finds linear projections which capture more general dependencies than earlier methods. Mutual dependencies can also be used for supervising traditional unsupervised learning methods. The learning metrics principle describes how a new distance metric focusing on relevant information can be derived based on the dependency between the measurements and a supervising signal. In this thesis, the approximations and optimization methods required for using the learning metrics principle are improved

    Nichtparametrische Bayesianische Modelle

    Get PDF
    Die Analyse praktischer Fragestellungen erfordert oft Modelle, die robust und zugleich flexibel genug sind um Abhängigkeiten in den Daten präzise darzustellen. Nichtparametrische Bayesianische Modelle erlauben die Konstruktion solcher Modelle und können daher für komplexe Aufgaben herangezogen werden. Unter nichtparametrischen Modellen sind dabei solche mit undendlich vielen Parametern zu verstehen. Die vorliegende Doktorarbeit untersucht zwei Varianten solcher Modelle: zum einen Latent Class Models mit unendlich vielen latenten Klassen, und andererseits Discrete Latent Feature Models mit unendlich vielen latenten Merkmalen. Für erstere verwenden wir Dirichlet Prozess-Mixturen (Dirichlet Process Mixtures, DPM) und für letztere den Indian Buffet-Prozess (IBP), eine Verallgemeinerung der DPM-Modelle. Eine analytische Behandlung der in dieser Arbeit diskutierten Modelle ist nicht möglich, was approximative Verfahren erforderlich macht. Bei solchen Verfahren kann die Verwendung geeigneter konjugierter a priori Verteilungen zu bedeutenden Vereinfachungen führen. Im Rahmen komplexer Modelle stellen solche Verteilungen allerdings oft eine zu starke Beschränkung dar. Ein Hauptthema dieser Arbeit sind daher Markov-Ketten Monte Carlo (MCMC) Verfahren zur approximativen Inferenz, die auch ohne konjugierte a priori Verteilung effizient einsetzbar sind. In Kapitel 2 definieren wir grundlegende Begriffe und erklären die in dieser Arbeit verwendete Notation. Der Dirichlet-Prozess (DP) wird in Kapitel 3 eingeführt, zusammen mit einigen unendlichen Mixturmodellen, welche diesen als a priori Verteilung verwenden. Zunächst geben wir einen Überblick über bisherige Arbeiten zur Definition eines Dirichlet-Prozesses und beschreiben die MCMC Techniken, die zur Behandlung von DPM-Modellen entwickelt wurden. DP Mixturen von Gaußverteilungen (Dirichlet process mixtures of Gaussians, DPMoG) wurden vielfach zur Dichteschätzung eingesetzt. Wir zeigen eine empirische Studie über die Abwägung zwischen analytischer Einfachheit und Modellierungsfähigkeit bei der Verwendung konjugierter a priori Verteilungen im DPMoG. Die Verwendung von bedingt konjugierten im Gegensatz zu konjugierten a priori Verteilungen macht weniger einschränkende Annahmen, was ohne eine deutliche Erhöhung der Rechenzeit zu besseren Schätzergebnissen führt. In einem Faktor-Analyse-Modell wird eine Gaußverteilung durch eine spärlich parametrisierte Kovarianzmatrix repräsentiert. Wir betrachten eine Mixtur solcher Modelle (mixture of factor analyzers, MFA), wobei wiederum die Anzahl der Klassen nicht beschränkt ist (Dirichlet Process MFA, DPMFA). Wir benutzen DPMFA, um Aktionspotentiale verschiedener Neuronen aus extrazellulären Ableitungen zu gruppieren (spike sorting). Kapitel 4 behandelt Indian Buffet Prozesse (IBP) und unendliche latente Merkmalsmodelle mit IBPs als a priori Verteilungen. Der IBP ist eine Verteilung über binäre Matrizen mit unendlich vielen Spalten. Wir beschreiben verschiedene Ansätze zur Konstruktion von IBPs und stellen einige neue MCMC Verfahren zur approximativen Inferenz in Modellen dar, die den IBP als a priori Verteilung benutzen. Im Gegensatz zur etablierten Methode des Gibbs Sampling haben unsere Verfahren den Vorteil, dass sie keine konjugierten a priori Verteilungen voraussetzen. Bei einem vorgestellten empirischen Vergleich liefern sie dennoch ebenso gute Ergebnisse wie Gibbs Sampling. Wir zeigen außerdem, dass ein nichtkonjugiertes IBP Modell dazu in der Lage ist, die latenten Variablen handgeschriebener Ziffern zu lernen. Ferner benutzen wir eine IBP a priori Verteilung, um eine nichtparametrische Variante des Elimination-by-aspects (EBA) Auswahlmodells zu formulieren. Eine vorgestellte Paar-Vergleichs-Studie demonstriert dessen präzise Vorhersagen des menschlichen Auswahlverhaltens.The analysis of real-world problems often requires robust and flexible models that can accurately represent the structure in the data. Nonparametric Bayesian priors allow the construction of such models which can be used for complex real-world data. Nonparametric models, despite their name, can be defined as models that have infinitely many parameters. This thesis is about two types of nonparametric models. The first type is the latent class models (i.e. a mixture model) with infinitely many classes, which we construct using Dirichlet process mixtures (DPM). The second is the discrete latent feature models with infinitely many features, for which we use the Indian buffet process (IBP), a generalization of the DPM. Analytical inference is not possible in the models discussed in this thesis. The use of conjugate priors can often make inference somewhat more tractable, but for a given model the family of conjugate priors may not always be rich enough. Methodologically this thesis will rely on Markov chain Monte Carlo (MCMC) techniques for inference, especially those which can be used in the absence of conjugacy. Chapter 2 introduces the basic terminology and notation used in the thesis. Chapter 3 presents the Dirichlet process (DP) and some infinite latent class models which use the DP as a prior. We first summarize different approaches for defining the DP, and describe several established MCMC algorithms for inference on the DPM models. The Dirichlet process mixtures of Gaussians (DPMoG) model has been extensively used for density estimation. We present an empirical comparison of conjugate and conditionally conjugate priors in the DPMoG, demonstrating that the latter can give better density estimates without significant additional computational cost. The mixtures of factor analyzers (MFA) model allows data to be modeled as a mixture of Gaussians with a reduced parametrization. We present the formulation of a nonparametric form of the MFA model, the Dirichlet process MFA (DPMFA).We utilize the DPMFA for clustering the action potentials of different neurons from extracellular recordings, a problem known as spike sorting. Chapter 4 presents the IBP and some infinite latent feature models which use the IBP as a prior. The IBP is a distribution over binary matrices with infinitely many columns. We describe different approaches for defining the distribution and present new MCMC techniques that can be used for inference on models which use it as a prior. Empirical results on a conjugate model are presented showing that the new methods perform as well as the established method of Gibbs sampling, but without the requirement for conjugacy. We demonstrate the performance of a non-conjugate IBP model by successfully learning the latent features of handwritten digits. Finally, we formulate a nonparametric version of the elimination-by-aspects (EBA) choice model using the IBP, and show that it can make accurate predictions about the people’s choice outcomes in a paired comparison task

    Acquisition and distribution of synergistic reactive control skills

    Get PDF
    Learning from demonstration is an afficient way to attain a new skill. In the context of autonomous robots, using a demonstration to teach a robot accelerates the robot learning process significantly. It helps to identify feasible solutions as starting points for future exploration or to avoid actions that lead to failure. But the acquisition of pertinent observationa is predicated on first segmenting the data into meaningful sequences. These segments form the basis for learning models capable of recognising future actions and reconstructing the motion to control a robot. Furthermore, learning algorithms for generative models are generally not tuned to produce stable trajectories and suffer from parameter redundancy for high degree of freedom robots This thesis addresses these issues by firstly investigating algorithms, based on dynamic programming and mixture models, for segmentation sensitivity and recognition accuracy on human motion capture data sets of repetitive and categorical motion classes. A stability analysis of the non-linear dynamical systems derived from the resultant mixture model representations aims to ensure that any trajectories converge to the intended target motion as observed in the demonstrations. Finally, these concepts are extended to humanoid robots by deploying a factor analyser for each mixture model component and coordinating the structure into a low dimensional representation of the demonstrated trajectories. This representation can be constructed as a correspondence map is learned between the demonstrator and robot for joint space actions. Applying these algorithms for demonstrating movement skills to robot is a further step towards autonomous incremental robot learning

    Improving Electricity Distribution System State Estimation with AMR-Based Load Profiles

    Get PDF
    The ongoing battle against global warming is rapidly increasing the amount of renewable power generation, and smart solutions are needed to integrate these new generation units into the existing distribution systems. Smart grids answer this call by introducing intelligent ways of controlling the network and active resources connected to it. However, before the network can be controlled, the automation system must know what the node voltages and line currents defining the network state are.Distribution system state estimation (DSSE) is needed to find the most likely state of the network when the number and accuracy of measurements are limited. Typically, two types of measurements are used in DSSE: real-time measurements and pseudomeasurements. In recent years, finding cost-efficient ways to improve the DSSE accuracy has been a popular subject in the literature. While others have focused on optimizing the type, amount and location of real-time measurements, the main hypothesis of this thesis is that it is possible to enhance the DSSE accuracy by using interval measurements collected with automatic meter reading (AMR) to improve the load profiles used as pseudo-measurements.The work done in this thesis can be divided into three stages. In the first stage, methods for creating new AMR-based load profiles are studied. AMR measurements from thousands of customers are used to test and compare the different options for improving the load profiling accuracy. Different clustering algorithms are tested and a novel twostage clustering method for load profiling is developed. In the second stage, a DSSE algorithm suited for smart grid environment is developed. Simulations and real-life demonstrations are conducted to verify the accuracy and applicability of the developed state estimator. In the third and final stage, the AMR-based load profiling and DSSE are combined. Matlab simulations with real AMR data and a real distribution network model are made and the developed load profiles are compared with other commonly used pseudo-measurements.The results indicate that clustering is an efficient way to improve the load profiling accuracy. With the help of clustering, both the customer classification and customer class load profiles can be updated simultaneously. Several of the tested clustering algorithms were suited for clustering electricity customers, but the best results were achieved with a modified k-means algorithm. Results from the third stage simulations supported the main hypothesis that the new AMR-based load profiles improve the DSSE accuracy.The results presented in this thesis should motivate distribution system operators and other actors in the field of electricity distribution to utilize AMR data and clustering algorithms in load profiling. It improves not only the DSSE accuracy but also many other functions that rely on load flow calculation and need accurate load estimates or forecasts

    Spatial Mass Spectral Data Analysis Using Factor and Correlation Models

    Get PDF
    ToF-SIMS is a powerful and information rich tool with high resolution and sensitivity compared to conventional mass spectrometers. Recently, its application has been extended to metabolic profiling analysis. However, there are only a few algorithms currently available to handle such output data from metabolite samples. Therefore some novel and innovative algorithms are undoubtedly in need to provide new insights into the application of ToF-SIMS for metabolic profiling analysis. In this thesis, we develop novel multivariate analysis techniques that can be used in processing ToF-SIMS data extracted from metabolite samples. Firstly, several traditional multivariate analysis methodologies that have previously been suggested for ToF-SIMS data analysis are discussed, including Clustering, Principal Components Analysis (PCA), Maximum Autocorrelation Factor (MAF), and Multivariate Curve Resolution (MCR). In particular, PCA is selected as an example to show the performance of traditional multivariate analysis techniques in dealing with large ToF-SIMS data extracted from metabolite samples. In order to provide more realistic and meaningful interpretation of the results, Non-negative Matrix Factorisation (NMF) is presented. This algorithm is combined with the Bayesian Framework to improve the reliability of the results and the convergence of the algorithm. However, the iterative process involved leads to considerable computational complexity in the estimation procedure. Another novel algorithm is also proposed which is an optimised MCR algorithm within alternating non-negativity constrained least squares (ANLS) framework. It provides a more simple approximation procedure by implementing a dimensionality reduction based on a basis function decomposition approach. The novel and main feature of the proposed algorithm is that it incorporates a spatially continuous representation of ToF-SIMS data which decouples the computational complexity of the estimation procedure from the image resolution. The proposed algorithm can be used as an efficient tool in processing ToF-SIMS data obtained from metabolite samples
    • …
    corecore