826 research outputs found

    Learning Density Models via Structured Latent Variables

    Get PDF
    As one principal approach to machine learning and cognitive science, the probabilistic framework has been continuously developed both theoretically and practically. Learning a probabilistic model can be thought of as inferring plausible models to explain observed data. The learning process exploits random variables as building blocks which are held together with probabilistic relationships. The key idea behind latent variable models is to introduce latent variables as powerful attributes (setting/instrument) to reveal data structures and explore underlying features which can sensitively describe the real-world data. The classical research approaches engage shallow architectures, including latent feature models and finite mixtures of latent variable models. Within the classical frameworks, we should make certain assumptions about the form, structure, and distribution of the data. Since the shallow form may not describe the data structures sufficiently, new types of latent structures are promptly developed with the probabilistic frameworks. In this line, three main research interests are sparked, including infinite latent feature models, mixtures of the mixture models, and deep models. This dissertation summarises our work which is advancing the state-of-the-art in both classical and emerging areas. In the first block, a finite latent variable model with the parametric priors is presented for clustering and is further extended into a two-layer mixture model for discrimination. These models embed the dimensionality reduction in their learning tasks by designing a latent structure called common loading. Referred to as the joint learning models, these models attain more appropriate low-dimensional space that better matches the learning task. Meanwhile, the parameters are optimised simultaneously for both the low-dimensional space and model learning. However, these joint learning models must assume the fixed number of features as well as mixtures, which are normally tuned and searched using a trial and error approach. In general, the simpler inference can be performed by fixing more parameters. However, the fixed parameters will limit the flexibility of models, and false assumptions could even derive incorrect inferences from the data. Thus, a richer model is allowed for reducing the number of assumptions. Therefore an infinite tri-factorisation structure is proposed with non-parametric priors in the second block. This model can automatically determine an optimal number of features and leverage the interrelation between data and features. In the final block, we introduce how to promote the shallow latent structures model to deep structures to handle the richer structured data. This part includes two tasks: one is a layer-wise-based model, another is a deep autoencoder-based model. In a deep density model, the knowledge of cognitive agents can be modelled using more complex probability distributions. At the same time, inference and parameter computation procedure are straightforward by using a greedy layer-wise algorithm. The deep autoencoder-based joint learning model is trained in an end-to-end fashion which does not require pre-training of the autoencoder network. Also, it can be optimised by standard backpropagation without the inference of maximum a posteriori. Deep generative models are much more efficient than their shallow architectures for unsupervised and supervised density learning tasks. Furthermore, they can also be developed and used in various practical applications

    Fully Automated Parameter Estimation for Mixtures of Factor Analyzers

    Get PDF
    Mixture models are a family of statistical models that can model datasets with underlying sub-population structures effectively. This thesis focuses on one particular mixture model, called the Mixtures of Factor Analyzers (MFA) model [Ghahramani et al., 1997], which is a multivariate clustering model more parsimonious than the well known Gaussian mixture model (GMM). The MFA model has two hyperparameters, g, the number of components, and q, the number of factors per component. When these are assumed to be known in advance, approximate maximum likelihood estimates for the remaining model parameters can be obtained using Expectation Maximisation (EM)-type algorithms [Dempster et al., 1977] [Ghahramani et al., 1997] [McLachlan and Peel, 2000] [Zhao and Yu, 2008]. This work reviews methods for fitting the MFA model in the more realistic case where its two hyperparameters are not known a priori. A systematic comparison of seven methods for fitting the MFA model when its hyperparameters are unknown is conducted. The methods are compared based on their ability to infer the two hyperparameters accurately, as well as general model fit, clustering accuracy and the length of time taken to fit the model. The results suggest that a naive grid search over both hyperparameters performs the best on all of the metrics except for the time taken to fit the models. The Infinite Mixtures of Infinite Factor Analyzers (IMIFA) algorithm [Murphy et al., 2020] also performs well on most of the metrics. However, like the naive search, IMIFA is also very computationally intensive. The Automatic Mixture of Factor Analyzers (AMFA) algorithm [Wang and Lin, 2020] is a viable alternative when available computation time is limited, as it often performs comparably to the na¨ıve search and IMIFA, but with greatly reduced computation times. To facilitate the comparison, the R package autoMFA is created, which implements five methods for the automated fitting of the MFA model and is available on the Comprehensive R Archive Network (CRAN). A limitation of the MFA model is its inability to deal with asymmetrical cluster shapes, which is a consequence of using multivariate Gaussian component densities. The Mixtures of Mean-Variance Mixture of Normal Distribution Factor Analyzers (MMVMNFA) family is proposed as a generalisation of the MFA model, which permits asymmetrical component densities. A new EM-type algorithm for parameter estimation of MMVMNFA models is developed. Based on its performance in the comparison, the AMFA algorithm is selected and generalised to the MMVMNFA family. Six specific instances of the MMVMNFA family are considered, and the steps for the EM-type algorithm are derived for each. The Julia package FactorMixtures is created, which contains implementations of each of these algorithms. The six instances are tested on two synthetic datasets and two real world datasets, where their superior ability to capture heavy-tailed data and data exhibiting multivariate skewness is demonstrated in comparison to the standard MFA model, which cannot effectively capture either of these properties.Thesis (Ph.D.) -- University of Adelaide, School of Mathematical Sciences, 202

    Constructivism Learning: A Learning Paradigm for Transparent Predictive Analytics

    Get PDF
    Aiming to achieve the learning capabilities possessed by intelligent beings, especially human, researchers in machine learning field have the long-standing tradition of bor- rowing ideas from human learning, such as reinforcement learning, active learning, and curriculum learning. Motivated by a philosophical theory called "constructivism", in this work, we propose a new machine learning paradigm, constructivism learning. The constructivism theory has had wide-ranging impact on various human learning theories about how human acquire knowledge. To adapt this human learning theory to the context of machine learning, we first studied how to improve leaning perfor- mance by exploring inductive bias or prior knowledge from multiple learning tasks with multiple data sources, that is multi-task multi-view learning, both in offline and lifelong setting. Then we formalized a Bayesian nonparametric approach using se- quential Dirichlet Process Mixture Models to support constructivism learning. To fur- ther exploit constructivism learning, we also developed a constructivism deep learning method utilizing Uniform Process Mixture Models

    Factor analysis for gene regulatory networks and transcription factor activity profiles

    Get PDF
    BACKGROUND: Most existing algorithms for the inference of the structure of gene regulatory networks from gene expression data assume that the activity levels of transcription factors (TFs) are proportional to their mRNA levels. This assumption is invalid for most biological systems. However, one might be able to reconstruct unobserved activity profiles of TFs from the expression profiles of target genes. A simple model is a two-layer network with unobserved TF variables in the first layer and observed gene expression variables in the second layer. TFs are connected to regulated genes by weighted edges. The weights, known as factor loadings, indicate the strength and direction of regulation. Of particular interest are methods that produce sparse networks, networks with few edges, since it is known that most genes are regulated by only a small number of TFs, and most TFs regulate only a small number of genes. RESULTS: In this paper, we explore the performance of five factor analysis algorithms, Bayesian as well as classical, on problems with biological context using both simulated and real data. Factor analysis (FA) models are used in order to describe a larger number of observed variables by a smaller number of unobserved variables, the factors, whereby all correlation between observed variables is explained by common factors. Bayesian FA methods allow one to infer sparse networks by enforcing sparsity through priors. In contrast, in the classical FA, matrix rotation methods are used to enforce sparsity and thus to increase the interpretability of the inferred factor loadings matrix. However, we also show that Bayesian FA models that do not impose sparsity through the priors can still be used for the reconstruction of a gene regulatory network if applied in conjunction with matrix rotation methods. Finally, we show the added advantage of merging the information derived from all algorithms in order to obtain a combined result. CONCLUSION: Most of the algorithms tested are successful in reconstructing the connectivity structure as well as the TF profiles. Moreover, we demonstrate that if the underlying network is sparse it is still possible to reconstruct hidden activity profiles of TFs to some degree without prior connectivity information

    Subspace Gaussian Mixture Models for Language Identification and Dysarthric Speech Intelligibility Assessment

    Get PDF
    En esta Tesis se ha investigado la aplicación de técnicas de modelado de subespacios de mezclas de Gaussianas en dos problemas relacionados con las tecnologías del habla, como son la identificación automática de idioma (LID, por sus siglas en inglés) y la evaluación automática de inteligibilidad en el habla de personas con disartria. Una de las técnicas más importantes estudiadas es el análisis factorial conjunto (JFA, por sus siglas en inglés). JFA es, en esencia, un modelo de mezclas de Gaussianas en el que la media de cada componente se expresa como una suma de factores de dimensión reducida, y donde cada factor representa una contribución diferente a la señal de audio. Esta factorización nos permite compensar nuestros modelos frente a contribuciones indeseadas presentes en la señal, como la información de canal. JFA se ha investigado como clasficador y como extractor de parámetros. En esta última aproximación se modela un solo factor que representa todas las contribuciones presentes en la señal. Los puntos en este subespacio se denominan i-Vectors. Así, un i-Vector es un vector de baja dimensión que representa una grabación de audio. Los i-Vectors han resultado ser muy útiles como vector de características para representar señales en diferentes problemas relacionados con el aprendizaje de máquinas. En relación al problema de LID, se han investigado dos sistemas diferentes de acuerdo al tipo de información extraída de la señal. En el primero, la señal se parametriza en vectores acústicos con información espectral a corto plazo. En este caso, observamos mejoras de hasta un 50% con el sistema basado en i-Vectors respecto al sistema que utilizaba JFA como clasificador. Se comprobó que el subespacio de canal del modelo JFA también contenía información del idioma, mientras que con los i-Vectors no se descarta ningún tipo de información, y además, son útiles para mitigar diferencias entre los datos de entrenamiento y de evaluación. En la fase de clasificación, los i-Vectors de cada idioma se modelaron con una distribución Gaussiana en la que la matriz de covarianza era común para todos. Este método es simple y rápido, y no requiere de ningún post-procesado de los i-Vectors. En el segundo sistema, se introdujo el uso de información prosódica y formántica en un sistema de LID basado en i-Vectors. La precisión de éste estaba por debajo de la del sistema acústico. Sin embargo, los dos sistemas son complementarios, y se obtuvo hasta un 20% de mejora con la fusión de los dos respecto al sistema acústico solo. Tras los buenos resultados obtenidos para LID, y dado que, teóricamente, los i-Vectors capturan toda la información presente en la señal, decidimos usarlos para la evaluar de manera automática la inteligibilidad en el habla de personas con disartria. Los logopedas están muy interesados en esta tecnología porque permitiría evaluar a sus pacientes de una manera objetiva y consistente. En este caso, los i-Vectors se obtuvieron a partir de información espectral a corto plazo de la señal, y la inteligibilidad se calculó a partir de los i-Vectors obtenidos para un conjunto de palabras dichas por el locutor evaluado. Comprobamos que los resultados eran mucho mejores si en el entrenamiento del sistema se incorporaban datos de la persona que iba a ser evaluada. No obstante, esta limitación podría aliviarse utilizando una mayor cantidad de datos para entrenar el sistema.In this Thesis, we investigated how to effciently apply subspace Gaussian mixture modeling techniques onto two speech technology problems, namely automatic spoken language identification (LID) and automatic intelligibility assessment of dysarthric speech. One of the most important of such techniques in this Thesis was joint factor analysis (JFA). JFA is essentially a Gaussian mixture model where the mean of the components is expressed as a sum of low-dimension factors that represent different contributions to the speech signal. This factorization makes it possible to compensate for undesired sources of variability, like the channel. JFA was investigated as final classiffer and as feature extractor. In the latter approach, a single subspace including all sources of variability is trained, and points in this subspace are known as i-Vectors. Thus, one i-Vector is defined as a low-dimension representation of a single utterance, and they are a very powerful feature for different machine learning problems. We have investigated two different LID systems according to the type of features extracted from speech. First, we extracted acoustic features representing short-time spectral information. In this case, we observed relative improvements with i-Vectors with respect to JFA of up to 50%. We realized that the channel subspace in a JFA model also contains language information whereas i-Vectors do not discard any language information, and moreover, they help to reduce mismatches between training and testing data. For classification, we modeled the i-Vectors of each language with a Gaussian distribution with covariance matrix shared among languages. This method is simple and fast, and it worked well without any post-processing. Second, we introduced the use of prosodic and formant information with the i-Vectors system. The performance was below the acoustic system but both were found to be complementary and we obtained up to a 20% relative improvement with the fusion with respect to the acoustic system alone. Given the success in LID and the fact that i-Vectors capture all the information that is present in the data, we decided to use i-Vectors for other tasks, specifically, the assessment of speech intelligibility in speakers with different types of dysarthria. Speech therapists are very interested in this technology because it would allow them to objectively and consistently rate the intelligibility of their patients. In this case, the input features were extracted from short-term spectral information, and the intelligibility was assessed from the i-Vectors calculated from a set of words uttered by the tested speaker. We found that the performance was clearly much better if we had available data for training of the person that would use the application. We think that this limitation could be relaxed if we had larger databases for training. However, the recording process is not easy for people with disabilities, and it is difficult to obtain large datasets of dysarthric speakers open to the research community. Finally, the same system architecture for intelligibility assessment based on i-Vectors was used for predicting the accuracy that an automatic speech recognizer (ASR) system would obtain with dysarthric speakers. The only difference between both was the ground truth label set used for training. Predicting the performance response of an ASR system would increase the confidence of speech therapists in these systems and would diminish health related costs. The results were not as satisfactory as in the previous case, probably because an ASR is a complex system whose accuracy can be very difficult to be predicted only with acoustic information. Nonetheless, we think that we opened a door to an interesting research direction for the two problems

    Identifiable and interpretable nonparametric factor analysis

    Full text link
    Factor models have been widely used to summarize the variability of high-dimensional data through a set of factors with much lower dimensionality. Gaussian linear factor models have been particularly popular due to their interpretability and ease of computation. However, in practice, data often violate the multivariate Gaussian assumption. To characterize higher-order dependence and nonlinearity, models that include factors as predictors in flexible multivariate regression are popular, with GP-LVMs using Gaussian process (GP) priors for the regression function and VAEs using deep neural networks. Unfortunately, such approaches lack identifiability and interpretability and tend to produce brittle and non-reproducible results. To address these problems by simplifying the nonparametric factor model while maintaining flexibility, we propose the NIFTY framework, which parsimoniously transforms uniform latent variables using one-dimensional nonlinear mappings and then applies a linear generative model. The induced multivariate distribution falls into a flexible class while maintaining simple computation and interpretation. We prove that this model is identifiable and empirically study NIFTY using simulated data, observing good performance in density estimation and data visualization. We then apply NIFTY to bird song data in an environmental monitoring application.Comment: 50 pages, 17 figure

    REPRESENTATION LEARNING WITH ADDITIONAL STRUCTURES

    Get PDF
    The ability to learn meaningful representations of complex, high-dimensional data like image and text for various downstream tasks has been the cornerstone of the modern deep learning success story. Most approaches that succeed in meaningful representation learning of the input data rely on prior knowledge of the underlying data structure to inject appropriate inductive biases into their frameworks. Prime examples of which range from the convolutional neural network (CNN) for images, to the recurrent neural network (RNN) for sequences, and to the recent trend of attention-based models (e.g. transformers) for incorporating relational information. However, most of the traditional approaches focus on a learning setup where there is a single input (and a single output if in a supervised setting). With the rapidly growing varieties of data being collected and the increasing complexity of the structures that underlie them, approaches that are able to take advantage of the additional data structures for better representation learning are needed. To this end, we introduce frameworks to learn better representations of complex data with additional structures in four arenas, where we gradually shift from supervised learning, to ``pseudo-supervised'' learning, and lastly to unsupervised learning. More specifically, we first propose a supervised approach that exploits relational-information among set elements for learning representations of set-structured data. We then propose a clustering approach that utilizes side-information, i.e. information that is related to the final clustering goal but not directly indicative of the clustering results (hence ``pseudo-supervised'' learning), for learning representations that are better for clustering. Next we introduce another clustering approach that leverages the structural assumption that data samples in each cluster form a trajectory. Lastly, we propose a general representation learning framework for learning interpretable representations of multimodal data.Doctor of Philosoph
    corecore