9 research outputs found

    Robust and efficient inference and learning algorithms for generative models

    Get PDF
    Generative modelling is a popular paradigm in machine learning due to its natural ability to describe uncertainty in data and models and for its applications including data compression (Ho et al., 2020), missing data imputation (Valera et al., 2018), synthetic data generation (Lin et al., 2020), representation learning (Kingma and Welling, 2014), robust classification (Li et al., 2019b), and more. For generative models, the task of finding the distribution of unobserved variables conditioned on observed ones is referred to as inference. Finding the optimal model that makes the model distribution close to the data distribution according to some discrepancy measures is called learning. In practice, existing learning and inference methods can fall short on robustness and efficiency. A method that is more robust to its hyper-parameters or different types of data can be more easily adapted to various real-world applications. How efficient a method is in regard to the size and the dimensionality of data determines at what scale the method can be applied. This thesis presents four pieces of my original work that improves these properties in generative models. First, I introduce two novel Bayesian inference algorithms. One is called coupled multinomial Hamiltonian Monte Carlo (Xu et al., 2021a); it builds on Heng and Jacob (2019), which is a recent work in unbiased Markov chain Monte Carlo (MCMC) (Jacob et al., 2019b) and has been found to sensitive to hyper-parameters and less efficient compared to normal, biased MCMC. These issues are solved by establishing couplings to the widely-used multinomial Hamiltonian Monte Carlo, leading to a statistically more efficient and robust method. The other method is called roulette-based variational expectation (RAVE; Xu et al., 2019) that applies amortised inference to a model family called Bayesian non-parametric models, in which the number of parameters are allowed to grow unbounded as the data gets more complex. Unlike previous sampling-based methods that are slow or variational inference methods that rely on truncation, RAVE combines the advantages of both to achieve flexible inference that is also computational efficient. Second, I introduce two novel learning methods. One is called generative ratio-matching (Srivastava et al., 2019) which is a learning algorithm that makes deep generative models based on kernel methods applicable to high-dimensional data. The key innovation of this method is learning a projection of the data to a lower-dimensional space in which the density ratio is preserved such that learning can be done in the lowerdimensional space where kernel methods are effective. The other method is called Bayesian symbolic physics that combines Bayesian inference and symbolic regression in the context of naïve physics—the study of how humans understand and learn physics. Unlike classic generative models for which the structure of the generative process is predefined or deep generative models where the process is represented by data-hungry neural networks, Bayesian-symbolic generative processes are defined by functions over a hypothesis space specified by a context-free grammar. This formulation allows these models to incorporate domain knowledge in learning, which gives highly-improved sample efficiency. For all four pieces of work, I provide theoretical analyses and/or empirical results to validate that the algorithmic advances lead to improvements in robustness and efficiency for generative models. Lastly, I summarise my contributions to free and open-source software on generative modelling. This includes a set of Julia packages that I contributed and are currently used by the Turing probabilistic programming language (Ge et al., 2018). These packages, which are highly reusable components for building probabilistic programming languages, together form a probabilistic programming ecosystem in Julia. An important package that is primarily developed by me is called ADVANCEDHMC.JL (Xu et al., 2020), which provides robust and efficient implementations of HMC methods and has been adopted as the backend of Turing. Importantly, the design of this package allows an intuitive abstraction to construct HMC samplers similarly to how they are mathematically defined. The promise of these open-source packages is to make generative modelling techniques more accessible to domain experts from various backgrounds and to make relevant research more reproducible to help advance the field

    CATVI: conditional and adaptively truncated variational inference for hierarchical Bayesian nonparametric models

    Get PDF
    Current variational inference methods for hierarchical Bayesian nonparametric models can neither characterize the correlation struc- ture among latent variables due to the mean- eld setting, nor infer the true posterior dimension because of the universal trunca- tion. To overcome these limitations, we pro- pose the conditional and adaptively trun- cated variational inference method (CATVI) by maximizing the nonparametric evidence lower bound and integrating Monte Carlo into the variational inference framework. CATVI enjoys several advantages over tra- ditional methods, including a smaller diver- gence between variational and true posteri- ors, reduced risk of undertting or overt- ting, and improved prediction accuracy. Em- pirical studies on three large datasets re- veal that CATVI applied in Bayesian non- parametric topic models substantially out- performs competing models, providing lower perplexity and clearer topic-words clustering

    Scalable Probabilistic Model Selection for Network Representation Learning in Biological Network Inference

    Get PDF
    A biological system is a complex network of heterogeneous molecular entities and their interactions contributing to various biological characteristics of the system. Although the biological networks not only provide an elegant theoretical framework but also offer a mathematical foundation to analyze, understand, and learn from complex biological systems, the reconstruction of biological networks is an important and unsolved problem. Current biological networks are noisy, sparse and incomplete, limiting the ability to create a holistic view of the biological reconstructions and thus fail to provide a system-level understanding of the biological phenomena. Experimental identification of missing interactions is both time-consuming and expensive. Recent advancements in high-throughput data generation and significant improvement in computational power have led to novel computational methods to predict missing interactions. However, these methods still suffer from several unresolved challenges. It is challenging to extract information about interactions and incorporate that information into the computational model. Furthermore, the biological data are not only heterogeneous but also high-dimensional and sparse presenting the difficulty of modeling from indirect measurements. The heterogeneous nature and sparsity of biological data pose significant challenges to the design of deep neural network structures which use essentially either empirical or heuristic model selection methods. These unscalable methods heavily rely on expertise and experimentation, which is a time-consuming and error-prone process and are prone to overfitting. Furthermore, the complex deep networks tend to be poorly calibrated with high confidence on incorrect predictions. In this dissertation, we describe novel algorithms that address these challenges. In Part I, we design novel neural network structures to learn representation for biological entities and further expand the model to integrate heterogeneous biological data for biological interaction prediction. In part II, we develop a novel Bayesian model selection method to infer the most plausible network structures warranted by data. We demonstrate that our methods achieve the state-of-the-art performance on the tasks across various domains including interaction prediction. Experimental studies on various interaction networks show that our method makes accurate and calibrated predictions. Our novel probabilistic model selection approach enables the network structures to dynamically evolve to accommodate incrementally available data. In conclusion, we discuss the limitations and future directions for proposed works

    Bayesian inference for multi-level non-stationary Gaussian processes

    Get PDF
    The complexity of most real-world phenomena requires the use of flexible models that capture intricated features present in the data. Gaussian processes (GPs) have proven valuable tools for this purpose due to their non parametric and probabilistic nature. Nevertheless, the default approach when modelling with GPs is to assume stationarity. This assumption permits easier inference but can be restrictive when the correlation of the process is not constant across the input space. This thesis investigates a class of non-stationary priors that enhance flexibility while retaining interpretability. These priors assemble GPs through input-varying parameters in the covariance. Such hierarchical constructions result in high-dimensional correlated posteriors, where Bayesian inference becomes challenging and notably expensive due to the characteristic computational constrains of GPs. Altogether, this thesis provides novel approaches for scalable Bayesian inference in 2-level GP regression models. First, we use a sparse representation of the inverse non-stationary covariance to develop and compare three different Markov chain Monte Carlo (MCMC) samplers for two hyperpriors. To maintain scalability when extending the approach to multi-dimensional problems, we propose a non-stationary additive Gaussian process (AGP) model. The efficiency and accuracy of the methodology are demonstrated in simulated experiments and a computer emulation problem. Second, we derive a hybrid variational-MCMC approach that combines low-dimensional variational distributions with MCMC to avoid further distributional and independence restrictions on the posterior of interest. The resulting approximate posterior includes an intractable likelihood that when approximated with a small-order Gauss-Hermite quadrature results in poor predictive performance. In this case, an extension to higher-dimensional settings requires specific assumptions of the non-stationary covariance. Lastly, we propose a pseudo-marginal algorithm that uses a block-Poisson estimator to circumvent numerical integration in the variationally sparse model. This strategy demonstrates an improvement in predictive performance, can be computationally more efficient, and is generally applicable to other GP-based models with intractable likelihoods
    corecore