50 research outputs found

    Nichtparametrische Bayesianische Modelle

    Get PDF
    Die Analyse praktischer Fragestellungen erfordert oft Modelle, die robust und zugleich flexibel genug sind um AbhĂ€ngigkeiten in den Daten prĂ€zise darzustellen. Nichtparametrische Bayesianische Modelle erlauben die Konstruktion solcher Modelle und können daher fĂŒr komplexe Aufgaben herangezogen werden. Unter nichtparametrischen Modellen sind dabei solche mit undendlich vielen Parametern zu verstehen. Die vorliegende Doktorarbeit untersucht zwei Varianten solcher Modelle: zum einen Latent Class Models mit unendlich vielen latenten Klassen, und andererseits Discrete Latent Feature Models mit unendlich vielen latenten Merkmalen. FĂŒr erstere verwenden wir Dirichlet Prozess-Mixturen (Dirichlet Process Mixtures, DPM) und fĂŒr letztere den Indian Buffet-Prozess (IBP), eine Verallgemeinerung der DPM-Modelle. Eine analytische Behandlung der in dieser Arbeit diskutierten Modelle ist nicht möglich, was approximative Verfahren erforderlich macht. Bei solchen Verfahren kann die Verwendung geeigneter konjugierter a priori Verteilungen zu bedeutenden Vereinfachungen fĂŒhren. Im Rahmen komplexer Modelle stellen solche Verteilungen allerdings oft eine zu starke BeschrĂ€nkung dar. Ein Hauptthema dieser Arbeit sind daher Markov-Ketten Monte Carlo (MCMC) Verfahren zur approximativen Inferenz, die auch ohne konjugierte a priori Verteilung effizient einsetzbar sind. In Kapitel 2 definieren wir grundlegende Begriffe und erklĂ€ren die in dieser Arbeit verwendete Notation. Der Dirichlet-Prozess (DP) wird in Kapitel 3 eingefĂŒhrt, zusammen mit einigen unendlichen Mixturmodellen, welche diesen als a priori Verteilung verwenden. ZunĂ€chst geben wir einen Überblick ĂŒber bisherige Arbeiten zur Definition eines Dirichlet-Prozesses und beschreiben die MCMC Techniken, die zur Behandlung von DPM-Modellen entwickelt wurden. DP Mixturen von Gaußverteilungen (Dirichlet process mixtures of Gaussians, DPMoG) wurden vielfach zur DichteschĂ€tzung eingesetzt. Wir zeigen eine empirische Studie ĂŒber die AbwĂ€gung zwischen analytischer Einfachheit und ModellierungsfĂ€higkeit bei der Verwendung konjugierter a priori Verteilungen im DPMoG. Die Verwendung von bedingt konjugierten im Gegensatz zu konjugierten a priori Verteilungen macht weniger einschrĂ€nkende Annahmen, was ohne eine deutliche Erhöhung der Rechenzeit zu besseren SchĂ€tzergebnissen fĂŒhrt. In einem Faktor-Analyse-Modell wird eine Gaußverteilung durch eine spĂ€rlich parametrisierte Kovarianzmatrix reprĂ€sentiert. Wir betrachten eine Mixtur solcher Modelle (mixture of factor analyzers, MFA), wobei wiederum die Anzahl der Klassen nicht beschrĂ€nkt ist (Dirichlet Process MFA, DPMFA). Wir benutzen DPMFA, um Aktionspotentiale verschiedener Neuronen aus extrazellulĂ€ren Ableitungen zu gruppieren (spike sorting). Kapitel 4 behandelt Indian Buffet Prozesse (IBP) und unendliche latente Merkmalsmodelle mit IBPs als a priori Verteilungen. Der IBP ist eine Verteilung ĂŒber binĂ€re Matrizen mit unendlich vielen Spalten. Wir beschreiben verschiedene AnsĂ€tze zur Konstruktion von IBPs und stellen einige neue MCMC Verfahren zur approximativen Inferenz in Modellen dar, die den IBP als a priori Verteilung benutzen. Im Gegensatz zur etablierten Methode des Gibbs Sampling haben unsere Verfahren den Vorteil, dass sie keine konjugierten a priori Verteilungen voraussetzen. Bei einem vorgestellten empirischen Vergleich liefern sie dennoch ebenso gute Ergebnisse wie Gibbs Sampling. Wir zeigen außerdem, dass ein nichtkonjugiertes IBP Modell dazu in der Lage ist, die latenten Variablen handgeschriebener Ziffern zu lernen. Ferner benutzen wir eine IBP a priori Verteilung, um eine nichtparametrische Variante des Elimination-by-aspects (EBA) Auswahlmodells zu formulieren. Eine vorgestellte Paar-Vergleichs-Studie demonstriert dessen prĂ€zise Vorhersagen des menschlichen Auswahlverhaltens.The analysis of real-world problems often requires robust and flexible models that can accurately represent the structure in the data. Nonparametric Bayesian priors allow the construction of such models which can be used for complex real-world data. Nonparametric models, despite their name, can be defined as models that have infinitely many parameters. This thesis is about two types of nonparametric models. The first type is the latent class models (i.e. a mixture model) with infinitely many classes, which we construct using Dirichlet process mixtures (DPM). The second is the discrete latent feature models with infinitely many features, for which we use the Indian buffet process (IBP), a generalization of the DPM. Analytical inference is not possible in the models discussed in this thesis. The use of conjugate priors can often make inference somewhat more tractable, but for a given model the family of conjugate priors may not always be rich enough. Methodologically this thesis will rely on Markov chain Monte Carlo (MCMC) techniques for inference, especially those which can be used in the absence of conjugacy. Chapter 2 introduces the basic terminology and notation used in the thesis. Chapter 3 presents the Dirichlet process (DP) and some infinite latent class models which use the DP as a prior. We first summarize different approaches for defining the DP, and describe several established MCMC algorithms for inference on the DPM models. The Dirichlet process mixtures of Gaussians (DPMoG) model has been extensively used for density estimation. We present an empirical comparison of conjugate and conditionally conjugate priors in the DPMoG, demonstrating that the latter can give better density estimates without significant additional computational cost. The mixtures of factor analyzers (MFA) model allows data to be modeled as a mixture of Gaussians with a reduced parametrization. We present the formulation of a nonparametric form of the MFA model, the Dirichlet process MFA (DPMFA).We utilize the DPMFA for clustering the action potentials of different neurons from extracellular recordings, a problem known as spike sorting. Chapter 4 presents the IBP and some infinite latent feature models which use the IBP as a prior. The IBP is a distribution over binary matrices with infinitely many columns. We describe different approaches for defining the distribution and present new MCMC techniques that can be used for inference on models which use it as a prior. Empirical results on a conjugate model are presented showing that the new methods perform as well as the established method of Gibbs sampling, but without the requirement for conjugacy. We demonstrate the performance of a non-conjugate IBP model by successfully learning the latent features of handwritten digits. Finally, we formulate a nonparametric version of the elimination-by-aspects (EBA) choice model using the IBP, and show that it can make accurate predictions about the people’s choice outcomes in a paired comparison task

    Bayesian Inference for Genomic Data Analysis

    Get PDF
    High-throughput genomic data contain gazillion of information that are influenced by the complex biological processes in the cell. As such, appropriate mathematical modeling frameworks are required to understand the data and the data generating processes. This dissertation focuses on the formulation of mathematical models and the description of appropriate computational algorithms to obtain insights from genomic data. Specifically, characterization of intra-tumor heterogeneity is studied. Based on the total number of allele copies at the genomic locations in the tumor subclones, the problem is viewed from two perspectives: the presence or absence of copy-neutrality assumption. With the presence of copy-neutrality, it is assumed that the genome contains mutational variability and the three possible genotypes may be present at each genomic location. As such, the genotypes of all the genomic locations in the tumor subclones are modeled by a ternary matrix. In the second case, in addition to mutational variability, it is assumed that the genomic locations may be affected by structural variabilities such as copy number variation (CNV). Thus, the genotypes are modeled with a pair of (Q + 1)-ary matrices. Using the categorical Indian buffet process (cIBP), state-space modeling framework is employed in describing the two processes and the sequential Monte Carlo (SMC) methods for dynamic models are applied to perform inference on important model parameters. Moreover, the problem of estimating gene regulatory network (GRN) from measurement with missing values is presented. Specifically, gene expression time series data may contain missing values for entire expression values of a single point or some set of consecutive time points. However, complete data is often needed to make inference on the underlying GRN. Using the missing measurement, a dynamic stochastic model is used to describe the evolution of gene expression and point-based Gaussian approximation (PBGA) filters with one-step or two-step missing measurements are applied for the inference. Finally, the problem of deconvolving gene expression data from complex heterogeneous biological samples is examined, where the observed data are a mixture of different cell types. A statistical description of the problem is used and the SMC method for static models is applied to estimate the cell-type specific expressions and the cell type proportions in the heterogeneous samples

    Bayesian Nonparametric Approaches for Modelling Stochastic Temporal Events

    Full text link
    Modelling stochastic temporal events is a classic machine learning problem that has drawn enormous research attentions over recent decades. Traditional approaches heavily focused on the parametric models that pre-specify model complexity. Comprehensive model comparison and selection are necessary to prevent over-fitting and under-fitting problems. The recently developed Bayesian nonparametric learning framework provides an appealing alternative to traditional approaches. It can automatically learn the model complexity from data. In this thesis, I propose a set of Bayesian nonparametric approaches for stochastic temporal event modelling with the consideration of event similarity, interaction, occurrence time and emitted observation. Specifically, I tackle following three main challenges in the modelling. 1. Data sparsity. Data sparsity problem is common in many real-world temporal event modelling applications, e.g., water pipes failures prediction. A Bayesian nonparametric model that allows pipes with similar behaviour to share failure data is proposed to attain a more effective failure prediction. It is shown that flexible event clustering can help alleviate the data sparsity problem. The clustering process is fully data-driven and it does not require predefining the number of clusters. 2. Event interaction. Stochastic events can interact with each other over time. One event can cause or repel the occurrence of other events. An unexplored theoretical bridge is established between interaction point processes and distance dependent Chinese restaurant process. Hence an integrated model, namely infinite branching model, is developed to estimate point event intensity, interaction mechanism and branching structure simultaneously. 3. Event correlation. The stochastic temporal events are correlated not only between arrival times but also between observations. A novel unified Bayesian nonparametric model that generalizes Hidden Markov model and interaction point processes is constructed to exploit two types of underlying correlation in a well-integrated way rather than individually. The proposed model provides a comprehensive insight into the interaction mechanism and correlation between events. At last, a future vision of Bayesian nonparametric research for stochastic temporal events is highlighted from both application and modelling perspectives

    Learning with Limited Labeled Data in Biomedical Domain by Disentanglement and Semi-Supervised Learning

    Get PDF
    In this dissertation, we are interested in improving the generalization of deep neural networks for biomedical data (e.g., electrocardiogram signal, x-ray images, etc). Although deep neural networks have attained state-of-the-art performance and, thus, deployment across a variety of domains, similar performance in the clinical setting remains challenging due to its ineptness to generalize across unseen data (e.g., new patient cohort). We address this challenge of generalization in the deep neural network from two perspectives: 1) learning disentangled representations from the deep network, and 2) developing efficient semi-supervised learning (SSL) algorithms using the deep network. In the former, we are interested in designing specific architectures and objective functions to learn representations, where variations in the data are well separated, i.e., disentangled. In the latter, we are interested in designing regularizers that encourage the underlying neural function\u27s behavior toward a common inductive bias to avoid over-fitting the function to small labeled data. Our end goal is to improve the generalization of the deep network for the diagnostic model in both of these approaches. In disentangled representations, this translates to appropriately learning latent representations from the data, capturing the observed input\u27s underlying explanatory factors in an independent and interpretable way. With data\u27s expository factors well separated, such disentangled latent space can then be useful for a large variety of tasks and domains within data distribution even with a small amount of labeled data, thus improving generalization. In developing efficient semi-supervised algorithms, this translates to utilizing a large volume of the unlabelled dataset to assist the learning from the limited labeled dataset, commonly encountered situation in the biomedical domain. By drawing ideas from different areas within deep learning like representation learning (e.g., autoencoder), variational inference (e.g., variational autoencoder), Bayesian nonparametric (e.g., beta-Bernoulli process), learning theory (e.g., analytical learning theory), function smoothing (Lipschitz Smoothness), etc., we propose several leaning algorithms to improve generalization in the associated task. We test our algorithms on real-world clinical data and show that our approach yields significant improvement over existing methods. Moreover, we demonstrate the efficacy of the proposed models in the benchmark data and simulated data to understand different aspects of the proposed learning methods. We conclude by identifying some of the limitations of the proposed methods, areas of further improvement, and broader future directions for the successful adoption of AI models in the clinical environment
    corecore