286 research outputs found

    Gibbs Max-margin Topic Models with Data Augmentation

    Full text link
    Max-margin learning is a powerful approach to building classifiers and structured output predictors. Recent work on max-margin supervised topic models has successfully integrated it with Bayesian topic models to discover discriminative latent semantic structures and make accurate predictions for unseen testing data. However, the resulting learning problems are usually hard to solve because of the non-smoothness of the margin loss. Existing approaches to building max-margin supervised topic models rely on an iterative procedure to solve multiple latent SVM subproblems with additional mean-field assumptions on the desired posterior distributions. This paper presents an alternative approach by defining a new max-margin loss. Namely, we present Gibbs max-margin supervised topic models, a latent variable Gibbs classifier to discover hidden topic representations for various tasks, including classification, regression and multi-task learning. Gibbs max-margin supervised topic models minimize an expected margin loss, which is an upper bound of the existing margin loss derived from an expected prediction rule. By introducing augmented variables and integrating out the Dirichlet variables analytically by conjugacy, we develop simple Gibbs sampling algorithms with no restricting assumptions and no need to solve SVM subproblems. Furthermore, each step of the "augment-and-collapse" Gibbs sampling algorithms has an analytical conditional distribution, from which samples can be easily drawn. Experimental results demonstrate significant improvements on time efficiency. The classification performance is also significantly improved over competitors on binary, multi-class and multi-label classification tasks.Comment: 35 page

    Bayesian nonparametric clusterings in relational and high-dimensional settings with applications in bioinformatics.

    Get PDF
    Recent advances in high throughput methodologies offer researchers the ability to understand complex systems via high dimensional and multi-relational data. One example is the realm of molecular biology where disparate data (such as gene sequence, gene expression, and interaction information) are available for various snapshots of biological systems. This type of high dimensional and multirelational data allows for unprecedented detailed analysis, but also presents challenges in accounting for all the variability. High dimensional data often has a multitude of underlying relationships, each represented by a separate clustering structure, where the number of structures is typically unknown a priori. To address the challenges faced by traditional clustering methods on high dimensional and multirelational data, we developed three feature selection and cross-clustering methods: 1) infinite relational model with feature selection (FIRM) which incorporates the rich information of multirelational data; 2) Bayesian Hierarchical Cross-Clustering (BHCC), a deterministic approximation to Cross Dirichlet Process mixture (CDPM) and to cross-clustering; and 3) randomized approximation (RBHCC), based on a truncated hierarchy. An extension of BHCC, Bayesian Congruence Measuring (BCM), is proposed to measure incongruence between genes and to identify sets of congruent loci with identical evolutionary histories. We adapt our BHCC algorithm to the inference of BCM, where the intended structure of each view (congruent loci) represents consistent evolutionary processes. We consider an application of FIRM on categorizing mRNA and microRNA. The model uses latent structures to encode the expression pattern and the gene ontology annotations. We also apply FIRM to recover the categories of ligands and proteins, and to predict unknown drug-target interactions, where latent categorization structure encodes drug-target interaction, chemical compound similarity, and amino acid sequence similarity. BHCC and RBHCC are shown to have improved predictive performance (both in terms of cluster membership and missing value prediction) compared to traditional clustering methods. Our results suggest that these novel approaches to integrating multi-relational information have a promising future in the biological sciences where incorporating data related to varying features is often regarded as a daunting task

    Extensions to the Latent Dirichlet Allocation Topic Model Using Flexible Priors

    Get PDF
    Intrinsically, topic models have always their likelihood functions fixed to multinomial distributions as they operate on count data instead of Gaussian data. As a result, their performances ultimately depend on the flexibility of the chosen prior distributions when following the Bayesian paradigm compared to classical approaches such as PLSA (probabilistic latent semantic analysis), unigrams and mixture of unigrams that do not use prior information. The standard LDA (latent Dirichlet allocation) topic model operates with symmetric Dirichlet distribution (as a conjugate prior) which has been found to carry some limitations due to its independent structure that tends to hinder performance for instance in topic correlation including positively correlated data processing. Compared to classical ML estimators, the use of priors ultimately presents another unique advantage of smoothing out the multinomials while enhancing predictive topic models. In this thesis, we propose a series of flexible priors such as generalized Dirichlet (GD) and Beta-Liouville (BL) for our topic models within the collapsed representation, leading to much improved CVB (collapsed variational Bayes) update equations compared to ones from the standard LDA. This is because the flexibility of these priors improves significantly the lower bounds in the corresponding CVB algorithms. We also show the robustness of our proposed CVB inferences when using simultaneously the BL and GD in hybrid generative-discriminative models where the generative stage produces good and heterogeneous topic features that are used in the discriminative stage by powerful classifiers such as SVMs (support vector machines) as we propose efficient probabilistic kernels to facilitate processing (classification) of documents based on topic signatures. Doing so, we implicitly cast topic modeling which is an unsupervised learning method into a supervised learning technique. Furthermore, due to the complexity of the CVB algorithm (as it requires second order Taylor expansions) in general, despite its flexibility, we propose a much simpler and tractable update equation using a MAP (maximum a posteriori) framework with the standard EM (expectation-maximization) algorithm. As most Bayesian posteriors are not tractable for complex models, we ultimately propose the MAP-LBLA (latent BL allocation) where we characterize the contributions of asymmetric BL priors over the symmetric Dirichlet (Dir). The proposed MAP technique importantly offers a point estimate (mode) with a much tractable solution. In the MAP, we show that point estimate could be easy to implement than full Bayesian analysis that integrates over the entire parameter space. The MAP implicitly exhibits some equivalent relationship with the CVB especially the zero order approximations CVB0 and its stochastic version SCVB0. The proposed method enhances performances in information retrieval in text document analysis. We show that parametric topic models (as they are finite dimensional methods) have a much smaller hypothesis space and they generally suffer from model selection. We therefore propose a Bayesian nonparametric (BNP) technique that uses the Hierarchical Dirichlet process (HDP) as conjugate prior to the document multinomial distributions where the asymmetric BL serves as a diffuse (probability) base measure that provides the global atoms (topics) that are shared among documents. The heterogeneity in the topic structure helps in providing an alternative to model selection because the nonparametric topic model (which is infinite dimensional with a much bigger hypothesis space) could now prune out irrelevant topics based on the associated probability masses to only retain the most relevant ones. We also show that for large scale applications, stochastic optimizations using natural gradients of the objective functions have demonstrated significant performances when we learn rapidly both data and parameters in online fashion (streaming). We use both predictive likelihood and perplexity as evaluation methods to assess the robustness of our proposed topic models as we ultimately refer to probability as a way to quantify uncertainty in our Bayesian framework. We improve object categorization in terms of inferences through the flexibility of our prior distributions in the collapsed space. We also improve information retrieval technique with the MAP and the HDP-LBLA topic models while extending the standard LDA. These two applications present the ultimate capability of enhancing a search engine based on topic models

    Graphical models for visual object recognition and tracking

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Includes bibliographical references (p. 277-301).We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.by Erik B. Sudderth.Ph.D

    Latent variable models for understanding user behavior in software applications

    Get PDF
    Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2018.Cataloged from PDF version of thesis.Includes bibliographical references (pages 147-157).Understanding user behavior in software applications is of significant interest to software developers and companies. By having a better understanding of the user needs and usage patterns, the developers can design a more efficient workflow, add new features, or even automate the user's workflow. In this thesis, I propose novel latent variable models to understand, predict and eventually automate the user interaction with a software application. I start by analyzing users' clicks using time series models; I introduce models and inference algorithms for time series segmentation which are scalable to large-scale user datasets. Next, using a conditional variational autoencoder and some related models, I introduce a framework for automating the user interaction with a software application. I focus on photo enhancement applications, but this framework can be applied to any domain where segmentation, prediction and personalization is valuable. Finally, by combining sequential Monte Carlo and variational inference, I propose a new inference scheme which has better convergence properties than other reasonable baselines.by Ardavan Saeedi.Ph. D

    A role for the developing lexicon in phonetic category acquisition

    Get PDF
    Infants segment words from fluent speech during the same period when they are learning phonetic categories, yet accounts of phonetic category acquisition typically ignore information about the words in which sounds appear. We use a Bayesian model to illustrate how feedback from segmented words might constrain phonetic category learning by providing information about which sounds occur together in words. Simulations demonstrate that word-level information can successfully disambiguate overlapping English vowel categories. Learning patterns in the model are shown to parallel human behavior from artificial language learning tasks. These findings point to a central role for the developing lexicon in phonetic category acquisition and provide a framework for incorporating top-down constraints into models of category learning

    Weakly Supervised Learning of Objects, Attributes and Their Associations

    Get PDF
    The final publication is available at Springer via http://dx.doi.org/10.1007/978-3-319-10605-2_31]”

    Doctor of Philosophy

    Get PDF
    dissertationLatent structures play a vital role in many data analysis tasks. By providing compact yet expressive representations, such structures can offer useful insights into the complex and high-dimensional datasets encountered in domains such as computational biology, computer vision, natural language processing, etc. Specifying the right complexity of these latent structures for a given problem is an important modeling decision. Instead of using models with an a priori fixed complexity, it is desirable to have models that can adapt their complexity as the data warrant. Nonparametric Bayesian models are motivated precisely based on this desideratum by offering a flexible modeling paradigm for data without limiting the model-complexity a priori. The flexibility comes from the model's ability to adjust its complexity adaptively with data. This dissertation is about nonparametric Bayesian learning of two specific types of latent structures: (1) low-dimensional latent features underlying high-dimensional observed data where the latent features could exhibit interdependencies, and (2) latent task structures that capture how a set of learning tasks relate with each other, a notion critical in the paradigm of Multitask Learning where the goal is to solve multiple learning tasks jointly in order to borrow information across similar tasks. Another focus of this dissertation is on designing efficient approximate inference algorithms for nonparametric Bayesian models. Specifically, for the nonparametric Bayesian latent feature model where the goal is to infer the binary-valued latent feature assignment matrix for a given set of observations, the dissertation proposes two approximate inference methods. The first one is a search-based algorithm to find the maximum-a-posteriori (MAP) solution for the latent feature assignment matrix. The second one is a sequential Monte-Carlo-based approximate inference algorithm that allows processing the data oneexample- at-a-time while being space-efficient in terms of the storage required to represent the posterior distribution of the latent feature assignment matrix

    Unsupervised Hybrid Feature Extraction Selection for High-Dimensional Non-Gaussian Data Clustering with Variational Inference

    Get PDF
    Clustering has been a subject of extensive research in data mining, pattern recognition, and other areas for several decades. The main goal is to assign samples, which are typically non-Gaussian and expressed as points in high-dimensional feature spaces, to one of a number of clusters. It is well known that in such high-dimensional settings, the existence of irrelevant features generally compromises modeling capabilities. In this paper, we propose a variational inference framework for unsupervised non-Gaussian feature selection, in the context of finite generalized Dirichlet (GD) mixture-based clustering. Under the proposed principled variational framework, we simultaneously estimate, in a closed form, all the involved parameters and determine the complexity (i.e., both model an feature selection) of the GD mixture. Extensive simulations using synthetic data along with an analysis of real-world data and human action videos demonstrate that our variational approach achieves better results than comparable techniques
    corecore