547 research outputs found

    Decontamination of Mutual Contamination Models

    Full text link
    Many machine learning problems can be characterized by mutual contamination models. In these problems, one observes several random samples from different convex combinations of a set of unknown base distributions and the goal is to infer these base distributions. This paper considers the general setting where the base distributions are defined on arbitrary probability spaces. We examine three popular machine learning problems that arise in this general setting: multiclass classification with label noise, demixing of mixed membership models, and classification with partial labels. In each case, we give sufficient conditions for identifiability and present algorithms for the infinite and finite sample settings, with associated performance guarantees.Comment: Published in JMLR. Subsumes arXiv:1602.0623

    Learning the Structure and Parameters of Large-Population Graphical Games from Behavioral Data

    Full text link
    We consider learning, from strictly behavioral data, the structure and parameters of linear influence games (LIGs), a class of parametric graphical games introduced by Irfan and Ortiz (2014). LIGs facilitate causal strategic inference (CSI): Making inferences from causal interventions on stable behavior in strategic settings. Applications include the identification of the most influential individuals in large (social) networks. Such tasks can also support policy-making analysis. Motivated by the computational work on LIGs, we cast the learning problem as maximum-likelihood estimation (MLE) of a generative model defined by pure-strategy Nash equilibria (PSNE). Our simple formulation uncovers the fundamental interplay between goodness-of-fit and model complexity: good models capture equilibrium behavior within the data while controlling the true number of equilibria, including those unobserved. We provide a generalization bound establishing the sample complexity for MLE in our framework. We propose several algorithms including convex loss minimization (CLM) and sigmoidal approximations. We prove that the number of exact PSNE in LIGs is small, with high probability; thus, CLM is sound. We illustrate our approach on synthetic data and real-world U.S. congressional voting records. We briefly discuss our learning framework's generality and potential applicability to general graphical games.Comment: Journal of Machine Learning Research. (accepted, pending publication.) Last conference version: submitted March 30, 2012 to UAI 2012. First conference version: entitled, Learning Influence Games, initially submitted on June 1, 2010 to NIPS 201

    Learning Topic Models and Latent Bayesian Networks Under Expansion Constraints

    Full text link
    Unsupervised estimation of latent variable models is a fundamental problem central to numerous applications of machine learning and statistics. This work presents a principled approach for estimating broad classes of such models, including probabilistic topic models and latent linear Bayesian networks, using only second-order observed moments. The sufficient conditions for identifiability of these models are primarily based on weak expansion constraints on the topic-word matrix, for topic models, and on the directed acyclic graph, for Bayesian networks. Because no assumptions are made on the distribution among the latent variables, the approach can handle arbitrary correlations among the topics or latent factors. In addition, a tractable learning method via 1\ell_1 optimization is proposed and studied in numerical experiments.Comment: 38 pages, 6 figures, 2 tables, applications in topic models and Bayesian networks are studied. Simulation section is adde

    Nonparametric Estimation of Multi-View Latent Variable Models

    Full text link
    Spectral methods have greatly advanced the estimation of latent variable models, generating a sequence of novel and efficient algorithms with strong theoretical guarantees. However, current spectral algorithms are largely restricted to mixtures of discrete or Gaussian distributions. In this paper, we propose a kernel method for learning multi-view latent variable models, allowing each mixture component to be nonparametric. The key idea of the method is to embed the joint distribution of a multi-view latent variable into a reproducing kernel Hilbert space, and then the latent parameters are recovered using a robust tensor power method. We establish that the sample complexity for the proposed method is quadratic in the number of latent components and is a low order polynomial in the other relevant parameters. Thus, our non-parametric tensor approach to learning latent variable models enjoys good sample and computational efficiencies. Moreover, the non-parametric tensor power method compares favorably to EM algorithm and other existing spectral algorithms in our experiments

    Learning mixed membership models with a separable latent structure: theory, provably efficient algorithms, and applications

    Full text link
    In a wide spectrum of problems in science and engineering that includes hyperspectral imaging, gene expression analysis, and machine learning tasks such as topic modeling, the observed data is high-dimensional and can be modeled as arising from a data-specific probabilistic mixture of a small collection of latent factors. Being able to successfully learn the latent factors from the observed data is important for efficient data representation, inference, and prediction. Popular approaches such as variational Bayesian and MCMC methods exhibit good empirical performance on some real-world datasets, but make heavy use of approximations and heuristics for dealing with the highly non-convex and computationally intractable optimization objectives that accompany them. As a consequence, consistency or efficiency guarantees for these algorithms are rather weak. This thesis develops a suite of algorithms with provable polynomial statistical and computational efficiency guarantees for learning a wide class of high-dimensional Mixed Membership Latent Variable Models (MMLVMs). Our approach is based on a natural separability property of the shared latent factors that is known to be either exactly or approximately satisfied by the estimates produced by variational Bayesian and MCMC methods. Latent factors are called separable when each factor contains a novel part that is predominantly unique to that factor. For a broad class of problems, we establish that separability is not only an algorithmically convenient structural condition, but is in fact an inevitable consequence of a having a relatively small number of latent factors in a high-dimensional observation space. The key insight underlying our algorithms is the identification of novel parts of each latent factor as extreme points of certain convex polytopes in a suitable representation space. We show that this can be done efficiently through appropriately defined random projections in the representation space. We establish statistical and computational efficiency bounds that are both polynomial in all the model parameters. Furthermore, the proposed random-projections-based algorithm turns out to be naturally amenable to a low-communication-cost distributed implementation which is attractive for modern web-scale distributed data mining applications. We explore in detail two distinct classes of MMLVMs in this thesis: learning topic models for text documents based on their empirical word frequencies and learning mixed membership ranking models based on pairwise comparison data. For each problem, we demonstrate that separability is inevitable when the data dimension scales up and then establish consistency and efficiency guarantees for identifying all novel parts and estimating the latent factors. As a by-product of this analysis, we obtain the first asymptotic consistency and polynomial sample and computational complexity results for learning permutation-mixture and Mallows-mixture models for rankings based on pairwise comparison data. We demonstrate empirically that the performance of our approach is competitive with the current state-of-the-art on a number of real-world datasets

    Robust Subspace Learning: Robust PCA, Robust Subspace Tracking, and Robust Subspace Recovery

    Full text link
    PCA is one of the most widely used dimension reduction techniques. A related easier problem is "subspace learning" or "subspace estimation". Given relatively clean data, both are easily solved via singular value decomposition (SVD). The problem of subspace learning or PCA in the presence of outliers is called robust subspace learning or robust PCA (RPCA). For long data sequences, if one tries to use a single lower dimensional subspace to represent the data, the required subspace dimension may end up being quite large. For such data, a better model is to assume that it lies in a low-dimensional subspace that can change over time, albeit gradually. The problem of tracking such data (and the subspaces) while being robust to outliers is called robust subspace tracking (RST). This article provides a magazine-style overview of the entire field of robust subspace learning and tracking. In particular solutions for three problems are discussed in detail: RPCA via sparse+low-rank matrix decomposition (S+LR), RST via S+LR, and "robust subspace recovery (RSR)". RSR assumes that an entire data vector is either an outlier or an inlier. The S+LR formulation instead assumes that outliers occur on only a few data vector indices and hence are well modeled as sparse corruptions.Comment: To appear, IEEE Signal Processing Magazine, July 201
    corecore