44,535 research outputs found

    A Constructive, Incremental-Learning Network for Mixture Modeling and Classification

    Full text link
    Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409

    A Constructive, Incremental-Learning Network for Mixture Modeling and Classification

    Full text link
    Gaussian ARTMAP (GAM) is a supervised-learning adaptive resonance theory (ART) network that uses Gaussian-defined receptive fields. Like other ART networks, GAM incrementally learns and constructs a representation of sufficient complexity to solve a problem it is trained on. GAM's representation is a Gaussian mixture model of the input space, with learned mappings from the mixture components to output classes. We show a close relationship between GAM and the well-known Expectation-Maximization (EM) approach to mixture-modeling. GAM outperforms an EM classification algorithm on a classification benchmark, thereby demonstrating the advantage of the ART match criterion for regulating learning, and the ARTMAP match tracking operation for incorporate environmental feedback in supervised learning situations.Office of Naval Research (N00014-95-1-0409

    Approximating Likelihood Ratios with Calibrated Discriminative Classifiers

    Full text link
    In many fields of science, generalized likelihood ratio tests are established tools for statistical inference. At the same time, it has become increasingly common that a simulator (or generative model) is used to describe complex processes that tie parameters θ\theta of an underlying theory and measurement apparatus to high-dimensional observations x∈Rp\mathbf{x}\in \mathbb{R}^p. However, simulator often do not provide a way to evaluate the likelihood function for a given observation x\mathbf{x}, which motivates a new class of likelihood-free inference algorithms. In this paper, we show that likelihood ratios are invariant under a specific class of dimensionality reduction maps Rp↦R\mathbb{R}^p \mapsto \mathbb{R}. As a direct consequence, we show that discriminative classifiers can be used to approximate the generalized likelihood ratio statistic when only a generative model for the data is available. This leads to a new machine learning-based approach to likelihood-free inference that is complementary to Approximate Bayesian Computation, and which does not require a prior on the model parameters. Experimental results on artificial problems with known exact likelihoods illustrate the potential of the proposed method.Comment: 35 pages, 5 figure

    Subdeterminant Maximization via Nonconvex Relaxations and Anti-concentration

    Full text link
    Several fundamental problems that arise in optimization and computer science can be cast as follows: Given vectors v1,…,vm∈Rdv_1,\ldots,v_m \in \mathbb{R}^d and a constraint family B⊆2[m]{\cal B}\subseteq 2^{[m]}, find a set S∈BS \in \cal{B} that maximizes the squared volume of the simplex spanned by the vectors in SS. A motivating example is the data-summarization problem in machine learning where one is given a collection of vectors that represent data such as documents or images. The volume of a set of vectors is used as a measure of their diversity, and partition or matroid constraints over [m][m] are imposed in order to ensure resource or fairness constraints. Recently, Nikolov and Singh presented a convex program and showed how it can be used to estimate the value of the most diverse set when B{\cal B} corresponds to a partition matroid. This result was recently extended to regular matroids in works of Straszak and Vishnoi, and Anari and Oveis Gharan. The question of whether these estimation algorithms can be converted into the more useful approximation algorithms -- that also output a set -- remained open. The main contribution of this paper is to give the first approximation algorithms for both partition and regular matroids. We present novel formulations for the subdeterminant maximization problem for these matroids; this reduces them to the problem of finding a point that maximizes the absolute value of a nonconvex function over a Cartesian product of probability simplices. The technical core of our results is a new anti-concentration inequality for dependent random variables that allows us to relate the optimal value of these nonconvex functions to their value at a random point. Unlike prior work on the constrained subdeterminant maximization problem, our proofs do not rely on real-stability or convexity and could be of independent interest both in algorithms and complexity.Comment: in FOCS 201

    Minimax and Adaptive Inference in Nonparametric Function Estimation

    Get PDF
    Since Stein's 1956 seminal paper, shrinkage has played a fundamental role in both parametric and nonparametric inference. This article discusses minimaxity and adaptive minimaxity in nonparametric function estimation. Three interrelated problems, function estimation under global integrated squared error, estimation under pointwise squared error, and nonparametric confidence intervals, are considered. Shrinkage is pivotal in the development of both the minimax theory and the adaptation theory. While the three problems are closely connected and the minimax theories bear some similarities, the adaptation theories are strikingly different. For example, in a sharp contrast to adaptive point estimation, in many common settings there do not exist nonparametric confidence intervals that adapt to the unknown smoothness of the underlying function. A concise account of these theories is given. The connections as well as differences among these problems are discussed and illustrated through examples.Comment: Published in at http://dx.doi.org/10.1214/11-STS355 the Statistical Science (http://www.imstat.org/sts/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Classification via local multi-resolution projections

    Full text link
    We focus on the supervised binary classification problem, which consists in guessing the label YY associated to a co-variate X∈RdX \in \R^d, given a set of nn independent and identically distributed co-variates and associated labels (Xi,Yi)(X_i,Y_i). We assume that the law of the random vector (X,Y)(X,Y) is unknown and the marginal law of XX admits a density supported on a set \A. In the particular case of plug-in classifiers, solving the classification problem boils down to the estimation of the regression function \eta(X) = \Exp[Y|X]. Assuming first \A to be known, we show how it is possible to construct an estimator of η\eta by localized projections onto a multi-resolution analysis (MRA). In a second step, we show how this estimation procedure generalizes to the case where \A is unknown. Interestingly, this novel estimation procedure presents similar theoretical performances as the celebrated local-polynomial estimator (LPE). In addition, it benefits from the lattice structure of the underlying MRA and thus outperforms the LPE from a computational standpoint, which turns out to be a crucial feature in many practical applications. Finally, we prove that the associated plug-in classifier can reach super-fast rates under a margin assumption.Comment: 38 pages, 6 figure
    • …
    corecore