3,243 research outputs found

    Statistical and Computational Tradeoffs in Stochastic Composite Likelihood

    Get PDF
    Maximum likelihood estimators are often of limited practical use due to the intensive computation they require. We propose a family of alternative estimators that maximize a stochastic variation of the composite likelihood function. Each of the estimators resolve the computation-accuracy tradeoff differently, and taken together they span a continuous spectrum of computation-accuracy tradeoff resolutions. We prove the consistency of the estimators, provide formulas for their asymptotic variance, statistical robustness, and computational complexity. We discuss experimental results in the context of Boltzmann machines and conditional random fields. The theoretical and experimental studies demonstrate the effectiveness of the estimators when the computational resources are insufficient. They also demonstrate that in some cases reduced computational complexity is associated with robustness thereby increasing statistical accuracy.Comment: 30 pages, 97 figures, 2 author

    Statistical and Computational Tradeoff in Genetic Algorithm-Based Estimation

    Full text link
    When a Genetic Algorithm (GA), or a stochastic algorithm in general, is employed in a statistical problem, the obtained result is affected by both variability due to sampling, that refers to the fact that only a sample is observed, and variability due to the stochastic elements of the algorithm. This topic can be easily set in a framework of statistical and computational tradeoff question, crucial in recent problems, for which statisticians must carefully set statistical and computational part of the analysis, taking account of some resource or time constraints. In the present work we analyze estimation problems tackled by GAs, for which variability of estimates can be decomposed in the two sources of variability, considering some constraints in the form of cost functions, related to both data acquisition and runtime of the algorithm. Simulation studies will be presented to discuss the statistical and computational tradeoff question.Comment: 17 pages, 5 figure

    Asymptotic Analysis of Generative Semi-Supervised Learning

    Full text link
    Semisupervised learning has emerged as a popular framework for improving modeling accuracy while controlling labeling cost. Based on an extension of stochastic composite likelihood we quantify the asymptotic accuracy of generative semi-supervised learning. In doing so, we complement distribution-free analysis by providing an alternative framework to measure the value associated with different labeling policies and resolve the fundamental question of how much data to label and in what manner. We demonstrate our approach with both simulation studies and real world experiments using naive Bayes for text classification and MRFs and CRFs for structured prediction in NLP.Comment: 12 pages, 9 figure

    Noisy Monte Carlo: Convergence of Markov chains with approximate transition kernels

    Get PDF
    Monte Carlo algorithms often aim to draw from a distribution π\pi by simulating a Markov chain with transition kernel PP such that π\pi is invariant under PP. However, there are many situations for which it is impractical or impossible to draw from the transition kernel PP. For instance, this is the case with massive datasets, where is it prohibitively expensive to calculate the likelihood and is also the case for intractable likelihood models arising from, for example, Gibbs random fields, such as those found in spatial statistics and network analysis. A natural approach in these cases is to replace PP by an approximation P^\hat{P}. Using theory from the stability of Markov chains we explore a variety of situations where it is possible to quantify how 'close' the chain given by the transition kernel P^\hat{P} is to the chain given by PP. We apply these results to several examples from spatial statistics and network analysis.Comment: This version: results extended to non-uniformly ergodic Markov chain

    Convex Optimization for Big Data

    Get PDF
    This article reviews recent advances in convex optimization algorithms for Big Data, which aim to reduce the computational, storage, and communications bottlenecks. We provide an overview of this emerging field, describe contemporary approximation techniques like first-order methods and randomization for scalability, and survey the important role of parallel and distributed computation. The new Big Data algorithms are based on surprisingly simple principles and attain staggering accelerations even on classical problems.Comment: 23 pages, 4 figurs, 8 algorithm
    corecore