168 research outputs found
Statistical and Computational Tradeoffs in Stochastic Composite Likelihood
Maximum likelihood estimators are often of limited practical use due to the
intensive computation they require. We propose a family of alternative
estimators that maximize a stochastic variation of the composite likelihood
function. Each of the estimators resolve the computation-accuracy tradeoff
differently, and taken together they span a continuous spectrum of
computation-accuracy tradeoff resolutions. We prove the consistency of the
estimators, provide formulas for their asymptotic variance, statistical
robustness, and computational complexity. We discuss experimental results in
the context of Boltzmann machines and conditional random fields. The
theoretical and experimental studies demonstrate the effectiveness of the
estimators when the computational resources are insufficient. They also
demonstrate that in some cases reduced computational complexity is associated
with robustness thereby increasing statistical accuracy.Comment: 30 pages, 97 figures, 2 author
Asymptotic Analysis of Generative Semi-Supervised Learning
Semisupervised learning has emerged as a popular framework for improving
modeling accuracy while controlling labeling cost. Based on an extension of
stochastic composite likelihood we quantify the asymptotic accuracy of
generative semi-supervised learning. In doing so, we complement
distribution-free analysis by providing an alternative framework to measure the
value associated with different labeling policies and resolve the fundamental
question of how much data to label and in what manner. We demonstrate our
approach with both simulation studies and real world experiments using naive
Bayes for text classification and MRFs and CRFs for structured prediction in
NLP.Comment: 12 pages, 9 figure
Automatically Bounding the Taylor Remainder Series: Tighter Bounds and New Applications
We present a new algorithm for automatically bounding the Taylor remainder
series. In the special case of a scalar function , our algorithm takes as input a reference point , trust region
, and integer , and returns an interval such that for
all . As in automatic differentiation, the function is
provided to the algorithm in symbolic form, and must be composed of known
atomic functions.
At a high level, our algorithm has two steps. First, for a variety of
commonly-used elementary functions (e.g., , ), we use
recently-developed theory to derive sharp polynomial upper and lower bounds on
the Taylor remainder series. We then recursively combine the bounds for the
elementary functions using an interval arithmetic variant of Taylor-mode
automatic differentiation. Our algorithm can make efficient use of machine
learning hardware accelerators, and we provide an open source implementation in
JAX.
We then turn our attention to applications. Most notably, in a companion
paper we use our new machinery to create the first universal
majorization-minimization optimization algorithms: algorithms that iteratively
minimize an arbitrary loss using a majorizer that is derived automatically,
rather than by hand. We also show that our automatically-derived bounds can be
used for verified global optimization and numerical integration, and to prove
sharper versions of Jensen's inequality.Comment: Previous version has been split into 3 articles: arXiv:2308.00679,
arXiv:2308.00190, and this articl
PAC-Bayes: Narrowing the Empirical Risk Gap in the Misspecified Bayesian Regime
While the decision-theoretic optimality of the Bayesian formalism under
correct model specification is well-known (Berger 2013), the Bayesian case
becomes less clear under model misspecification (Grunwald 2017; Ramamoorthi
2015; Fushiki 2005). To formally understand the consequences of Bayesian
misspecification, this work examines the relationship between posterior
predictive risk and its sensitivity to correct model assumptions, i.e., choice
of likelihood and prior. We present the multisample PAC-Bayes risk. This
risk is justified by theoretical analysis based on PAC-Bayes as well as
empirical study on a number of toy problems. The PAC-Bayes risk is
appealing in that it entails direct minimization of the Monte-Carlo
approximated posterior predictive risk yet recovers both the Bayesian formalism
as well as the MLE in its limits. Our work is heavily influenced by Masegosa
(2019); our contributions are to align training and generalization risks while
offering a tighter bound which empirically performs at least as well and
sometimes much better.Comment: Submitted to ICML 202
- β¦