133 research outputs found
A* Sampling
The problem of drawing samples from a discrete distribution can be converted
into a discrete optimization problem. In this work, we show how sampling from a
continuous distribution can be converted into an optimization problem over
continuous space. Central to the method is a stochastic process recently
described in mathematical statistics that we call the Gumbel process. We
present a new construction of the Gumbel process and A* sampling, a practical
generic sampling algorithm that searches for the maximum of a Gumbel process
using A* search. We analyze the correctness and convergence time of A* sampling
and demonstrate empirically that it makes more efficient use of bound and
likelihood evaluations than the most closely related adaptive rejection
sampling-based algorithms.Comment: V2: - reworded the last paragraph of Section 2 to clarify that the
argmax is a sample from the normalized measure. - fixed notation in Algorithm
1. - fixed a typo in paragraph 2 of Section
Probabilistic Invariant Learning with Randomized Linear Classifiers
Designing models that are both expressive and preserve known invariances of
tasks is an increasingly hard problem. Existing solutions tradeoff invariance
for computational or memory resources. In this work, we show how to leverage
randomness and design models that are both expressive and invariant but use
less resources. Inspired by randomized algorithms, our key insight is that
accepting probabilistic notions of universal approximation and invariance can
reduce our resource requirements. More specifically, we propose a class of
binary classification models called Randomized Linear Classifiers (RLCs). We
give parameter and sample size conditions in which RLCs can, with high
probability, approximate any (smooth) function while preserving invariance to
compact group transformations. Leveraging this result, we design three RLCs
that are provably probabilistic invariant for classification tasks over sets,
graphs, and spherical data. We show how these models can achieve probabilistic
invariance and universality using less resources than (deterministic) neural
networks and their invariant counterparts. Finally, we empirically demonstrate
the benefits of this new class of models on invariant tasks where deterministic
invariant neural networks are known to struggle
Contrastive Learning Can Find An Optimal Basis For Approximately View-Invariant Functions
Contrastive learning is a powerful framework for learning self-supervised
representations that generalize well to downstream supervised tasks. We show
that multiple existing contrastive learning methods can be reinterpreted as
learning kernel functions that approximate a fixed positive-pair kernel. We
then prove that a simple representation obtained by combining this kernel with
PCA provably minimizes the worst-case approximation error of linear predictors,
under a straightforward assumption that positive pairs have similar labels. Our
analysis is based on a decomposition of the target function in terms of the
eigenfunctions of a positive-pair Markov chain, and a surprising equivalence
between these eigenfunctions and the output of Kernel PCA. We give
generalization bounds for downstream linear prediction using our Kernel PCA
representation, and show empirically on a set of synthetic tasks that applying
Kernel PCA to contrastive learning models can indeed approximately recover the
Markov chain eigenfunctions, although the accuracy depends on the kernel
parameterization as well as on the augmentation strength.Comment: Published at ICLR 202
- …