9 research outputs found
Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains
Data dependent regularization is known to benefit a wide variety of problems
in machine learning. Often, these regularizers cannot be easily decomposed into
a sum over a finite number of terms, e.g., a sum over individual example-wise
terms. The measure, Area under the ROC curve (AUCROC) and Precision
at a fixed recall (P@R) are some prominent examples that are used in many
applications. We find that for most medium to large sized datasets, scalability
issues severely limit our ability in leveraging the benefits of such
regularizers. Importantly, the key technical impediment despite some recent
progress is that, such objectives remain difficult to optimize via
backpropapagation procedures. While an efficient general-purpose strategy for
this problem still remains elusive, in this paper, we show that for many
data-dependent nondecomposable regularizers that are relevant in applications,
sizable gains in efficiency are possible with minimal code-level changes; in
other words, no specialized tools or numerical schemes are needed. Our
procedure involves a reparameterization followed by a partial dualization --
this leads to a formulation that has provably cheap projection operators. We
present a detailed analysis of runtime and convergence properties of our
algorithm. On the experimental side, we show that a direct use of our scheme
significantly improves the state of the art IOU measures reported for MSCOCO
Stuff segmentation dataset
Training Gaussian Boson Sampling Distributions
Gaussian Boson Sampling (GBS) is a near-term platform for photonic quantum
computing. Applications have been developed which rely on directly programming
GBS devices, but the ability to train and optimize circuits has been a key
missing ingredient for developing new algorithms. In this work, we derive
analytical gradient formulas for the GBS distribution, which can be used to
train devices using standard methods based on gradient descent. We introduce a
parametrization of the distribution that allows the gradient to be estimated by
sampling from the same device that is being optimized. In the case of training
using a Kullback-Leibler divergence or log-likelihood cost function, we show
that gradients can be computed classically, leading to fast training. We
illustrate these results with numerical experiments in stochastic optimization
and unsupervised learning. As a particular example, we introduce the
variational Ising solver, a hybrid algorithm for training GBS devices to sample
ground states of a classical Ising model with high probability.Comment: 15 pages, 3 figure