76 research outputs found
Mind the Gap: Subspace based Hierarchical Domain Adaptation
Domain adaptation techniques aim at adapting a classifier learnt on a source
domain to work on the target domain. Exploiting the subspaces spanned by
features of the source and target domains respectively is one approach that has
been investigated towards solving this problem. These techniques normally
assume the existence of a single subspace for the entire source / target
domain. In this work, we consider the hierarchical organization of the data and
consider multiple subspaces for the source and target domain based on the
hierarchy. We evaluate different subspace based domain adaptation techniques
under this setting and observe that using different subspaces based on the
hierarchy yields consistent improvement over a non-hierarchical baselineComment: 4 pages in Second Workshop on Transfer and Multi-Task Learning:
Theory meets Practice in NIPS 201
Subspace Alignment Based Domain Adaptation for RCNN Detector
In this paper, we propose subspace alignment based domain adaptation of the
state of the art RCNN based object detector. The aim is to be able to achieve
high quality object detection in novel, real world target scenarios without
requiring labels from the target domain. While, unsupervised domain adaptation
has been studied in the case of object classification, for object detection it
has been relatively unexplored. In subspace based domain adaptation for
objects, we need access to source and target subspaces for the bounding box
features. The absence of supervision (labels and bounding boxes are absent)
makes the task challenging. In this paper, we show that we can still adapt sub-
spaces that are localized to the object by obtaining detections from the RCNN
detector trained on source and applied on target. Then we form localized
subspaces from the detections and show that subspace alignment based adaptation
between these subspaces yields improved object detection. This evaluation is
done by considering challenging real world datasets of PASCAL VOC as source and
validation set of Microsoft COCO dataset as target for various categories.Comment: 26th British Machine Vision Conference, Swansea, U
Screening Rules for Convex Problems
We propose a new framework for deriving screening rules for convex
optimization problems. Our approach covers a large class of constrained and
penalized optimization formulations, and works in two steps. First, given any
approximate point, the structure of the objective function and the duality gap
is used to gather information on the optimal solution. In the second step, this
information is used to produce screening rules, i.e. safely identifying
unimportant weight variables of the optimal solution. Our general framework
leads to a large variety of useful existing as well as new screening rules for
many applications. For example, we provide new screening rules for general
simplex and -constrained problems, Elastic Net, squared-loss Support
Vector Machines, minimum enclosing ball, as well as structured norm regularized
problems, such as group lasso
Stochastic Stein Discrepancies
Stein discrepancies (SDs) monitor convergence and non-convergence in
approximate inference when exact integration and sampling are intractable.
However, the computation of a Stein discrepancy can be prohibitive if the Stein
operator - often a sum over likelihood terms or potentials - is expensive to
evaluate. To address this deficiency, we show that stochastic Stein
discrepancies (SSDs) based on subsampled approximations of the Stein operator
inherit the convergence control properties of standard SDs with probability 1.
In our experiments with biased Markov chain Monte Carlo (MCMC) hyperparameter
tuning, approximate MCMC sampler selection, and stochastic Stein variational
gradient descent, SSDs deliver comparable inferences to standard SDs with
orders of magnitude fewer likelihood evaluations
Utilising the CLT Structure in Stochastic Gradient based Sampling : Improved Analysis and Faster Algorithms
We consider stochastic approximations of sampling algorithms, such as
Stochastic Gradient Langevin Dynamics (SGLD) and the Random Batch Method (RBM)
for Interacting Particle Dynamcs (IPD). We observe that the noise introduced by
the stochastic approximation is nearly Gaussian due to the Central Limit
Theorem (CLT) while the driving Brownian motion is exactly Gaussian. We harness
this structure to absorb the stochastic approximation error inside the
diffusion process, and obtain improved convergence guarantees for these
algorithms. For SGLD, we prove the first stable convergence rate in KL
divergence without requiring uniform warm start, assuming the target density
satisfies a Log-Sobolev Inequality. Our result implies superior first-order
oracle complexity compared to prior works, under significantly milder
assumptions. We also prove the first guarantees for SGLD under even weaker
conditions such as H\"{o}lder smoothness and Poincare Inequality, thus bridging
the gap between the state-of-the-art guarantees for LMC and SGLD. Our analysis
motivates a new algorithm called covariance correction, which corrects for the
additional noise introduced by the stochastic approximation by rescaling the
strength of the diffusion. Finally, we apply our techniques to analyze RBM, and
significantly improve upon the guarantees in prior works (such as removing
exponential dependence on horizon), under minimal assumptions.Comment: Version 2 considers more results, including those for stochastic
gradient lagevin dynamics and the random batch method for interacting
particle dynamics, along with the results in the previous version. This also
contains 2 additional author
Uniform-in-Time Wasserstein Stability Bounds for (Noisy) Stochastic Gradient Descent
Algorithmic stability is an important notion that has proven powerful for
deriving generalization bounds for practical algorithms. The last decade has
witnessed an increasing number of stability bounds for different algorithms
applied on different classes of loss functions. While these bounds have
illuminated various properties of optimization algorithms, the analysis of each
case typically required a different proof technique with significantly
different mathematical tools. In this study, we make a novel connection between
learning theory and applied probability and introduce a unified guideline for
proving Wasserstein stability bounds for stochastic optimization algorithms. We
illustrate our approach on stochastic gradient descent (SGD) and we obtain
time-uniform stability bounds (i.e., the bound does not increase with the
number of iterations) for strongly convex losses and non-convex losses with
additive noise, where we recover similar results to the prior art or extend
them to more general cases by using a single proof technique. Our approach is
flexible and can be generalizable to other popular optimizers, as it mainly
requires developing Lyapunov functions, which are often readily available in
the literature. It also illustrates that ergodicity is an important component
for obtaining time-uniform bounds -- which might not be achieved for convex or
non-convex losses unless additional noise is injected to the iterates. Finally,
we slightly stretch our analysis technique and prove time-uniform bounds for
SGD under convex and non-convex losses (without additional additive noise),
which, to our knowledge, is novel.Comment: 49 pages, NeurIPS 202
- …