138 research outputs found
Priming Neural Networks
Visual priming is known to affect the human visual system to allow detection
of scene elements, even those that may have been near unnoticeable before, such
as the presence of camouflaged animals. This process has been shown to be an
effect of top-down signaling in the visual system triggered by the said cue. In
this paper, we propose a mechanism to mimic the process of priming in the
context of object detection and segmentation. We view priming as having a
modulatory, cue dependent effect on layers of features within a network. Our
results show how such a process can be complementary to, and at times more
effective than simple post-processing applied to the output of the network,
notably so in cases where the object is hard to detect such as in severe noise.
Moreover, we find the effects of priming are sometimes stronger when early
visual layers are affected. Overall, our experiments confirm that top-down
signals can go a long way in improving object detection and segmentation.Comment: fixed error in author nam
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy
Gradient Energy Matching for Distributed Asynchronous Gradient Descent
Distributed asynchronous SGD has become widely used for deep learning in
large-scale systems, but remains notorious for its instability when increasing
the number of workers. In this work, we study the dynamics of distributed
asynchronous SGD under the lens of Lagrangian mechanics. Using this
description, we introduce the concept of energy to describe the optimization
process and derive a sufficient condition ensuring its stability as long as the
collective energy induced by the active workers remains below the energy of a
target synchronous process. Making use of this criterion, we derive a stable
distributed asynchronous optimization procedure, GEM, that estimates and
maintains the energy of the asynchronous system below or equal to the energy of
sequential SGD with momentum. Experimental results highlight the stability and
speedup of GEM compared to existing schemes, even when scaling to one hundred
asynchronous workers. Results also indicate better generalization compared to
the targeted SGD with momentum
- …