231,149 research outputs found
Random Projection in Deep Neural Networks
This work investigates the ways in which deep learning methods can benefit
from random projection (RP), a classic linear dimensionality reduction method.
We focus on two areas where, as we have found, employing RP techniques can
improve deep models: training neural networks on high-dimensional data and
initialization of network parameters. Training deep neural networks (DNNs) on
sparse, high-dimensional data with no exploitable structure implies a network
architecture with an input layer that has a huge number of weights, which often
makes training infeasible. We show that this problem can be solved by
prepending the network with an input layer whose weights are initialized with
an RP matrix. We propose several modifications to the network architecture and
training regime that makes it possible to efficiently train DNNs with learnable
RP layer on data with as many as tens of millions of input features and
training examples. In comparison to the state-of-the-art methods, neural
networks with RP layer achieve competitive performance or improve the results
on several extremely high-dimensional real-world datasets. The second area
where the application of RP techniques can be beneficial for training deep
models is weight initialization. Setting the initial weights in DNNs to
elements of various RP matrices enabled us to train residual deep networks to
higher levels of performance
Asynchronous Optimization Methods for Efficient Training of Deep Neural Networks with Guarantees
Asynchronous distributed algorithms are a popular way to reduce
synchronization costs in large-scale optimization, and in particular for neural
network training. However, for nonsmooth and nonconvex objectives, few
convergence guarantees exist beyond cases where closed-form proximal operator
solutions are available. As most popular contemporary deep neural networks lead
to nonsmooth and nonconvex objectives, there is now a pressing need for such
convergence guarantees. In this paper, we analyze for the first time the
convergence of stochastic asynchronous optimization for this general class of
objectives. In particular, we focus on stochastic subgradient methods allowing
for block variable partitioning, where the shared-memory-based model is
asynchronously updated by concurrent processes. To this end, we first introduce
a probabilistic model which captures key features of real asynchronous
scheduling between concurrent processes; under this model, we establish
convergence with probability one to an invariant set for stochastic subgradient
methods with momentum.
From the practical perspective, one issue with the family of methods we
consider is that it is not efficiently supported by machine learning
frameworks, as they mostly focus on distributed data-parallel strategies. To
address this, we propose a new implementation strategy for shared-memory based
training of deep neural networks, whereby concurrent parameter servers are
utilized to train a partitioned but shared model in single- and multi-GPU
settings. Based on this implementation, we achieve on average 1.2x speed-up in
comparison to state-of-the-art training methods for popular image
classification tasks without compromising accuracy
Robust feature space separation for deep convolutional neural network training
This paper introduces two deep convolutional neural network training techniques that lead to more robust feature subspace separation in comparison to traditional training. Assume that dataset has M labels. The first method creates M deep convolutional neural networks called {DCNNi}M i=1 . Each of the networks DCNNi is composed of a convolutional neural network ( CNNi ) and a fully connected neural network ( FCNNi ). In training, a set of projection matrices are created and adaptively updated as representations for feature subspaces {S i}M i=1 . A rejection value is computed for each training based on its projections on feature subspaces. Each FCNNi acts as a binary classifier with a cost function whose main parameter is rejection values. A threshold value ti is determined for ith network DCNNi . A testing strategy utilizing {ti}M i=1 is also introduced. The second method creates a single DCNN and it computes a cost function whose parameters depend on subspace separations using the geodesic distance on the Grasmannian manifold of subspaces S i and the sum of all remaining subspaces {S j}M j=1,j≠i . The proposed methods are tested using multiple network topologies. It is shown that while the first method works better for smaller networks, the second method performs better for complex architectures
An Evaluation of Deep CNN Baselines for Scene-Independent Person Re-Identification
In recent years, a variety of proposed methods based on deep convolutional
neural networks (CNNs) have improved the state of the art for large-scale
person re-identification (ReID). While a large number of optimizations and
network improvements have been proposed, there has been relatively little
evaluation of the influence of training data and baseline network architecture.
In particular, it is usually assumed either that networks are trained on
labeled data from the deployment location (scene-dependent), or else adapted
with unlabeled data, both of which complicate system deployment. In this paper,
we investigate the feasibility of achieving scene-independent person ReID by
forming a large composite dataset for training. We present an in-depth
comparison of several CNN baseline architectures for both scene-dependent and
scene-independent ReID, across a range of training dataset sizes. We show that
scene-independent ReID can produce leading-edge results, competitive with
unsupervised domain adaption techniques. Finally, we introduce a new dataset
for comparing within-camera and across-camera person ReID.Comment: To be published in 2018 15th Conference on Computer and Robot Vision
(CRV
- …