128 research outputs found
Contrastive Learning for Lifted Networks
In this work we address supervised learning of neural networks via lifted
network formulations. Lifted networks are interesting because they allow
training on massively parallel hardware and assign energy models to
discriminatively trained neural networks. We demonstrate that the training
methods for lifted networks proposed in the literature have significant
limitations and show how to use a contrastive loss to address those
limitations. We demonstrate that this contrastive training approximates
back-propagation in theory and in practice and that it is superior to the
training objective regularly used for lifted networks.Comment: 9 pages, BMVC 201
0/1 Deep Neural Networks via Block Coordinate Descent
The step function is one of the simplest and most natural activation
functions for deep neural networks (DNNs). As it counts 1 for positive
variables and 0 for others, its intrinsic characteristics (e.g., discontinuity
and no viable information of subgradients) impede its development for several
decades. Even if there is an impressive body of work on designing DNNs with
continuous activation functions that can be deemed as surrogates of the step
function, it is still in the possession of some advantageous properties, such
as complete robustness to outliers and being capable of attaining the best
learning-theoretic guarantee of predictive accuracy. Hence, in this paper, we
aim to train DNNs with the step function used as an activation function (dubbed
as 0/1 DNNs). We first reformulate 0/1 DNNs as an unconstrained optimization
problem and then solve it by a block coordinate descend (BCD) method. Moreover,
we acquire closed-form solutions for sub-problems of BCD as well as its
convergence properties. Furthermore, we also integrate
-regularization into 0/1 DNN to accelerate the training process and
compress the network scale. As a result, the proposed algorithm has a high
performance on classifying MNIST and Fashion-MNIST datasets. As a result, the
proposed algorithm has a desirable performance on classifying MNIST,
FashionMNIST, Cifar10, and Cifar100 datasets
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
Understanding the global optimality in deep learning (DL) has been attracting
more and more attention recently. Conventional DL solvers, however, have not
been developed intentionally to seek for such global optimality. In this paper
we propose a novel approximation algorithm, BPGrad, towards optimizing deep
models globally via branch and pruning. Our BPGrad algorithm is based on the
assumption of Lipschitz continuity in DL, and as a result it can adaptively
determine the step size for current gradient given the history of previous
updates, wherein theoretically no smaller steps can achieve the global
optimality. We prove that, by repeating such branch-and-pruning procedure, we
can locate the global optimality within finite iterations. Empirically an
efficient solver based on BPGrad for DL is proposed as well, and it outperforms
conventional DL solvers such as Adagrad, Adadelta, RMSProp, and Adam in the
tasks of object recognition, detection, and segmentation
- β¦