800 research outputs found
DANTE: Deep AlterNations for Training nEural networks
We present DANTE, a novel method for training neural networks using the
alternating minimization principle. DANTE provides an alternate perspective to
traditional gradient-based backpropagation techniques commonly used to train
deep networks. It utilizes an adaptation of quasi-convexity to cast training a
neural network as a bi-quasi-convex optimization problem. We show that for
neural network configurations with both differentiable (e.g. sigmoid) and
non-differentiable (e.g. ReLU) activation functions, we can perform the
alternations effectively in this formulation. DANTE can also be extended to
networks with multiple hidden layers. In experiments on standard datasets,
neural networks trained using the proposed method were found to be promising
and competitive to traditional backpropagation techniques, both in terms of
quality of the solution, as well as training speed.Comment: 19 page
Distributionally Robust Learning with Weakly Convex Losses: Convergence Rates and Finite-Sample Guarantees
We consider a distributionally robust stochastic optimization problem and
formulate it as a stochastic two-level composition optimization problem with
the use of the mean--semideviation risk measure. In this setting, we consider a
single time-scale algorithm, involving two versions of the inner function value
tracking: linearized tracking of a continuously differentiable loss function,
and SPIDER tracking of a weakly convex loss function. We adopt the norm of the
gradient of the Moreau envelope as our measure of stationarity and show that
the sample complexity of is possible in both
cases, with only the constant larger in the second case. Finally, we
demonstrate the performance of our algorithm with a robust learning example and
a weakly convex, non-smooth regression example
- …