Search CORE

1,918 research outputs found

The Implicit Bias of Gradient Descent on Separable Data

Author: Gunasekar Suriya
Hoffer Elad
Nacson Mor Shpigel
Soudry Daniel
Srebro Nathan
Publication venue
Publication date: 28/12/2018
Field of study

We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main improvements in journal version over conference version (v2 appeared in ICLR): We proved the measure zero case for main theorem (with implications for the rates), and the multi-class cas

arXiv.org e-Print Archive

Strong Stationary Duality for M\"obius Monotone Markov Chains: Unreliable Networks

Author: Lorek Pawel
Szekli Ryszard
Publication venue
Publication date: 01/01/2011
Field of study

For Markov chains with a partially ordered finite state space we show strong stationary duality under the condition of M\"obius monotonicity of the chain. We show relations of M\"obius monotonicity to other definitions of monotone chains. We give examples of dual chains in this context which have transitions only upwards. We illustrate general theory by an analysis of nonsymmetric random walks on the cube with an application to networks of queues

arXiv.org e-Print Archive

CiteSeerX

Accelerated Backpressure Algorithm

Author: Jadbabaie Ali
Ribeiro Alejandro
Zargham Michael
Publication venue
Publication date: 06/02/2013
Field of study

We develop an Accelerated Back Pressure (ABP) algorithm using Accelerated Dual Descent (ADD), a distributed approximate Newton-like algorithm that only uses local information. Our construction is based on writing the backpressure algorithm as the solution to a network feasibility problem solved via stochastic dual subgradient descent. We apply stochastic ADD in place of the stochastic gradient descent algorithm. We prove that the ABP algorithm guarantees stable queues. Our numerical experiments demonstrate a significant improvement in convergence rate, especially when the packet arrival statistics vary over time.Comment: 9 pages, 4 figures. A version of this work with significantly extended proofs is being submitted for journal publicatio

arXiv.org e-Print Archive

Crossref