1,918 research outputs found
The Implicit Bias of Gradient Descent on Separable Data
We examine gradient descent on unregularized logistic regression problems,
with homogeneous linear predictors on linearly separable datasets. We show the
predictor converges to the direction of the max-margin (hard margin SVM)
solution. The result also generalizes to other monotone decreasing loss
functions with an infimum at infinity, to multi-class problems, and to training
a weight layer in a deep network in a certain restricted setting. Furthermore,
we show this convergence is very slow, and only logarithmic in the convergence
of the loss itself. This can help explain the benefit of continuing to optimize
the logistic or cross-entropy loss even after the training error is zero and
the training loss is extremely small, and, as we show, even if the validation
loss increases. Our methodology can also aid in understanding implicit
regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main
improvements in journal version over conference version (v2 appeared in
ICLR): We proved the measure zero case for main theorem (with implications
for the rates), and the multi-class cas
Strong Stationary Duality for M\"obius Monotone Markov Chains: Unreliable Networks
For Markov chains with a partially ordered finite state space we show strong
stationary duality under the condition of M\"obius monotonicity of the chain.
We show relations of M\"obius monotonicity to other definitions of monotone
chains. We give examples of dual chains in this context which have transitions
only upwards. We illustrate general theory by an analysis of nonsymmetric
random walks on the cube with an application to networks of queues
Accelerated Backpressure Algorithm
We develop an Accelerated Back Pressure (ABP) algorithm using Accelerated
Dual Descent (ADD), a distributed approximate Newton-like algorithm that only
uses local information. Our construction is based on writing the backpressure
algorithm as the solution to a network feasibility problem solved via
stochastic dual subgradient descent. We apply stochastic ADD in place of the
stochastic gradient descent algorithm. We prove that the ABP algorithm
guarantees stable queues. Our numerical experiments demonstrate a significant
improvement in convergence rate, especially when the packet arrival statistics
vary over time.Comment: 9 pages, 4 figures. A version of this work with significantly
extended proofs is being submitted for journal publicatio
- …