1,918 research outputs found

    The Implicit Bias of Gradient Descent on Separable Data

    Full text link
    We examine gradient descent on unregularized logistic regression problems, with homogeneous linear predictors on linearly separable datasets. We show the predictor converges to the direction of the max-margin (hard margin SVM) solution. The result also generalizes to other monotone decreasing loss functions with an infimum at infinity, to multi-class problems, and to training a weight layer in a deep network in a certain restricted setting. Furthermore, we show this convergence is very slow, and only logarithmic in the convergence of the loss itself. This can help explain the benefit of continuing to optimize the logistic or cross-entropy loss even after the training error is zero and the training loss is extremely small, and, as we show, even if the validation loss increases. Our methodology can also aid in understanding implicit regularization n more complex models and with other optimization methods.Comment: Final JMLR version, with improved discussions over v3. Main improvements in journal version over conference version (v2 appeared in ICLR): We proved the measure zero case for main theorem (with implications for the rates), and the multi-class cas

    Strong Stationary Duality for M\"obius Monotone Markov Chains: Unreliable Networks

    Full text link
    For Markov chains with a partially ordered finite state space we show strong stationary duality under the condition of M\"obius monotonicity of the chain. We show relations of M\"obius monotonicity to other definitions of monotone chains. We give examples of dual chains in this context which have transitions only upwards. We illustrate general theory by an analysis of nonsymmetric random walks on the cube with an application to networks of queues

    Accelerated Backpressure Algorithm

    Full text link
    We develop an Accelerated Back Pressure (ABP) algorithm using Accelerated Dual Descent (ADD), a distributed approximate Newton-like algorithm that only uses local information. Our construction is based on writing the backpressure algorithm as the solution to a network feasibility problem solved via stochastic dual subgradient descent. We apply stochastic ADD in place of the stochastic gradient descent algorithm. We prove that the ABP algorithm guarantees stable queues. Our numerical experiments demonstrate a significant improvement in convergence rate, especially when the packet arrival statistics vary over time.Comment: 9 pages, 4 figures. A version of this work with significantly extended proofs is being submitted for journal publicatio
    • …
    corecore