58,233 research outputs found

    Global Convergence and Stability of Stochastic Gradient Descent

    Full text link
    In machine learning, stochastic gradient descent (SGD) is widely deployed to train models using highly non-convex objectives with equally complex noise models. Unfortunately, SGD theory often makes restrictive assumptions that fail to capture the non-convexity of real problems, and almost entirely ignore the complex noise models that exist in practice. In this work, we make substantial progress on this shortcoming. First, we establish that SGD's iterates will either globally converge to a stationary point or diverge under nearly arbitrary nonconvexity and noise models. Under a slightly more restrictive assumption on the joint behavior of the non-convexity and noise model that generalizes current assumptions in the literature, we show that the objective function cannot diverge, even if the iterates diverge. As a consequence of our results, SGD can be applied to a greater range of stochastic optimization problems with confidence about its global convergence behavior and stability

    Universal Compressed Sensing

    Full text link
    In this paper, the problem of developing universal algorithms for compressed sensing of stochastic processes is studied. First, R\'enyi's notion of information dimension (ID) is generalized to analog stationary processes. This provides a measure of complexity for such processes and is connected to the number of measurements required for their accurate recovery. Then a minimum entropy pursuit (MEP) optimization approach is proposed, and it is proven that it can reliably recover any stationary process satisfying some mixing constraints from sufficient number of randomized linear measurements, without having any prior information about the distribution of the process. It is proved that a Lagrangian-type approximation of the MEP optimization problem, referred to as Lagrangian-MEP problem, is identical to a heuristic implementable algorithm proposed by Baron et al. It is shown that for the right choice of parameters the Lagrangian-MEP algorithm, in addition to having the same asymptotic performance as MEP optimization, is also robust to the measurement noise. For memoryless sources with a discrete-continuous mixture distribution, the fundamental limits of the minimum number of required measurements by a non-universal compressed sensing decoder is characterized by Wu et al. For such sources, it is proved that there is no loss in universal coding, and both the MEP and the Lagrangian-MEP asymptotically achieve the optimal performance
    corecore