Search CORE

46 research outputs found

Stochastic Learning under Random Reshuffling with Constant Step-sizes

Author: Sayed Ali H.
Vlaski Stefan
Ying Bicheng
Yuan Kun
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 09/10/2018
Field of study

In empirical risk optimization, it has been observed that stochastic gradient implementations that rely on random reshuffling of the data achieve better performance than implementations that rely on sampling the data uniformly. Recent works have pursued justifications for this behavior by examining the convergence rate of the learning process under diminishing step-sizes. This work focuses on the constant step-size case and strongly convex loss function. In this case, convergence is guaranteed to a small neighborhood of the optimizer albeit at a linear rate. The analysis establishes analytically that random reshuffling outperforms uniform sampling by showing explicitly that iterates approach a smaller neighborhood of size

O(\mu^2)

around the minimizer rather than

O(\mu)

. Furthermore, we derive an analytical expression for the steady-state mean-square-error performance of the algorithm, which helps clarify in greater detail the differences between sampling with and without replacement. We also explain the periodic behavior that is observed in random reshuffling implementations

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Recommended from our members

Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments

Author: Vlaski Stefan
Publication venue: eScholarship, University of California
Publication date: 01/01/2019
Field of study

The first part of this dissertation considers distributed learning problems over networked agents. The general objective of distributed adaptation and learning is the solution of global, stochastic optimization problems through localized interactions and without information about the statistical properties of the data.Regularization is a useful technique to encourage or enforce structural properties on the resulting solution, such as sparsity or constraints. A substantial number of regularizers are inherently non-smooth, while many cost functions are differentiable. We propose distributed and adaptive strategies that are able to minimize aggregate sums of objectives. In doing so, we exploit the structure of the individual objectives as sums of differentiable costs and non-differentiable regularizers. The resulting algorithms are adaptive in nature and able to continuously track drifts in the problem; their recursions, however, are subject to persistent perturbations arising from the stochastic nature of the gradient approximations and from disagreement across agents in the network. The presence of non-smooth, and potentially unbounded, regularizers enriches the dynamics of these recursions. We quantify the impact of this interplay and draw implications for steady-state performance as well as algorithm design and present applications in distributed machine learning and image reconstruction.There has also been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. In this work, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning algorithm continues to yield meaningful estimates in these more challenging, non-convex environments, in the sense that (a) despite the distributed implementation, individual agents cluster in a small region around the weighted network centroid in the mean-fourth sense, and (b) the network centroid inherits many properties of the centralized, stochastic gradient descent recursion, including the escape from strict saddle-points in time inversely proportional to the step-size and return of approximately second-order stationary points in a polynomial number of iterations.In the second part of the dissertation, we consider centralized learning problems over networked feature spaces. Rapidly growing capabilities to observe, collect and process ever increasing quantities of information, necessitate methods for identifying and exploiting structure in high-dimensional feature spaces. Networks, frequently referred to as graphs in this context, have emerged as a useful tool for modeling interrelations among different parts of a data set. We consider graph signals that evolve dynamically according to a heat diffusion process and are subject to persistent perturbations. The model is not limited to heat diffusion but can be applied to modeling other processes such as the evolution of interest over social networks and the movement of people in cities. We develop an online algorithm that is able to learn the underlying graph structure from observations of the signal evolution and derive expressions for its performance. The algorithm is adaptive in nature and able to respond to changes in the graph structure and the perturbation statistics. Furthermore, in order to incorporate prior structural knowledge to improve classification performance, we propose a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces. In this way, the algorithm is able to identify salient subspaces and their correlations, while simultaneously dampening the effect of irrelevant features

eScholarship - University of California

Tracking Performance of Online Stochastic Learners

Author: Rizk Elsa
Sayed Ali H.
Vlaski Stefan
Publication venue
Publication date: 04/04/2020
Field of study

The utilization of online stochastic algorithms is popular in large-scale learning settings due to their ability to compute updates on the fly, without the need to store and process data in large batches. When a constant step-size is used, these algorithms also have the ability to adapt to drifts in problem parameters, such as data or model properties, and track the optimal solution with reasonable accuracy. Building on analogies with the study of adaptive filters, we establish a link between steady-state performance derived under stationarity assumptions and the tracking performance of online learners under random walk models. The link allows us to infer the tracking performance from steady-state expressions directly and almost by inspection

arXiv.org e-Print Archive

Local Graph-homomorphic Processing for Privatized Distributed Systems

Author: Rizk Elsa
Sayed Ali H.
Vlaski Stefan
Publication venue
Publication date: 26/10/2022
Field of study

We study the generation of dependent random numbers in a distributed fashion in order to enable privatized distributed learning by networked agents. We propose a method that we refer to as local graph-homomorphic processing; it relies on the construction of particular noises over the edges to ensure a certain level of differential privacy. We show that the added noise does not affect the performance of the learned model. This is a significant improvement to previous works on differential privacy for distributed algorithms, where the noise was added in a less structured manner without respecting the graph topology and has often led to performance deterioration. We illustrate the theoretical results by considering a linear regression problem over a network of agents

arXiv.org e-Print Archive

Dif-MAML: Decentralized Multi-Agent Meta-Learning

Author: Kayaalp Mert
Sayed Ali H.
Vlaski Stefan
Publication venue
Publication date: 06/10/2020
Field of study

The objective of meta-learning is to exploit the knowledge obtained from observed tasks to improve adaptation to unseen tasks. As such, meta-learners are able to generalize better when they are trained with a larger number of observed tasks and with a larger amount of data per task. Given the amount of resources that are needed, it is generally difficult to expect the tasks, their respective data, and the necessary computational capacity to be available at a single central location. It is more natural to encounter situations where these resources are spread across several agents connected by some graph topology. The formalism of meta-learning is actually well-suited to this decentralized setting, where the learner would be able to benefit from information and computational power spread across the agents. Motivated by this observation, in this work, we propose a cooperative fully-decentralized multi-agent meta-learning algorithm, referred to as Diffusion-based MAML or Dif-MAML. Decentralized optimization algorithms are superior to centralized implementations in terms of scalability, avoidance of communication bottlenecks, and privacy guarantees. The work provides a detailed theoretical analysis to show that the proposed strategy allows a collection of agents to attain agreement at a linear rate and to converge to a stationary point of the aggregate MAML objective even in non-convex environments. Simulation results illustrate the theoretical findings and the superior performance relative to the traditional non-cooperative setting

arXiv.org e-Print Archive