53 research outputs found
Stochastic Learning under Random Reshuffling with Constant Step-sizes
In empirical risk optimization, it has been observed that stochastic gradient
implementations that rely on random reshuffling of the data achieve better
performance than implementations that rely on sampling the data uniformly.
Recent works have pursued justifications for this behavior by examining the
convergence rate of the learning process under diminishing step-sizes. This
work focuses on the constant step-size case and strongly convex loss function.
In this case, convergence is guaranteed to a small neighborhood of the
optimizer albeit at a linear rate. The analysis establishes analytically that
random reshuffling outperforms uniform sampling by showing explicitly that
iterates approach a smaller neighborhood of size around the
minimizer rather than . Furthermore, we derive an analytical expression
for the steady-state mean-square-error performance of the algorithm, which
helps clarify in greater detail the differences between sampling with and
without replacement. We also explain the periodic behavior that is observed in
random reshuffling implementations
Recommended from our members
Distributed Stochastic Optimization in Non-Differentiable and Non-Convex Environments
The first part of this dissertation considers distributed learning problems over networked agents. The general objective of distributed adaptation and learning is the solution of global, stochastic optimization problems through localized interactions and without information about the statistical properties of the data.Regularization is a useful technique to encourage or enforce structural properties on the resulting solution, such as sparsity or constraints. A substantial number of regularizers are inherently non-smooth, while many cost functions are differentiable. We propose distributed and adaptive strategies that are able to minimize aggregate sums of objectives. In doing so, we exploit the structure of the individual objectives as sums of differentiable costs and non-differentiable regularizers. The resulting algorithms are adaptive in nature and able to continuously track drifts in the problem; their recursions, however, are subject to persistent perturbations arising from the stochastic nature of the gradient approximations and from disagreement across agents in the network. The presence of non-smooth, and potentially unbounded, regularizers enriches the dynamics of these recursions. We quantify the impact of this interplay and draw implications for steady-state performance as well as algorithm design and present applications in distributed machine learning and image reconstruction.There has also been increasing interest in understanding the behavior of gradient-descent algorithms in non-convex environments. In this work, we consider stochastic cost functions, where exact gradients are replaced by stochastic approximations and the resulting gradient noise persistently seeps into the dynamics of the algorithm. We establish that the diffusion learning algorithm continues to yield meaningful estimates in these more challenging, non-convex environments, in the sense that (a) despite the distributed implementation, individual agents cluster in a small region around the weighted network centroid in the mean-fourth sense, and (b) the network centroid inherits many properties of the centralized, stochastic gradient descent recursion, including the escape from strict saddle-points in time inversely proportional to the step-size and return of approximately second-order stationary points in a polynomial number of iterations.In the second part of the dissertation, we consider centralized learning problems over networked feature spaces. Rapidly growing capabilities to observe, collect and process ever increasing quantities of information, necessitate methods for identifying and exploiting structure in high-dimensional feature spaces. Networks, frequently referred to as graphs in this context, have emerged as a useful tool for modeling interrelations among different parts of a data set. We consider graph signals that evolve dynamically according to a heat diffusion process and are subject to persistent perturbations. The model is not limited to heat diffusion but can be applied to modeling other processes such as the evolution of interest over social networks and the movement of people in cities. We develop an online algorithm that is able to learn the underlying graph structure from observations of the signal evolution and derive expressions for its performance. The algorithm is adaptive in nature and able to respond to changes in the graph structure and the perturbation statistics. Furthermore, in order to incorporate prior structural knowledge to improve classification performance, we propose a BRAIN strategy for learning, which enhances the performance of traditional algorithms, such as logistic regression and SVM learners, by incorporating a graphical layer that tracks and learns in real-time the underlying correlation structure among feature subspaces. In this way, the algorithm is able to identify salient subspaces and their correlations, while simultaneously dampening the effect of irrelevant features
Tracking Performance of Online Stochastic Learners
The utilization of online stochastic algorithms is popular in large-scale
learning settings due to their ability to compute updates on the fly, without
the need to store and process data in large batches. When a constant step-size
is used, these algorithms also have the ability to adapt to drifts in problem
parameters, such as data or model properties, and track the optimal solution
with reasonable accuracy. Building on analogies with the study of adaptive
filters, we establish a link between steady-state performance derived under
stationarity assumptions and the tracking performance of online learners under
random walk models. The link allows us to infer the tracking performance from
steady-state expressions directly and almost by inspection
Local Graph-homomorphic Processing for Privatized Distributed Systems
We study the generation of dependent random numbers in a distributed fashion
in order to enable privatized distributed learning by networked agents. We
propose a method that we refer to as local graph-homomorphic processing; it
relies on the construction of particular noises over the edges to ensure a
certain level of differential privacy. We show that the added noise does not
affect the performance of the learned model. This is a significant improvement
to previous works on differential privacy for distributed algorithms, where the
noise was added in a less structured manner without respecting the graph
topology and has often led to performance deterioration. We illustrate the
theoretical results by considering a linear regression problem over a network
of agents
Dif-MAML: Decentralized Multi-Agent Meta-Learning
The objective of meta-learning is to exploit the knowledge obtained from
observed tasks to improve adaptation to unseen tasks. As such, meta-learners
are able to generalize better when they are trained with a larger number of
observed tasks and with a larger amount of data per task. Given the amount of
resources that are needed, it is generally difficult to expect the tasks, their
respective data, and the necessary computational capacity to be available at a
single central location. It is more natural to encounter situations where these
resources are spread across several agents connected by some graph topology.
The formalism of meta-learning is actually well-suited to this decentralized
setting, where the learner would be able to benefit from information and
computational power spread across the agents. Motivated by this observation, in
this work, we propose a cooperative fully-decentralized multi-agent
meta-learning algorithm, referred to as Diffusion-based MAML or Dif-MAML.
Decentralized optimization algorithms are superior to centralized
implementations in terms of scalability, avoidance of communication
bottlenecks, and privacy guarantees. The work provides a detailed theoretical
analysis to show that the proposed strategy allows a collection of agents to
attain agreement at a linear rate and to converge to a stationary point of the
aggregate MAML objective even in non-convex environments. Simulation results
illustrate the theoretical findings and the superior performance relative to
the traditional non-cooperative setting
- …