69 research outputs found
Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration
We show that standard extragradient methods (i.e. mirror prox [Arkadi Nemirovski, 2004] and dual extrapolation [Yurii Nesterov, 2007]) recover optimal accelerated rates for first-order minimization of smooth convex functions. To obtain this result we provide fine-grained characterization of the convergence rates of extragradient methods for solving monotone variational inequalities in terms of a natural condition we call relative Lipschitzness. We further generalize this framework to handle local and randomized notions of relative Lipschitzness and thereby recover rates for box-constrained ?_? regression based on area convexity [Jonah Sherman, 2017] and complexity bounds achieved by accelerated (randomized) coordinate descent [Zeyuan {Allen Zhu} et al., 2016; Yurii Nesterov and Sebastian U. Stich, 2017] for smooth convex function minimization
Recommended from our members
Algorithms for First-order Sparse Reinforcement Learning
This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational complexity per-step, and their asymptotic convergence guarantee and error bound analysis are given using stochastic optimization and stochastic approximation. The second major contribution of the thesis is the establishment of a unified general framework for stochastic-gradient-based temporal-difference learning algorithms that use proximal gradient methods. The primal-dual saddle-point formulation is introduced, and state-of-the-art stochastic gradient solvers, such as mirror descent and extragradient are used to design several novel RL algorithms. Theoretical analysis is given, including regularization, acceleration analysis and finite-sample analysis, along with detailed empirical experiments to demonstrate the effectiveness of the proposed algorithms
Balancing Act: Constraining Disparate Impact in Sparse Models
Model pruning is a popular approach to enable the deployment of large deep
learning models on edge devices with restricted computational or storage
capacities. Although sparse models achieve performance comparable to that of
their dense counterparts at the level of the entire dataset, they exhibit high
accuracy drops for some data sub-groups. Existing methods to mitigate this
disparate impact induced by pruning (i) rely on surrogate metrics that address
the problem indirectly and have limited interpretability; or (ii) scale poorly
with the number of protected sub-groups in terms of computational cost. We
propose a constrained optimization approach that : our formulation bounds the accuracy change
between the dense and sparse models, for each sub-group. This choice of
constraints provides an interpretable success criterion to determine if a
pruned model achieves acceptable disparity levels. Experimental results
demonstrate that our technique scales reliably to problems involving large
models and hundreds of protected sub-groups.Comment: Code available at https://github.com/merajhashemi/Balancing_Ac
- …