Search CORE

69 research outputs found

Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration

Author: Cohen Michael B.
Sidford Aaron
Tian Kevin
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 12th Innovations in Theoretical Computer Science Conference (ITCS 2021)
Publication date: 01/01/2021
Field of study

We show that standard extragradient methods (i.e. mirror prox [Arkadi Nemirovski, 2004] and dual extrapolation [Yurii Nesterov, 2007]) recover optimal accelerated rates for first-order minimization of smooth convex functions. To obtain this result we provide fine-grained characterization of the convergence rates of extragradient methods for solving monotone variational inequalities in terms of a natural condition we call relative Lipschitzness. We further generalize this framework to handle local and randomized notions of relative Lipschitzness and thereby recover rates for box-constrained ?_? regression based on area convexity [Jonah Sherman, 2017] and complexity bounds achieved by accelerated (randomized) coordinate descent [Zeyuan {Allen Zhu} et al., 2016; Yurii Nesterov and Sebastian U. Stich, 2017] for smooth convex function minimization

arXiv.org e-Print Archive

Dagstuhl Research Online Publication Server

Recommended from our members

Algorithms for First-order Sparse Reinforcement Learning

Author: Liu Bo
Publication venue: ScholarWorks@UMass Amherst
Publication date: 15/03/2016
Field of study

This thesis presents a general framework for first-order temporal difference learning algorithms with an in-depth theoretical analysis. The main contribution of the thesis is the development and design of a family of first-order regularized temporal-difference (TD) algorithms using stochastic approximation and stochastic optimization. To scale up TD algorithms to large-scale problems, we use first-order optimization to explore regularized TD methods using linear value function approximation. Previous regularized TD methods often use matrix inversion, which requires cubic time and quadratic memory complexity. We propose two algorithms, sparse-Q and RO-TD, for on-policy and off-policy learning, respectively. These two algorithms exhibit linear computational complexity per-step, and their asymptotic convergence guarantee and error bound analysis are given using stochastic optimization and stochastic approximation. The second major contribution of the thesis is the establishment of a unified general framework for stochastic-gradient-based temporal-difference learning algorithms that use proximal gradient methods. The primal-dual saddle-point formulation is introduced, and state-of-the-art stochastic gradient solvers, such as mirror descent and extragradient are used to design several novel RL algorithms. Theoretical analysis is given, including regularization, acceleration analysis and finite-sample analysis, along with detailed empirical experiments to demonstrate the effectiveness of the proposed algorithms

ScholarWorks@UMass Amherst

Balancing Act: Constraining Disparate Impact in Sparse Models

Author: Farnadi Golnoosh
Gallego-Posada Jose
Hashemizadeh Meraj
Lacoste-Julien Simon
Ramirez Juan
Sukumaran Rohan
Publication venue
Publication date: 31/10/2023
Field of study

Model pruning is a popular approach to enable the deployment of large deep learning models on edge devices with restricted computational or storage capacities. Although sparse models achieve performance comparable to that of their dense counterparts at the level of the entire dataset, they exhibit high accuracy drops for some data sub-groups. Existing methods to mitigate this disparate impact induced by pruning (i) rely on surrogate metrics that address the problem indirectly and have limited interpretability; or (ii) scale poorly with the number of protected sub-groups in terms of computational cost. We propose a constrained optimization approach that

\textit{directly addresses the disparate impact of pruning}

: our formulation bounds the accuracy change between the dense and sparse models, for each sub-group. This choice of constraints provides an interpretable success criterion to determine if a pruned model achieves acceptable disparity levels. Experimental results demonstrate that our technique scales reliably to problems involving large models and hundreds of protected sub-groups.Comment: Code available at https://github.com/merajhashemi/Balancing_Ac

arXiv.org e-Print Archive