9,633 research outputs found

    Doubly Optimal No-Regret Learning in Monotone Games

    Full text link
    We consider online learning in multi-player smooth monotone games. Existing algorithms have limitations such as (1) being only applicable to strongly monotone games; (2) lacking the no-regret guarantee; (3) having only asymptotic or slow O(1T)O(\frac{1}{\sqrt{T}}) last-iterate convergence rate to a Nash equilibrium. While the O(1T)O(\frac{1}{\sqrt{T}}) rate is tight for a large class of algorithms including the well-studied extragradient algorithm and optimistic gradient algorithm, it is not optimal for all gradient-based algorithms. We propose the accelerated optimistic gradient (AOG) algorithm, the first doubly optimal no-regret learning algorithm for smooth monotone games. Namely, our algorithm achieves both (i) the optimal O(T)O(\sqrt{T}) regret in the adversarial setting under smooth and convex loss functions and (ii) the optimal O(1T)O(\frac{1}{T}) last-iterate convergence rate to a Nash equilibrium in multi-player smooth monotone games. As a byproduct of the accelerated last-iterate convergence rate, we further show that each player suffers only an O(log⁑T)O(\log T) individual worst-case dynamic regret, providing an exponential improvement over the previous state-of-the-art O(T)O(\sqrt{T}) bound.Comment: Published at ICML 2023. V2 incorporates reviewers' feedbac

    Bandit Online Learning of Nash Equilibria in Monotone Games

    Full text link
    We address online bandit learning of Nash equilibria in multi-agent convex games. We propose an algorithm whereby each agent uses only obtained values of her cost function at each joint played action, lacking any information of the functional form of her cost or other agents' costs or strategies. In contrast to past work where convergent algorithms required strong monotonicity, we prove that the algorithm converges to a Nash equilibrium under mere monotonicity assumption. The proposed algorithm extends the applicability of bandit learning in several games including zero-sum convex games with possibly unbounded action spaces, mixed extension of finite-action zero-sum games, as well as convex games with linear coupling constraints.Comment: arXiv admin note: text overlap with arXiv:1904.0188

    Asymmetric Feedback Learning in Online Convex Games

    Full text link
    This paper considers online convex games involving multiple agents that aim to minimize their own cost functions using locally available feedback. A common assumption in the study of such games is that the agents are symmetric, meaning that they have access to the same type of information or feedback. Here we lift this assumption, which is often violated in practice, and instead consider asymmetric agents; specifically, we assume some agents have access to first-order gradient feedback and others have access to the zeroth-order oracles (cost function evaluations). We propose an asymmetric feedback learning algorithm that combines the agent feedback mechanisms. We analyze the regret and Nash equilibrium convergence of this algorithm for convex games and strongly monotone games, respectively. Specifically, we show that our algorithm always performs between pure first-order and zeroth-order methods, and can match the performance of these two extremes by adjusting the number of agents with access to zeroth-order oracles. Therefore, our algorithm incorporates the pure first-order and zeroth-order methods as special cases. We provide numerical experiments on an online market problem for both deterministic and risk-averse games to demonstrate the performance of the proposed algorithm.Comment: 16page

    Adaptive, Doubly Optimal No-Regret Learning in Strongly Monotone and Exp-Concave Games with Gradient Feedback

    Full text link
    Online gradient descent (OGD) is well known to be doubly optimal under strong convexity or monotonicity assumptions: (1) in the single-agent setting, it achieves an optimal regret of Θ(log⁑T)\Theta(\log T) for strongly convex cost functions; and (2) in the multi-agent setting of strongly monotone games, with each agent employing OGD, we obtain last-iterate convergence of the joint action to a unique Nash equilibrium at an optimal rate of Θ(1T)\Theta(\frac{1}{T}). While these finite-time guarantees highlight its merits, OGD has the drawback that it requires knowing the strong convexity/monotonicity parameters. In this paper, we design a fully adaptive OGD algorithm, \textsf{AdaOGD}, that does not require a priori knowledge of these parameters. In the single-agent setting, our algorithm achieves O(log⁑2(T))O(\log^2(T)) regret under strong convexity, which is optimal up to a log factor. Further, if each agent employs \textsf{AdaOGD} in strongly monotone games, the joint action converges in a last-iterate sense to a unique Nash equilibrium at a rate of O(log⁑3TT)O(\frac{\log^3 T}{T}), again optimal up to log factors. We illustrate our algorithms in a learning version of the classical newsvendor problem, where due to lost sales, only (noisy) gradient feedback can be observed. Our results immediately yield the first feasible and near-optimal algorithm for both the single-retailer and multi-retailer settings. We also extend our results to the more general setting of exp-concave cost functions and games, using the online Newton step (ONS) algorithm.Comment: Accepted by Operations Research; 47 page
    • …
    corecore