Search CORE

532 research outputs found

Distributed Learning for Stochastic Generalized Nash Equilibrium Problems

Author: Sayed Ali H.
van der Schaar Mihaela
Yu Chung-Kai
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 06/04/2017
Field of study

This work examines a stochastic formulation of the generalized Nash equilibrium problem (GNEP) where agents are subject to randomness in the environment of unknown statistical distribution. We focus on fully-distributed online learning by agents and employ penalized individual cost functions to deal with coupled constraints. Three stochastic gradient strategies are developed with constant step-sizes. We allow the agents to use heterogeneous step-sizes and show that the penalty solution is able to approach the Nash equilibrium in a stable manner within

O(\mu_\text{max})

, for small step-size value

\mu_\text{max}

and sufficiently large penalty parameters. The operation of the algorithm is illustrated by considering the network Cournot competition problem

arXiv.org e-Print Archive

Infoscience - École polytechnique fédérale de Lausanne

Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards

Author: Cord Matthieu
Couairon Guillaume
Dancette Corentin
Gaya Jean-Baptiste
Ramé Alexandre
Shukor Mustafa
Soulier Laure
Publication venue
Publication date: 16/10/2023
Field of study

Foundation models are first pre-trained on vast unsupervised datasets and then fine-tuned on labeled data. Reinforcement learning, notably from human feedback (RLHF), can further align the network with the intended usage. Yet the imperfections in the proxy reward may hinder the training and lead to suboptimal results; the diversity of objectives in real-world tasks and human opinions exacerbate the issue. This paper proposes embracing the heterogeneity of diverse rewards by following a multi-policy strategy. Rather than focusing on a single a priori reward, we aim for Pareto-optimal generalization across the entire space of preferences. To this end, we propose rewarded soup, first specializing multiple networks independently (one for each proxy reward) and then interpolating their weights linearly. This succeeds empirically because we show that the weights remain linearly connected when fine-tuned on diverse rewards from a shared pre-trained initialization. We demonstrate the effectiveness of our approach for text-to-text (summarization, Q&A, helpful assistant, review), text-image (image captioning, text-to-image generation, visual grounding, VQA), and control (locomotion) tasks. We hope to enhance the alignment of deep models, and how they interact with the world in all its diversity

arXiv.org e-Print Archive

Knowledge Distillation Performs Partial Variance Reduction

Author: Alistarh Dan
Peste Alexandra
Safaryan Mher
Publication venue
Publication date: 27/05/2023
Field of study

Knowledge distillation is a popular approach for enhancing the performance of ``student'' models, with lower representational capacity, by taking advantage of more powerful ``teacher'' models. Despite its apparent simplicity and widespread use, the underlying mechanics behind knowledge distillation (KD) are still not fully understood. In this work, we shed new light on the inner workings of this method, by examining it from an optimization perspective. We show that, in the context of linear and deep linear models, KD can be interpreted as a novel type of stochastic variance reduction mechanism. We provide a detailed convergence analysis of the resulting dynamics, which hold under standard assumptions for both strongly-convex and non-convex losses, showing that KD acts as a form of \emph{partial variance reduction}, which can reduce the stochastic gradient noise, but may not eliminate it completely, depending on the properties of the ``teacher'' model. Our analysis puts further emphasis on the need for careful parametrization of KD, in particular w.r.t. the weighting of the distillation loss, and is validated empirically on both linear models and deep neural networks.Comment: 36 page

arXiv.org e-Print Archive