Search CORE

13 research outputs found

Is Learning in Games Good for the Learners?

Author: Brown William
Schneider Jon
Vodrahalli Kiran
Publication venue
Publication date: 16/12/2023
Field of study

We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of

\textit{generalized equilibrium}

which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of

\textit{Stackelberg}

equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against

\textit{some}

no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-

\textit{adaptive}

-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.Comment: 22 page

arXiv.org e-Print Archive

PaLM 2 Technical Report

Author: Abrego Gustavo Hernandez
Ahn Junwhan
Anil Rohan
Austin Jacob
Bailey Paige
Barham Paul
Botha Jan
Bradbury James
Brahma Siddhartha
Brooks Kevin
Catasta Michele
Chen Zhifeng
Cheng Yong
Cherry Colin
Choquette-Choo Christopher A.
Chowdhery Aakanksha
Chu Eric
Clark Jonathan H.
Crepy Clément
Dai Andrew M.
Dave Shachi
Dehghani Mostafa
Dev Sunipa
Devlin Jacob
Du Nan
Dyer Ethan
Díaz Mark
Feinberg Vlad
Feng Fangxiaoyu
Fienber Vlad
Firat Orhan
Freitag Markus
Garcia Xavier
Gehrmann Sebastian
Gonzalez Lucas
Gur-Ari Guy
Hand Steven
Hashemi Hadi
Hou Le
Howland Joshua
Hu Andrea
Huang Yanping
Hui Jeffrey
Hurwitz Jeremy
Isard Michael
Ittycheriah Abe
Jagielski Matthew
Jia Wenhao
Johnson Melvin
Kenealy Kathleen
Krikun Maxim
Kudugunta Sneha
Lan Chang
Lee Benjamin
Lee Katherine
Lepikhin Dmitry
Li Eric
Li Jian
Li Music
Li Wei
Li YaGuang
Lim Hyeontaek
Lin Hanzhao
Liu Frederick
Liu Zhongtao
Maggioni Marcello
Mahendru Aroma
Maynez Joshua
Meier-Hellstern Kathy
Mishra Gaurav
Misra Vedant
Moreira Erica
Moussalem Maysam
Nado Zachary
Nham John
Ni Eric
Nystrom Andrew
Omernick Mark
Parrish Alicia
Passos Alexandre
Pellat Marie
Petrov Slav
Polacek Martin
Polozov Alex
Pope Reiner
Qiao Siyuan
Reif Emily
Richter Bryan
Riley Parker
Robinson Kevin
Ros Alex Castro
Roy Aurko
Ruder Sebastian
Saeta Brennan
Samuel Rajkumar
Shafey Laurent El
Shakeri Siamak
Shelby Renee
Slone Ambrose
Smilkov Daniel
So David R.
Sohn Daniel
Taropa Emanuel
Tay Yi
Tokumine Simon
Valter Dasha
Vasudevan Vijay
Vodrahalli Kiran
Wang Pidong
Wang Tao
Wang Xuezhi
Wang Zirui
Wieting John
Wu Yonghui
Wu Yuhuai
Xiao Kefan
Xu Kelvin
Xu Yuanzhong
Xu Yunhan
Xue Linting
Yin Pengcheng
Yu Jiahui
Zhang Qiao
Zhang Yujing
Zheng Ce
Zheng Steven
Zhou Denny
Zhou Weikang
Publication venue
Publication date: 13/09/2023
Field of study

We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report

arXiv.org e-Print Archive

Recommended from our members

Resource-Efficient Methods in Machine Learning

Author: Vodrahalli Kiran Nagesh
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2022
Field of study

In this thesis, we consider resource limitations on machine learning algorithms in a variety of settings. In the first two chapters, we study how to learn nonlinear model classes (monomials and neural nets) which are structured in various ways -- we consider sparse monomials and deep neural nets whose weight-matrices are low-rank respectively. These kinds of restrictions on the model class lead to gains in resource efficiency -- sparse and low-rank models are computationally easier to deploy and train. We prove that sparse nonlinear monomials are easier to learn (smaller sample complexity) while still remaining computationally efficient to both estimate and deploy, and we give both theoretical and empirical evidence for the benefit of novel nonlinear initialization schemes for low-rank deep networks. In both cases, we showcase a blessing of nonlinearity -- sparse monomials are in some sense easier to learn compared to a linear class, and the prior state-of-the-art linear low-rank initialization methods for deep networks are inferior to our proposed nonlinear method for initialization. To achieve our theoretical results, we often make use of the theory of Hermite polynomials -- an orthogonal function basis over the Gaussian measure. In the last chapter, we consider resource limitations in an online streaming setting. In particular, we consider how many data points from an oblivious adversarial stream we must store from one pass over the stream to output an additive approximation to the Support Vector Machine (SVM) objective, and prove stronger lower bounds on the memory complexity

Columbia University Academic Commons

Low-dimensional Representations of Semantic Context in Language and the Brain

Author: Vodrahalli Kiran N
Publication venue
Publication date: 11/07/2016
Field of study

We study the problem of finding low-dimensional shared representations of meaning for natural language and brain response modalities for multiple-subject narrative story datasets (a portion of an episode of the Sherlock television program and a chapter of a Harry Potter book). These datasets have paired fMRI responses and textual descriptions. Our first goal is to determine if any fMRI space can be learned across subjects that correlates well with semantic context vectors derived from recent, unsupervised methods in natural language understanding for embedding word meaning in Rn. Can distributed, low-dimensional representations of narrative context predict voxels? Our second goal is to determine if a shared space between the fMRI voxels and the semantic word embeddings exists which can be purposed to decode brain states into coherent textual representations of thought. First, we were able to construct a fine-grained 300-dimensional embedding of the semantic context induced by a scene annotation dataset for Sherlock. Our primary positive result in this thesis is that the multi-view Shared Response Model produces a semantically relevant 20-dimensional space using views of multiple subjects watching Sherlock. This lowdimensional shared fMRI space is able to match fMRI responses to scenes with performance considerably above chance. Using the fMRI shared space over individual fMRI responses brings a large improvement in reconstructing voxels from semantic vectors, and suggests that other recent work in this area may benefit from applying the Shared Response Mode

Dataspace

Recommended from our members

A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs

Author: Arora Sanjeev
Khodak Mikhail
Saunshi Nikunj
Vodrahalli Kiran
Publication venue
Publication date: 01/01/2018
Field of study

Low-dimensional vector embeddings, computed using LSTMs or simpler techniques, are a popular approach for capturing the “meaning” of text and a form of unsupervised learning useful for downstream tasks. However, their power is not theoretically understood. The current paper derives formal understanding by looking at the subcase of linear embedding schemes. Using the theory of compressed sensing we show that representations combining the constituent word vectors are essentially information-preserving linear measurements of Bag-of-n-Grams (BonG) representations of text. This leads to a new theoretical result about LSTMs: low-dimensional embeddings derived from a low-memory LSTM are provably at least as powerful on classification tasks, up to small error, as a linear classifier over BonG vectors, a result that extensive empirical work has thus far been unable to show. Our experiments support these theoretical findings and establish strong, simple, and unsupervised baselines on standard benchmarks that in some cases are state of the art among word-level methods. We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice

Princeton University Open Access Repository

Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior

Author: Araki Brandon
Donahue Mark
Leech Thomas
Rus Daniela
Vasile Cristian-Ioan
Vodrahalli Kiran
Publication venue: 'Association for the Advancement of Artificial Intelligence (AAAI)'
Publication date: 03/04/2020
Field of study

We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.</jats:p

DSpace@MIT

Association for the Advancement of Artificial Intelligence: AAAI Publications

Learning to Plan with Logical Automata

Author: Araki Brandon
Donahue Mark D.
Leech Thomas
Rus Daniela L
Vasile Cristian-Ioan
Vodrahalli Kiran
Publication venue: 'Robotics: Science and Systems Foundation'
Publication date: 01/06/2019
Field of study

This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies for a task to larger classes of tasks, and (2) how to account for erroneous demonstrations. Our LVIN model solves finite gridworld environments by instantiating a recurrent, convolutional neural network as a value iteration procedure over a learned Markov Decision Process (MDP) that factors into two MDPs: a small finite state automaton (FSA) corresponding to logical rules, and a larger MDP corresponding to motions in the environment. The parameters of LVIN (value function, reward map, FSA transitions, large MDP transitions) are approximately learned from expert trajectories. Since the model represents the learned rules as an FSA, the model is interpretable; since the FSA is integrated into planning, the behavior of the agent can be manipulated by modifying the FSA transitions. We demonstrate these abilities in several domains of interest, including a lunchboxpacking manipulation task and a driving domain.National Science Foundation (Grant 1723943)United States. Office of Naval Research (Grant N000141812830)Air Force Office of Scientific Research (Contract FA8702-15-D-0001

DSpace@MIT

Crossref