13 research outputs found

    Is Learning in Games Good for the Learners?

    Full text link
    We consider a number of questions related to tradeoffs between reward and regret in repeated gameplay between two agents. To facilitate this, we introduce a notion of generalized equilibrium\textit{generalized equilibrium} which allows for asymmetric regret constraints, and yields polytopes of feasible values for each agent and pair of regret constraints, where we show that any such equilibrium is reachable by a pair of algorithms which maintain their regret guarantees against arbitrary opponents. As a central example, we highlight the case one agent is no-swap and the other's regret is unconstrained. We show that this captures an extension of Stackelberg\textit{Stackelberg} equilibria with a matching optimal value, and that there exists a wide class of games where a player can significantly increase their utility by deviating from a no-swap-regret algorithm against a no-swap learner (in fact, almost any game without pure Nash equilibria is of this form). Additionally, we make use of generalized equilibria to consider tradeoffs in terms of the opponent's algorithm choice. We give a tight characterization for the maximal reward obtainable against some\textit{some} no-regret learner, yet we also show a class of games in which this is bounded away from the value obtainable against the class of common "mean-based" no-regret algorithms. Finally, we consider the question of learning reward-optimal strategies via repeated play with a no-regret agent when the game is initially unknown. Again we show tradeoffs depending on the opponent's learning algorithm: the Stackelberg strategy is learnable in exponential time with any no-regret agent (and in polynomial time with any no-adaptive\textit{adaptive}-regret agent) for any game where it is learnable via queries, and there are games where it is learnable in polynomial time against any no-swap-regret agent but requires exponential time against a mean-based no-regret agent.Comment: 22 page

    PaLM 2 Technical Report

    Full text link
    We introduce PaLM 2, a new state-of-the-art language model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture of objectives. Through extensive evaluations on English and multilingual language, and reasoning tasks, we demonstrate that PaLM 2 has significantly improved quality on downstream tasks across different model sizes, while simultaneously exhibiting faster and more efficient inference compared to PaLM. This improved efficiency enables broader deployment while also allowing the model to respond faster, for a more natural pace of interaction. PaLM 2 demonstrates robust reasoning capabilities exemplified by large improvements over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable performance on a suite of responsible AI evaluations, and enables inference-time control over toxicity without additional overhead or impact on other capabilities. Overall, PaLM 2 achieves state-of-the-art performance across a diverse set of tasks and capabilities. When discussing the PaLM 2 family, it is important to distinguish between pre-trained models (of various sizes), fine-tuned variants of these models, and the user-facing products that use these models. In particular, user-facing products typically include additional pre- and post-processing steps. Additionally, the underlying models may evolve over time. Therefore, one should not expect the performance of user-facing products to exactly match the results reported in this report

    Low-dimensional Representations of Semantic Context in Language and the Brain

    No full text
    We study the problem of finding low-dimensional shared representations of meaning for natural language and brain response modalities for multiple-subject narrative story datasets (a portion of an episode of the Sherlock television program and a chapter of a Harry Potter book). These datasets have paired fMRI responses and textual descriptions. Our first goal is to determine if any fMRI space can be learned across subjects that correlates well with semantic context vectors derived from recent, unsupervised methods in natural language understanding for embedding word meaning in Rn. Can distributed, low-dimensional representations of narrative context predict voxels? Our second goal is to determine if a shared space between the fMRI voxels and the semantic word embeddings exists which can be purposed to decode brain states into coherent textual representations of thought. First, we were able to construct a fine-grained 300-dimensional embedding of the semantic context induced by a scene annotation dataset for Sherlock. Our primary positive result in this thesis is that the multi-view Shared Response Model produces a semantically relevant 20-dimensional space using views of multiple subjects watching Sherlock. This lowdimensional shared fMRI space is able to match fMRI responses to scenes with performance considerably above chance. Using the fMRI shared space over individual fMRI responses brings a large improvement in reconstructing voxels from semantic vectors, and suggests that other recent work in this area may benefit from applying the Shared Response Mode

    Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior

    No full text
    We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.</jats:p

    Learning to Plan with Logical Automata

    No full text
    This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies for a task to larger classes of tasks, and (2) how to account for erroneous demonstrations. Our LVIN model solves finite gridworld environments by instantiating a recurrent, convolutional neural network as a value iteration procedure over a learned Markov Decision Process (MDP) that factors into two MDPs: a small finite state automaton (FSA) corresponding to logical rules, and a larger MDP corresponding to motions in the environment. The parameters of LVIN (value function, reward map, FSA transitions, large MDP transitions) are approximately learned from expert trajectories. Since the model represents the learned rules as an FSA, the model is interpretable; since the FSA is integrated into planning, the behavior of the agent can be manipulated by modifying the FSA transitions. We demonstrate these abilities in several domains of interest, including a lunchboxpacking manipulation task and a driving domain.National Science Foundation (Grant 1723943)United States. Office of Naval Research (Grant N000141812830)Air Force Office of Scientific Research (Contract FA8702-15-D-0001
    corecore