13 research outputs found
Is Learning in Games Good for the Learners?
We consider a number of questions related to tradeoffs between reward and
regret in repeated gameplay between two agents. To facilitate this, we
introduce a notion of which allows for
asymmetric regret constraints, and yields polytopes of feasible values for each
agent and pair of regret constraints, where we show that any such equilibrium
is reachable by a pair of algorithms which maintain their regret guarantees
against arbitrary opponents. As a central example, we highlight the case one
agent is no-swap and the other's regret is unconstrained. We show that this
captures an extension of equilibria with a matching
optimal value, and that there exists a wide class of games where a player can
significantly increase their utility by deviating from a no-swap-regret
algorithm against a no-swap learner (in fact, almost any game without pure Nash
equilibria is of this form). Additionally, we make use of generalized
equilibria to consider tradeoffs in terms of the opponent's algorithm choice.
We give a tight characterization for the maximal reward obtainable against
no-regret learner, yet we also show a class of games in which
this is bounded away from the value obtainable against the class of common
"mean-based" no-regret algorithms. Finally, we consider the question of
learning reward-optimal strategies via repeated play with a no-regret agent
when the game is initially unknown. Again we show tradeoffs depending on the
opponent's learning algorithm: the Stackelberg strategy is learnable in
exponential time with any no-regret agent (and in polynomial time with any
no--regret agent) for any game where it is learnable via
queries, and there are games where it is learnable in polynomial time against
any no-swap-regret agent but requires exponential time against a mean-based
no-regret agent.Comment: 22 page
PaLM 2 Technical Report
We introduce PaLM 2, a new state-of-the-art language model that has better
multilingual and reasoning capabilities and is more compute-efficient than its
predecessor PaLM. PaLM 2 is a Transformer-based model trained using a mixture
of objectives. Through extensive evaluations on English and multilingual
language, and reasoning tasks, we demonstrate that PaLM 2 has significantly
improved quality on downstream tasks across different model sizes, while
simultaneously exhibiting faster and more efficient inference compared to PaLM.
This improved efficiency enables broader deployment while also allowing the
model to respond faster, for a more natural pace of interaction. PaLM 2
demonstrates robust reasoning capabilities exemplified by large improvements
over PaLM on BIG-Bench and other reasoning tasks. PaLM 2 exhibits stable
performance on a suite of responsible AI evaluations, and enables
inference-time control over toxicity without additional overhead or impact on
other capabilities. Overall, PaLM 2 achieves state-of-the-art performance
across a diverse set of tasks and capabilities.
When discussing the PaLM 2 family, it is important to distinguish between
pre-trained models (of various sizes), fine-tuned variants of these models, and
the user-facing products that use these models. In particular, user-facing
products typically include additional pre- and post-processing steps.
Additionally, the underlying models may evolve over time. Therefore, one should
not expect the performance of user-facing products to exactly match the results
reported in this report
Recommended from our members
Resource-Efficient Methods in Machine Learning
In this thesis, we consider resource limitations on machine learning algorithms in a variety of settings. In the first two chapters, we study how to learn nonlinear model classes (monomials and neural nets) which are structured in various ways -- we consider sparse monomials and deep neural nets whose weight-matrices are low-rank respectively. These kinds of restrictions on the model class lead to gains in resource efficiency -- sparse and low-rank models are computationally easier to deploy and train.
We prove that sparse nonlinear monomials are easier to learn (smaller sample complexity) while still remaining computationally efficient to both estimate and deploy, and we give both theoretical and empirical evidence for the benefit of novel nonlinear initialization schemes for low-rank deep networks. In both cases, we showcase a blessing of nonlinearity -- sparse monomials are in some sense easier to learn compared to a linear class, and the prior state-of-the-art linear low-rank initialization methods for deep networks are inferior to our proposed nonlinear method for initialization. To achieve our theoretical results, we often make use of the theory of Hermite polynomials -- an orthogonal function basis over the Gaussian measure.
In the last chapter, we consider resource limitations in an online streaming setting. In particular, we consider how many data points from an oblivious adversarial stream we must store from one pass over the stream to output an additive approximation to the Support Vector Machine (SVM) objective, and prove stronger lower bounds on the memory complexity
Low-dimensional Representations of Semantic Context in Language and the Brain
We study the problem of finding low-dimensional shared representations of meaning for natural language and brain response modalities for multiple-subject narrative story datasets (a portion of an episode of the Sherlock television program and a chapter of a Harry Potter book). These datasets have paired fMRI responses and textual descriptions. Our first goal is to determine if any fMRI space can be learned across subjects that correlates well with semantic context vectors derived from recent, unsupervised methods in natural language understanding for embedding word meaning in Rn. Can distributed, low-dimensional representations of narrative context predict voxels? Our second goal is to determine if a shared space between the fMRI voxels and the semantic word embeddings exists which can be purposed to decode brain states into coherent textual representations of thought.
First, we were able to construct a fine-grained 300-dimensional embedding of the semantic context induced by a scene annotation dataset for Sherlock. Our primary positive result in this thesis is that the multi-view Shared Response Model produces a semantically relevant 20-dimensional space using views of multiple subjects watching Sherlock. This lowdimensional shared fMRI space is able to match fMRI responses to scenes with performance considerably above chance. Using the fMRI shared space over individual fMRI responses brings a large improvement in reconstructing voxels from semantic vectors, and suggests that other recent work in this area may benefit from applying the Shared Response Mode
Recommended from our members
A Compressed Sensing View of Unsupervised Text Embeddings, Bag-of-n-Grams, and LSTMs
Low-dimensional vector embeddings, computed using LSTMs or simpler techniques, are a popular approach for capturing the “meaning” of text and a form of unsupervised learning useful for downstream tasks. However, their power is not theoretically understood. The current paper derives formal understanding by looking at the subcase of linear embedding schemes. Using the theory of compressed sensing we show that representations combining the constituent word vectors are essentially information-preserving linear measurements of Bag-of-n-Grams (BonG) representations of text. This leads to a new theoretical result about LSTMs: low-dimensional embeddings derived from a low-memory LSTM are provably at least as powerful on classification tasks, up to small error, as a linear classifier over BonG vectors, a result that extensive empirical work has thus far been unable to show. Our experiments support these theoretical findings and establish strong, simple, and unsupervised baselines on standard benchmarks that in some cases are state of the art among word-level methods. We also show a surprising new property of embeddings such as GloVe and word2vec: they form a good sensing matrix for text that is more efficient than random matrices, the standard sparse recovery tool, which may explain why they lead to better representations in practice
Deep Bayesian Nonparametric Learning of Rules and Plans from Demonstrations with a Learned Automaton Prior
We introduce a method to learn imitative policies from expert demonstrations that are interpretable and manipulable. We achieve interpretability by modeling the interactions between high-level actions as an automaton with connections to formal logic. We achieve manipulability by integrating this automaton into planning, so that changes to the automaton have predictable effects on the learned behavior. These qualities allow a human user to first understand what the model has learned, and then either correct the learned behavior or zero-shot generalize to new, similar tasks. We build upon previous work by no longer requiring additional supervised information which is hard to collect in practice. We achieve this by using a deep Bayesian nonparametric hierarchical model. We test our model on several domains and also show results for a real-world implementation on a mobile robotic arm platform.</jats:p
Learning to Plan with Logical Automata
This paper introduces the Logic-based Value Iteration Network (LVIN) framework, which combines imitation
learning and logical automata to enable agents to learn complex behaviors from demonstrations. We address two problems with learning from expert knowledge: (1) how to generalize learned policies for a task to larger classes of tasks, and (2) how to account for erroneous demonstrations. Our LVIN model solves finite gridworld environments by instantiating a recurrent, convolutional neural network as a value iteration procedure over a learned Markov Decision Process (MDP) that factors into two MDPs: a small finite state automaton (FSA) corresponding to
logical rules, and a larger MDP corresponding to motions in the environment. The parameters of LVIN (value function, reward map, FSA transitions, large MDP transitions) are approximately learned from expert trajectories. Since the model represents the learned rules as an FSA, the model is interpretable; since the FSA is integrated into planning, the behavior of the agent can be manipulated by modifying the FSA transitions. We demonstrate
these abilities in several domains of interest, including a lunchboxpacking manipulation task and a driving domain.National Science Foundation (Grant 1723943)United States. Office of Naval Research (Grant N000141812830)Air Force Office of Scientific Research (Contract FA8702-15-D-0001