3,807 research outputs found
Generating Long-term Trajectories Using Deep Hierarchical Networks
We study the problem of modeling spatiotemporal trajectories over long time
horizons using expert demonstrations. For instance, in sports, agents often
choose action sequences with long-term goals in mind, such as achieving a
certain strategic position. Conventional policy learning approaches, such as
those based on Markov decision processes, generally fail at learning cohesive
long-term behavior in such high-dimensional state spaces, and are only
effective when myopic modeling lead to the desired behavior. The key difficulty
is that conventional approaches are "shallow" models that only learn a single
state-action policy. We instead propose a hierarchical policy class that
automatically reasons about both long-term and short-term goals, which we
instantiate as a hierarchical neural network. We showcase our approach in a
case study on learning to imitate demonstrated basketball trajectories, and
show that it generates significantly more realistic trajectories compared to
non-hierarchical baselines as judged by professional sports analysts.Comment: Published in NIPS 201
Multi-resolution Tensor Learning for Large-Scale Spatial Data
High-dimensional tensor models are notoriously computationally expensive to
train. We present a meta-learning algorithm, MMT, that can significantly speed
up the process for spatial tensor models. MMT leverages the property that
spatial data can be viewed at multiple resolutions, which are related by
coarsening and finegraining from one resolution to another. Using this
property, MMT learns a tensor model by starting from a coarse resolution and
iteratively increasing the model complexity. In order to not "over-train" on
coarse resolution models, we investigate an information-theoretic fine-graining
criterion to decide when to transition into higher-resolution models. We
provide both theoretical and empirical evidence for the advantages of this
approach. When applied to two real-world large-scale spatial datasets for
basketball player and animal behavior modeling, our approach demonstrate 3 key
benefits: 1) it efficiently captures higher-order interactions (i.e., tensor
latent factors), 2) it is orders of magnitude faster than fixed resolution
learning and scales to very fine-grained spatial resolutions, and 3) it
reliably yields accurate and interpretable models
Recommended from our members
PTPN22 Silencing in the NOD Model Indicates the Type 1 Diabetes–Associated Allele Is Not a Loss-of-Function Variant
PTPN22 encodes the lymphoid tyrosine phosphatase (LYP) and is the second strongest non-HLA genetic risk factor for type 1 diabetes. The PTPN22 susceptibility allele generates an LYP variant with an arginine-to-tryptophan substitution at position 620 (R620W) that has been reported by several studies to impart a gain of function. However, a recent report investigating both human cells and a knockin mouse model containing the R620W homolog suggested that this variation causes faster protein degradation. Whether LYP R620W is a gain- or loss-of-function variant, therefore, remains controversial. To address this issue, we generated transgenic NOD mice (nonobese diabetic) in which Ptpn22 can be inducibly silenced by RNA interference. We found that Ptpn22 silencing in the NOD model replicated many of the phenotypes observed in C57BL/6 Ptpn22 knockout mice, including an increase in regulatory T cells. Notably, loss of Ptpn22 led to phenotypic changes in B cells opposite to those reported for the human susceptibility allele. Furthermore, Ptpn22 knockdown did not increase the risk of autoimmune diabetes but, rather, conferred protection from disease. Overall, to our knowledge, this is the first functional study of Ptpn22 within a model of type 1 diabetes, and the data do not support a loss of function for the PTPN22 disease variant
Long-term Forecasting using Tensor-Train RNNs
We present Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and high-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of Tensor-Train RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant long-term prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world climate and traffic data
MERMAIDE: Learning to Align Learners using Model-Based Meta-Learning
We study how a principal can efficiently and effectively intervene on the
rewards of a previously unseen learning agent in order to induce desirable
outcomes. This is relevant to many real-world settings like auctions or
taxation, where the principal may not know the learning behavior nor the
rewards of real people. Moreover, the principal should be few-shot adaptable
and minimize the number of interventions, because interventions are often
costly. We introduce MERMAIDE, a model-based meta-learning framework to train a
principal that can quickly adapt to out-of-distribution agents with different
learning strategies and reward functions. We validate this approach
step-by-step. First, in a Stackelberg setting with a best-response agent, we
show that meta-learning enables quick convergence to the theoretically known
Stackelberg equilibrium at test time, although noisy observations severely
increase the sample complexity. We then show that our model-based meta-learning
approach is cost-effective in intervening on bandit agents with unseen
explore-exploit strategies. Finally, we outperform baselines that use either
meta-learning or agent behavior modeling, in both -shot and -shot
settings with partial agent information
- …