6 research outputs found
Decomposed Mutual Information Optimization for Generalized Context in Meta-Reinforcement Learning
Adapting to the changes in transition dynamics is essential in robotic
applications. By learning a conditional policy with a compact context,
context-aware meta-reinforcement learning provides a flexible way to adjust
behavior according to dynamics changes. However, in real-world applications,
the agent may encounter complex dynamics changes. Multiple confounders can
influence the transition dynamics, making it challenging to infer accurate
context for decision-making. This paper addresses such a challenge by
Decomposed Mutual INformation Optimization (DOMINO) for context learning, which
explicitly learns a disentangled context to maximize the mutual information
between the context and historical trajectories, while minimizing the state
transition prediction error. Our theoretical analysis shows that DOMINO can
overcome the underestimation of the mutual information caused by
multi-confounded challenges via learning disentangled context and reduce the
demand for the number of samples collected in various environments. Extensive
experiments show that the context learned by DOMINO benefits both model-based
and model-free reinforcement learning algorithms for dynamics generalization in
terms of sample efficiency and performance in unseen environments.Comment: NeurIPS 202
Deep Clustering of Text Representations for Supervision-free Probing of Syntax
We explore deep clustering of text representations for unsupervised model
interpretation and induction of syntax. As these representations are
high-dimensional, out-of-the-box methods like KMeans do not work well. Thus,
our approach jointly transforms the representations into a lower-dimensional
cluster-friendly space and clusters them. We consider two notions of syntax:
Part of speech Induction (POSI) and constituency labelling (CoLab) in this
work. Interestingly, we find that Multilingual BERT (mBERT) contains surprising
amount of syntactic knowledge of English; possibly even as much as English BERT
(EBERT). Our model can be used as a supervision-free probe which is arguably a
less-biased way of probing. We find that unsupervised probes show benefits from
higher layers as compared to supervised probes. We further note that our
unsupervised probe utilizes EBERT and mBERT representations differently,
especially for POSI. We validate the efficacy of our probe by demonstrating its
capabilities as an unsupervised syntax induction technique. Our probe works
well for both syntactic formalisms by simply adapting the input
representations. We report competitive performance of our probe on 45-tag
English POSI, state-of-the-art performance on 12-tag POSI across 10 languages,
and competitive results on CoLab. We also perform zero-shot syntax induction on
resource impoverished languages and report strong results