141 research outputs found
Large-scale video analysis and understanding
University of Technology Sydney. Faculty of Engineering and Information Technology.Video understanding is a complex task in computer vision, which requires not only recognizing objects, persons, and scenes, but also capturing and remembering the changes of visual content along time. Rapid development in building blocks like image classification task in recent years provides great opportunities for accurate and efficient video understanding. Based on deep convolutional neural networks and recurrent neural networks, various kinds of deep learning applications on video understanding have been studied. In this thesis, I present my research on large-scale video analysis and understanding in three major aspects: video representation learning, recognition with limited examples, and vision & language. Representation and features are the most important part for vision tasks, since it is very general and can be used for classification task, detection task and also tasks for structural prediction like vision and language. We begin with video classification from multimodal features, which are hand-crafted features from different streams, i.e. vision and audio. For representation learning, we investigate aggregation methods to generate video representation from frame features. Significant improvements over classical pooling methods have been demonstrated. In addition, we propose a hierarchical recurrent neural network to learn the hierarchical structure for video. Going beyond supervised learning, we develop a sequence model to learn from reconstruction of future and past features based on the current sequences, showing that unlabeled videos can help learning good and generalizable video representation. We explore the problem of recognition with limited examples, which tries to tackle the situation that we cannot obtain enough data to train the model. The encouraging results show that it is feasible to obtain good performance with only a few examples for the target class. Except for the video classification task which only outputs labels for the video, we also seek for richer interaction between machine and human on vision content via natural language. We consider two major forms of vision and language tasks, the first is video captioning, i.e., to automatically generate caption to describe the given video sequence, and video question answering, i.e., to answer questions related to the presented video sequence. Finally, I conclude the thesis with some future directions on video understanding
Reinforcement Learning from Diverse Human Preferences
The complexity of designing reward functions has been a major obstacle to the
wide application of deep reinforcement learning (RL) techniques. Describing an
agent's desired behaviors and properties can be difficult, even for experts. A
new paradigm called reinforcement learning from human preferences (or
preference-based RL) has emerged as a promising solution, in which reward
functions are learned from human preference labels among behavior trajectories.
However, existing methods for preference-based RL are limited by the need for
accurate oracle preference labels. This paper addresses this limitation by
developing a method for crowd-sourcing preference labels and learning from
diverse human preferences. The key idea is to stabilize reward learning through
regularization and correction in a latent space. To ensure temporal
consistency, a strong constraint is imposed on the reward model that forces its
latent space to be close to the prior distribution. Additionally, a
confidence-based reward model ensembling method is designed to generate more
stable and reliable predictions. The proposed method is tested on a variety of
tasks in DMcontrol and Meta-world and has shown consistent and significant
improvements over existing preference-based RL algorithms when learning from
diverse feedback, paving the way for real-world applications of RL methods.Comment: Published as a conference paper in IJCAI 202
Learning to Optimize for Reinforcement Learning
In recent years, by leveraging more data, computation, and diverse tasks,
learned optimizers have achieved remarkable success in supervised learning,
outperforming classical hand-designed optimizers. Reinforcement learning (RL)
is essentially different from supervised learning, and in practice, these
learned optimizers do not work well even in simple RL tasks. We investigate
this phenomenon and identify two issues. First, the agent-gradient distribution
is non-independent and identically distributed, leading to inefficient
meta-training. Moreover, due to highly stochastic agent-environment
interactions, the agent-gradients have high bias and variance, which increases
the difficulty of learning an optimizer for RL. We propose pipeline training
and a novel optimizer structure with a good inductive bias to address these
issues, making it possible to learn an optimizer for reinforcement learning
from scratch. We show that, although only trained in toy tasks, our learned
optimizer can generalize to unseen complex tasks in Brax.Comment: Published at RLC 2024. For code release, see
https://github.com/sail-sg/optim4r
Ti-MAE: Self-Supervised Masked Time Series Autoencoders
Multivariate Time Series forecasting has been an increasingly popular topic
in various applications and scenarios. Recently, contrastive learning and
Transformer-based models have achieved good performance in many long-term
series forecasting tasks. However, there are still several issues in existing
methods. First, the training paradigm of contrastive learning and downstream
prediction tasks are inconsistent, leading to inaccurate prediction results.
Second, existing Transformer-based models which resort to similar patterns in
historical time series data for predicting future values generally induce
severe distribution shift problems, and do not fully leverage the sequence
information compared to self-supervised methods. To address these issues, we
propose a novel framework named Ti-MAE, in which the input time series are
assumed to follow an integrate distribution. In detail, Ti-MAE randomly masks
out embedded time series data and learns an autoencoder to reconstruct them at
the point-level. Ti-MAE adopts mask modeling (rather than contrastive learning)
as the auxiliary task and bridges the connection between existing
representation learning and generative Transformer-based methods, reducing the
difference between upstream and downstream forecasting tasks while maintaining
the utilization of original time series data. Experiments on several public
real-world datasets demonstrate that our framework of masked autoencoding could
learn strong representations directly from the raw data, yielding better
performance in time series forecasting and classification tasks.Comment: 20 pages, 7 figure
Mutual Information Regularized Offline Reinforcement Learning
The major challenge of offline RL is the distribution shift that appears when
out-of-distribution actions are queried, which makes the policy improvement
direction biased by extrapolation errors. Most existing methods address this
problem by penalizing the policy or value for deviating from the behavior
policy during policy improvement or evaluation. In this work, we propose a
novel MISA framework to approach offline RL from the perspective of Mutual
Information between States and Actions in the dataset by directly constraining
the policy improvement direction. MISA constructs lower bounds of mutual
information parameterized by the policy and Q-values. We show that optimizing
this lower bound is equivalent to maximizing the likelihood of a one-step
improved policy on the offline dataset. Hence, we constrain the policy
improvement direction to lie in the data manifold. The resulting algorithm
simultaneously augments the policy evaluation and improvement by adding mutual
information regularizations. MISA is a general framework that unifies
conservative Q-learning (CQL) and behavior regularization methods (e.g.,
TD3+BC) as special cases. We introduce 3 different variants of MISA, and
empirically demonstrate that tighter mutual information lower bound gives
better offline RL performance. In addition, our extensive experiments show MISA
significantly outperforms a wide range of baselines on various tasks of the
D4RL benchmark,e.g., achieving 742.9 total points on gym-locomotion tasks. Our
code is available at https://github.com/sail-sg/MISA.Comment: NeurIPS 202
Efficient Offline Policy Optimization with a Learned Model
MuZero Unplugged presents a promising approach for offline policy learning
from logged data. It conducts Monte-Carlo Tree Search (MCTS) with a learned
model and leverages Reanalyze algorithm to learn purely from offline data. For
good performance, MCTS requires accurate learned models and a large number of
simulations, thus costing huge computing time. This paper investigates a few
hypotheses where MuZero Unplugged may not work well under the offline RL
settings, including 1) learning with limited data coverage; 2) learning from
offline data of stochastic environments; 3) improperly parameterized models
given the offline data; 4) with a low compute budget. We propose to use a
regularized one-step look-ahead approach to tackle the above issues. Instead of
planning with the expensive MCTS, we use the learned model to construct an
advantage estimation based on a one-step rollout. Policy improvements are
towards the direction that maximizes the estimated advantage with
regularization of the dataset. We conduct extensive empirical studies with
BSuite environments to verify the hypotheses and then run our algorithm on the
RL Unplugged Atari benchmark. Experimental results show that our proposed
approach achieves stable performance even with an inaccurate learned model. On
the large-scale Atari benchmark, the proposed method outperforms MuZero
Unplugged by 43%. Most significantly, it uses only 5.6% wall-clock time (i.e.,
1 hour) compared to MuZero Unplugged (i.e., 17.8 hours) to achieve a 150% IQM
normalized score with the same hardware and software stacks. Our implementation
is open-sourced at https://github.com/sail-sg/rosmo.Comment: ICLR202
- …