1,465 research outputs found
Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees
Hybrid RL is the setting where an RL agent has access to both offline data
and online data by interacting with the real-world environment. In this work,
we propose a new hybrid RL algorithm that combines an on-policy actor-critic
method with offline data. On-policy methods such as policy gradient and natural
policy gradient (NPG) have shown to be more robust to model misspecification,
though sometimes it may not be as sample efficient as methods that rely on
off-policy learning. On the other hand, offline methods that depend on
off-policy training often require strong assumptions in theory and are less
stable to train in practice. Our new approach integrates a procedure of
off-policy training on the offline data into an on-policy NPG framework. We
show that our approach, in theory, can obtain a best-of-both-worlds type of
result -- it achieves the state-of-art theoretical guarantees of offline RL
when offline RL-specific assumptions hold, while at the same time maintaining
the theoretical guarantees of on-policy NPG regardless of the offline RL
assumptions' validity. Experimentally, in challenging rich-observation
environments, we show that our approach outperforms a state-of-the-art hybrid
RL baseline which only relies on off-policy policy optimization, demonstrating
the empirical benefit of combining on-policy and off-policy learning. Our code
is publicly available at https://github.com/YifeiZhou02/HNPG.Comment: The first two authors contributed equall
A Graph Reasoning Network for Multi-turn Response Selection via Customized Pre-training
We investigate response selection for multi-turn conversation in
retrieval-based chatbots. Existing studies pay more attention to the matching
between utterances and responses by calculating the matching score based on
learned features, leading to insufficient model reasoning ability. In this
paper, we propose a graph-reasoning network (GRN) to address the problem. GRN
first conducts pre-training based on ALBERT using next utterance prediction and
utterance order prediction tasks specifically devised for response selection.
These two customized pre-training tasks can endow our model with the ability of
capturing semantical and chronological dependency between utterances. We then
fine-tune the model on an integrated network with sequence reasoning and graph
reasoning structures. The sequence reasoning module conducts inference based on
the highly summarized context vector of utterance-response pairs from the
global perspective. The graph reasoning module conducts the reasoning on the
utterance-level graph neural network from the local perspective. Experiments on
two conversational reasoning datasets show that our model can dramatically
outperform the strong baseline methods and can achieve performance which is
close to human-level.Comment: Accepted by AAAI 2021;10 pages,6 figure
Hi4D: 4D Instance Segmentation of Close Human Interaction
We propose Hi4D, a method and dataset for the automatic analysis of
physically close human-human interaction under prolonged contact. Robustly
disentangling several in-contact subjects is a challenging task due to
occlusions and complex shapes. Hence, existing multi-view systems typically
fuse 3D surfaces of close subjects into a single, connected mesh. To address
this issue we leverage i) individually fitted neural implicit avatars; ii) an
alternating optimization scheme that refines pose and surface through periods
of close proximity; and iii) thus segment the fused raw scans into individual
instances. From these instances we compile Hi4D dataset of 4D textured scans of
20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D
contains rich interaction-centric annotations in 2D and 3D alongside accurately
registered parametric body models. We define varied human pose and shape
estimation tasks on this dataset and provide results from state-of-the-art
methods on these benchmarks.Comment: Project page: https://yifeiyin04.github.io/Hi4D
Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient
We consider a hybrid reinforcement learning setting (Hybrid RL), in which an
agent has access to an offline dataset and the ability to collect experience
via real-world online interaction. The framework mitigates the challenges that
arise in both pure offline and online RL settings, allowing for the design of
simple and highly effective algorithms, in both theory and practice. We
demonstrate these advantages by adapting the classical Q learning/iteration
algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In
our theoretical results, we prove that the algorithm is both computationally
and statistically efficient whenever the offline dataset supports a
high-quality policy and the environment has bounded bilinear rank. Notably, we
require no assumptions on the coverage provided by the initial distribution, in
contrast with guarantees for policy gradient/iteration methods. In our
experimental results, we show that Hy-Q with neural network function
approximation outperforms state-of-the-art online, offline, and hybrid RL
baselines on challenging benchmarks, including Montezuma's Revenge.Comment: 42 pages, 6 figures. Published at ICLR 2023. Code available at
https://github.com/yudasong/Hy
Accelerating-particle-deposition Method for Quickly Evaluating Long-term Performance of Fin-and-tube Heat Exchangers
Fin-and-tube heat exchanger is the most commonly used heat exchanger type in air-conditioning systems. In the actual operation of air-conditioning systems, the dust particles involved in the air may partly deposit and form particulate fouling on fins and tubes when the dusty air flows through the heat exchangers. The deposited particles may gradually block the passageway of air flow and occupy the heat transfer area, which results in the continuous increase of air side thermal resistance and the significant deterioration of the heat transfer capacity of heat exchangers during the long-term operation. In order to quickly evaluate the long-term performance of fin-and-tube heat exchangers, an accelerating-particle-deposition method, which is capable of implementing the particle deposition process on the long-running heat exchangers in a short time, is proposed in this study. The idea of the accelerating-particle-deposition method is to employ high concentration dusty air flow through heat exchangers in the accelerated test, and to quickly form the particulate fouling with the same weight as that on long-running heat exchangers under the actual operating environment with low particle concentration. The accelerating factor, which is defined as the ratio of the actual running time to the accelerated testing time, is calculated based on the deposition weight of dust particles. The deposition weight is calculated by the relationship of the impact frequency and deposition probability of dust particles with the particle concentration of dusty air. An experimental apparatus for accelerating the particle deposition process and testing the heat transfer capacity of fin-and-tube heat exchangers is designed. The predicted long-term performances of heat exchangers based on the proposed accelerating-particle-deposition method are compared with the actual performance data of heat exchangers after 5-8 years’ operation published by China Quality Certification Center. The comparison results show that, the predicted results agree well with the actual operation data, and the mean deviation of the heat transfer capacity is within 10%
Learning Domain Invariant Prompt for Vision-Language Models
Prompt learning is one of the most effective and trending ways to adapt
powerful vision-language foundation models like CLIP to downstream datasets by
tuning learnable prompt vectors with very few samples. However, although prompt
learning achieves excellent performance over in-domain data, it still faces the
major challenge of generalizing to unseen classes and domains. Some existing
prompt learning methods tackle this issue by adaptively generating different
prompts for different tokens or domains but neglecting the ability of learned
prompts to generalize to unseen domains. In this paper, we propose a novel
prompt learning paradigm that directly generates \emph{domain invariant} prompt
that can be generalized to unseen domains, called MetaPrompt. Specifically, a
dual-modality prompt tuning network is proposed to generate prompts for input
from both image and text modalities. With a novel asymmetric contrastive loss,
the representation from the original pre-trained vision-language model acts as
supervision to enhance the generalization ability of the learned prompt. More
importantly, we propose a meta-learning-based prompt tuning algorithm that
explicitly constrains the task-specific prompt tuned for one domain or class to
also achieve good performance in another domain or class. Extensive experiments
on 11 datasets for base-to-new generalization and 4 datasets for domain
generalization demonstrate that our method consistently and significantly
outperforms existing methods.Comment: 12 pages, 6 figures, 5 table
- …