1,479 research outputs found

    Offline Data Enhanced On-Policy Policy Gradient with Provable Guarantees

    Full text link
    Hybrid RL is the setting where an RL agent has access to both offline data and online data by interacting with the real-world environment. In this work, we propose a new hybrid RL algorithm that combines an on-policy actor-critic method with offline data. On-policy methods such as policy gradient and natural policy gradient (NPG) have shown to be more robust to model misspecification, though sometimes it may not be as sample efficient as methods that rely on off-policy learning. On the other hand, offline methods that depend on off-policy training often require strong assumptions in theory and are less stable to train in practice. Our new approach integrates a procedure of off-policy training on the offline data into an on-policy NPG framework. We show that our approach, in theory, can obtain a best-of-both-worlds type of result -- it achieves the state-of-art theoretical guarantees of offline RL when offline RL-specific assumptions hold, while at the same time maintaining the theoretical guarantees of on-policy NPG regardless of the offline RL assumptions' validity. Experimentally, in challenging rich-observation environments, we show that our approach outperforms a state-of-the-art hybrid RL baseline which only relies on off-policy policy optimization, demonstrating the empirical benefit of combining on-policy and off-policy learning. Our code is publicly available at https://github.com/YifeiZhou02/HNPG.Comment: The first two authors contributed equall

    A Graph Reasoning Network for Multi-turn Response Selection via Customized Pre-training

    Full text link
    We investigate response selection for multi-turn conversation in retrieval-based chatbots. Existing studies pay more attention to the matching between utterances and responses by calculating the matching score based on learned features, leading to insufficient model reasoning ability. In this paper, we propose a graph-reasoning network (GRN) to address the problem. GRN first conducts pre-training based on ALBERT using next utterance prediction and utterance order prediction tasks specifically devised for response selection. These two customized pre-training tasks can endow our model with the ability of capturing semantical and chronological dependency between utterances. We then fine-tune the model on an integrated network with sequence reasoning and graph reasoning structures. The sequence reasoning module conducts inference based on the highly summarized context vector of utterance-response pairs from the global perspective. The graph reasoning module conducts the reasoning on the utterance-level graph neural network from the local perspective. Experiments on two conversational reasoning datasets show that our model can dramatically outperform the strong baseline methods and can achieve performance which is close to human-level.Comment: Accepted by AAAI 2021;10 pages,6 figure

    Hi4D: 4D Instance Segmentation of Close Human Interaction

    Full text link
    We propose Hi4D, a method and dataset for the automatic analysis of physically close human-human interaction under prolonged contact. Robustly disentangling several in-contact subjects is a challenging task due to occlusions and complex shapes. Hence, existing multi-view systems typically fuse 3D surfaces of close subjects into a single, connected mesh. To address this issue we leverage i) individually fitted neural implicit avatars; ii) an alternating optimization scheme that refines pose and surface through periods of close proximity; and iii) thus segment the fused raw scans into individual instances. From these instances we compile Hi4D dataset of 4D textured scans of 20 subject pairs, 100 sequences, and a total of more than 11K frames. Hi4D contains rich interaction-centric annotations in 2D and 3D alongside accurately registered parametric body models. We define varied human pose and shape estimation tasks on this dataset and provide results from state-of-the-art methods on these benchmarks.Comment: Project page: https://yifeiyin04.github.io/Hi4D

    Hybrid RL: Using Both Offline and Online Data Can Make RL Efficient

    Full text link
    We consider a hybrid reinforcement learning setting (Hybrid RL), in which an agent has access to an offline dataset and the ability to collect experience via real-world online interaction. The framework mitigates the challenges that arise in both pure offline and online RL settings, allowing for the design of simple and highly effective algorithms, in both theory and practice. We demonstrate these advantages by adapting the classical Q learning/iteration algorithm to the hybrid setting, which we call Hybrid Q-Learning or Hy-Q. In our theoretical results, we prove that the algorithm is both computationally and statistically efficient whenever the offline dataset supports a high-quality policy and the environment has bounded bilinear rank. Notably, we require no assumptions on the coverage provided by the initial distribution, in contrast with guarantees for policy gradient/iteration methods. In our experimental results, we show that Hy-Q with neural network function approximation outperforms state-of-the-art online, offline, and hybrid RL baselines on challenging benchmarks, including Montezuma's Revenge.Comment: 42 pages, 6 figures. Published at ICLR 2023. Code available at https://github.com/yudasong/Hy

    Accelerating-particle-deposition Method for Quickly Evaluating Long-term Performance of Fin-and-tube Heat Exchangers

    Get PDF
    Fin-and-tube heat exchanger is the most commonly used heat exchanger type in air-conditioning systems. In the actual operation of air-conditioning systems, the dust particles involved in the air may partly deposit and form particulate fouling on fins and tubes when the dusty air flows through the heat exchangers. The deposited particles may gradually block the passageway of air flow and occupy the heat transfer area, which results in the continuous increase of air side thermal resistance and the significant deterioration of the heat transfer capacity of heat exchangers during the long-term operation. In order to quickly evaluate the long-term performance of fin-and-tube heat exchangers, an accelerating-particle-deposition method, which is capable of implementing the particle deposition process on the long-running heat exchangers in a short time, is proposed in this study. The idea of the accelerating-particle-deposition method is to employ high concentration dusty air flow through heat exchangers in the accelerated test, and to quickly form the particulate fouling with the same weight as that on long-running heat exchangers under the actual operating environment with low particle concentration. The accelerating factor, which is defined as the ratio of the actual running time to the accelerated testing time, is calculated based on the deposition weight of dust particles. The deposition weight is calculated by the relationship of the impact frequency and deposition probability of dust particles with the particle concentration of dusty air. An experimental apparatus for accelerating the particle deposition process and testing the heat transfer capacity of fin-and-tube heat exchangers is designed. The predicted long-term performances of heat exchangers based on the proposed accelerating-particle-deposition method are compared with the actual performance data of heat exchangers after 5-8 years’ operation published by China Quality Certification Center. The comparison results show that, the predicted results agree well with the actual operation data, and the mean deviation of the heat transfer capacity is within 10%

    Learning Domain Invariant Prompt for Vision-Language Models

    Full text link
    Prompt learning is one of the most effective and trending ways to adapt powerful vision-language foundation models like CLIP to downstream datasets by tuning learnable prompt vectors with very few samples. However, although prompt learning achieves excellent performance over in-domain data, it still faces the major challenge of generalizing to unseen classes and domains. Some existing prompt learning methods tackle this issue by adaptively generating different prompts for different tokens or domains but neglecting the ability of learned prompts to generalize to unseen domains. In this paper, we propose a novel prompt learning paradigm that directly generates \emph{domain invariant} prompt that can be generalized to unseen domains, called MetaPrompt. Specifically, a dual-modality prompt tuning network is proposed to generate prompts for input from both image and text modalities. With a novel asymmetric contrastive loss, the representation from the original pre-trained vision-language model acts as supervision to enhance the generalization ability of the learned prompt. More importantly, we propose a meta-learning-based prompt tuning algorithm that explicitly constrains the task-specific prompt tuned for one domain or class to also achieve good performance in another domain or class. Extensive experiments on 11 datasets for base-to-new generalization and 4 datasets for domain generalization demonstrate that our method consistently and significantly outperforms existing methods.Comment: 12 pages, 6 figures, 5 table
    • …
    corecore