3 research outputs found

    Frustratingly Easy Regularization on Representation Can Boost Deep Reinforcement Learning

    Full text link
    Deep reinforcement learning (DRL) gives the promise that an agent learns good policy from high-dimensional information, whereas representation learning removes irrelevant and redundant information and retains pertinent information. In this work, we demonstrate that the learned representation of the QQ-network and its target QQ-network should, in theory, satisfy a favorable distinguishable representation property. Specifically, there exists an upper bound on the representation similarity of the value functions of two adjacent time steps in a typical DRL setting. However, through illustrative experiments, we show that the learned DRL agent may violate this property and lead to a sub-optimal policy. Therefore, we propose a simple yet effective regularizer called Policy Evaluation with Easy Regularization on Representation (PEER), which aims to maintain the distinguishable representation property via explicit regularization on internal representations. And we provide the convergence rate guarantee of PEER. Implementing PEER requires only one line of code. Our experiments demonstrate that incorporating PEER into DRL can significantly improve performance and sample efficiency. Comprehensive experiments show that PEER achieves state-of-the-art performance on all 4 environments on PyBullet, 9 out of 12 tasks on DMControl, and 19 out of 26 games on Atari. To the best of our knowledge, PEER is the first work to study the inherent representation property of Q-network and its target. Our code is available at https://sites.google.com/view/peer-cvpr2023/.Comment: Accepted to CVPR23. Website: https://sites.google.com/view/peer-cvpr2023

    Time-inconsistent objectives in reinforcement learning

    No full text
    In Reinforcement Learning, one of the most intriguing and long-lasting problems is about how to assign credit to historical events efficiently and meaningfully. And within temporal credit assignment problems, time inconsistency is a challenging sub-domain that was noticed long ago but still lacks systematic treatment. The goal of this work is to search for efficient algorithms that converge to equilibrium policies in the presence of time-inconsistent objectives. In this work, we first provide a brief introduction on reinforcement learning and control theory; then, we define the time-inconsistent problem, both illustratively and formally. After that, we propose a general backward update framework based on game theory. This framework is theoretically proven to be able to find the equilibrium control under time-inconsistency. We also review and implement a forward update algorithm that is able to find the equilibrium control in the special case of hyperbolic discounting but has many limitations. The literature review introduces other time-inconsistent situations and algorithms that deal with the efficient temporal credit assignment problem. Finally, we conclude the report and point out the future directions.Bachelor of Science in Mathematical Science

    Estimating Regional PM<sub>2.5</sub> Concentrations in China Using a Global-Local Regression Model Considering Global Spatial Autocorrelation and Local Spatial Heterogeneity

    No full text
    Linear regression models are commonly used for estimating ground PM2.5 concentrations, but the global spatial autocorrelation and local spatial heterogeneity of PM2.5 distribution are either ignored or only partially considered in commonly used models for estimating PM2.5 concentrations. Therefore, taking both global spatial autocorrelation and local spatial heterogeneity into consideration, a global-local regression (GLR) model is proposed for estimating ground PM2.5 concentrations in the Yangtze River Delta (YRD) and in the Beijing, Tianjin, Hebei (BTH) regions of China based on the aerosol optical depth data, meteorological data, remote sensing data, and pollution source data. Considering the global spatial autocorrelation, the GLR model extracts global factors by the eigenvector spatial filtering (ESF) method, and combines the fraction of them that passes further filtering with the geographically weighted regression (GWR) method to address the local spatial heterogeneity. Comprehensive results show that the GLR model outperforms the ordinary GWR and ESF models, and the GLR model has the best performance at the monthly, seasonal, and annual levels. The average adjusted R2 of the monthly GLR model in the YRD region (the BTH region) is 0.620 (0.853), which is 8.0% and 7.4% (6.8% and 7.0%) higher than that of the monthly ESF and GWR models, respectively. The average cross-validation root mean square error of the monthly GLR model is 7.024 μg/m3 in the YRD region, and 9.499 μg/m3 in the BTH region, which is lower than that of the ESF and GWR models. The GLR model can effectively address the spatial autocorrelation and spatial heterogeneity, and overcome the shortcoming of the ordinary GWR model that overfocuses on local features and the disadvantage of the poor local performance of the ordinary ESF model. Overall, the GLR model with good spatial and temporal applicability is a promising method for estimating PM2.5 concentrations
    corecore