Search CORE

27 research outputs found

Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

Author: Liang Yingbin
Weng Bowen
Xiong Huaqing
Zhang Wei
Publication venue: 'International Joint Conferences on Artificial Intelligence'
Publication date: 14/07/2020
Field of study

Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.Comment: This paper extends the work presented at the 2020 International Joint Conferences on Artificial Intelligence with supplementary material

arXiv.org e-Print Archive

Crossref

Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

Author: Liang Yingbin
Xiong Huaqing
Xu Tengyu
Zhang Wei
Publication venue
Publication date: 17/08/2020
Field of study

Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning that incorporate AMSGrad updates (a standard alternative of Adam in theoretical analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover, our analysis focuses on Markovian sampling for both algorithms. We show that under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of

\mathcal{O}(1/T)

(where

T

denotes the number of iterations), and with a diminishing stepsize converges exactly to a stationary point at the rate of

\mathcal{O}(\log^2 T/\sqrt{T})

. Furthermore, under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of

\mathcal{O}(1/T)

, and with a diminishing stepsize converges exactly to the global optimum at the rate of

\mathcal{O}(\log T/\sqrt{T})

. Our study develops new techniques for analyzing the Adam-type RL algorithms under Markovian sampling

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Influence of long-term fertilization on soil aggregates stability and organic carbon occurrence characteristics in karst yellow soil of Southwest China

Author: Han Xiong
Han Xiong
Huaqing Zhu
Huaqing Zhu
Meng Zhang
Meng Zhang
Taiming Jiang
Taiming Jiang
Xingcheng Huang
Xingcheng Huang
Yanling Liu
Yanling Liu
Yarong Zhang
Yarong Zhang
Yehua Yang
Yehua Yang
Yu Li
Yu Li
Publication venue: 'Frontiers Media SA'
Publication date: 01/06/2023
Field of study

Current research has long focused on soil organic carbon and soil aggregates stability. However, the effects of different long-term fertilization on the composition of yellow soil aggregates and the characteristics of the occurrence of organic carbon in the karst region of Southwest China are still unclear. Based on a 25-year long-term located experiment on yellow soil, soil samples from the 0–20 cm soil layer were collected and treated with different fertilizers (CK: unfertilized control; NPK: chemical fertilizer; 1/4 M + 3/4 NP: 25% chemical fertilizer replaced by 25% organic fertilizer; 1/2 M + 1/2 NP: 50% chemical fertilizer replaced by organic fertilizer; and M: organic fertilizer). In water-stable aggregates, soil aggregates stability, total organic carbon (TOC), easily oxidized organic carbon (EOC), carbon preservation capacity (CPC), and carbon pool management index (CPMI) were analyzed. The findings demonstrated that the order of the average weight diameter (MWD), geometric mean diameter (GWD), and macro-aggregate content (R0.25) of stable water aggregates was M > CK > 1/2M +1/2NP > 1/4M +3/4NP> NPK. The MWD, GWD, and R0.25 of NPK treatment significantly decreased by 32.6%, 43.2%, and 7.0 percentage points, respectively, compared to CK treatment. The order of TOC and EOC content in aggregates of different particle sizes was M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, and it increased as the rate of organic fertilizer increased. In macro-aggregates and bulk soil, the CPC of TOC (TOPC) and EOC (EOPC), as well as CPMI, were arranged as M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, but the opposite was true for micro-aggregates. In bulk soil treated with organic fertilizer, the TOPC, EOPC, and CPMI significantly increased by 27.4%–53.8%, 29.7%–78.1%, 29.7–82.2 percentage points, respectively, compared to NPK treatment. Redundancy analysis and stepwise regression analysis show that TOC was the main physical and chemical factor affecting the aggregates stability, and the TOPC in micro-aggregates has the most direct impact. In conclusion, the primary cause of the decrease in SOC caused by the long-term application of chemical fertilizer was the loss of organic carbon in macro-aggregates. An essential method to increase soil nutrient supply and improve yellow soil productivity was to apply an organic fertilizer to increase aggregates stability, storage and activity of SOC in macro-aggregates

Directory of Open Access Journals

Finite-Time Analysis for Double Q-learning

Author: Liang Yingbin
Xiong Huaqing
Zhang Wei
Zhao Lin
Publication venue
Publication date: 29/09/2020
Field of study

Advances in Neural Information Processing Systems 3

arXiv.org e-Print Archive

ScholarBank@NUS

Effect of Land Expropriation on Land-Lost Farmers’ Health: Empirical Evidence from Rural China

Author: Huaqing Wu
Jinping Xiong
Wenlong Li
Ying Li
Yuxin Wang
Publication venue: 'MDPI AG'
Publication date: 15/08/2019
Field of study

With rapid urbanization and industry development, China has witnessed substantial land acquisition. Using the rural household survey data, this paper examines the impact of land expropriation on land-lost farmers’ self-reported health with the ordered probit model and investigates the possible mechanisms. The results show that the land expropriation puts higher health risks over those land-lost farmers and the health status of land-lost farmers is significantly worse than that of those with land. Land expropriation has a negative impact on the land-lost farmer’s health through income effects and psychological effects. The health status of land-lost farmers can be enhanced through amending current land requisition policies, increasing the amount of compensation, improving the earning capacity of land-lost farmers and strengthening mental health education

Multidisciplinary Digital Publishing Institute

Deterministic Policy Gradient: Convergence Analysis

Author: Huaqing Xiong
Tengyu Xu
Wei Zhang
Yingbin Liang
Zhao Lin
Publication venue
Publication date: 15/07/2022
Field of study

The Conference on Uncertainty in Artificial Intelligence (UAI

ScholarBank@NUS

Finite-time theory of momentum Q-learning

Author: Liang Yingbin
Weng Bowen
Xiong Huaqing
Zhang Wei
Zhao Lin
Publication venue
Publication date: 17/07/2021
Field of study

37th conference on uncertainty in artificial intelligenc

ScholarBank@NUS