27 research outputs found
Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent
Existing convergence analyses of Q-learning mostly focus on the vanilla
stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment
Estimation (Adam) has been commonly used for practical Q-learning algorithms,
there has not been any convergence guarantee provided for Q-learning with such
type of updates. In this paper, we first characterize the convergence rate for
Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly
adopted alternative of Adam for theoretical analysis). To further improve the
performance, we propose to incorporate the momentum restart scheme to
Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence
rate of Q-AMSGradR is also established. Our experiments on a linear quadratic
regulator problem show that the two proposed Q-learning algorithms outperform
the vanilla Q-learning with SGD updates. The two algorithms also exhibit
significantly better performance than the DQN learning method over a batch of
Atari 2600 games.Comment: This paper extends the work presented at the 2020 International Joint
Conferences on Artificial Intelligence with supplementary material
Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling
Despite the wide applications of Adam in reinforcement learning (RL), the
theoretical convergence of Adam-type RL algorithms has not been established.
This paper provides the first such convergence analysis for two fundamental RL
algorithms of policy gradient (PG) and temporal difference (TD) learning that
incorporate AMSGrad updates (a standard alternative of Adam in theoretical
analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover,
our analysis focuses on Markovian sampling for both algorithms. We show that
under general nonlinear function approximation, PG-AMSGrad with a constant
stepsize converges to a neighborhood of a stationary point at the rate of
(where denotes the number of iterations), and with a
diminishing stepsize converges exactly to a stationary point at the rate of
. Furthermore, under linear function
approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood
of the global optimum at the rate of , and with a diminishing
stepsize converges exactly to the global optimum at the rate of
. Our study develops new techniques for analyzing
the Adam-type RL algorithms under Markovian sampling
Influence of long-term fertilization on soil aggregates stability and organic carbon occurrence characteristics in karst yellow soil of Southwest China
Current research has long focused on soil organic carbon and soil aggregates stability. However, the effects of different long-term fertilization on the composition of yellow soil aggregates and the characteristics of the occurrence of organic carbon in the karst region of Southwest China are still unclear. Based on a 25-year long-term located experiment on yellow soil, soil samples from the 0–20 cm soil layer were collected and treated with different fertilizers (CK: unfertilized control; NPK: chemical fertilizer; 1/4 M + 3/4 NP: 25% chemical fertilizer replaced by 25% organic fertilizer; 1/2 M + 1/2 NP: 50% chemical fertilizer replaced by organic fertilizer; and M: organic fertilizer). In water-stable aggregates, soil aggregates stability, total organic carbon (TOC), easily oxidized organic carbon (EOC), carbon preservation capacity (CPC), and carbon pool management index (CPMI) were analyzed. The findings demonstrated that the order of the average weight diameter (MWD), geometric mean diameter (GWD), and macro-aggregate content (R0.25) of stable water aggregates was M > CK > 1/2M +1/2NP > 1/4M +3/4NP> NPK. The MWD, GWD, and R0.25 of NPK treatment significantly decreased by 32.6%, 43.2%, and 7.0 percentage points, respectively, compared to CK treatment. The order of TOC and EOC content in aggregates of different particle sizes was M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, and it increased as the rate of organic fertilizer increased. In macro-aggregates and bulk soil, the CPC of TOC (TOPC) and EOC (EOPC), as well as CPMI, were arranged as M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, but the opposite was true for micro-aggregates. In bulk soil treated with organic fertilizer, the TOPC, EOPC, and CPMI significantly increased by 27.4%–53.8%, 29.7%–78.1%, 29.7–82.2 percentage points, respectively, compared to NPK treatment. Redundancy analysis and stepwise regression analysis show that TOC was the main physical and chemical factor affecting the aggregates stability, and the TOPC in micro-aggregates has the most direct impact. In conclusion, the primary cause of the decrease in SOC caused by the long-term application of chemical fertilizer was the loss of organic carbon in macro-aggregates. An essential method to increase soil nutrient supply and improve yellow soil productivity was to apply an organic fertilizer to increase aggregates stability, storage and activity of SOC in macro-aggregates
Finite-Time Analysis for Double Q-learning
Advances in Neural Information Processing Systems 3
Effect of Land Expropriation on Land-Lost Farmers’ Health: Empirical Evidence from Rural China
With rapid urbanization and industry development, China has witnessed substantial land acquisition. Using the rural household survey data, this paper examines the impact of land expropriation on land-lost farmers’ self-reported health with the ordered probit model and investigates the possible mechanisms. The results show that the land expropriation puts higher health risks over those land-lost farmers and the health status of land-lost farmers is significantly worse than that of those with land. Land expropriation has a negative impact on the land-lost farmer’s health through income effects and psychological effects. The health status of land-lost farmers can be enhanced through amending current land requisition policies, increasing the amount of compensation, improving the earning capacity of land-lost farmers and strengthening mental health education
Deterministic Policy Gradient: Convergence Analysis
The Conference on Uncertainty in Artificial Intelligence (UAI
Finite-time theory of momentum Q-learning
37th conference on uncertainty in artificial intelligenc