27 research outputs found

    Analysis of Q-learning with Adaptation and Momentum Restart for Gradient Descent

    Full text link
    Existing convergence analyses of Q-learning mostly focus on the vanilla stochastic gradient descent (SGD) type of updates. Despite the Adaptive Moment Estimation (Adam) has been commonly used for practical Q-learning algorithms, there has not been any convergence guarantee provided for Q-learning with such type of updates. In this paper, we first characterize the convergence rate for Q-AMSGrad, which is the Q-learning algorithm with AMSGrad update (a commonly adopted alternative of Adam for theoretical analysis). To further improve the performance, we propose to incorporate the momentum restart scheme to Q-AMSGrad, resulting in the so-called Q-AMSGradR algorithm. The convergence rate of Q-AMSGradR is also established. Our experiments on a linear quadratic regulator problem show that the two proposed Q-learning algorithms outperform the vanilla Q-learning with SGD updates. The two algorithms also exhibit significantly better performance than the DQN learning method over a batch of Atari 2600 games.Comment: This paper extends the work presented at the 2020 International Joint Conferences on Artificial Intelligence with supplementary material

    Non-asymptotic Convergence of Adam-type Reinforcement Learning Algorithms under Markovian Sampling

    Full text link
    Despite the wide applications of Adam in reinforcement learning (RL), the theoretical convergence of Adam-type RL algorithms has not been established. This paper provides the first such convergence analysis for two fundamental RL algorithms of policy gradient (PG) and temporal difference (TD) learning that incorporate AMSGrad updates (a standard alternative of Adam in theoretical analysis), referred to as PG-AMSGrad and TD-AMSGrad, respectively. Moreover, our analysis focuses on Markovian sampling for both algorithms. We show that under general nonlinear function approximation, PG-AMSGrad with a constant stepsize converges to a neighborhood of a stationary point at the rate of O(1/T)\mathcal{O}(1/T) (where TT denotes the number of iterations), and with a diminishing stepsize converges exactly to a stationary point at the rate of O(log2T/T)\mathcal{O}(\log^2 T/\sqrt{T}). Furthermore, under linear function approximation, TD-AMSGrad with a constant stepsize converges to a neighborhood of the global optimum at the rate of O(1/T)\mathcal{O}(1/T), and with a diminishing stepsize converges exactly to the global optimum at the rate of O(logT/T)\mathcal{O}(\log T/\sqrt{T}). Our study develops new techniques for analyzing the Adam-type RL algorithms under Markovian sampling

    Influence of long-term fertilization on soil aggregates stability and organic carbon occurrence characteristics in karst yellow soil of Southwest China

    Get PDF
    Current research has long focused on soil organic carbon and soil aggregates stability. However, the effects of different long-term fertilization on the composition of yellow soil aggregates and the characteristics of the occurrence of organic carbon in the karst region of Southwest China are still unclear. Based on a 25-year long-term located experiment on yellow soil, soil samples from the 0–20 cm soil layer were collected and treated with different fertilizers (CK: unfertilized control; NPK: chemical fertilizer; 1/4 M + 3/4 NP: 25% chemical fertilizer replaced by 25% organic fertilizer; 1/2 M + 1/2 NP: 50% chemical fertilizer replaced by organic fertilizer; and M: organic fertilizer). In water-stable aggregates, soil aggregates stability, total organic carbon (TOC), easily oxidized organic carbon (EOC), carbon preservation capacity (CPC), and carbon pool management index (CPMI) were analyzed. The findings demonstrated that the order of the average weight diameter (MWD), geometric mean diameter (GWD), and macro-aggregate content (R0.25) of stable water aggregates was M > CK > 1/2M +1/2NP > 1/4M +3/4NP> NPK. The MWD, GWD, and R0.25 of NPK treatment significantly decreased by 32.6%, 43.2%, and 7.0 percentage points, respectively, compared to CK treatment. The order of TOC and EOC content in aggregates of different particle sizes was M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, and it increased as the rate of organic fertilizer increased. In macro-aggregates and bulk soil, the CPC of TOC (TOPC) and EOC (EOPC), as well as CPMI, were arranged as M > 1/2M +1/2NP > 1/4M +3/4NP> CK > NPK, but the opposite was true for micro-aggregates. In bulk soil treated with organic fertilizer, the TOPC, EOPC, and CPMI significantly increased by 27.4%–53.8%, 29.7%–78.1%, 29.7–82.2 percentage points, respectively, compared to NPK treatment. Redundancy analysis and stepwise regression analysis show that TOC was the main physical and chemical factor affecting the aggregates stability, and the TOPC in micro-aggregates has the most direct impact. In conclusion, the primary cause of the decrease in SOC caused by the long-term application of chemical fertilizer was the loss of organic carbon in macro-aggregates. An essential method to increase soil nutrient supply and improve yellow soil productivity was to apply an organic fertilizer to increase aggregates stability, storage and activity of SOC in macro-aggregates

    Finite-Time Analysis for Double Q-learning

    No full text
    Advances in Neural Information Processing Systems 3

    Effect of Land Expropriation on Land-Lost Farmers’ Health: Empirical Evidence from Rural China

    No full text
    With rapid urbanization and industry development, China has witnessed substantial land acquisition. Using the rural household survey data, this paper examines the impact of land expropriation on land-lost farmers’ self-reported health with the ordered probit model and investigates the possible mechanisms. The results show that the land expropriation puts higher health risks over those land-lost farmers and the health status of land-lost farmers is significantly worse than that of those with land. Land expropriation has a negative impact on the land-lost farmer’s health through income effects and psychological effects. The health status of land-lost farmers can be enhanced through amending current land requisition policies, increasing the amount of compensation, improving the earning capacity of land-lost farmers and strengthening mental health education

    Deterministic Policy Gradient: Convergence Analysis

    No full text
    The Conference on Uncertainty in Artificial Intelligence (UAI

    Finite-time theory of momentum Q-learning

    No full text
    37th conference on uncertainty in artificial intelligenc
    corecore