47,371 research outputs found

    Concentration of Contractive Stochastic Approximation: Additive and Multiplicative Noise

    Full text link
    In this work, we study the concentration behavior of a stochastic approximation (SA) algorithm under a contractive operator with respect to an arbitrary norm. We consider two settings where the iterates are potentially unbounded: (1) bounded multiplicative noise, and (2) additive sub-Gaussian noise. We obtain maximal concentration inequalities on the convergence errors, and show that these errors have sub-Gaussian tails in the additive noise setting, and super-polynomial tails (faster than polynomial decay) in the multiplicative noise setting. In addition, we provide an impossibility result showing that it is in general not possible to achieve sub-exponential tails for SA with multiplicative noise. To establish these results, we develop a novel bootstrapping argument that involves bounding the moment generating function of the generalized Moreau envelope of the error and the construction of an exponential supermartingale to enable using Ville's maximal inequality. To demonstrate the applicability of our theoretical results, we use them to provide maximal concentration bounds for a large class of reinforcement learning algorithms, including but not limited to on-policy TD-learning with linear function approximation, off-policy TD-learning with generalized importance sampling factors, and QQ-learning. To the best of our knowledge, super-polynomial concentration bounds for off-policy TD-learning have not been established in the literature due to the challenge of handling the combination of unbounded iterates and multiplicative noise

    Positive and Negative Parenting in Conduct Disorder with High versus Low Levels of Callous-Unemotional Traits

    Get PDF
    Less is known about the relationship between conduct disorder (CD), callous-unemotional (CU) traits, and positive and negative parenting in youth compared to early childhood. We combined traditional univariate analyses with a novel machine learning classifier (Angle-based Generalized Matrix Learning Vector Quantization) to classify youth (N = 756; 9-18 years) into typically developing (TD) or CD groups with or without elevated CU traits (CD/HCU, CD/LCU, respectively) using youth- A nd parent-reports of parenting behavior. At the group level, both CD/HCU and CD/LCU were associated with high negative and low positive parenting relative to TD. However, only positive parenting differed between the CD/HCU and CD/LCU groups. In classification analyses, performance was best when distinguishing CD/HCU from TD groups and poorest when distinguishing CD/HCU from CD/LCU groups. Positive and negative parenting were both relevant when distinguishing CD/HCU from TD, negative parenting was most relevant when distinguishing between CD/LCU and TD, and positive parenting was most relevant when distinguishing CD/HCU from CD/LCU groups. These findings suggest that while positive parenting distinguishes between CD/HCU and CD/LCU, negative parenting is associated with both CD subtypes. These results highlight the importance of considering multiple parenting behaviors in CD with varying levels of CU traits in late childhood/adolescence

    Generalized Off-Policy Actor-Critic

    Full text link
    We propose a new objective, the counterfactual objective, unifying existing objectives for off-policy policy gradient algorithms in the continuing reinforcement learning (RL) setting. Compared to the commonly used excursion objective, which can be misleading about the performance of the target policy when deployed, our new objective better predicts such performance. We prove the Generalized Off-Policy Policy Gradient Theorem to compute the policy gradient of the counterfactual objective and use an emphatic approach to get an unbiased sample from this policy gradient, yielding the Generalized Off-Policy Actor-Critic (Geoff-PAC) algorithm. We demonstrate the merits of Geoff-PAC over existing algorithms in Mujoco robot simulation tasks, the first empirical success of emphatic algorithms in prevailing deep RL benchmarks.Comment: NeurIPS 201
    • …
    corecore