270 research outputs found
Recommended from our members
Optimization Foundations of Reinforcement Learning
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice.
In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems.
In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients
Global Optimality Guarantees For Policy Gradient Methods
Policy gradients methods apply to complex, poorly understood, control
problems by performing stochastic gradient descent over a parameterized class
of polices. Unfortunately, even for simple control problems solvable by
standard dynamic programming techniques, policy gradient algorithms face
non-convex optimization problems and are widely understood to converge only to
a stationary point. This work identifies structural properties -- shared by
several classic control problems -- that ensure the policy gradient objective
function has no suboptimal stationary points despite being non-convex. When
these conditions are strengthened, this objective satisfies a
Polyak-lojasiewicz (gradient dominance) condition that yields convergence
rates. We also provide bounds on the optimality gap of any stationary point
when some of these conditions are relaxed
A Bibliometric Survey on the Reliable Software Delivery Using Predictive Analysis
Delivering a reliable software product is a fairly complex process, which involves proper coordination from the various teams in planning, execution, and testing for delivering software. Most of the development time and the software budget\u27s cost is getting spent finding and fixing bugs. Rework and side effect costs are mostly not visible in the planned estimates, caused by inherent bugs in the modified code, which impact the software delivery timeline and increase the cost. Artificial intelligence advancements can predict the probable defects with classification based on the software code changes, helping the software development team make rational decisions. Optimizing the software cost and improving the software quality is the topmost priority of the industry to remain profitable in the competitive market. Hence, there is a great urge to improve software delivery quality by minimizing defects and having reasonable control over predicted defects. This paper presents the bibliometric study for Reliable Software Delivery using Predictive analysis by selecting 450 documents from the Scopus database, choosing keywords like software defect prediction, machine learning, and artificial intelligence. The study is conducted for a year starting from 2010 to 2021. As per the survey, it is observed that Software defect prediction achieved an excellent focus among the researchers. There are great possibilities to predict and improve overall software product quality using artificial intelligence techniques
"A study of change of posture on the pulmonary function tests : can it help copd patients ?"
Objectives : To know the pulmonary function tests in sitting, supine and standing postures in North Indian population Is there any change in PFTs due to change in posture Settings: Department of Physiology, G.S.V.M. Medical College, Kanpur and Escorts Heart Centre, Kanpur Participants : 50 male and 30 female healthy, non-smoker volunteers comprising of 50 students of first year MBBS and 30 volunteers at Escorts Heart Centre, Kanpur Measurements : PFTs, FEV1, FVC, FER, PEFR and TV Statistical analysis : Students't' test Results: The FEV1, FVC and PEFR increased significantly from supine to sitting to standing posture in both males and females. The FER significantly increased only when moving from supine to sitting in both males and females. The TV increased significantly by moving from supine to sitting and also from supine to standing postures in both males and females
p38α MAPK in inflammation‐associated colorectal cancer
Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular . Fecha de lectura: 23-09-201
- …