Search CORE

270 research outputs found

Recommended from our members

Optimization Foundations of Reinforcement Learning

Author: Bhandari Jalaj
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2020
Field of study

Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice. In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems. In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients

Columbia University Academic Commons

Global Optimality Guarantees For Policy Gradient Methods

Author: Bhandari Jalaj
Russo Daniel
Publication venue
Publication date: 29/10/2020
Field of study

Policy gradients methods apply to complex, poorly understood, control problems by performing stochastic gradient descent over a parameterized class of polices. Unfortunately, even for simple control problems solvable by standard dynamic programming techniques, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point. This work identifies structural properties -- shared by several classic control problems -- that ensure the policy gradient objective function has no suboptimal stationary points despite being non-convex. When these conditions are strengthened, this objective satisfies a Polyak-lojasiewicz (gradient dominance) condition that yields convergence rates. We also provide bounds on the optimality gap of any stationary point when some of these conditions are relaxed

arXiv.org e-Print Archive

A Bibliometric Survey on the Reliable Software Delivery Using Predictive Analysis

Author: Ahirrao Swati
Kotecha Ketan
Pachouly Jalaj
Publication venue: DigitalCommons@University of Nebraska - Lincoln
Publication date: 08/10/2020
Field of study

Delivering a reliable software product is a fairly complex process, which involves proper coordination from the various teams in planning, execution, and testing for delivering software. Most of the development time and the software budget\u27s cost is getting spent finding and fixing bugs. Rework and side effect costs are mostly not visible in the planned estimates, caused by inherent bugs in the modified code, which impact the software delivery timeline and increase the cost. Artificial intelligence advancements can predict the probable defects with classification based on the software code changes, helping the software development team make rational decisions. Optimizing the software cost and improving the software quality is the topmost priority of the industry to remain profitable in the competitive market. Hence, there is a great urge to improve software delivery quality by minimizing defects and having reasonable control over predicted defects. This paper presents the bibliometric study for Reliable Software Delivery using Predictive analysis by selecting 450 documents from the Scopus database, choosing keywords like software defect prediction, machine learning, and artificial intelligence. The study is conducted for a year starting from 2010 to 2021. As per the survey, it is observed that Software defect prediction achieved an excellent focus among the researchers. There are great possibilities to predict and improve overall software product quality using artificial intelligence techniques

"A study of change of posture on the pulmonary function tests : can it help copd patients ?"

Author: Gupta Sushma
Saxena Jalaj
Saxena Sonali
Publication venue: Indian Association of Preventive and Social Medicine
Publication date: 30/06/2006
Field of study

Objectives : To know the pulmonary function tests in sitting, supine and standing postures in North Indian population Is there any change in PFTs due to change in posture Settings: Department of Physiology, G.S.V.M. Medical College, Kanpur and Escorts Heart Centre, Kanpur Participants : 50 male and 30 female healthy, non-smoker volunteers comprising of 50 students of first year MBBS and 30 volunteers at Escorts Heart Centre, Kanpur Measurements : PFTs, FEV1, FVC, FER, PEFR and TV Statistical analysis : Students't' test Results: The FEV1, FVC and PEFR increased significantly from supine to sitting to standing posture in both males and females. The FER significantly increased only when moving from supine to sitting in both males and females. The TV increased significantly by moving from supine to sitting and also from supine to standing postures in both males and females

Indian Journal of Community Health

p38α MAPK in inflammation‐associated colorectal cancer

Author: Gupta Jalaj
Publication venue
Publication date: 01/01/2013
Field of study

Tesis doctoral inédita leída en la Universidad Autónoma de Madrid, Facultad de Ciencias, Departamento de Biología Molecular . Fecha de lectura: 23-09-201

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Biblos-e Archivo