3,046 research outputs found
Recommended from our members
Optimization Foundations of Reinforcement Learning
Reinforcement learning (RL) has attracted rapidly increasing interest in the machine learning and artificial intelligence communities in the past decade. With tremendous success already demonstrated for Game AI, RL offers great potential for applications in more complex, real world domains, for example in robotics, autonomous driving and even drug discovery. Although researchers have devoted a lot of engineering effort to deploy RL methods at scale, many state-of-the art RL techniques still seem mysterious - with limited theoretical guarantees on their behaviour in practice.
In this thesis, we focus on understanding convergence guarantees for two key ideas in reinforcement learning, namely Temporal difference learning and policy gradient methods, from an optimization perspective. In Chapter 2, we provide a simple and explicit finite time analysis of Temporal difference (TD) learning with linear function approximation. Except for a few key insights, our analysis mirrors standard techniques for analyzing stochastic gradient descent algorithms, and therefore inherits the simplicity and elegance of that literature. Our convergence results extend seamlessly to the study of TD learning with eligibility traces, known as TD(λ), and to Q-learning for a class of high-dimensional optimal stopping problems.
In Chapter 3, we turn our attention to policy gradient methods and present a simple and general understanding of their global convergence properties. The main challenge here is that even for simple control problems, policy gradient algorithms face non-convex optimization problems and are widely understood to converge only to a stationary point of the objective. We identify structural properties -- shared by finite MDPs and several classic control problems -- which guarantee that despite non-convexity, any stationary point of the policy gradient objective is globally optimal. In the final chapter, we extend our analysis for finite MDPs to show linear convergence guarantees for many popular variants of policy gradient methods like projected policy gradient, Frank-Wolfe, mirror descent and natural policy gradients
Data-based analysis of extreme events: inference, numerics and applications
The concept of extreme events describes the above average behavior of a process, for instance, heat waves in climate or weather research, earthquakes in geology and financial crashes in economics. It is significant to study the behavior of extremes, in order to reduce their negative impacts. Key objectives include the identification of the appropriate mathematical/statistical model, description of the underlying dependence structure in the multivariate or the spatial case, and the investigation of the most relevant external factors. Extreme value analysis (EVA), based on Extreme Value Theory, provides the necessary statistical tools. Assuming that all relevant covariates are known and observed, EVA often deploys statistical regression analysis to study the changes in the model parameters. Modeling of the dependence structure implies a priori assumptions such as Gaussian, locally stationary or isotropic behavior. Based on EVA and advanced time-series analysis methodology, this thesis introduces a semiparametric, nonstationary and non- homogenous framework for statistical regression analysis of spatio-temporal extremes. The involved regression analysis accounts explicitly for systematically missing covariates; their influence was reduced to an additive nonstationary offset. The nonstationarity was resolved by the Finite Element Time Series Analysis Methodology (FEM). FEM approximates the underlying nonstationarity by a set of locally stationary models and a nonstationary hidden switching process with bounded variation (BV). The resulting FEM-BV- EVA approach goes beyond a priori assumptions of standard methods based, for instance, on Bayesian statistics, Hidden Markov Models or Local Kernel Smoothing. The multivariate/spatial extension of FEM-BV-EVA describes the underlying spatial variability by the model parameters, referring to hierarchical modeling. The spatio-temporal behavior of the model parameters was approximated by locally stationary models and a spatial nonstationary switching process. Further, it was shown that the resulting spatial FEM-BV-EVA formulation is consistent with the max-stability postulate and describes the underlying dependence structure in a nonparametric way. The proposed FEM-BV-EVA methodology was integrated into the existent FEM MATLAB toolbox. The FEM-BV-EVA framework is computationally efficient as it deploys gradient free MCMC based optimization methods and numerical solvers for constrained, large, structured quadratic and linear problems. In order to demonstrate its performance, FEM-BV-EVA was applied to various test-cases and real-data and compared to standard methods. It was shown that parametric approaches lead to biased results if significant covariates are unresolved. Comparison to nonparametric methods based on smoothing regression revealed their weakness, the locality property and the inability to resolve discontinuous functions. Spatial FEM-BV-EVA was applied to study the dynamics of extreme precipitation over Switzerland. The analysis identified among others three major spatially dependent regions
Q-learning with Nearest Neighbors
We consider model-free reinforcement learning for infinite-horizon discounted
Markov Decision Processes (MDPs) with a continuous state space and unknown
transition kernel, when only a single sample path under an arbitrary policy of
the system is available. We consider the Nearest Neighbor Q-Learning (NNQL)
algorithm to learn the optimal Q function using nearest neighbor regression
method. As the main contribution, we provide tight finite sample analysis of
the convergence rate. In particular, for MDPs with a -dimensional state
space and the discounted factor , given an arbitrary sample
path with "covering time" , we establish that the algorithm is guaranteed
to output an -accurate estimate of the optimal Q-function using
samples. For instance, for a
well-behaved MDP, the covering time of the sample path under the purely random
policy scales as so the sample
complexity scales as Indeed, we
establish a lower bound that argues that the dependence of is necessary.Comment: Accepted to NIPS 201
Efficient state-space inference of periodic latent force models
Latent force models (LFM) are principled approaches to incorporating solutions to differen-tial equations within non-parametric inference methods. Unfortunately, the developmentand application of LFMs can be inhibited by their computational cost, especially whenclosed-form solutions for the LFM are unavailable, as is the case in many real world prob-lems where these latent forces exhibit periodic behaviour. Given this, we develop a newsparse representation of LFMs which considerably improves their computational efficiency,as well as broadening their applicability, in a principled way, to domains with periodic ornear periodic latent forces. Our approach uses a linear basis model to approximate onegenerative model for each periodic force. We assume that the latent forces are generatedfrom Gaussian process priors and develop a linear basis model which fully expresses thesepriors. We apply our approach to model the thermal dynamics of domestic buildings andshow that it is effective at predicting day-ahead temperatures within the homes. We alsoapply our approach within queueing theory in which quasi-periodic arrival rates are mod-elled as latent forces. In both cases, we demonstrate that our approach can be implemented efficiently using state-space methods which encode the linear dynamic systems via LFMs.Further, we show that state estimates obtained using periodic latent force models can re-duce the root mean squared error to 17% of that from non-periodic models and 27% of thenearest rival approach which is the resonator model (S ̈arkk ̈a et al., 2012; Hartikainen et al.,2012.
APHRODITE: an Anomaly-based Architecture for False Positive Reduction
We present APHRODITE, an architecture designed to reduce false positives in
network intrusion detection systems. APHRODITE works by detecting anomalies in
the output traffic, and by correlating them with the alerts raised by the NIDS
working on the input traffic. Benchmarks show a substantial reduction of false
positives and that APHRODITE is effective also after a "quick setup", i.e. in
the realistic case in which it has not been "trained" and set up optimall
On-line Human Activity Recognition from Audio and Home Automation Sensors: comparison of sequential and non-sequential models in realistic Smart Homes
International audienceAutomatic human Activity Recognition (AR) is an important process for the provision of context-aware services in smart spaces such as voice-controlled smart homes. In this paper, we present an on-line Activities of Daily Living (ADL) recognition method for automatic identification within homes in which multiple sensors, actuators and automation equipment coexist, including audio sensors. Three sequence-based models are presented and compared: a Hidden Markov Model (HMM), Conditional Random Fields (CRF) and a sequential Markov Logic Network (MLN). These methods have been tested in two real Smart Homes thanks to experiments involving more than 30 participants. Their results were compared to those of three non-sequential models: a Support Vector Machine (SVM), a Random Forest (RF) and a non-sequential MLN. This comparative study shows that CRF gave the best results for on-line activity recognition from non-visual, audio and home automation sensors
- …