Search CORE

45,724 research outputs found

Competitive function approximation for reinforcement learning

Author: Agostini Alejandro Gabriel
Celaya Llover Enric
Publication venue
Publication date: 01/01/2014
Field of study

The application of reinforcement learning to problems with continuous domains requires representing the value function by means of function approximation. We identify two aspects of reinforcement learning that make the function approximation process hard: non-stationarity of the target function and biased sampling. Non-stationarity is the result of the bootstrapping nature of dynamic programming where the value function is estimated using its current approximation. Biased sampling occurs when some regions of the state space are visited too often, causing a reiterated updating with similar values which fade out the occasional updates of infrequently sampled regions. We propose a competitive approach for function approximation where many different local approximators are available at a given input and the one with expectedly best approximation is selected by means of a relevance function. The local nature of the approximators allows their fast adaptation to non-stationary changes and mitigates the biased sampling problem. The coexistence of multiple approximators updated and tried in parallel permits obtaining a good estimation much faster than would be possible with a single approximator. Experiments in different benchmark problems show that the competitive strategy provides a faster and more stable learning than non-competitive approaches.Preprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

UPCommons. Portal del coneixement obert de la UPC

Digital.CSIC

Tracking-Based Non-Parametric Background-Foreground Classification in a Chromaticity-Gradient Space

Author: Cuevas Rodríguez Carlos
García Santos Narciso
Publication venue: E.T.S.I. Telecomunicación (UPM)
Publication date: 01/12/2010
Field of study

This work presents a novel background-foreground classification technique based on adaptive non-parametric kernel estimation in a color-gradient space of components. By combining normalized color components with their gradients, shadows are efficiently suppressed from the results, while the luminance information in the moving objects is preserved. Moreover, a fast multi-region iterative tracking strategy applied over previously detected foreground regions allows to construct a robust foreground modeling, which combined with the background model increases noticeably the quality in the detections. The proposed strategy has been applied to different kind of sequences, obtaining satisfactory results in complex situations such as those given by dynamic backgrounds, illumination changes, shadows and multiple moving objects

Crossref

Archivo Digital UPM

Multi-agents adaptive estimation and coverage control using Gaussian regression

Author: Carli Ruggero
Carron Andrea
Pillonetto Gianluigi
Schenato Luca
Todescato Marco
Publication venue
Publication date: 22/07/2014
Field of study

We consider a scenario where the aim of a group of agents is to perform the optimal coverage of a region according to a sensory function. In particular, centroidal Voronoi partitions have to be computed. The difficulty of the task is that the sensory function is unknown and has to be reconstructed on line from noisy measurements. Hence, estimation and coverage needs to be performed at the same time. We cast the problem in a Bayesian regression framework, where the sensory function is seen as a Gaussian random field. Then, we design a set of control inputs which try to well balance coverage and estimation, also discussing convergence properties of the algorithm. Numerical experiments show the effectivness of the new approach

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Padova

Sequential Design for Optimal Stopping Problems

Author: Gramacy Robert B.
Ludkovski Mike
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 29/07/2014
Field of study

We propose a new approach to solve optimal stopping problems via simulation. Working within the backward dynamic programming/Snell envelope framework, we augment the methodology of Longstaff-Schwartz that focuses on approximating the stopping strategy. Namely, we introduce adaptive generation of the stochastic grids anchoring the simulated sample paths of the underlying state process. This allows for active learning of the classifiers partitioning the state space into the continuation and stopping regions. To this end, we examine sequential design schemes that adaptively place new design points close to the stopping boundaries. We then discuss dynamic regression algorithms that can implement such recursive estimation and local refinement of the classifiers. The new algorithm is illustrated with a variety of numerical experiments, showing that an order of magnitude savings in terms of design size can be achieved. We also compare with existing benchmarks in the context of pricing multi-dimensional Bermudan options.Comment: 24 page

arXiv.org e-Print Archive

CiteSeerX

Finding Structural Information of RF Power Amplifiers using an Orthogonal Non-Parametric Kernel Smoothing Estimator

Author: Handel Peter
Isaksson Magnus
Khan Zain Ahmed
Zenteno Efrain
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

A non-parametric technique for modeling the behavior of power amplifiers is presented. The proposed technique relies on the principles of density estimation using the kernel method and is suited for use in power amplifier modeling. The proposed methodology transforms the input domain into an orthogonal memory domain. In this domain, non-parametric static functions are discovered using the kernel estimator. These orthogonal, non-parametric functions can be fitted with any desired mathematical structure, thus facilitating its implementation. Furthermore, due to the orthogonality, the non-parametric functions can be analyzed and discarded individually, which simplifies pruning basis functions and provides a tradeoff between complexity and performance. The results show that the methodology can be employed to model power amplifiers, therein yielding error performance similar to state-of-the-art parametric models. Furthermore, a parameter-efficient model structure with 6 coefficients was derived for a Doherty power amplifier, therein significantly reducing the deployment's computational complexity. Finally, the methodology can also be well exploited in digital linearization techniques.Comment: Matlab sample code (15 MB): https://dl.dropboxusercontent.com/u/106958743/SampleMatlabKernel.zi

arXiv.org e-Print Archive

Publikationer från KTH

Digitala Vetenskapliga Arkivet - Academic Archive On-line