8 research outputs found
Auto-organisation modulaire d'une architecture intelligente
Colloque avec actes et comité de lecture. nationale.National audienceCe papier présente notre démarche sur l'étude de l'auto-organisation modulaire d'un système de décision complètement générique. Dans un premier temps, nous décrivons l'approche de l'apprentissage par renforcement. Nous montrons de quelle façon le cadre formel des processus décisionnels de Markov (PDM) permet de définir précisément la notion de spécialisation modulaire. Ensuite, nous dérivons une abstraction des principes généraux d'auto-organisation de nombreux algorithmes connexionnistes de classification. Nous adaptons ces principes au problème de l'émergence de modules fonctionnels dans un système s'appuyant sur les PDM: un agent amené à résoudre une série de tâches va, au cours du temps, voir différents modules le constituant se spécialiser et former un tout cohérent et efficace. Nous expliquons et justifions notre démarche et dressons des objectifs à court terme
Modular self-organization
The aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems
Error reducing sampling in reinforcement learning
In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing the interaction samples so that the corresponding behavior tends quickly to the optimal behavior. Our main result is a sensitivity analysis relating the choice of sampling any state-action pair to the decrease of an error bound on the optimal solution. We derive two new model-based algorithms. Simulations demonstrate a quicker convergence (in the sense of the number of samples) of the value function to the real optimal value function
Adaptive value function approximation in reinforcement learning using wavelets
A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015.Reinforcement learning agents solve tasks by finding policies that maximise their reward
over time. The policy can be found from the value function, which represents the value
of each state-action pair. In continuous state spaces, the value function must be approximated.
Often, this is done using a fixed linear combination of functions across all
dimensions.
We introduce and demonstrate the wavelet basis for reinforcement learning, a basis
function scheme competitive against state of the art fixed bases. We extend two online
adaptive tiling schemes to wavelet functions and show their performance improvement
across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis
(MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive
to the initial level of detail. This scheme adaptively grows the basis function
set by combining across dimensions, or splitting within a dimension those candidate functions
which have a high estimated projection onto the Bellman error. A number of novel
measures are used to find this estimate.