Search CORE

8 research outputs found

Auto-organisation modulaire d'une architecture intelligente

Author: Scherrer Bruno
Publication venue: HAL CCSD
Publication date: 01/10/2001
Field of study

Colloque avec actes et comité de lecture. nationale.National audienceCe papier présente notre démarche sur l'étude de l'auto-organisation modulaire d'un système de décision complètement générique. Dans un premier temps, nous décrivons l'approche de l'apprentissage par renforcement. Nous montrons de quelle façon le cadre formel des processus décisionnels de Markov (PDM) permet de définir précisément la notion de spécialisation modulaire. Ensuite, nous dérivons une abstraction des principes généraux d'auto-organisation de nombreux algorithmes connexionnistes de classification. Nous adaptons ces principes au problème de l'émergence de modules fonctionnels dans un système s'appuyant sur les PDM: un agent amené à résoudre une série de tâches va, au cours du temps, voir différents modules le constituant se spécialiser et former un tout cohérent et efficace. Nous expliquons et justifions notre démarche et dressons des objectifs à court terme

INRIA a CCSD electronic archive server

HAL-Rennes 1

Modular self-organization

Author: Scherrer Bruno
Publication venue: HAL CCSD
Publication date: 01/01/2003
Field of study

The aim of this paper is to provide a sound framework for addressing a difficult problem: the automatic construction of an autonomous agent's modular architecture. We combine results from two apparently uncorrelated domains: Autonomous planning through Markov Decision Processes and a General Data Clustering Approach using a kernel-like method. Our fundamental idea is that the former is a good framework for addressing autonomy whereas the latter allows to tackle self-organizing problems

INRIA a CCSD electronic archive server

HAL-Rennes 1

Error reducing sampling in reinforcement learning

Author: Mannor Shie
Scherrer Bruno
Publication venue: HAL CCSD
Publication date: 01/01/2004
Field of study

In reinforcement learning, an agent collects information interacting with an environment and uses it to derive a behavior. This paper focuses on efficient sampling; that is, the problem of choosing the interaction samples so that the corresponding behavior tends quickly to the optimal behavior. Our main result is a sensitivity analysis relating the choice of sampling any state-action pair to the decrease of an error bound on the optimal solution. We derive two new model-based algorithms. Simulations demonstrate a quicker convergence (in the sense of the number of samples) of the value function to the real optimal value function

INRIA a CCSD electronic archive server

HAL-Rennes 1

Adaptive value function approximation in reinforcement learning using wavelets

Author: Mitchley Michael
Publication venue
Publication date: 01/01/2016
Field of study

A thesis submitted to the Faculty of Science, School of Computational and Applied Mathematics University of the Witwatersrand, Johannesburg, in fulfilment of the requirements for the degree of Doctor of Philosophy. Johannesburg, South Africa, July 2015.Reinforcement learning agents solve tasks by finding policies that maximise their reward over time. The policy can be found from the value function, which represents the value of each state-action pair. In continuous state spaces, the value function must be approximated. Often, this is done using a fixed linear combination of functions across all dimensions. We introduce and demonstrate the wavelet basis for reinforcement learning, a basis function scheme competitive against state of the art fixed bases. We extend two online adaptive tiling schemes to wavelet functions and show their performance improvement across standard domains. Finally we introduce the Multiscale Adaptive Wavelet Basis (MAWB), a wavelet-based adaptive basis scheme which is dimensionally scalable and insensitive to the initial level of detail. This scheme adaptively grows the basis function set by combining across dimensions, or splitting within a dimension those candidate functions which have a high estimated projection onto the Bellman error. A number of novel measures are used to find this estimate.

Wits Institutional Repository on DSPACE