Reinforcement learning algorithms and exploration exploitation dilemma

Šušić, Igor

Reinforcement learning algorithms and exploration exploitation dilemma

Authors: Igor Šušić
Publication date: 17 September 2020
Publisher: University of Split. University of Split, Faculty of science. Department of Informatics.

Abstract

Ovaj rad nudi pregled poduprtog učenja, ideja i algoritama na kojima se područje temelji. Kroz primjere polazi se od formalizacije problema koristeći Markovljeve procese odluke sve do Q-učenja. Promatra dilemu istraživanja i iskorištavanja, daje poveznicu s ponašanjem živih bića. Analizira te uspoređuje uspjeh različitih politika ponašanja kao rješenja dileme, točnije uspoređuje pohlepni epsilon, softmax i gornju granicu intervala pouzdanosti nad jednostavnim labirintom. Softmax je jedina politika koja u ograničenom broju epizoda konvergira na optimalnu politiku ponašanja za primjer korišten u radu.This work gives an overview of reinforcement learning, ideas, and algorithms that it is based upon. Through examples, it goesfrom formalization using MDP to Q-learning. Observesthe explorationexploitation dilemma and gives a link to the psychology of living creatures. Analyzes and compares multiple policies, such as epsilon greedy, softmax, and upper confidence bound in a use case of a simple maze. In such envirovment with some restraints like finite episode count only softmax converged to the optimal policy

Similar works

Full text

Available Versions

Repository of Faculty of Science

oai:repozitorij.pmfst.unist.hr...

Last time updated on 20/03/2021