Search CORE

3 research outputs found

ARIMA based Value Estimation in Wireless Sensor Networks

Author
Publication venue: 'Copernicus GmbH'
Publication date
Field of study

Crossref

ANALYSIS OF NON-BINARY FAULT TOLERANT EVENT DETECTION IN WIRELESS SENSOR NETWORKS

Author: Das Nandita
Jancee B Victoria
Radha S
Publication venue: 'Exeley, Inc.'
Publication date
Field of study

Exeley Inc.

The Exploration-Exploitation Trade-Off in Sequential Decision Making Problems

Author: Sykulski Adam M.
Sykulski Adam M.
Publication venue: Mathematics, Imperial College London
Publication date: 01/11/2011
Field of study

Sequential decision making problems require an agent to repeatedly choose between a series of actions. Common to such problems is the exploration-exploitation trade-off, where an agent must choose between the action expected to yield the best reward (exploitation) or trying an alternative action for potential future benefit (exploration). The main focus of this thesis is to understand in more detail the role this trade-off plays in various important sequential decision making problems, in terms of maximising finite-time reward. The most common and best studied abstraction of the exploration-exploitation trade-off is the classic multi-armed bandit problem. In this thesis we study several important extensions that are more suitable than the classic problem to real-world applications. These extensions include scenarios where the rewards for actions change over time or the presence of other agents must be repeatedly considered. In these contexts, the exploration-exploitation trade-off has a more complicated role in terms of maximising finite-time performance. For example, the amount of exploration required will constantly change in a dynamic decision problem, in multiagent problems agents can explore by communication, and in repeated games, the exploration-exploitation trade-off must be jointly considered with game theoretic reasoning. Existing techniques for balancing exploration-exploitation are focused on achieving desirable asymptotic behaviour and are in general only applicable to basic decision problems. The most flexible state-of-the-art approaches, έ-greedy and έ-first, require exploration parameters to be set a priori, the optimal values of which are highly dependent on the problem faced. To overcome this, we construct a novel algorithm, έ-ADAPT, which has no exploration parameters and can adapt exploration on-line for a wide range of problems. έ-ADAPT is built on newly proven theoretical properties of the έ-first policy and we demonstrate that έ-ADAPT can accurately learn not only how much to explore, but also when and which actions to explore

Spiral - Imperial College Digital Repository