4,373 research outputs found
Dynamic Control of Explore/Exploit Trade-Off In Bayesian Optimization
Bayesian optimization offers the possibility of optimizing black-box
operations not accessible through traditional techniques. The success of
Bayesian optimization methods such as Expected Improvement (EI) are
significantly affected by the degree of trade-off between exploration and
exploitation. Too much exploration can lead to inefficient optimization
protocols, whilst too much exploitation leaves the protocol open to strong
initial biases, and a high chance of getting stuck in a local minimum.
Typically, a constant margin is used to control this trade-off, which results
in yet another hyper-parameter to be optimized. We propose contextual
improvement as a simple, yet effective heuristic to counter this - achieving a
one-shot optimization strategy. Our proposed heuristic can be swiftly
calculated and improves both the speed and robustness of discovery of optimal
solutions. We demonstrate its effectiveness on both synthetic and real world
problems and explore the unaccounted for uncertainty in the pre-determination
of search hyperparameters controlling explore-exploit trade-off.Comment: Accepted for publication in the proceedings of 2018 Computing
Conferenc
Bootstrapping Monte Carlo Tree Search with an Imperfect Heuristic
We consider the problem of using a heuristic policy to improve the value
approximation by the Upper Confidence Bound applied in Trees (UCT) algorithm in
non-adversarial settings such as planning with large-state space Markov
Decision Processes. Current improvements to UCT focus on either changing the
action selection formula at the internal nodes or the rollout policy at the
leaf nodes of the search tree. In this work, we propose to add an auxiliary arm
to each of the internal nodes, and always use the heuristic policy to roll out
simulations at the auxiliary arms. The method aims to get fast convergence to
optimal values at states where the heuristic policy is optimal, while retaining
similar approximation as the original UCT in other states. We show that
bootstrapping with the proposed method in the new algorithm, UCT-Aux, performs
better compared to the original UCT algorithm and its variants in two benchmark
experiment settings. We also examine conditions under which UCT-Aux works well.Comment: 16 pages, accepted for presentation at ECML'1
Recommended from our members
Towards Informed Exploration for Deep Reinforcement Learning
In this thesis, we discuss various techniques for improving exploration for deep reinforcement learning. We begin with a brief review of reinforcement learning (RL) and the fundamental v.s. exploitation trade-off. Then we review how deep RL has improved upon classical and summarize six categories of the latest exploration methods for deep RL, in the order increasing usage of prior information. We then explore representative works in three categories discuss their strengths and weaknesses. The first category, represented by Soft Q-learning, uses regularization to encourage exploration. The second category, represented by count-based via hashing, maps states to hash codes for counting and assigns higher exploration to less-encountered states. The third category utilizes hierarchy and is represented by modular architecture for RL agents to play StarCraft II. Finally, we conclude that exploration by prior knowledge is a promising research direction and suggest topics of potentially impact
- …