Search CORE

52,605 research outputs found

Gradient-based Reinforcement Planning in Policy-Search Methods

Author: Hutter Marcus
Kwee Ivo
Schmidhuber Juergen
Publication venue
Publication date: 01/01/2001
Field of study

We introduce a learning method called ``gradient-based reinforcement planning'' (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.Comment: This is an extended version of the paper presented at the EWRL 2001 in Utrecht (The Netherlands

arXiv.org e-Print Archive

CiteSeerX

The Australian National University

A Survey of Monte Carlo Tree Search Methods

Author: Browne Cameron B
Colton Simon
Cowling Peter I
Lucas Simon M
Perez Diego
Powley Edward
Rohlfshagen Philipp
Samothrakis Spyridon
Tavener Stephen
Whitehouse Daniel
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2012
Field of study

Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

University of Essex Research Repository

CiteSeerX

Maastricht University Research Portal

Crossref

Exoplanets - search methods, discoveries, and prospects for astrobiology

Author: Jones Barrie W
Publication venue: 'Cambridge University Press (CUP)'
Publication date: 01/10/2008
Field of study

Whereas the Solar System has Mars and Europa as the best candidates for finding fossil/extant life as we know it - based on complex carbon compounds and liquid water - the 263 (non-pulsar) planetary systems around other stars as known at 15 September 2008 could between them possess many more planets where life might exist. Moreover, the number of these exoplanetary systems is growing steadily, and with this growth there is an increase in the number of planets that could bear carbon-liquid water life. In this brief review the main methods by which exoplanets are being discovered are outlined, and then the discoveries that have so far been made are presented. Habitability is then discussed, and an outline presented of how a planet could be studied from afar to determine whether it is habitable, and whether it is indeed inhabited. This review is aimed at the astrobiology community, which spans many disciplines, few of which involve exoplanets. It is therefore at a basic level and concentrates on the major topics.Comment: 37 pages, 12 Figure

arXiv.org e-Print Archive

Open Research Online (The Open University)

Importance mixing: Improving sample reuse in evolutionary policy search methods

Author: Perrin Nicolas
Pourchot Aloïs
Sigaud Olivier
Publication venue
Publication date: 17/08/2018
Field of study

Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

arXiv.org e-Print Archive

Time-Dependent Point Source Search Methods in High Energy Neutrino Astronomy

Author: Abbasi
Abbasi
Abbasi
Abdo
Achterberg
Aguilar
Aharonian
Aharonian
Ahrens
Ahrens
Albert
Albrecht Karle
Alexandreas
Barr
Boettcher
Braun
Bugaev
Chad Finley
Dziewonski
Hill
Jim Braun
Jon Dumm
Kouveliotou
Mike Baker
Neunhöffer
Pumplin
Shanidze
Sokalski
Teresa Montaruli
Torres
Publication venue: 'Elsevier BV'
Publication date: 08/12/2009
Field of study

We present maximum-likelihood search methods for time-dependent fluxes from point sources, such as flares or periodic emissions. We describe a method for the case when the time dependence of the flux can be assumed a priori from other observations, and we additionally describe a method to search for bursts with an unknown time dependence. In the context of high energy neutrino astronomy, we simulate one year of data from a cubic-kilometer scale neutrino detector and characterize these methods and equivalent binned methods with respect to the duration of neutrino emission. Compared to standard time-integrated searches, we find that up to an order of magnitude fewer events are needed to discover bursts with short durations, even when the burst time and duration are not known a priori.Comment: LaTeX; 17 Pages, 4 figures; submitted to Astroparticle Physic

arXiv.org e-Print Archive

Crossref

Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling

Author: C. Duin
D. Bertsekas
E. Taillard
L. Kocsis
L. Xu
M.J. Streeter
P. Auer
P. Rolet
S. Panwalkar
S. Voß
T. Lai
Á. Fialho
Publication venue
Publication date: 01/01/2012
Field of study

Greedy heuristics may be attuned by looking ahead for each possible choice, in an approach called the rollout or Pilot method. These methods may be seen as meta-heuristics that can enhance (any) heuristic solution, by repetitively modifying a master solution: similarly to what is done in game tree search, better choices are identified using lookahead, based on solutions obtained by repeatedly using a greedy heuristic. This paper first illustrates how the Pilot method improves upon some simple well known dispatch heuristics for the job-shop scheduling problem. The Pilot method is then shown to be a special case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the Pilot method, MCTS methods use random completion of partial solutions to identify promising branches of the tree. The Pilot method and a simple version of MCTS, using the

\varepsilon

-greedy exploration paradigms, are then compared within the same framework, consisting of 300 scheduling problems of varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

The Choice of Search Methods: Some Empirical Evidence from Italy

Author: AUTIERO Giuseppina
MAZZOTTA Fernanda
Publication venue
Publication date
Field of study

In labour market part of the coordination process involves the matching between job skills and vacancies requiring specific skills. On the side of unemployed workers, the process requires a searching activity based on the gathering of information on available vacancies, the related wages and skills. The distinction among search methods plays a significant role as to the success of individual job search. The factors characterising the methods and the individuals searching for a job influence their choice. The specific aim of this empirical analysis is to understand how individual look for a job and, thus, how they decide to choose the search methods drawn from the set of search actions as specified in the 1993 Bank of Italy Survey.labour supply; unemployment; models and job search

Research Papers in Economics