52,605 research outputs found
Gradient-based Reinforcement Planning in Policy-Search Methods
We introduce a learning method called ``gradient-based reinforcement
planning'' (GREP). Unlike traditional DP methods that improve their policy
backwards in time, GREP is a gradient-based method that plans ahead and
improves its policy before it actually acts in the environment. We derive
formulas for the exact policy gradient that maximizes the expected future
reward and confirm our ideas with numerical experiments.Comment: This is an extended version of the paper presented at the EWRL 2001
in Utrecht (The Netherlands
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
Exoplanets - search methods, discoveries, and prospects for astrobiology
Whereas the Solar System has Mars and Europa as the best candidates for
finding fossil/extant life as we know it - based on complex carbon compounds
and liquid water - the 263 (non-pulsar) planetary systems around other stars as
known at 15 September 2008 could between them possess many more planets where
life might exist. Moreover, the number of these exoplanetary systems is growing
steadily, and with this growth there is an increase in the number of planets
that could bear carbon-liquid water life. In this brief review the main methods
by which exoplanets are being discovered are outlined, and then the discoveries
that have so far been made are presented. Habitability is then discussed, and
an outline presented of how a planet could be studied from afar to determine
whether it is habitable, and whether it is indeed inhabited. This review is
aimed at the astrobiology community, which spans many disciplines, few of which
involve exoplanets. It is therefore at a basic level and concentrates on the
major topics.Comment: 37 pages, 12 Figure
Importance mixing: Improving sample reuse in evolutionary policy search methods
Deep neuroevolution, that is evolutionary policy search methods based on deep
neural networks, have recently emerged as a competitor to deep reinforcement
learning algorithms due to their better parallelization capabilities. However,
these methods still suffer from a far worse sample efficiency. In this paper we
investigate whether a mechanism known as "importance mixing" can significantly
improve their sample efficiency. We provide a didactic presentation of
importance mixing and we explain how it can be extended to reuse more samples.
Then, from an empirical comparison based on a simple benchmark, we show that,
though it actually provides better sample efficiency, it is still far from the
sample efficiency of deep reinforcement learning, though it is more stable
Time-Dependent Point Source Search Methods in High Energy Neutrino Astronomy
We present maximum-likelihood search methods for time-dependent fluxes from
point sources, such as flares or periodic emissions. We describe a method for
the case when the time dependence of the flux can be assumed a priori from
other observations, and we additionally describe a method to search for bursts
with an unknown time dependence. In the context of high energy neutrino
astronomy, we simulate one year of data from a cubic-kilometer scale neutrino
detector and characterize these methods and equivalent binned methods with
respect to the duration of neutrino emission. Compared to standard
time-integrated searches, we find that up to an order of magnitude fewer events
are needed to discover bursts with short durations, even when the burst time
and duration are not known a priori.Comment: LaTeX; 17 Pages, 4 figures; submitted to Astroparticle Physic
Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling
Greedy heuristics may be attuned by looking ahead for each possible choice,
in an approach called the rollout or Pilot method. These methods may be seen as
meta-heuristics that can enhance (any) heuristic solution, by repetitively
modifying a master solution: similarly to what is done in game tree search,
better choices are identified using lookahead, based on solutions obtained by
repeatedly using a greedy heuristic. This paper first illustrates how the Pilot
method improves upon some simple well known dispatch heuristics for the
job-shop scheduling problem. The Pilot method is then shown to be a special
case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the
Pilot method, MCTS methods use random completion of partial solutions to
identify promising branches of the tree. The Pilot method and a simple version
of MCTS, using the -greedy exploration paradigms, are then
compared within the same framework, consisting of 300 scheduling problems of
varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS
reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012
The Choice of Search Methods: Some Empirical Evidence from Italy
In labour market part of the coordination process involves the matching between job skills and vacancies requiring specific skills. On the side of unemployed workers, the process requires a searching activity based on the gathering of information on available vacancies, the related wages and skills. The distinction among search methods plays a significant role as to the success of individual job search. The factors characterising the methods and the individuals searching for a job influence their choice. The specific aim of this empirical analysis is to understand how individual look for a job and, thus, how they decide to choose the search methods drawn from the set of search actions as specified in the 1993 Bank of Italy Survey.labour supply; unemployment; models and job search
- âŠ