52,605 research outputs found

    Gradient-based Reinforcement Planning in Policy-Search Methods

    Full text link
    We introduce a learning method called ``gradient-based reinforcement planning'' (GREP). Unlike traditional DP methods that improve their policy backwards in time, GREP is a gradient-based method that plans ahead and improves its policy before it actually acts in the environment. We derive formulas for the exact policy gradient that maximizes the expected future reward and confirm our ideas with numerical experiments.Comment: This is an extended version of the paper presented at the EWRL 2001 in Utrecht (The Netherlands

    A Survey of Monte Carlo Tree Search Methods

    Get PDF
    Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work

    Exoplanets - search methods, discoveries, and prospects for astrobiology

    Get PDF
    Whereas the Solar System has Mars and Europa as the best candidates for finding fossil/extant life as we know it - based on complex carbon compounds and liquid water - the 263 (non-pulsar) planetary systems around other stars as known at 15 September 2008 could between them possess many more planets where life might exist. Moreover, the number of these exoplanetary systems is growing steadily, and with this growth there is an increase in the number of planets that could bear carbon-liquid water life. In this brief review the main methods by which exoplanets are being discovered are outlined, and then the discoveries that have so far been made are presented. Habitability is then discussed, and an outline presented of how a planet could be studied from afar to determine whether it is habitable, and whether it is indeed inhabited. This review is aimed at the astrobiology community, which spans many disciplines, few of which involve exoplanets. It is therefore at a basic level and concentrates on the major topics.Comment: 37 pages, 12 Figure

    Importance mixing: Improving sample reuse in evolutionary policy search methods

    Full text link
    Deep neuroevolution, that is evolutionary policy search methods based on deep neural networks, have recently emerged as a competitor to deep reinforcement learning algorithms due to their better parallelization capabilities. However, these methods still suffer from a far worse sample efficiency. In this paper we investigate whether a mechanism known as "importance mixing" can significantly improve their sample efficiency. We provide a didactic presentation of importance mixing and we explain how it can be extended to reuse more samples. Then, from an empirical comparison based on a simple benchmark, we show that, though it actually provides better sample efficiency, it is still far from the sample efficiency of deep reinforcement learning, though it is more stable

    Time-Dependent Point Source Search Methods in High Energy Neutrino Astronomy

    Full text link
    We present maximum-likelihood search methods for time-dependent fluxes from point sources, such as flares or periodic emissions. We describe a method for the case when the time dependence of the flux can be assumed a priori from other observations, and we additionally describe a method to search for bursts with an unknown time dependence. In the context of high energy neutrino astronomy, we simulate one year of data from a cubic-kilometer scale neutrino detector and characterize these methods and equivalent binned methods with respect to the duration of neutrino emission. Compared to standard time-integrated searches, we find that up to an order of magnitude fewer events are needed to discover bursts with short durations, even when the burst time and duration are not known a priori.Comment: LaTeX; 17 Pages, 4 figures; submitted to Astroparticle Physic

    Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling

    Get PDF
    Greedy heuristics may be attuned by looking ahead for each possible choice, in an approach called the rollout or Pilot method. These methods may be seen as meta-heuristics that can enhance (any) heuristic solution, by repetitively modifying a master solution: similarly to what is done in game tree search, better choices are identified using lookahead, based on solutions obtained by repeatedly using a greedy heuristic. This paper first illustrates how the Pilot method improves upon some simple well known dispatch heuristics for the job-shop scheduling problem. The Pilot method is then shown to be a special case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the Pilot method, MCTS methods use random completion of partial solutions to identify promising branches of the tree. The Pilot method and a simple version of MCTS, using the Δ\varepsilon-greedy exploration paradigms, are then compared within the same framework, consisting of 300 scheduling problems of varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012

    The Choice of Search Methods: Some Empirical Evidence from Italy

    Get PDF
    In labour market part of the coordination process involves the matching between job skills and vacancies requiring specific skills. On the side of unemployed workers, the process requires a searching activity based on the gathering of information on available vacancies, the related wages and skills. The distinction among search methods plays a significant role as to the success of individual job search. The factors characterising the methods and the individuals searching for a job influence their choice. The specific aim of this empirical analysis is to understand how individual look for a job and, thus, how they decide to choose the search methods drawn from the set of search actions as specified in the 1993 Bank of Italy Survey.labour supply; unemployment; models and job search
    • 

    corecore