1,781 research outputs found

    Exploration vs. Exploitation in the Information Filtering Problem

    Full text link
    We consider information filtering, in which we face a stream of items too voluminous to process by hand (e.g., scientific articles, blog posts, emails), and must rely on a computer system to automatically filter out irrelevant items. Such systems face the exploration vs. exploitation tradeoff, in which it may be beneficial to present an item despite a low probability of relevance, just to learn about future items with similar content. We present a Bayesian sequential decision-making model of this problem, show how it may be solved to optimality using a decomposition to a collection of two-armed bandit problems, and show structural results for the optimal policy. We show that the resulting method is especially useful when facing the cold start problem, i.e., when filtering items for new users without a long history of past interactions. We then present an application of this information filtering method to a historical dataset from the arXiv.org repository of scientific articles.Comment: 36 pages, 5 figure

    Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling

    Get PDF
    Greedy heuristics may be attuned by looking ahead for each possible choice, in an approach called the rollout or Pilot method. These methods may be seen as meta-heuristics that can enhance (any) heuristic solution, by repetitively modifying a master solution: similarly to what is done in game tree search, better choices are identified using lookahead, based on solutions obtained by repeatedly using a greedy heuristic. This paper first illustrates how the Pilot method improves upon some simple well known dispatch heuristics for the job-shop scheduling problem. The Pilot method is then shown to be a special case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the Pilot method, MCTS methods use random completion of partial solutions to identify promising branches of the tree. The Pilot method and a simple version of MCTS, using the ε\varepsilon-greedy exploration paradigms, are then compared within the same framework, consisting of 300 scheduling problems of varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012

    Reinforcement Learning: A Survey

    Full text link
    This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning.Comment: See http://www.jair.org/ for any accompanying file

    Sustainable Cooperative Coevolution with a Multi-Armed Bandit

    Get PDF
    This paper proposes a self-adaptation mechanism to manage the resources allocated to the different species comprising a cooperative coevolutionary algorithm. The proposed approach relies on a dynamic extension to the well-known multi-armed bandit framework. At each iteration, the dynamic multi-armed bandit makes a decision on which species to evolve for a generation, using the history of progress made by the different species to guide the decisions. We show experimentally, on a benchmark and a real-world problem, that evolving the different populations at different paces allows not only to identify solutions more rapidly, but also improves the capacity of cooperative coevolution to solve more complex problems.Comment: Accepted at GECCO 201
    corecore