Search CORE

30,720 research outputs found

Discovering Valuable Items from Massive Data

Author: Bubeck S.
Dani V.
Desautels T.
Garnett R.
Kale S.
Krause A.
Kulesza A.
Lawrence N. D.
Lin H.
Nemhauser G.
Rasmussen C. E.
Schölkopf B.
Settles B.
Slivkins A.
Streeter M.
Streeter M.
Tran-Thanh L.
Yue Y.
Publication venue
Publication date: 02/06/2015
Field of study

Suppose there is a large collection of items, each with an associated cost and an inherent utility that is revealed only once we commit to selecting it. Given a budget on the cumulative cost of the selected items, how can we pick a subset of maximal value? This task generalizes several important problems such as multi-arm bandits, active search and the knapsack problem. We present an algorithm, GP-Select, which utilizes prior knowledge about similarity be- tween items, expressed as a kernel function. GP-Select uses Gaussian process prediction to balance exploration (estimating the unknown value of items) and exploitation (selecting items of high value). We extend GP-Select to be able to discover sets that simultaneously have high utility and are diverse. Our preference for diversity can be specified as an arbitrary monotone submodular function that quantifies the diminishing returns obtained when selecting similar items. Furthermore, we exploit the structure of the model updates to achieve an order of magnitude (up to 40X) speedup in our experiments without resorting to approximations. We provide strong guarantees on the performance of GP-Select and apply it to three real-world case studies of industrial relevance: (1) Refreshing a repository of prices in a Global Distribution System for the travel industry, (2) Identifying diverse, binding-affine peptides in a vaccine de- sign task and (3) Maximizing clicks in a web-scale recommender system by recommending items to users

arXiv.org e-Print Archive

CiteSeerX

Repository for Publications and Research Data

Crossref

Learning Classical Planning Strategies with Policy Gradient

Author: Alrajeh Dalal
Gomoluch Pawel
Russo Alessandra
Publication venue
Publication date: 07/02/2019
Field of study

A common paradigm in classical planning is heuristic forward search. Forward search planners often rely on simple best-first search which remains fixed throughout the search process. In this paper, we introduce a novel search framework capable of alternating between several forward search approaches while solving a particular planning problem. Selection of the approach is performed using a trainable stochastic policy, mapping the state of the search to a probability distribution over the approaches. This enables using policy gradient to learn search strategies tailored to a specific distributions of planning problems and a selected performance metric, e.g. the IPC score. We instantiate the framework by constructing a policy space consisting of five search approaches and a two-dimensional representation of the planner's state. Then, we train the system on randomly generated problems from five IPC domains using three different performance metrics. Our experimental results show that the learner is able to discover domain-specific search strategies, improving the planner's performance relative to the baselines of plain best-first search and a uniform policy.Comment: Accepted for ICAPS 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository

Pilot, Rollout and Monte Carlo Tree Search Methods for Job Shop Scheduling

Author: C. Duin
D. Bertsekas
E. Taillard
L. Kocsis
L. Xu
M.J. Streeter
P. Auer
P. Rolet
S. Panwalkar
S. Voß
T. Lai
Á. Fialho
Publication venue
Publication date: 01/01/2012
Field of study

Greedy heuristics may be attuned by looking ahead for each possible choice, in an approach called the rollout or Pilot method. These methods may be seen as meta-heuristics that can enhance (any) heuristic solution, by repetitively modifying a master solution: similarly to what is done in game tree search, better choices are identified using lookahead, based on solutions obtained by repeatedly using a greedy heuristic. This paper first illustrates how the Pilot method improves upon some simple well known dispatch heuristics for the job-shop scheduling problem. The Pilot method is then shown to be a special case of the more recent Monte Carlo Tree Search (MCTS) methods: Unlike the Pilot method, MCTS methods use random completion of partial solutions to identify promising branches of the tree. The Pilot method and a simple version of MCTS, using the

\varepsilon

-greedy exploration paradigms, are then compared within the same framework, consisting of 300 scheduling problems of varying sizes with fixed-budget of rollouts. Results demonstrate that MCTS reaches better or same results as the Pilot methods in this context.Comment: Learning and Intelligent OptimizatioN (LION'6) 7219 (2012

arXiv.org e-Print Archive

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Monte Carlo Approaches to Parameterized Poker Squares

Author: Anton Calin
Arrington Robert
Bogaerts Steven
Castro-Wunsch Karo
Langely Clay
Maga William
Messinger Colin M.
Neller Todd W.
Yang Zuozhi
Publication venue: The Cupola: Scholarship at Gettysburg College
Publication date: 29/06/2016
Field of study

The paper summarized a variety of Monte Carlo approaches employed in the top three performing entries to the Parameterized Poker Squares NSG Challenge competition. In all cases AI players benefited from real-time machine learning and various Monte Carlo game-tree search techniques

Crossref

Gettysburg College