83 research outputs found
Preference-Based Monte Carlo Tree Search
Monte Carlo tree search (MCTS) is a popular choice for solving sequential
anytime problems. However, it depends on a numeric feedback signal, which can
be difficult to define. Real-time MCTS is a variant which may only rarely
encounter states with an explicit, extrinsic reward. To deal with such cases,
the experimenter has to supply an additional numeric feedback signal in the
form of a heuristic, which intrinsically guides the agent. Recent work has
shown evidence that in different areas the underlying structure is ordinal and
not numerical. Hence erroneous and biased heuristics are inevitable, especially
in such domains. In this paper, we propose a MCTS variant which only depends on
qualitative feedback, and therefore opens up new applications for MCTS. We also
find indications that translating absolute into ordinal feedback may be
beneficial. Using a puzzle domain, we show that our preference-based MCTS
variant, wich only receives qualitative feedback, is able to reach a
performance level comparable to a regular MCTS baseline, which obtains
quantitative feedback.Comment: To be publishe
In situ compatibilisation of alkenyl-terminated polymer blends using cross metathesis
Several compatibilised polyolefin-based blends have been obtained via rather simple and robust chemistry: olefin cross metathesis using Grubbs' second-generation catalyst (G2) of alkenyl-terminated macromolecules of different nature. The viability of the concept was first demonstrated for low molecular weight polyolefin macromolecules before being extended to higher molecular weight polymers, including polar ones such as poly(3-caprolactone) (PCL), poly(pentadecalactone) (PPDL) and poly(methylmethacrylate) (PMMA). When taking all the possible cross metathesis reactions into account, a statistical distribution of homopolymers and diblock copolymers is likely to be formed. While clear macrophase separation is visible in the uncompatibilised blends of macromolecules, it is absent for the in situ compatibilised products, as was confirmed by optical microscopy. It was demonstrated that even small amounts of diblock copolymers can effectively compatibilise the two phases. All materials were analysed by HT SEC, DSC, HT HPLC and optical microscopy. Such a proof of principle indicates that using cross metathesis on a large library of macromolecules might be a versatile "synthetic handle" to reach a variety of in situ compatibilised blends
Minimizing Simple and Cumulative Regret in Monte-Carlo Tree Search
Regret minimization is important in both the Multi-Armed Bandit problem and Monte-Carlo Tree Search (MCTS). Recently, sim-ple regret, i.e., the regret of not recommending the best action, has been proposed as an alternative to cumulative regret in MCTS, i.e., regret accumulated over time. Each type of regret is appropriate in different contexts. Although the majority of MCTS research applies the UCT se-lection policy for minimizing cumulative regret in the tree, this paper introduces a new MCTS variant, Hybrid MCTS (H-MCTS), which min-imizes both types of regret in different parts of the tree. H-MCTS uses SHOT, a recursive version of Sequential Halving, to minimize simple regret near the root, and UCT to minimize cumulative regret when de-scending further down the tree. We discuss the motivation for this new search technique, and show the performance of H-MCTS in six distinc
- …