27,037 research outputs found
Online Meta-learning by Parallel Algorithm Competition
The efficiency of reinforcement learning algorithms depends critically on a
few meta-parameters that modulates the learning updates and the trade-off
between exploration and exploitation. The adaptation of the meta-parameters is
an open question in reinforcement learning, which arguably has become more of
an issue recently with the success of deep reinforcement learning in
high-dimensional state spaces. The long learning times in domains such as Atari
2600 video games makes it not feasible to perform comprehensive searches of
appropriate meta-parameter values. We propose the Online Meta-learning by
Parallel Algorithm Competition (OMPAC) method. In the OMPAC method, several
instances of a reinforcement learning algorithm are run in parallel with small
differences in the initial values of the meta-parameters. After a fixed number
of episodes, the instances are selected based on their performance in the task
at hand. Before continuing the learning, Gaussian noise is added to the
meta-parameters with a predefined probability. We validate the OMPAC method by
improving the state-of-the-art results in stochastic SZ-Tetris and in standard
Tetris with a smaller, 1010, board, by 31% and 84%, respectively, and
by improving the results for deep Sarsa() agents in three Atari 2600
games by 62% or more. The experiments also show the ability of the OMPAC method
to adapt the meta-parameters according to the learning progress in different
tasks.Comment: 15 pages, 10 figures. arXiv admin note: text overlap with
arXiv:1702.0311
ASlib: A Benchmark Library for Algorithm Selection
The task of algorithm selection involves choosing an algorithm from a set of
algorithms on a per-instance basis in order to exploit the varying performance
of algorithms over a set of instances. The algorithm selection problem is
attracting increasing attention from researchers and practitioners in AI. Years
of fruitful applications in a number of domains have resulted in a large amount
of data, but the community lacks a standard format or repository for this data.
This situation makes it difficult to share and compare different approaches
effectively, as is done in other, more established fields. It also
unnecessarily hinders new researchers who want to work in this area. To address
this problem, we introduce a standardized format for representing algorithm
selection scenarios and a repository that contains a growing number of data
sets from the literature. Our format has been designed to be able to express a
wide variety of different scenarios. Demonstrating the breadth and power of our
platform, we describe a set of example experiments that build and evaluate
algorithm selection models through a common interface. The results display the
potential of algorithm selection to achieve significant performance
improvements across a broad range of problems and algorithms.Comment: Accepted to be published in Artificial Intelligence Journa
Portfolio-based Planning: State of the Art, Common Practice and Open Challenges
In recent years the field of automated planning has significantly
advanced and several powerful domain-independent
planners have been developed. However, none of these systems
clearly outperforms all the others in every known
benchmark domain. This observation motivated the idea of
configuring and exploiting a portfolio of planners to perform
better than any individual planner: some recent planning systems
based on this idea achieved significantly good results in
experimental analysis and International Planning Competitions.
Such results let us suppose that future challenges of the
Automated Planning community will converge on designing
different approaches for combining existing planning algorithms.
This paper reviews existing techniques and provides an exhaustive
guide to portfolio-based planning. In addition, the
paper outlines open issues of existing approaches and highlights
possible future evolution of these techniques
A biologically inspired meta-control navigation system for the Psikharpax rat robot
A biologically inspired navigation system for the mobile rat-like robot named Psikharpax is presented, allowing for self-localization and autonomous navigation in an initially unknown environment. The ability of parts of the model (e. g. the strategy selection mechanism) to reproduce rat behavioral data in various maze tasks has been validated before in simulations. But the capacity of the model to work on a real robot platform had not been tested. This paper presents our work on the implementation on the Psikharpax robot of two independent navigation strategies (a place-based planning strategy and a cue-guided taxon strategy) and a strategy selection meta-controller. We show how our robot can memorize which was the optimal strategy in each situation, by means of a reinforcement learning algorithm. Moreover, a context detector enables the controller to quickly adapt to changes in the environment-recognized as new contexts-and to restore previously acquired strategy preferences when a previously experienced context is recognized. This produces adaptivity closer to rat behavioral performance and constitutes a computational proposition of the role of the rat prefrontal cortex in strategy shifting. Moreover, such a brain-inspired meta-controller may provide an advancement for learning architectures in robotics
Lifelong Neural Predictive Coding: Learning Cumulatively Online without Forgetting
In lifelong learning systems, especially those based on artificial neural
networks, one of the biggest obstacles is the severe inability to retain old
knowledge as new information is encountered. This phenomenon is known as
catastrophic forgetting. In this article, we propose a new kind of
connectionist architecture, the Sequential Neural Coding Network, that is
robust to forgetting when learning from streams of data points and, unlike
networks of today, does not learn via the immensely popular back-propagation of
errors. Grounded in the neurocognitive theory of predictive processing, our
model adapts its synapses in a biologically-plausible fashion, while another,
complementary neural system rapidly learns to direct and control this
cortex-like structure by mimicking the task-executive control functionality of
the basal ganglia. In our experiments, we demonstrate that our self-organizing
system experiences significantly less forgetting as compared to standard neural
models and outperforms a wide swath of previously proposed methods even though
it is trained across task datasets in a stream-like fashion. The promising
performance of our complementary system on benchmarks, e.g., SplitMNIST, Split
Fashion MNIST, and Split NotMNIST, offers evidence that by incorporating
mechanisms prominent in real neuronal systems, such as competition, sparse
activation patterns, and iterative input processing, a new possibility for
tackling the grand challenge of lifelong machine learning opens up.Comment: Key updates including results on standard benchmarks, e.g., split
mnist/fmnist/not-mnist. Task selection/basal ganglia model has been
integrate
Cross-Device Tracking: Matching Devices and Cookies
The number of computers, tablets and smartphones is increasing rapidly, which
entails the ownership and use of multiple devices to perform online tasks. As
people move across devices to complete these tasks, their identities becomes
fragmented. Understanding the usage and transition between those devices is
essential to develop efficient applications in a multi-device world. In this
paper we present a solution to deal with the cross-device identification of
users based on semi-supervised machine learning methods to identify which
cookies belong to an individual using a device. The method proposed in this
paper scored third in the ICDM 2015 Drawbridge Cross-Device Connections
challenge proving its good performance
Robotic Pick-and-Place of Novel Objects in Clutter with Multi-Affordance Grasping and Cross-Domain Image Matching
This paper presents a robotic pick-and-place system that is capable of
grasping and recognizing both known and novel objects in cluttered
environments. The key new feature of the system is that it handles a wide range
of object categories without needing any task-specific training data for novel
objects. To achieve this, it first uses a category-agnostic affordance
prediction algorithm to select and execute among four different grasping
primitive behaviors. It then recognizes picked objects with a cross-domain
image classification framework that matches observed images to product images.
Since product images are readily available for a wide range of objects (e.g.,
from the web), the system works out-of-the-box for novel objects without
requiring any additional training data. Exhaustive experimental results
demonstrate that our multi-affordance grasping achieves high success rates for
a wide variety of objects in clutter, and our recognition algorithm achieves
high accuracy for both known and novel grasped objects. The approach was part
of the MIT-Princeton Team system that took 1st place in the stowing task at the
2017 Amazon Robotics Challenge. All code, datasets, and pre-trained models are
available online at http://arc.cs.princeton.eduComment: Project webpage: http://arc.cs.princeton.edu Summary video:
https://youtu.be/6fG7zwGfIk
A Survey of Monte Carlo Tree Search Methods
Monte Carlo tree search (MCTS) is a recently proposed search method that combines the precision of tree search with the generality of random sampling. It has received considerable interest due to its spectacular success in the difficult problem of computer Go, but has also proved beneficial in a range of other domains. This paper is a survey of the literature to date, intended to provide a snapshot of the state of the art after the first five years of MCTS research. We outline the core algorithm's derivation, impart some structure on the many variations and enhancements that have been proposed, and summarize the results from the key game and nongame domains to which MCTS methods have been applied. A number of open research questions indicate that the field is ripe for future work
- …