8,882 research outputs found
GP-Localize: Persistent Mobile Robot Localization using Online Sparse Gaussian Process Observation Model
Central to robot exploration and mapping is the task of persistent
localization in environmental fields characterized by spatially correlated
measurements. This paper presents a Gaussian process localization (GP-Localize)
algorithm that, in contrast to existing works, can exploit the spatially
correlated field measurements taken during a robot's exploration (instead of
relying on prior training data) for efficiently and scalably learning the GP
observation model online through our proposed novel online sparse GP. As a
result, GP-Localize is capable of achieving constant time and memory (i.e.,
independent of the size of the data) per filtering step, which demonstrates the
practical feasibility of using GPs for persistent robot localization and
autonomy. Empirical evaluation via simulated experiments with real-world
datasets and a real robot experiment shows that GP-Localize outperforms
existing GP localization algorithms.Comment: 28th AAAI Conference on Artificial Intelligence (AAAI 2014), Extended
version with proofs, 10 page
Gaussian Process Planning with Lipschitz Continuous Reward Functions: Towards Unifying Bayesian Optimization, Active Learning, and Beyond
This paper presents a novel nonmyopic adaptive Gaussian process planning
(GPP) framework endowed with a general class of Lipschitz continuous reward
functions that can unify some active learning/sensing and Bayesian optimization
criteria and offer practitioners some flexibility to specify their desired
choices for defining new tasks/problems. In particular, it utilizes a
principled Bayesian sequential decision problem framework for jointly and
naturally optimizing the exploration-exploitation trade-off. In general, the
resulting induced GPP policy cannot be derived exactly due to an uncountable
set of candidate observations. A key contribution of our work here thus lies in
exploiting the Lipschitz continuity of the reward functions to solve for a
nonmyopic adaptive epsilon-optimal GPP (epsilon-GPP) policy. To plan in real
time, we further propose an asymptotically optimal, branch-and-bound anytime
variant of epsilon-GPP with performance guarantee. We empirically demonstrate
the effectiveness of our epsilon-GPP policy and its anytime variant in Bayesian
optimization and an energy harvesting task.Comment: 30th AAAI Conference on Artificial Intelligence (AAAI 2016), Extended
version with proofs, 17 page
Active Markov Information-Theoretic Path Planning for Robotic Environmental Sensing
Recent research in multi-robot exploration and mapping has focused on
sampling environmental fields, which are typically modeled using the Gaussian
process (GP). Existing information-theoretic exploration strategies for
learning GP-based environmental field maps adopt the non-Markovian problem
structure and consequently scale poorly with the length of history of
observations. Hence, it becomes computationally impractical to use these
strategies for in situ, real-time active sampling. To ease this computational
burden, this paper presents a Markov-based approach to efficient
information-theoretic path planning for active sampling of GP-based fields. We
analyze the time complexity of solving the Markov-based path planning problem,
and demonstrate analytically that it scales better than that of deriving the
non-Markovian strategies with increasing length of planning horizon. For a
class of exploration tasks called the transect sampling task, we provide
theoretical guarantees on the active sampling performance of our Markov-based
policy, from which ideal environmental field conditions and sampling task
settings can be established to limit its performance degradation due to
violation of the Markov assumption. Empirical evaluation on real-world
temperature and plankton density field data shows that our Markov-based policy
can generally achieve active sampling performance comparable to that of the
widely-used non-Markovian greedy policies under less favorable realistic field
conditions and task settings while enjoying significant computational gain over
them.Comment: 10th International Conference on Autonomous Agents and Multiagent
Systems (AAMAS 2011), Extended version with proofs, 11 page
The Assistive Multi-Armed Bandit
Learning preferences implicit in the choices humans make is a well studied
problem in both economics and computer science. However, most work makes the
assumption that humans are acting (noisily) optimally with respect to their
preferences. Such approaches can fail when people are themselves learning about
what they want. In this work, we introduce the assistive multi-armed bandit,
where a robot assists a human playing a bandit task to maximize cumulative
reward. In this problem, the human does not know the reward function but can
learn it through the rewards received from arm pulls; the robot only observes
which arms the human pulls but not the reward associated with each pull. We
offer sufficient and necessary conditions for successfully assisting the human
in this framework. Surprisingly, better human performance in isolation does not
necessarily lead to better performance when assisted by the robot: a human
policy can do better by effectively communicating its observed rewards to the
robot. We conduct proof-of-concept experiments that support these results. We
see this work as contributing towards a theory behind algorithms for
human-robot interaction.Comment: Accepted to HRI 201
- …