4,177 research outputs found
Direct policy search reinforcement learning based on particle filtering
We reveal a link between particle filtering methods and direct policy search reinforcement learning, and propose a novel reinforcement learning algorithm, based heavily on ideas borrowed from particle filters. A major advantage of the proposed algorithm is its ability to perform global search in policy space and thus find the globally optimal policy. We validate the approach on one- and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art ExpectationMaximization based reinforcement learning algorithm
Simultaneous discovery of multiple alternative optimal policies by reinforcement learning
Conventional reinforcement learning algorithms for direct policy search are limited to finding only a single optimal policy. This is caused by their local-search nature, which allows them to converge only to a single local optimum in policy space, and makes them heavily dependent on the policy initialization. In this paper, we propose a novel reinforcement learning algorithm for direct policy search, which is capable of simultaneously finding multiple alternative optimal policies. The algorithm is based on particle filtering and performs global search in policy space, therefore eliminating the dependency on the policy initialization, and having the ability to find the globally optimal policy. We validate the approach on one-and two-dimensional problems with multiple optima, and compare its performance to a global random sampling method, and a state-of-the-art Expectation-Maximization based reinforcement learning algorithm. © 2012 IEEE
Probabilistic movement modeling for intention inference in human-robot interaction.
Intention inference can be an essential step toward efficient humanrobot interaction. For this purpose, we propose the Intention-Driven Dynamics Model (IDDM) to probabilistically model the generative process of movements that are directed by the intention. The IDDM allows to infer the intention from observed movements using Bayes ’ theorem. The IDDM simultaneously finds a latent state representation of noisy and highdimensional observations, and models the intention-driven dynamics in the latent states. As most robotics applications are subject to real-time constraints, we develop an efficient online algorithm that allows for real-time intention inference. Two human-robot interaction scenarios, i.e., target prediction for robot table tennis and action recognition for interactive humanoid robots, are used to evaluate the performance of our inference algorithm. In both intention inference tasks, the proposed algorithm achieves substantial improvements over support vector machines and Gaussian processes.
Robust Filtering and Smoothing with Gaussian Processes
We propose a principled algorithm for robust Bayesian filtering and smoothing
in nonlinear stochastic dynamic systems when both the transition function and
the measurement function are described by non-parametric Gaussian process (GP)
models. GPs are gaining increasing importance in signal processing, machine
learning, robotics, and control for representing unknown system functions by
posterior probability distributions. This modern way of "system identification"
is more robust than finding point estimates of a parametric function
representation. In this article, we present a principled algorithm for robust
analytic smoothing in GP dynamic systems, which are increasingly used in
robotics and control. Our numerical evaluations demonstrate the robustness of
the proposed approach in situations where other state-of-the-art Gaussian
filters and smoothers can fail.Comment: 7 pages, 1 figure, draft version of paper accepted at IEEE
Transactions on Automatic Contro
Particle filter-based Gaussian process optimisation for parameter inference
We propose a novel method for maximum likelihood-based parameter inference in
nonlinear and/or non-Gaussian state space models. The method is an iterative
procedure with three steps. At each iteration a particle filter is used to
estimate the value of the log-likelihood function at the current parameter
iterate. Using these log-likelihood estimates, a surrogate objective function
is created by utilizing a Gaussian process model. Finally, we use a heuristic
procedure to obtain a revised parameter iterate, providing an automatic
trade-off between exploration and exploitation of the surrogate model. The
method is profiled on two state space models with good performance both
considering accuracy and computational cost.Comment: Accepted for publication in proceedings of the 19th World Congress of
the International Federation of Automatic Control (IFAC), Cape Town, South
Africa, August 2014. 6 pages, 4 figure
Long-Term Visual Object Tracking Benchmark
We propose a new long video dataset (called Track Long and Prosper - TLP) and
benchmark for single object tracking. The dataset consists of 50 HD videos from
real world scenarios, encompassing a duration of over 400 minutes (676K
frames), making it more than 20 folds larger in average duration per sequence
and more than 8 folds larger in terms of total covered duration, as compared to
existing generic datasets for visual tracking. The proposed dataset paves a way
to suitably assess long term tracking performance and train better deep
learning architectures (avoiding/reducing augmentation, which may not reflect
real world behaviour). We benchmark the dataset on 17 state of the art trackers
and rank them according to tracking accuracy and run time speeds. We further
present thorough qualitative and quantitative evaluation highlighting the
importance of long term aspect of tracking. Our most interesting observations
are (a) existing short sequence benchmarks fail to bring out the inherent
differences in tracking algorithms which widen up while tracking on long
sequences and (b) the accuracy of trackers abruptly drops on challenging long
sequences, suggesting the potential need of research efforts in the direction
of long-term tracking.Comment: ACCV 2018 (Oral
- …