4 research outputs found
Numerical comparison of least square-based finite-difference (LSFD) and radial basis function-based finite-difference (RBFFD) methods
10.1016/j.camwa.2006.04.015Computers and Mathematics with Applications518 SPEC. ISS.1297-1310CMAP
A Survey on Policy Search for Robotics
Policy search is a subfield in reinforcement learning which focuses on
finding good parameters for a given policy parametrization. It is well
suited for robotics as it can cope with high-dimensional state and action
spaces, one of the main challenges in robot learning. We review recent
successes of both model-free and model-based policy search in robot
learning.
Model-free policy search is a general approach to learn policies
based on sampled trajectories. We classify model-free methods based on
their policy evaluation strategy, policy update strategy, and exploration
strategy and present a unified view on existing algorithms. Learning a
policy is often easier than learning an accurate forward model, and,
hence, model-free methods are more frequently used in practice. However,
for each sampled trajectory, it is necessary to interact with the
* Both authors contributed equally.
robot, which can be time consuming and challenging in practice. Modelbased
policy search addresses this problem by first learning a simulator
of the robot’s dynamics from data. Subsequently, the simulator generates
trajectories that are used for policy learning. For both modelfree
and model-based policy search methods, we review their respective
properties and their applicability to robotic systems
Recommended from our members
High-Dimensional Reinforcement Learning with Human Feedback
State-of-the-art personal robots need to perform complex manipulation tasks to be viable in complex scenarios. However, many of these robots, like the PR2, use manipulators with high degrees of freedom. High degrees of freedom are desirable from a functionality standpoint, but make the learning task more difficult by adding a high-dimensional state space. The problem is made worse in bimanual manipulation tasks. Our proposed approach is to scale existing reinforcement learning techniques to learn in high-dimensional robot control problems.
We propose reducing the state space by using demonstrations to discover a representative low-dimensional manifold in which to learn. This allows the agent to converge quickly to a good policy. We call this Dimensionality-Reduced Reinforcement Learning (DRRL). However, when performing dimensionality reduction, sometimes important state information is lost. We extend this work by first learning in a single dimension, and then transferring that knowledge to a higher-dimensional space. By using our Iterative DRRL (IDRRL) framework with an existing learning algorithm, the agent converges quickly to a better policy by iterating to increasingly higher dimensions. IDRRL is robust to demonstration quality and can learn efficiently using few demonstrations.
We use Principal Component Analysis (PCA) for our linear dimensionality reduction in DRRL and IDRRL. However, linear dimensionality reduction assumes that the underlying data can be represented by a lower dimension linear subspace. Robot state spaces typically include velocities and accelerations, whose equations of motion are inherently nonlinear. Standard linear dimensionality reduction techniques cannot accurately represent complex nonlinear structures. However, nonlinear dimensionality reduction techniques are too computationally complex to use online. To overcome these limitations, we introduce a novel approach to dimensionality reduction based on a system of cascading autoencoders (CAE), producing the new algorithm IDRRL-CAE.
Optimization is useful, but fast learning doesn't help if the objective function is deceptive or difficult to define mathematically. In many cases, roboticists may not be able to predict all scenarios their robots may experience, and thus cannot design an objective function for every case apriori. In these situations it may be helpful to incorporate human feedback. To give effective feedback, users need an interface that is intuitive, time insensitive, and incorporates both fine-grained and coarse feedback.
To incorporate human feedback in our learning, we use timeline interfaces. Timeline interfaces that allow you to move backward and forward through a video have been used by video editors for years. They are simple and designed for both non-experts and video editing experts. These interfaces allow a user to cut, concatenate, rewind, fast forward, and perform many other tasks on videos. They speed up the editing process by decoupling the timescale of the editing process from the timescale of the video being edited. These same concepts can be used in human feedback mechanisms for robot control systems. Current human feedback mechanisms require the user to quickly respond to robot actions, work in only discrete spaces, or only allow for either coarse or detailed feedback. The timeline interface paradigm naturally accounts for fine-grained state spaces, does not require quick human feedback, allows the user to make both coarse and fine-grained edits to video, and decouples the speed of the video from the speed of feedback. In this dissertation we present a proof-of-concept movie reel interface that uses this timeline interface paradigm