137,528 research outputs found
Stochastic Inverse Reinforcement Learning
The goal of the inverse reinforcement learning (IRL) problem is to recover
the reward functions from expert demonstrations. However, the IRL problem like
any ill-posed inverse problem suffers the congenital defect that the policy may
be optimal for many reward functions, and expert demonstrations may be optimal
for many policies. In this work, we generalize the IRL problem to a well-posed
expectation optimization problem stochastic inverse reinforcement learning
(SIRL) to recover the probability distribution over reward functions. We adopt
the Monte Carlo expectation-maximization (MCEM) method to estimate the
parameter of the probability distribution as the first solution to the SIRL
problem. The solution is succinct, robust, and transferable for a learning task
and can generate alternative solutions to the IRL problem. Through our
formulation, it is possible to observe the intrinsic property for the IRL
problem from a global viewpoint, and our approach achieves a considerable
performance on the objectworld.Comment: 8+2 pages, 5 figures, Under Revie
Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation
Imitation learning is an effective approach for autonomous systems to acquire
control policies when an explicit reward function is unavailable, using
supervision provided as demonstrations from an expert, typically a human
operator. However, standard imitation learning methods assume that the agent
receives examples of observation-action tuples that could be provided, for
instance, to a supervised learning algorithm. This stands in contrast to how
humans and animals imitate: we observe another person performing some behavior
and then figure out which actions will realize that behavior, compensating for
changes in viewpoint, surroundings, object positions and types, and other
factors. We term this kind of imitation learning "imitation-from-observation,"
and propose an imitation learning method based on video prediction with context
translation and deep reinforcement learning. This lifts the assumption in
imitation learning that the demonstration should consist of observations in the
same environment configuration, and enables a variety of interesting
applications, including learning robotic skills that involve tool use simply by
observing videos of human tool use. Our experimental results show the
effectiveness of our approach in learning a wide range of real-world robotic
tasks modeled after common household chores from videos of a human
demonstrator, including sweeping, ladling almonds, pushing objects as well as a
number of tasks in simulation.Comment: Accepted at ICRA 2018, Brisbane. YuXuan Liu and Abhishek Gupta had
equal contributio
CoRide: Joint Order Dispatching and Fleet Management for Multi-Scale Ride-Hailing Platforms
How to optimally dispatch orders to vehicles and how to tradeoff between
immediate and future returns are fundamental questions for a typical
ride-hailing platform. We model ride-hailing as a large-scale parallel ranking
problem and study the joint decision-making task of order dispatching and fleet
management in online ride-hailing platforms. This task brings unique challenges
in the following four aspects. First, to facilitate a huge number of vehicles
to act and learn efficiently and robustly, we treat each region cell as an
agent and build a multi-agent reinforcement learning framework. Second, to
coordinate the agents from different regions to achieve long-term benefits, we
leverage the geographical hierarchy of the region grids to perform hierarchical
reinforcement learning. Third, to deal with the heterogeneous and variant
action space for joint order dispatching and fleet management, we design the
action as the ranking weight vector to rank and select the specific order or
the fleet management destination in a unified formulation. Fourth, to achieve
the multi-scale ride-hailing platform, we conduct the decision-making process
in a hierarchical way where a multi-head attention mechanism is utilized to
incorporate the impacts of neighbor agents and capture the key agent in each
scale. The whole novel framework is named as CoRide. Extensive experiments
based on multiple cities real-world data as well as analytic synthetic data
demonstrate that CoRide provides superior performance in terms of platform
revenue and user experience in the task of city-wide hybrid order dispatching
and fleet management over strong baselines.Comment: CIKM 201
- …