4,093 research outputs found
CLIC: Curriculum Learning and Imitation for object Control in non-rewarding environments
In this paper we study a new reinforcement learning setting where the
environment is non-rewarding, contains several possibly related objects of
various controllability, and where an apt agent Bob acts independently, with
non-observable intentions. We argue that this setting defines a realistic
scenario and we present a generic discrete-state discrete-action model of such
environments. To learn in this environment, we propose an unsupervised
reinforcement learning agent called CLIC for Curriculum Learning and Imitation
for Control. CLIC learns to control individual objects in its environment, and
imitates Bob's interactions with these objects. It selects objects to focus on
when training and imitating by maximizing its learning progress. We show that
CLIC is an effective baseline in our new setting. It can effectively observe
Bob to gain control of objects faster, even if Bob is not explicitly teaching.
It can also follow Bob when he acts as a mentor and provides ordered
demonstrations. Finally, when Bob controls objects that the agent cannot, or in
presence of a hierarchy between objects in the environment, we show that CLIC
ignores non-reproducible and already mastered interactions with objects,
resulting in a greater benefit from imitation
Human-Machine Collaborative Optimization via Apprenticeship Scheduling
Coordinating agents to complete a set of tasks with intercoupled temporal and
resource constraints is computationally challenging, yet human domain experts
can solve these difficult scheduling problems using paradigms learned through
years of apprenticeship. A process for manually codifying this domain knowledge
within a computational framework is necessary to scale beyond the
``single-expert, single-trainee" apprenticeship model. However, human domain
experts often have difficulty describing their decision-making processes,
causing the codification of this knowledge to become laborious. We propose a
new approach for capturing domain-expert heuristics through a pairwise ranking
formulation. Our approach is model-free and does not require enumerating or
iterating through a large state space. We empirically demonstrate that this
approach accurately learns multifaceted heuristics on a synthetic data set
incorporating job-shop scheduling and vehicle routing problems, as well as on
two real-world data sets consisting of demonstrations of experts solving a
weapon-to-target assignment problem and a hospital resource allocation problem.
We also demonstrate that policies learned from human scheduling demonstration
via apprenticeship learning can substantially improve the efficiency of a
branch-and-bound search for an optimal schedule. We employ this human-machine
collaborative optimization technique on a variant of the weapon-to-target
assignment problem. We demonstrate that this technique generates solutions
substantially superior to those produced by human domain experts at a rate up
to 9.5 times faster than an optimization approach and can be applied to
optimally solve problems twice as complex as those solved by a human
demonstrator.Comment: Portions of this paper were published in the Proceedings of the
International Joint Conference on Artificial Intelligence (IJCAI) in 2016 and
in the Proceedings of Robotics: Science and Systems (RSS) in 2016. The paper
consists of 50 pages with 11 figures and 4 table
Imitation in location choice
Under the assumption of perfect competition, it is difficult to avoid the conclusion that abandoned properties and long undeveloped neighborhoods remain that way because they are unprofitable. In contrast, this paper introduces a model in which firms systematically overlook neighborhoods with little commercial activity because of a positive informational externality motivating later movers to condition choice of location on earlier moversâ locations. When this occurs, firms sometimes find it profitable to imitate early moversâ locations even though privately acquired information suggests locating elsewhere. The model facilitates normative analysis of imitation in location choice by explicitly quantifying losses in aggregate efficiency following a shift from centralized to decentralized regimes. The model provides a tool for investigating the hypothesis of inefficient lock-in as it relates to neighborhoods in U.S. urban centers that remain underutilized despite the presence of profitable business prospects.Imitation, Location, Ecological Rationality, Bounded Rationality, Lock-In, Neighborhood, Abandoned
- âŠ