308 research outputs found
Continual Driving Policy Optimization with Closed-Loop Individualized Curricula
The safety of autonomous vehicles (AV) has been a long-standing top concern,
stemming from the absence of rare and safety-critical scenarios in the
long-tail naturalistic driving distribution. To tackle this challenge, a surge
of research in scenario-based autonomous driving has emerged, with a focus on
generating high-risk driving scenarios and applying them to conduct
safety-critical testing of AV models. However, limited work has been explored
on the reuse of these extensive scenarios to iteratively improve AV models.
Moreover, it remains intractable and challenging to filter through gigantic
scenario libraries collected from other AV models with distinct behaviors,
attempting to extract transferable information for current AV improvement.
Therefore, we develop a continual driving policy optimization framework
featuring Closed-Loop Individualized Curricula (CLIC), which we factorize into
a set of standardized sub-modules for flexible implementation choices: AV
Evaluation, Scenario Selection, and AV Training. CLIC frames AV Evaluation as a
collision prediction task, where it estimates the chance of AV failures in
these scenarios at each iteration. Subsequently, by re-sampling from historical
scenarios based on these failure probabilities, CLIC tailors individualized
curricula for downstream training, aligning them with the evaluated capability
of AV. Accordingly, CLIC not only maximizes the utilization of the vast
pre-collected scenario library for closed-loop driving policy optimization but
also facilitates AV improvement by individualizing its training with more
challenging cases out of those poorly organized scenarios. Experimental results
clearly indicate that CLIC surpasses other curriculum-based training
strategies, showing substantial improvement in managing risky scenarios, while
still maintaining proficiency in handling simpler cases
Automatic Curriculum Learning With Over-repetition Penalty for Dialogue Policy Learning
Dialogue policy learning based on reinforcement learning is difficult to be
applied to real users to train dialogue agents from scratch because of the high
cost. User simulators, which choose random user goals for the dialogue agent to
train on, have been considered as an affordable substitute for real users.
However, this random sampling method ignores the law of human learning, making
the learned dialogue policy inefficient and unstable. We propose a novel
framework, Automatic Curriculum Learning-based Deep Q-Network (ACL-DQN), which
replaces the traditional random sampling method with a teacher policy model to
realize the dialogue policy for automatic curriculum learning. The teacher
model arranges a meaningful ordered curriculum and automatically adjusts it by
monitoring the learning progress of the dialogue agent and the over-repetition
penalty without any requirement of prior knowledge. The learning progress of
the dialogue agent reflects the relationship between the dialogue agent's
ability and the sampled goals' difficulty for sample efficiency. The
over-repetition penalty guarantees the sampled diversity. Experiments show that
the ACL-DQN significantly improves the effectiveness and stability of dialogue
tasks with a statistically significant margin. Furthermore, the framework can
be further improved by equipping with different curriculum schedules, which
demonstrates that the framework has strong generalizability
A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning
Across machine learning, the use of curricula has shown strong empirical
potential to improve learning from data by avoiding local optima of training
objectives. For reinforcement learning (RL), curricula are especially
interesting, as the underlying optimization has a strong tendency to get stuck
in local optima due to the exploration-exploitation trade-off. Recently, a
number of approaches for an automatic generation of curricula for RL have been
shown to increase performance while requiring less expert knowledge compared to
manually designed curricula. However, these approaches are seldomly
investigated from a theoretical perspective, preventing a deeper understanding
of their mechanics. In this paper, we present an approach for automated
curriculum generation in RL with a clear theoretical underpinning. More
precisely, we formalize the well-known self-paced learning paradigm as inducing
a distribution over training tasks, which trades off between task complexity
and the objective to match a desired task distribution. Experiments show that
training on this induced distribution helps to avoid poor local optima across
RL algorithms in different tasks with uninformative rewards and challenging
exploration requirements
Medical-based Deep Curriculum Learning for Improved Fracture Classification
International audienceAbstract. Current deep-learning-based methods do not easily integrate into clinical protocols, neither take full advantage of medical knowledge.In this work, we propose and compare several strategies relying on curriculum learning, to support the classification of proximal femur fracturefrom X-ray images, a challenging problem as reflected by existing intra- and inter-expert disagreement. Our strategies are derived from knowledgesuch as medical decision trees and inconsistencies in the annotations of multiple experts, which allows us to assign a degree of diculty to eachtraining sample. We demonstrate that if we start learning \easy" examples and move towards \hard", the model can reach better performance,even with fewer data. The evaluation is performed on the classification of a clinical dataset of about 1000 X-ray images. Our results show that,compared to class-uniform and random strategies, the proposed medical knowledge-based curriculum, performs up to 15% better in terms ofaccuracy, achieving the performance of experienced trauma surgeons. Keywords: Curriculum learning, multi-label classification, bone fractures, computer-aided diagnosis, medical decision tre
Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey
Reinforcement learning (RL) is a popular paradigm for addressing sequential decision tasks in which the agent has only limited environmental feedback. Despite many advances over the past three decades, learning in many domains still requires a large amount of interaction with the environment, which can be prohibitively expensive in realistic scenarios. To address this problem, transfer learning has been applied to reinforcement learning such that experience gained in one task can be leveraged when starting to learn the next, harder task. More recently, several lines of research have explored how tasks, or data samples themselves, can be sequenced into a curriculum for the purpose of learning a problem that may otherwise be too difficult to learn from scratch. In this article, we present a framework for curriculum learning (CL) in reinforcement learning, and use it to survey and classify existing CL methods in terms of their assumptions, capabilities, and goals. Finally, we use our framework to find open problems and suggest directions for future RL curriculum learning research
A Versatile Adaptive Curriculum Learning Framework for Task-oriented Dialogue Policy Learning
Training a deep reinforcement learning-based dialogue policy with brute-force random sampling is costly. A new training paradigm was proposed to improve learning performance and efficiency by combining curriculum learning. However, attempts in the field of dialogue policy are very limited due to the lack of reliable evaluation of difficulty scores of dialogue tasks and the high sensitivity to the mode of progression through dialogue tasks. In this paper, we present a novel versatile adaptive curriculum learning (VACL) framework, which presents a substantial step toward applying automatic curriculum learning on dialogue policy tasks. It supports evaluating the difficulty of dialogue tasks only using the learning experiences of dialogue policy and skip-level selection according to their learning needs to maximize the learning efficiency. Moreover, an attractive feature of VACL is the construction of a generic, elastic global curriculum while training a good dialogue policy that could guide different dialogue policy learning without extra effort on re-training. The superiority and versatility of VACL are validated on three public dialogue datasets
Imitation Learning from Observation with Automatic Discount Scheduling
Humans often acquire new skills through observation and imitation. For
robotic agents, learning from the plethora of unlabeled video demonstration
data available on the Internet necessitates imitating the expert without access
to its action, presenting a challenge known as Imitation Learning from
Observations (ILfO). A common approach to tackle ILfO problems is to convert
them into inverse reinforcement learning problems, utilizing a proxy reward
computed from the agent's and the expert's observations. Nonetheless, we
identify that tasks characterized by a progress dependency property pose
significant challenges for such approaches; in these tasks, the agent needs to
initially learn the expert's preceding behaviors before mastering the
subsequent ones. Our investigation reveals that the main cause is that the
reward signals assigned to later steps hinder the learning of initial
behaviors. To address this challenge, we present a novel ILfO framework that
enables the agent to master earlier behaviors before advancing to later ones.
We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively
alters the discount factor in reinforcement learning during the training phase,
prioritizing earlier rewards initially and gradually engaging later rewards
only when the earlier behaviors have been mastered. Our experiments, conducted
on nine Meta-World tasks, demonstrate that our method significantly outperforms
state-of-the-art methods across all tasks, including those that are unsolvable
by them.Comment: Accepted by ICLR 202
Explanation-Aware Experience Replay in Rule-Dense Environments
Human environments are often regulated by explicit and complex rulesets. Integrating Reinforcement Learning (RL) agents into such environments motivates the development of learning mechanisms that perform well in rule-dense and exception-ridden environments such as autonomous driving on regulated roads. In this letter, we propose a method for organising experience by means of partitioning the experience buffer into clusters labelled on a per-explanation basis. We present discrete and continuous navigation environments compatible with modular rulesets and 9 learning tasks. For environments with explainable rulesets, we convert rule-based explanations into case-based explanations by allocating state-transitions into clusters labelled with explanations. This allows us to sample experiences in a curricular and task-oriented manner, focusing on the rarity, importance, and meaning of events. We label this concept Explanation-Awareness (XA). We perform XA experience replay (XAER) with intra and inter-cluster prioritisation, and introduce XA-compatible versions of DQN, TD3, and SAC. Performance is consistently superior with XA versions of those algorithms, compared to traditional Prioritised Experience Replay baselines, indicating that explanation engineering can be used in lieu of reward engineering for environments with explainable features
- …