31,582 research outputs found
TSO: Curriculum Generation using continuous optimization
The training of deep learning models poses vast challenges of including
parameter tuning and ordering of training data. Significant research has been
done in Curriculum learning for optimizing the sequence of training data.
Recent works have focused on using complex reinforcement learning techniques to
find the optimal data ordering strategy to maximize learning for a given
network. In this paper, we present a simple and efficient technique based on
continuous optimization. We call this new approach Training Sequence
Optimization (TSO). There are three critical components in our proposed
approach: (a) An encoder network maps/embeds training sequence into continuous
space. (b) A predictor network uses the continuous representation of a strategy
as input and predicts the accuracy for fixed network architecture. (c) A
decoder further maps a continuous representation of a strategy to the ordered
training dataset. The performance predictor and encoder enable us to perform
gradient-based optimization in the continuous space to find the embedding of
optimal training data ordering with potentially better accuracy. Experiments
show that we can gain 2AP with our generated optimal curriculum strategy over
the random strategy using the CIFAR-100 dataset and have better boosts than the
state of the art CL algorithms. We do an ablation study varying the
architecture, dataset and sample sizes showcasing our approach's robustness.Comment: 10 pages, along with all experiment detail
Learning to Auto Weight: Entirely Data-driven and Highly Efficient Weighting Framework
Example weighting algorithm is an effective solution to the training bias
problem, however, most previous typical methods are usually limited to human
knowledge and require laborious tuning of hyperparameters. In this paper, we
propose a novel example weighting framework called Learning to Auto Weight
(LAW). The proposed framework finds step-dependent weighting policies
adaptively, and can be jointly trained with target networks without any
assumptions or prior knowledge about the dataset. It consists of three key
components: Stage-based Searching Strategy (3SM) is adopted to shrink the huge
searching space in a complete training process; Duplicate Network Reward (DNR)
gives more accurate supervision by removing randomness during the searching
process; Full Data Update (FDU) further improves the updating efficiency.
Experimental results demonstrate the superiority of weighting policy explored
by LAW over standard training pipeline. Compared with baselines, LAW can find a
better weighting schedule which achieves much more superior accuracy on both
biased CIFAR and ImageNet.Comment: Accepted by AAAI 202
Automatic Curriculum Learning For Deep RL: A Short Survey
Automatic Curriculum Learning (ACL) has become a cornerstone of recent
successes in Deep Reinforcement Learning (DRL).These methods shape the learning
trajectories of agents by challenging them with tasks adapted to their
capacities. In recent years, they have been used to improve sample efficiency
and asymptotic performance, to organize exploration, to encourage
generalization or to solve sparse reward problems, among others. The ambition
of this work is dual: 1) to present a compact and accessible introduction to
the Automatic Curriculum Learning literature and 2) to draw a bigger picture of
the current state of the art in ACL to encourage the cross-breeding of existing
concepts and the emergence of new ideas.Comment: Accepted at IJCAI202
CASSL: Curriculum Accelerated Self-Supervised Learning
Recent self-supervised learning approaches focus on using a few thousand data
points to learn policies for high-level, low-dimensional action spaces.
However, scaling this framework for high-dimensional control require either
scaling up the data collection efforts or using a clever sampling strategy for
training. We present a novel approach - Curriculum Accelerated Self-Supervised
Learning (CASSL) - to train policies that map visual information to high-level,
higher- dimensional action spaces. CASSL orders the sampling of training data
based on control dimensions: the learning and sampling are focused on few
control parameters before other parameters. The right curriculum for learning
is suggested by variance-based global sensitivity analysis of the control
space. We apply our CASSL framework to learning how to grasp using an adaptive,
underactuated multi-fingered gripper, a challenging system to control. Our
experimental results indicate that CASSL provides significant improvement and
generalization compared to baseline methods such as staged curriculum learning
(8% increase) and complete end-to-end learning with random exploration (14%
improvement) tested on a set of novel objects
On Fast-Converged Deep Reinforcement Learning for Optimal Dispatch of Large-Scale Power Systems under Transient Security Constraints
Power system optimal dispatch with transient security constraints is commonly
represented as Transient Security-Constrained Optimal Power Flow (TSC-OPF).
Deep Reinforcement Learning (DRL)-based TSC-OPF trains efficient
decision-making agents that are adaptable to various scenarios and provide
solution results quickly. However, due to the high dimensionality of the state
space and action spaces, as well as the non-smoothness of dynamic constraints,
existing DRL-based TSC-OPF solution methods face a significant challenge of the
sparse reward problem. To address this issue, a fast-converged DRL method for
TSC-OPF is proposed in this paper. The Markov Decision Process (MDP) modeling
of TSC-OPF is improved by reducing the observation space and smoothing the
reward design, thus facilitating agent training. An improved Deep Deterministic
Policy Gradient algorithm with Curriculum learning, Parallel exploration, and
Ensemble decision-making (DDPG-CPEn) is introduced to drastically enhance the
efficiency of agent training and the accuracy of decision-making. The
effectiveness, efficiency, and accuracy of the proposed method are demonstrated
through experiments in the IEEE 39-bus system and a practical 710-bus regional
power grid. The source code of the proposed method is made public on GitHub.Comment: 10 pages, 11 figure
- …