16,871 research outputs found
Danger-aware Adaptive Composition of DRL Agents for Self-navigation
Self-navigation, referred as the capability of automatically reaching the
goal while avoiding collisions with obstacles, is a fundamental skill required
for mobile robots. Recently, deep reinforcement learning (DRL) has shown great
potential in the development of robot navigation algorithms. However, it is
still difficult to train the robot to learn goal-reaching and
obstacle-avoidance skills simultaneously. On the other hand, although many
DRL-based obstacle-avoidance algorithms are proposed, few of them are reused
for more complex navigation tasks. In this paper, a novel danger-aware adaptive
composition (DAAC) framework is proposed to combine two individually
DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a
navigation agent without any redesigning and retraining. The key to this
adaptive composition approach is that the value function outputted by the
obstacle-avoidance agent serves as an indicator for evaluating the risk level
of the current situation, which in turn determines the contribution of these
two agents for the next move. Simulation and real-world testing results show
that the composed Navigation network can control the robot to accomplish
difficult navigation tasks, e.g., reaching a series of successive goals in an
unknown and complex environment safely and quickly.Comment: 7 pages, 9 figure
Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks
An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet-of-Things (IoT) users, by optimizing offloading decision, transmission power, and resource allocation in the large-scale mobile-edge computing (MEC) system. Toward this end, a deep reinforcement learning (DRL)-based solution is proposed, which includes the following components. First, a related and regularized stacked autoencoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high-dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Second, we present an adaptive simulated annealing approach (ASA) as the action search method of DRL, in which an adaptive h -mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Third, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. The numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection
Speech recognition systems have achieved high recognition performance for
several tasks. However, the performance of such systems is dependent on the
tremendously costly development work of preparing vast amounts of task-matched
transcribed speech data for supervised training. The key problem here is the
cost of transcribing speech data. The cost is repeatedly required to support
new languages and new tasks. Assuming broad network services for transcribing
speech data for many users, a system would become more self-sufficient and more
useful if it possessed the ability to learn from very light feedback from the
users without annoying them. In this paper, we propose a general reinforcement
learning framework for speech recognition systems based on the policy gradient
method. As a particular instance of the framework, we also propose a hypothesis
selection-based reinforcement learning method. The proposed framework provides
a new view for several existing training and adaptation methods. The
experimental results show that the proposed method improves the recognition
performance compared to unsupervised adaptation.Comment: 5 pages, 6 figure
- …