16,871 research outputs found

    Danger-aware Adaptive Composition of DRL Agents for Self-navigation

    Full text link
    Self-navigation, referred as the capability of automatically reaching the goal while avoiding collisions with obstacles, is a fundamental skill required for mobile robots. Recently, deep reinforcement learning (DRL) has shown great potential in the development of robot navigation algorithms. However, it is still difficult to train the robot to learn goal-reaching and obstacle-avoidance skills simultaneously. On the other hand, although many DRL-based obstacle-avoidance algorithms are proposed, few of them are reused for more complex navigation tasks. In this paper, a novel danger-aware adaptive composition (DAAC) framework is proposed to combine two individually DRL-trained agents, obstacle-avoidance and goal-reaching, to construct a navigation agent without any redesigning and retraining. The key to this adaptive composition approach is that the value function outputted by the obstacle-avoidance agent serves as an indicator for evaluating the risk level of the current situation, which in turn determines the contribution of these two agents for the next move. Simulation and real-world testing results show that the composed Navigation network can control the robot to accomplish difficult navigation tasks, e.g., reaching a series of successive goals in an unknown and complex environment safely and quickly.Comment: 7 pages, 9 figure

    Stacked Auto Encoder Based Deep Reinforcement Learning for Online Resource Scheduling in Large-Scale MEC Networks

    Get PDF
    An online resource scheduling framework is proposed for minimizing the sum of weighted task latency for all the Internet-of-Things (IoT) users, by optimizing offloading decision, transmission power, and resource allocation in the large-scale mobile-edge computing (MEC) system. Toward this end, a deep reinforcement learning (DRL)-based solution is proposed, which includes the following components. First, a related and regularized stacked autoencoder (2r-SAE) with unsupervised learning is applied to perform data compression and representation for high-dimensional channel quality information (CQI) data, which can reduce the state space for DRL. Second, we present an adaptive simulated annealing approach (ASA) as the action search method of DRL, in which an adaptive h -mutation is used to guide the search direction and an adaptive iteration is proposed to enhance the search efficiency during the DRL process. Third, a preserved and prioritized experience replay (2p-ER) is introduced to assist the DRL to train the policy network and find the optimal offloading policy. The numerical results are provided to demonstrate that the proposed algorithm can achieve near-optimal performance while significantly decreasing the computational time compared with existing benchmarks

    Learning Scheduling Algorithms for Data Processing Clusters

    Full text link
    Efficiently scheduling data processing jobs on distributed compute clusters requires complex algorithms. Current systems, however, use simple generalized heuristics and ignore workload characteristics, since developing and tuning a scheduling policy for each workload is infeasible. In this paper, we show that modern machine learning techniques can generate highly-efficient policies automatically. Decima uses reinforcement learning (RL) and neural networks to learn workload-specific scheduling algorithms without any human instruction beyond a high-level objective such as minimizing average job completion time. Off-the-shelf RL techniques, however, cannot handle the complexity and scale of the scheduling problem. To build Decima, we had to develop new representations for jobs' dependency graphs, design scalable RL models, and invent RL training methods for dealing with continuous stochastic job arrivals. Our prototype integration with Spark on a 25-node cluster shows that Decima improves the average job completion time over hand-tuned scheduling heuristics by at least 21%, achieving up to 2x improvement during periods of high cluster load

    Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

    Full text link
    Speech recognition systems have achieved high recognition performance for several tasks. However, the performance of such systems is dependent on the tremendously costly development work of preparing vast amounts of task-matched transcribed speech data for supervised training. The key problem here is the cost of transcribing speech data. The cost is repeatedly required to support new languages and new tasks. Assuming broad network services for transcribing speech data for many users, a system would become more self-sufficient and more useful if it possessed the ability to learn from very light feedback from the users without annoying them. In this paper, we propose a general reinforcement learning framework for speech recognition systems based on the policy gradient method. As a particular instance of the framework, we also propose a hypothesis selection-based reinforcement learning method. The proposed framework provides a new view for several existing training and adaptation methods. The experimental results show that the proposed method improves the recognition performance compared to unsupervised adaptation.Comment: 5 pages, 6 figure
    corecore