2 research outputs found

    Deep reinforcement learning with robust deep deterministic policy gradient

    Get PDF
    Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment

    Deep reinforcement learning online offloading for SWIPT multiple access edge computing network

    Get PDF
    More computation-intensive and low latency applications are emerging recently, and they are constrained by the computing power and battery life of internet of things (IoT). Simultaneous wireless information and power transfer (SWIPT) with mobile-edge computing (MEC) can improve the data processing capability of energy constrained networks. In this paper, a SWIPT-based MEC system is proposed, comprising a multi-antenna access point (AP), multiple single antenna low power IoT devices and a MEC server. The IoT devices exploit the harvested energy for either locally computing or offloading the tasks to the MEC server. Conventional numerical optimization methods are not able to solve combinatorial problems within the limit of the wireless channel coherence time. Thus, Online Offloading with Deep Reinforcement learning (OODRL) is proposed. The proposed algorithm jointly optimizes the offloading decisions, the time slots devoted to energy harvesting (EH), and local computation/offloading to maximize the MEC computation rate. Deep Q network (DQN) is used to learn the binary offloading decisions from the learning experience. This method no longer needs to solve combinatorial problems. Simulation results are presented to demonstrate that the proposed algorithm is able to approach near-optimal performance and superior in decreasing tasks computation time compared with existing optimization methods, enabling real time optimal resource allocation and offloading achievable in a fast-fading wireless environment
    corecore