Search CORE

87,510 research outputs found

Deep Ordinal Reinforcement Learning

Author: C Wirth
CJ Watkins
RS Sutton
V Mnih
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 11/07/2019
Field of study

Reinforcement learning usually makes use of numerical rewards, which have nice properties but also come with drawbacks and difficulties. Using rewards on an ordinal scale (ordinal rewards) is an alternative to numerical rewards that has received more attention in recent years. In this paper, a general approach to adapting reinforcement learning problems to the use of ordinal rewards is presented and motivated. We show how to convert common reinforcement learning algorithms to an ordinal variation by the example of Q-learning and introduce Ordinal Deep Q-Networks, which adapt deep reinforcement learning to ordinal rewards. Additionally, we run evaluations on problems provided by the OpenAI Gym framework, showing that our ordinal variants exhibit a performance that is comparable to the numerical variations for a number of problems. We also give first evidence that our ordinal variant is able to produce better results for problems with less engineered and simpler-to-design reward signals.Comment: replaced figures for better visibility, added github repository, more details about source of experimental results, updated target value calculation for standard and ordinal Deep Q-Networ

arXiv.org e-Print Archive

Crossref

Federated Learning Assisted Deep Q-Learning for Joint Task Offloading and Fronthaul Segment Routing in Open RAN

Author: Cheriet Mohamed
Ndikumana Anselme
Nguyen Kim Khoa
Publication venue
Publication date: 14/08/2023
Field of study

Offloading computation-intensive tasks to edge clouds has become an efficient way to support resource constraint edge devices. However, task offloading delay is an issue largely due to the networks with limited capacities between edge clouds and edge devices. In this paper, we consider task offloading in Open Radio Access Network (O-RAN), which is a new 5G RAN architecture allowing Open Central Unit (O-CU) to be co-located with Open Distributed Unit (DU) at the edge cloud for low-latency services. O-RAN relies on fronthaul network to connect O-RAN Radio Units (O-RUs) and edge clouds that host O-DUs. Consequently, tasks are offloaded onto the edge clouds via wireless and fronthaul networks \cite{10045045}, which requires routing. Since edge clouds do not have the same available computation resources and tasks' computation deadlines are different, we need a task distribution approach to multiple edge clouds. Prior work has never addressed this joint problem of task offloading, fronthaul routing, and edge computing. To this end, using segment routing, O-RAN intelligent controllers, and multiple edge clouds, we formulate an optimization problem to minimize offloading, fronthaul routing, and computation delays in O-RAN. To determine the solution of this NP-hard problem, we use Deep Q-Learning assisted by federated learning with a reward function that reduces the Cost of Delay (CoD). The simulation results show that our solution maximizes the reward in minimizing CoD

arXiv.org e-Print Archive

A Deep Reinforcement Learning-Based Framework for Content Caching

Author: Gursoy M. Cenk
Velipasalar Senem
Zhong Chen
Publication venue
Publication date: 21/12/2017
Field of study

Content caching at the edge nodes is a promising technique to reduce the data traffic in next-generation wireless networks. Inspired by the success of Deep Reinforcement Learning (DRL) in solving complicated control problems, this work presents a DRL-based framework with Wolpertinger architecture for content caching at the base station. The proposed framework is aimed at maximizing the long-term cache hit rate, and it requires no knowledge of the content popularity distribution. To evaluate the proposed framework, we compare the performance with other caching algorithms, including Least Recently Used (LRU), Least Frequently Used (LFU), and First-In First-Out (FIFO) caching strategies. Meanwhile, since the Wolpertinger architecture can effectively limit the action space size, we also compare the performance with Deep Q-Network to identify the impact of dropping a portion of the actions. Our results show that the proposed framework can achieve improved short-term cache hit rate and improved and stable long-term cache hit rate in comparison with LRU, LFU, and FIFO schemes. Additionally, the performance is shown to be competitive in comparison to Deep Q-learning, while the proposed framework can provide significant savings in runtime.Comment: 6 pages, 3 figure

arXiv.org e-Print Archive

Crossref