Search CORE

3 research outputs found

Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning

Author: Emmerich Michael
Plaat Aske
Preuss Mike
Wang Hui
Publication venue
Publication date: 14/06/2020
Field of study

Morpion Solitaire is a popular single player game, performed with paper and pencil. Due to its large state space (on the order of the game of Go) traditional search algorithms, such as MCTS, have not been able to find good solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to find a new record of 82 steps, albeit with large computational resources. After achieving this record, to the best of our knowledge, there has been no further progress reported, for about a decade. In this paper we take the recent impressive performance of deep self-learning reinforcement learning approaches from AlphaGo/AlphaZero as inspiration to design a searcher for Morpion Solitaire. A challenge of Morpion Solitaire is that the state space is sparse, there are few win/loss signals. Instead, we use an approach known as ranked reward to create a reinforcement learning self-play framework for Morpion Solitaire. This enables us to find medium-quality solutions with reasonable computational effort. Our record is a 67 steps solution, which is very close to the human best (68) without any other adaptation to the problem than using ranked reward. We list many further avenues for potential improvement.Comment: 4 pages, 2 figures. the first/ongoing attempt to tackle Morpion Solitaire using ranked reward reinforcement learning. submitted to SYNASC202

arXiv.org e-Print Archive

Self-play Learning Strategies for Resource Assignment in Open-RAN Networks

Author: Kapoor Shipra
Parekh Arjun
Piechocki Robert J
Santos-Rodriguez Raul
Thomas Jonathan D
Wang Xiaoyang
Publication venue
Publication date: 03/03/2021
Field of study

Open Radio Access Network (ORAN) is being developed with an aim to democratise access and lower the cost of future mobile data networks, supporting network services with various QoS requirements, such as massive IoT and URLLC. In ORAN, network functionality is dis-aggregated into remote units (RUs), distributed units (DUs) and central units (CUs), which allows flexible software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the mapping of variable RU requirements to local mobile edge computing centres for future centralized processing would significantly reduce the power consumption in cellular networks. In this paper, we study the RU-DU resource assignment problem in an ORAN system, modelled as a 2D bin packing problem. A deep reinforcement learning-based self-play approach is proposed to achieve efficient RU-DU resource management, with AlphaGo Zero inspired neural Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing environment and real sites data show that the self-play learning strategy achieves intelligent RU-DU resource assignment for different network conditions

arXiv.org e-Print Archive

A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing

Author: Basumatary Ansuma
Khadilkar Harshad
Kumar Swagat
Nayak Siddharth
Singh Harsh Vardhan
Singhal Aniruddha
Sinha Rajesh
Verma Richa
Publication venue
Publication date: 01/07/2020
Field of study

We propose a Deep Reinforcement Learning (Deep RL) algorithm for solving the online 3D bin packing problem for an arbitrary number of bins and any bin size. The focus is on producing decisions that can be physically implemented by a robotic loading arm, a laboratory prototype used for testing the concept. The problem considered in this paper is novel in two ways. First, unlike the traditional 3D bin packing problem, we assume that the entire set of objects to be packed is not known a priori. Instead, a fixed number of upcoming objects is visible to the loading system, and they must be loaded in the order of arrival. Second, the goal is not to move objects from one point to another via a feasible path, but to find a location and orientation for each object that maximises the overall packing efficiency of the bin(s). Finally, the learnt model is designed to work with problem instances of arbitrary size without retraining. Simulation results show that the RL-based method outperforms state-of-the-art online bin packing heuristics in terms of empirical competitive ratio and volume efficiency.Comment: 9 pages, 9 figure

arXiv.org e-Print Archive