3 research outputs found
Tackling Morpion Solitaire with AlphaZero-likeRanked Reward Reinforcement Learning
Morpion Solitaire is a popular single player game, performed with paper and
pencil. Due to its large state space (on the order of the game of Go)
traditional search algorithms, such as MCTS, have not been able to find good
solutions. A later algorithm, Nested Rollout Policy Adaptation, was able to
find a new record of 82 steps, albeit with large computational resources. After
achieving this record, to the best of our knowledge, there has been no further
progress reported, for about a decade.
In this paper we take the recent impressive performance of deep self-learning
reinforcement learning approaches from AlphaGo/AlphaZero as inspiration to
design a searcher for Morpion Solitaire. A challenge of Morpion Solitaire is
that the state space is sparse, there are few win/loss signals. Instead, we use
an approach known as ranked reward to create a reinforcement learning self-play
framework for Morpion Solitaire. This enables us to find medium-quality
solutions with reasonable computational effort. Our record is a 67 steps
solution, which is very close to the human best (68) without any other
adaptation to the problem than using ranked reward. We list many further
avenues for potential improvement.Comment: 4 pages, 2 figures. the first/ongoing attempt to tackle Morpion
Solitaire using ranked reward reinforcement learning. submitted to SYNASC202
Self-play Learning Strategies for Resource Assignment in Open-RAN Networks
Open Radio Access Network (ORAN) is being developed with an aim to
democratise access and lower the cost of future mobile data networks,
supporting network services with various QoS requirements, such as massive IoT
and URLLC. In ORAN, network functionality is dis-aggregated into remote units
(RUs), distributed units (DUs) and central units (CUs), which allows flexible
software on Commercial-Off-The-Shelf (COTS) deployments. Furthermore, the
mapping of variable RU requirements to local mobile edge computing centres for
future centralized processing would significantly reduce the power consumption
in cellular networks. In this paper, we study the RU-DU resource assignment
problem in an ORAN system, modelled as a 2D bin packing problem. A deep
reinforcement learning-based self-play approach is proposed to achieve
efficient RU-DU resource management, with AlphaGo Zero inspired neural
Monte-Carlo Tree Search (MCTS). Experiments on representative 2D bin packing
environment and real sites data show that the self-play learning strategy
achieves intelligent RU-DU resource assignment for different network
conditions
A Generalized Reinforcement Learning Algorithm for Online 3D Bin-Packing
We propose a Deep Reinforcement Learning (Deep RL) algorithm for solving the
online 3D bin packing problem for an arbitrary number of bins and any bin size.
The focus is on producing decisions that can be physically implemented by a
robotic loading arm, a laboratory prototype used for testing the concept. The
problem considered in this paper is novel in two ways. First, unlike the
traditional 3D bin packing problem, we assume that the entire set of objects to
be packed is not known a priori. Instead, a fixed number of upcoming objects is
visible to the loading system, and they must be loaded in the order of arrival.
Second, the goal is not to move objects from one point to another via a
feasible path, but to find a location and orientation for each object that
maximises the overall packing efficiency of the bin(s). Finally, the learnt
model is designed to work with problem instances of arbitrary size without
retraining. Simulation results show that the RL-based method outperforms
state-of-the-art online bin packing heuristics in terms of empirical
competitive ratio and volume efficiency.Comment: 9 pages, 9 figure