4 research outputs found
Off-policy Learning for Remote Electrical Tilt Optimization
We address the problem of Remote Electrical Tilt (RET) optimization using
off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET
optimization is to control the orientation of the vertical tilt angle of the
antenna to optimize Key Performance Indicators (KPIs) representing the Quality
of Service (QoS) perceived by the users in cellular networks. Learning an
improved tilt update policy is hard. On the one hand, coming up with a new
policy in an online manner in a real network requires exploring tilt updates
that have never been used before, and is operationally too risky. On the other
hand, devising this policy via simulations suffers from the
simulation-to-reality gap. In this paper, we circumvent these issues by
learning an improved policy in an offline manner using existing data collected
on real networks. We formulate the problem of devising such a policy using the
off-policy CMAB framework. We propose CMAB learning algorithms to extract
optimal tilt update policies from the data. We train and evaluate these
policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our
policies show consistent improvements over the rule-based logging policy used
to collect the data
Symbolic Reinforcement Learning for Safe RAN Control
In this paper, we demonstrate a Symbolic Reinforcement Learning (SRL)
architecture for safe control in Radio Access Network (RAN) applications. In
our automated tool, a user can select a high-level safety specifications
expressed in Linear Temporal Logic (LTL) to shield an RL agent running in a
given cellular network with aim of optimizing network performance, as measured
through certain Key Performance Indicators (KPIs). In the proposed
architecture, network safety shielding is ensured through model-checking
techniques over combined discrete system models (automata) that are abstracted
through reinforcement learning. We demonstrate the user interface (UI) helping
the user set intent specifications to the architecture and inspect the
difference in allowed and blocked actions.Comment: The paper has been accepted to be presented in 20th International
Conference on Autonomous Agents and Multiagent Systems (AAMAS 2021), May 3-7,
London, UK (demo track
Safe Reinforcement Learning for Antenna Tilt Optimisation using Shielding and Multiple Baselines
Safe interaction with the environment is one of the most challenging aspects
of Reinforcement Learning (RL) when applied to real-world problems. This is
particularly important when unsafe actions have a high or irreversible negative
impact on the environment. In the context of network management operations,
Remote Electrical Tilt (RET) optimisation is a safety-critical application in
which exploratory modifications of antenna tilt angles of Base Stations (BSs)
can cause significant performance degradation in the network. In this paper, we
propose a modular Safe Reinforcement Learning (SRL) architecture which is then
used to address the RET optimisation in cellular networks. In this approach, a
safety shield continuously benchmarks the performance of RL agents against safe
baselines, and determines safe antenna tilt updates to be performed on the
network. Our results demonstrate improved performance of the SRL agent over the
baseline while ensuring the safety of the performed actions
Remote Electrical Tilt Optimization via Safe Reinforcement Learning
Remote Electrical Tilt (RET) optimization is an efficient method for
adjusting the vertical tilt angle of Base Stations (BSs) antennas in order to
optimize Key Performance Indicators (KPIs) of the network. Reinforcement
Learning (RL) provides a powerful framework for RET optimization because of its
self-learning capabilities and adaptivity to environmental changes. However, an
RL agent may execute unsafe actions during the course of its interaction, i.e.,
actions resulting in undesired network performance degradation. Since the
reliability of services is critical for Mobile Network Operators (MNOs), the
prospect of performance degradation has prohibited the real-world deployment of
RL methods for RET optimization. In this work, we model the RET optimization
problem in the Safe Reinforcement Learning (SRL) framework with the goal of
learning a tilt control strategy providing performance improvement guarantees
with respect to a safe baseline. We leverage a recent SRL method, namely Safe
Policy Improvement through Baseline Bootstrapping (SPIBB), to learn an improved
policy from an offline dataset of interactions collected by the safe baseline.
Our experiments show that the proposed approach is able to learn a safe and
improved tilt update policy, providing a higher degree of reliability and
potential for real-world network deployment