533 research outputs found
Quantifying the Impact of Non-Stationarity in Reinforcement Learning-Based Traffic Signal Control
In reinforcement learning (RL), dealing with non-stationarity is a
challenging issue. However, some domains such as traffic optimization are
inherently non-stationary. Causes for and effects of this are manifold. In
particular, when dealing with traffic signal controls, addressing
non-stationarity is key since traffic conditions change over time and as a
function of traffic control decisions taken in other parts of a network. In
this paper we analyze the effects that different sources of non-stationarity
have in a network of traffic signals, in which each signal is modeled as a
learning agent. More precisely, we study both the effects of changing the
\textit{context} in which an agent learns (e.g., a change in flow rates
experienced by it), as well as the effects of reducing agent observability of
the true environment state. Partial observability may cause distinct states (in
which distinct actions are optimal) to be seen as the same by the traffic
signal agents. This, in turn, may lead to sub-optimal performance. We show that
the lack of suitable sensors to provide a representative observation of the
real state seems to affect the performance more drastically than the changes to
the underlying traffic patterns.Comment: 13 page
Machine Learning for QoS Prediction in Vehicular Communication: Challenges and Solution Approaches
As cellular networks evolve towards the 6th generation, machine learning is
seen as a key enabling technology to improve the capabilities of the network.
Machine learning provides a methodology for predictive systems, which can make
networks become proactive. This proactive behavior of the network can be
leveraged to sustain, for example, a specific quality of service requirement.
With predictive quality of service, a wide variety of new use cases, both
safety- and entertainment-related, are emerging, especially in the automotive
sector. Therefore, in this work, we consider maximum throughput prediction
enhancing, for example, streaming or high-definition mapping applications. We
discuss the entire machine learning workflow highlighting less regarded aspects
such as the detailed sampling procedures, the in-depth analysis of the dataset
characteristics, the effects of splits in the provided results, and the data
availability. Reliable machine learning models need to face a lot of challenges
during their lifecycle. We highlight how confidence can be built on machine
learning technologies by better understanding the underlying characteristics of
the collected data. We discuss feature engineering and the effects of different
splits for the training processes, showcasing that random splits might
overestimate performance by more than twofold. Moreover, we investigate diverse
sets of input features, where network information proved to be most effective,
cutting the error by half. Part of our contribution is the validation of
multiple machine learning models within diverse scenarios. We also use
explainable AI to show that machine learning can learn underlying principles of
wireless networks without being explicitly programmed. Our data is collected
from a deployed network that was under full control of the measurement team and
covered different vehicular scenarios and radio environments.Comment: 18 pages, 12 Figures. Accepted on IEEE Acces
Towards Continual Reinforcement Learning: A Review and Perspectives
In this article, we aim to provide a literature review of different
formulations and approaches to continual reinforcement learning (RL), also
known as lifelong or non-stationary RL. We begin by discussing our perspective
on why RL is a natural fit for studying continual learning. We then provide a
taxonomy of different continual RL formulations and mathematically characterize
the non-stationary dynamics of each setting. We go on to discuss evaluation of
continual RL agents, providing an overview of benchmarks used in the literature
and important metrics for understanding agent performance. Finally, we
highlight open problems and challenges in bridging the gap between the current
state of continual RL and findings in neuroscience. While still in its early
days, the study of continual RL has the promise to develop better incremental
reinforcement learners that can function in increasingly realistic applications
where non-stationarity plays a vital role. These include applications such as
those in the fields of healthcare, education, logistics, and robotics.Comment: Preprint, 52 pages, 8 figure
Reinforcement learning applied to the real world : uncertainty, sample efficiency, and multi-agent coordination
L'immense potentiel des approches d'apprentissage par renforcement profond (ARP) pour la conception d'agents autonomes a été démontré à plusieurs reprises au cours de la dernière décennie. Son application à des agents physiques, tels que des robots ou des réseaux électriques automatisés, est cependant confrontée à plusieurs défis. Parmi eux, l'inefficacité de leur échantillonnage, combinée au coût et au risque d'acquérir de l'expérience dans le monde réel, peut décourager tout projet d'entraînement d'agents incarnés.
Dans cette thèse, je me concentre sur l'application de l'ARP sur des agents physiques. Je propose d'abord un cadre probabiliste pour améliorer l'efficacité de l'échantillonnage dans l'ARP. Dans un premier article, je présente la pondération BIV (batch inverse-variance), une fonction de perte tenant compte de la variance du bruit des étiquettes dans la régression bruitée hétéroscédastique. La pondération BIV est un élément clé du deuxième article, où elle est combinée avec des méthodes de pointe de prédiction de l'incertitude pour les réseaux neuronaux profonds dans un pipeline bayésien pour les algorithmes d'ARP avec différences temporelles. Cette approche, nommée apprentissage par renforcement à variance inverse (IV-RL), conduit à un entraînement nettement plus rapide ainsi qu'à de meilleures performances dans les tâches de contrôle.
Dans le troisième article, l'apprentissage par renforcement multi-agent (MARL) est appliqué au problème de la réponse rapide à la demande, une approche prometteuse pour gérer l'introduction de sources d'énergie renouvelables intermittentes dans les réseaux électriques. En contrôlant la coordination de plusieurs climatiseurs, les agents MARL obtiennent des performances nettement supérieures à celles des approches basées sur des règles. Ces résultats soulignent le rôle potentiel que les agents physiques entraînés par MARL pourraient jouer dans la transition énergétique et la lutte contre le réchauffement climatique.The immense potential of deep reinforcement learning (DRL) approaches to build autonomous agents has been proven repeatedly in the last decade. Its application to embodied agents, such as robots or automated power systems, is however facing several challenges. Among them, their sample inefficiency, combined to the cost and the risk of gathering experience in the real world, can deter any idea of training embodied agents.
In this thesis, I focus on the application of DRL on embodied agents. I first propose a probabilistic framework to improve sample efficiency in DRL. In the first article, I present batch inverse-variance (BIV) weighting, a loss function accounting for label noise variance in heteroscedastic noisy regression. BIV is a key element of the second article, where it is combined with state-of-the-art uncertainty prediction methods for deep neural networks in a Bayesian pipeline for temporal differences DRL algorithms. This approach, named inverse-variance reinforcement learning (IV-RL), leads to significantly faster training as well as better performance in control tasks.
In the third article, multi-agent reinforcement learning (MARL) is applied to the problem of fast-timescale demand response, a promising approach to the manage the introduction of intermittent renewable energy sources in power-grids. As MARL agents control the coordination of multiple air conditioners, they achieve significantly better performance than rule-based approaches. These results underline to the potential role that DRL trained embodied agents could take in the energetic transition and the fight against global warming
Dimmer: Self-Adaptive Network-Wide Flooding with Reinforcement Learning
The last decade saw an emergence of Synchronous Transmissions (ST) as an
effective communication paradigm in low-power wireless networks. Numerous ST
protocols provide high reliability and energy efficiency in normal wireless
conditions, for a large variety of traffic requirements. Recently, with the
EWSN dependability competitions, the community pushed ST to harsher and
highly-interfered environments, improving upon classical ST protocols through
the use of custom rules, hand-tailored parameters, and additional
retransmissions. The results are sophisticated protocols, that require prior
expert knowledge and extensive testing, often tuned for a specific deployment
and envisioned scenario. In this paper, we explore how ST protocols can benefit
from self-adaptivity; a self-adaptive ST protocol selects itself its best
parameters to (1) tackle external environment dynamics and (2) adapt to its
topology over time. We introduce Dimmer as a self-adaptive ST protocol. Dimmer
builds on LWB and uses Reinforcement Learning to tune its parameters and match
the current properties of the wireless medium. By learning how to behave from
an unlabeled dataset, Dimmer adapts to different interference types and
patterns, and is able to tackle previously unseen interference. With Dimmer, we
explore how to efficiently design AI-based systems for constrained devices, and
outline the benefits and downfalls of AI-based low-power networking. We
evaluate our protocol on two deployments of resource-constrained nodes
achieving 95.8% reliability against strong, unknown WiFi interference. Our
results outperform baselines such as non-adaptive ST protocols (27%) and PID
controllers, and show a performance close to hand-crafted and more
sophisticated solutions, such as Crystal (99%)
A Bayesian Framework for Digital Twin-Based Control, Monitoring, and Data Collection in Wireless Systems
Commonly adopted in the manufacturing and aerospace sectors, digital twin
(DT) platforms are increasingly seen as a promising paradigm to control,
monitor, and analyze software-based, "open", communication systems. Notably, DT
platforms provide a sandbox in which to test artificial intelligence (AI)
solutions for communication systems, potentially reducing the need to collect
data and test algorithms in the field, i.e., on the physical twin (PT). A key
challenge in the deployment of DT systems is to ensure that virtual control
optimization, monitoring, and analysis at the DT are safe and reliable,
avoiding incorrect decisions caused by "model exploitation". To address this
challenge, this paper presents a general Bayesian framework with the aim of
quantifying and accounting for model uncertainty at the DT that is caused by
limitations in the amount and quality of data available at the DT from the PT.
In the proposed framework, the DT builds a Bayesian model of the communication
system, which is leveraged to enable core DT functionalities such as control
via multi-agent reinforcement learning (MARL), monitoring of the PT for anomaly
detection, prediction, data-collection optimization, and counterfactual
analysis. To exemplify the application of the proposed framework, we
specifically investigate a case-study system encompassing multiple sensing
devices that report to a common receiver. Experimental results validate the
effectiveness of the proposed Bayesian framework as compared to standard
frequentist model-based solutions.Comment: Accepted for publication in IEEE Journal on Selected Areas in
Communications ; Extends and subsumes arXiv:2210.05582 ; Updates: -
18/01/2023: Updated reference ; - 29/08/2023: Revised manuscript versio
Delays in Reinforcement Learning
Delays are inherent to most dynamical systems. Besides shifting the process
in time, they can significantly affect their performance. For this reason, it
is usually valuable to study the delay and account for it. Because they are
dynamical systems, it is of no surprise that sequential decision-making
problems such as Markov decision processes (MDP) can also be affected by
delays. These processes are the foundational framework of reinforcement
learning (RL), a paradigm whose goal is to create artificial agents capable of
learning to maximise their utility by interacting with their environment.
RL has achieved strong, sometimes astonishing, empirical results, but delays
are seldom explicitly accounted for. The understanding of the impact of delay
on the MDP is limited. In this dissertation, we propose to study the delay in
the agent's observation of the state of the environment or in the execution of
the agent's actions. We will repeatedly change our point of view on the problem
to reveal some of its structure and peculiarities. A wide spectrum of delays
will be considered, and potential solutions will be presented. This
dissertation also aims to draw links between celebrated frameworks of the RL
literature and the one of delays
Autonomous Network Defence Using Multi-Agent Reinforcement Learning and Self-Play
Early threat detection is an increasing part of the cybersecurity landscape, given the growing scale and scope of cyberattacks in the recent years. Increasing exploitation of software vulnerabilities, especially in the manufacturing sector, demonstrates the ongoing need for autonomous network defence. In this work, we model the problem as a zero-sum Markov game between an attacker and defender reinforcement learning agents. Previous methods test their approach on a single topology or limit the agents to a subset of the network. However, real world networks are rarely fixed and often add or remove hosts based on demand, link failures, outages, or other factors. We do not confine our research to a fixed network in terms of size and topology, but instead are interested in larger networks and varied topologies to determine the scalability and robustness of the approach. We consider additional topologies and a robust training curriculum that incorporates network topologies to build more general, capable agents. We also use PPO which offers a good balance of computational complexity and convergence speed
Urban Visual Intelligence: Studying Cities with AI and Street-level Imagery
The visual dimension of cities has been a fundamental subject in urban
studies, since the pioneering work of scholars such as Sitte, Lynch, Arnheim,
and Jacobs. Several decades later, big data and artificial intelligence (AI)
are revolutionizing how people move, sense, and interact with cities. This
paper reviews the literature on the appearance and function of cities to
illustrate how visual information has been used to understand them. A
conceptual framework, Urban Visual Intelligence, is introduced to
systematically elaborate on how new image data sources and AI techniques are
reshaping the way researchers perceive and measure cities, enabling the study
of the physical environment and its interactions with socioeconomic
environments at various scales. The paper argues that these new approaches
enable researchers to revisit the classic urban theories and themes, and
potentially help cities create environments that are more in line with human
behaviors and aspirations in the digital age
- …