156 research outputs found
Reinforcement learning in large state action spaces
Reinforcement learning (RL) is a promising framework for training intelligent agents which learn to optimize long term utility by directly interacting with the environment. Creating RL methods which scale to large state-action spaces is a critical problem towards ensuring real world deployment of RL systems. However, several challenges limit the applicability of RL to large scale settings. These include difficulties with exploration, low sample efficiency, computational intractability, task constraints like decentralization and lack of guarantees about important properties like performance, generalization and robustness in potentially unseen scenarios.
This thesis is motivated towards bridging the aforementioned gap. We propose several principled algorithms and frameworks for studying and addressing the above challenges RL. The proposed methods cover a wide range of RL settings (single and multi-agent systems (MAS) with all the variations in the latter, prediction and control, model-based and model-free methods, value-based and policy-based methods). In this work we propose the first results on several different problems: e.g. tensorization of the Bellman equation which allows exponential sample efficiency gains (Chapter 4), provable suboptimality arising from structural constraints in MAS(Chapter 3), combinatorial generalization results in cooperative MAS(Chapter 5), generalization results on observation shifts(Chapter 7), learning deterministic policies in a probabilistic RL framework(Chapter 6). Our algorithms exhibit provably enhanced performance and sample efficiency along with better scalability. Additionally, we also shed light on generalization aspects of the agents under different frameworks. These properties have been been driven by the use of several advanced tools (e.g. statistical machine learning, state abstraction, variational inference, tensor theory).
In summary, the contributions in this thesis significantly advance progress towards making RL agents ready for large scale, real world applications
Optimal Status Updates for Minimizing Age of Correlated Information in IoT Networks with Energy Harvesting Sensors
Many real-time applications of the Internet of Things (IoT) need to deal with
correlated information generated by multiple sensors. The design of efficient
status update strategies that minimize the Age of Correlated Information (AoCI)
is a key factor. In this paper, we consider an IoT network consisting of
sensors equipped with the energy harvesting (EH) capability. We optimize the
average AoCI at the data fusion center (DFC) by appropriately managing the
energy harvested by sensors, whose true battery states are unobservable during
the decision-making process. Particularly, we first formulate the dynamic
status update procedure as a partially observable Markov decision process
(POMDP), where the environmental dynamics are unknown to the DFC. In order to
address the challenges arising from the causality of energy usage, unknown
environmental dynamics, unobservability of sensors'true battery states, and
large-scale discrete action space, we devise a deep reinforcement learning
(DRL)-based dynamic status update algorithm. The algorithm leverages the
advantages of the soft actor-critic and long short-term memory techniques.
Meanwhile, it incorporates our proposed action decomposition and mapping
mechanism. Extensive simulations are conducted to validate the effectiveness of
our proposed algorithm by comparing it with available DRL algorithms for
POMDPs
Towards an Unsupervised Bayesian Network Pipeline for Explainable Prediction, Decision Making and Discovery
An unsupervised learning pipeline for discrete Bayesian networks is proposed to facilitate prediction, decision making, discovery of patterns, and transparency in challenging real-world AI applications, and contend with data limitations. We explore methods for discretizing data, and notably apply the pipeline to prediction and prevention of preterm birth
Approximate information state based convergence analysis of recurrent Q-learning
In spite of the large literature on reinforcement learning (RL) algorithms
for partially observable Markov decision processes (POMDPs), a complete
theoretical understanding is still lacking. In a partially observable setting,
the history of data available to the agent increases over time so most
practical algorithms either truncate the history to a finite window or compress
it using a recurrent neural network leading to an agent state that is
non-Markovian. In this paper, it is shown that in spite of the lack of the
Markov property, recurrent Q-learning (RQL) converges in the tabular setting.
Moreover, it is shown that the quality of the converged limit depends on the
quality of the representation which is quantified in terms of what is known as
an approximate information state (AIS). Based on this characterization of the
approximation error, a variant of RQL with AIS losses is presented. This
variant performs better than a strong baseline for RQL that does not use AIS
losses. It is demonstrated that there is a strong correlation between the
performance of RQL over time and the loss associated with the AIS
representation.Comment: 25 pages, 6 figure
Pushing the Boundaries of Spacecraft Autonomy and Resilience with a Custom Software Framework and Onboard Digital Twin
This research addresses the high CubeSat mission failure rates caused by inadequate software and overreliance on ground control. By applying a reliable design methodology to flight software development and developing an onboard digital twin platform with fault prediction capabilities, this study provides a solution to increase satellite resilience and autonomy, thus reducing the risk of mission failure. These findings have implications for spacecraft of all sizes, paving the way for more resilient space missions
Assessment of the Robustness of Deep Neural Networks (DNNs)
In the past decade, Deep Neural Networks (DNNs) have demonstrated outstanding performance in various domains. However, recently, some researchers have shown that DNNs are surprisingly vulnerable to adversarial attacks. For instance, adding a small, human-imperceptible perturbation to an input image can fool DNNs, enabling the model to make an arbitrarily wrong prediction with high confidence. This raises serious concerns about the readiness of deep learning models, particularly in safety-critical applications, such as surveillance systems, autonomous vehicles, and medical applications. Hence, it is vital to investigate the performance of DNNs in an adversarial environment. In this thesis, we study the robustness of DNNs in three aspects: adversarial attacks, adversarial defence, and robustness verification. First, we address the robustness problems on video models and propose DeepSAVA, a sparse adversarial attack on video models. It aims to add human-imperceptible perturbations on the crucial frame of the input video to fool classifiers. Additionally, we construct a novel adversarial training framework based on the perturbations generated by DeepSAVA to increase the robustness of video classification models. The results show that DeepSAVA runs a relatively sparse attack on video models, yet achieves state-of-the-art performance in terms of attack success rate and adversarial transferability. Next, we address the challenges of robustness verification in two deep learning models: 3D point cloud models and cooperative multi-agent reinforcement learning models (c-MARLs). Robustness verification aims to provide solid proof of robustness within an input space to any adversarial attacks. To verify the robustness of 3D point cloud models, we propose an efficient verification framework, 3DVerifier, which tackles the challenges of cross-non-linearity operations in multiplication layers and the high computational complexity of high-dimensional point cloud inputs. We use a linear relaxation function to bound the multiplication layer and combine forward and backward propagation to compute the certified bounds of the outputs of the point cloud models. For certifying the c-MARLs, we propose a novel certification method, which is the first work to leverage a scalable approach for c-MARLs to determine actions with guaranteed certified bounds. The challenges of c-MARL certification are accumulated uncertainty as the number of agents increases and the potential lack of impact when changing the action of a single agent into a global team reward. These challenges prevent me from using existing algorithms directly. We employ the false discovery rate (FDR) controlling procedure, considering the importance of each agent to certify per-state robustness and propose a tree-search-based algorithm to find a lower bound of the global reward under the minimal certified perturbation. The experimental results show that the obtained certification bounds are much tighter than those of state-of-the-art RL certification solutions. In summary, this thesis focuses on assessing the robustness of deep learning models that are widely applied in safety-critical systems but rarely studied by the community. This thesis not only investigates the motivation and challenges of assessing the robustness of these deep learning models but also proposes novel and effective approaches to tackle these challenges
- …