28 research outputs found
DeepSplit: Scalable Verification of Deep Neural Networks via Operator Splitting
Analyzing the worst-case performance of deep neural networks against input
perturbations amounts to solving a large-scale non-convex optimization problem,
for which several past works have proposed convex relaxations as a promising
alternative. However, even for reasonably-sized neural networks, these
relaxations are not tractable, and so must be replaced by even weaker
relaxations in practice. In this work, we propose a novel operator splitting
method that can directly solve a convex relaxation of the problem to high
accuracy, by splitting it into smaller sub-problems that often have analytical
solutions. The method is modular and scales to problem instances that were
previously impossible to solve exactly due to their size. Furthermore, the
solver operations are amenable to fast parallelization with GPU acceleration.
We demonstrate our method in obtaining tighter bounds on the worst-case
performance of large convolutional networks in image classification and
reinforcement learning settings
Enhancing the Performance of Multi-Agent Reinforcement Learning for Controlling HVAC Systems
Systems for heating, ventilation and air-conditioning (HVAC) of buildings are
traditionally controlled by a rule-based approach. In order to reduce the
energy consumption and the environmental impact of HVAC systems more advanced
control methods such as reinforcement learning are promising. Reinforcement
learning (RL) strategies offer a good alternative, as user feedback can be
integrated more easily and presence can also be incorporated. Moreover,
multi-agent RL approaches scale well and can be generalized. In this paper, we
propose a multi-agent RL framework based on existing work that learns reducing
on one hand energy consumption by optimizing HVAC control and on the other hand
user feedback by occupants about uncomfortable room temperatures. Second, we
show how to reduce training time required for proper RL-agent-training by using
parameter sharing between the multiple agents and apply different pretraining
techniques. Results show that our framework is capable of reducing the energy
by around 6% when controlling a complete building or 8% for a single room zone.
The occupants complaints are acceptable or even better compared to a rule-based
baseline. Additionally, our performance analysis show that the training time
can be drastically reduced by using parameter sharing
Scalable Evolutionary Hierarchical Reinforcement Learning
This paper investigates a novel method combining Scalable Evolution
Strategies (S-ES) and Hierarchical Reinforcement Learning (HRL).
S-ES, named for its excellent scalability, was popularised with demonstrated
performance comparable to state-of-the-art policy gradient
methods. However, S-ES has not been tested in conjunction with
HRL methods, which empower temporal abstraction thus allowing
agents to tackle more challenging problems. We introduce a novel
method merging S-ES and HRL, which creates a highly scalable
and efficient (compute time) algorithm. We demonstrate that the
proposed method benefits from S-ES’s scalability and indifference
to delayed rewards. This results in our main contribution: significantly
higher learning speed and competitive performance compared
to gradient-based HRL methods, across a range of tasks
Towards Run-time Efficient Hierarchical Reinforcement Learning
This paper investigates a novel method combining
Scalable Evolution Strategies (S-ES) and Hierarchical Reinforcement
Learning (HRL). S-ES, named for its excellent scalability,
was popularised with demonstrated performance comparable to
state-of-the-art policy gradient methods. However, S-ES has not
been tested in conjunction with HRL methods, which empower
temporal abstraction thus allowing agents to tackle more challenging
problems. We introduce a novel method merging S-ES
and HRL, which creates a highly scalable and efficient (compute
time) algorithm. We demonstrate that the proposed method
benefits from S-ES’s scalability and indifference to delayed
rewards. This results in our main contribution: significantly
higher learning speed and competitive performance compared
to gradient-based HRL methods, across a range of tasks
Autonomous Drone Landings on an Unmanned Marine Vehicle using Deep Reinforcement Learning
This thesis describes with the integration of an Unmanned Surface Vehicle (USV) and an Unmanned Aerial Vehicle (UAV, also commonly known as drone) in a single Multi-Agent System (MAS). In marine robotics, the advantage offered by a MAS consists of exploiting the key features of a single robot to compensate for the shortcomings in the other. In this way, a USV can serve as the landing platform to alleviate the need for a UAV to be airborne for long periods time, whilst the latter can increase the overall environmental awareness thanks to the possibility to cover large portions of the prevailing environment with a camera (or more than one) mounted on it. There are numerous potential applications in which this system can be used, such as deployment in search and rescue missions, water and coastal monitoring, and reconnaissance and force protection, to name but a few.
The theory developed is of a general nature. The landing manoeuvre has been accomplished mainly identifying, through artificial vision techniques, a fiducial marker placed on a flat surface serving as a landing platform. The raison d'etre for the thesis was to propose a new solution for autonomous landing that relies solely on onboard sensors and with minimum or no communications between the vehicles. To this end, initial work solved the problem while using only data from the cameras mounted on the in-flight drone. In the situation in which the tracking of the marker is interrupted, the current position of the USV is estimated and integrated into the control commands. The limitations of classic control theory used in this approached suggested the need for a new solution that empowered the flexibility of intelligent methods, such as fuzzy logic or artificial neural networks. The recent achievements obtained by deep reinforcement learning (DRL) techniques in end-to-end control in playing the Atari video-games suite represented a fascinating while challenging new way to see and address the landing problem. Therefore, novel architectures were designed for approximating the action-value function of a Q-learning algorithm and used to map raw input observation to high-level navigation actions. In this way, the UAV learnt how to land from high latitude without any human supervision, using only low-resolution grey-scale images and with a level of accuracy and robustness. Both the approaches have been implemented on a simulated test-bed based on Gazebo simulator and the model of the Parrot AR-Drone. The solution based on DRL was further verified experimentally using the Parrot Bebop 2 in a series of trials. The outcomes demonstrate that both these innovative methods are both feasible and practicable, not only in an outdoor marine scenario but also in indoor ones as well
Recommended from our members
Algorithms for Optimal Paths of One, Many, and an Infinite Number of Agents
In this dissertation, we provide efficient algorithms for modeling the behavior of a single agent, multiple agents, and a continuum of agents. For a single agent, we combine the modeling framework of optimal control with advances in optimization splitting in order to efficiently find optimal paths for problems in very high-dimensions, thus providing alleviation from the curse of dimensionality. For a multiple, but finite, number of agents, we take the framework of multi-agent reinforcement learning and utilize imitation learning in order to decentralize a centralized expert, thus obtaining optimal multi-agents that act in a decentralized fashion. For a continuum of agents, we take the framework of mean-field games and use two neural networks, which we train in an alternating scheme, in order to efficiently find optimal paths for high-dimensional and stochastic problems. These tools cover a wide variety of use-cases that can be immediately deployed for practical applications