Search CORE

2 research outputs found

Sample efficiency, transfer learning and interpretability for deep reinforcement learning

Author: Arulkumaran Kailash
Publication venue: Bioengineering, Imperial College London
Publication date: 01/06/2020
Field of study

Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

Spiral - Imperial College Digital Repository

An analysis of emergency tracheal intubations in critically ill patients by critical care trainees

Author: Arulkumaran Kailash
Arulkumaran Nishkantha
Cecconi Maurizio
McLaren Charles S
Philips Barbara J
Publication venue: 'SAGE Publications'
Publication date: 01/08/2018
Field of study

Introduction We evaluated intensive care medicine trainees' practice of emergency intubations in the United Kingdom. Methods Retrospective analysis of 881 in-hospital emergency intubations over a three-year period using an online trainee logbook. Results Emergency intubations out-of-hours were less frequent than in-hours, both on weekdays and weekends. Complications occurred in 9% of cases, with no association with time of day/day of week (p = 0.860). Complications were associated with higher Cormack and Lehane grades (p=0.004) and number of intubation attempts (p < 0.001), but not American Society of Anesthesiologist grade. Capnography usage was ≥99% in all locations except in wards (85%; p = 0.001). Ward patients were the oldest (p < 0.001), had higher American Society of Anesthesiologist grades (p < 0.001) and lowest Glasgow Coma Scale (p < 0.001). Conclusions Complications of intubations are associated with higher Cormack and Lehane grades and number of attempts, but not time of day/day of week. The uptake of capnography is reassuring, although there is scope for improvement on the ward

Crossref

UCL Discovery

Sussex Research Online