1,129 research outputs found
Sample efficiency, transfer learning and interpretability for deep reinforcement learning
Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces
Online Matrix Completion with Side Information
This thesis considers the problem of binary matrix completion with side information in the online setting and the applications thereof. The side information provides additional information on the rows and columns and can yield improved results compared to when such information is not available. We present efficient and general algorithms in transductive and inductive models. The performance guarantees that we prove are with respect to the matrix complexity measures of the max-norm and the margin complexity. We apply our bounds to the hypothesis class of biclustered matrices. Such matrices can be permuted through the rows and columns into homogeneous latent blocks. This class is a natural choice for our problem since the margin complexity and max-norm of these matrices have an upper bound that is easy to interpret in terms of the latent dimensions. We also apply our algorithms to a novel online multitask setting with RKHS hypothesis classes. In this setting, each task is partitioned in a sequence of segments, where a hypothesis is associated with each segment. Our algorithms are designed to exploit the scenario where the number of associated hypotheses is much smaller than the number of segments. We prove performance guarantees that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting, this is analogous to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]
Online Multitask Learning with Long-Term Memory
We introduce a novel online multitask setting. In this setting each task is
partitioned into a sequence of segments that is unknown to the learner.
Associated with each segment is a hypothesis from some hypothesis class. We
give algorithms that are designed to exploit the scenario where there are many
such segments but significantly fewer associated hypotheses. We prove regret
bounds that hold for any segmentation of the tasks and any association of
hypotheses to the segments. In the single-task setting this is equivalent to
switching with long-term memory in the sense of [Bousquet and Warmuth; 2003].
We provide an algorithm that predicts on each trial in time linear in the
number of hypotheses when the hypothesis class is finite. We also consider
infinite hypothesis classes from reproducing kernel Hilbert spaces for which we
give an algorithm whose per trial time complexity is cubic in the number of
cumulative trials. In the single-task special case this is the first example of
an efficient regret-bounded switching algorithm with long-term memory for a
non-parametric hypothesis class
Recommended from our members
Unconstrained design: improving multitasking with in-vehicle information systems through enhanced situation awareness
In the age of information, in-vehicle multitasking is inevitable. The popularity of the automobile in combination with the demands of everyday life presents a demand to do more than simply focus on the road. Situation Awareness (SA) is a theory that allows designers to understand how operators interact in dynamic, complex environments. Unconstrained Design is proposed as a way of enhancing multitasking performance in-vehicle. This paper presents an experimental investigation into human-machine interface concepts that aim to support drivers to multitask in-vehicle when frequent task switching is required. Two SA-based approaches were investigated, one which focussed on supporting preparation for a Non-Driving Related Activity (NDRA), and one which focussed on supporting the Driving Related Activity (DRA) when an NDRA is active. While multitasking, Contextual Cueing, using a Head-up Display, produced significant reductions in NDRA response time while an auditory lane keeping aid increased the amount of time a driver spent in the central region of a lane. This provides evidence to suggest that using SA and Unconstrained Design as a philosophy for the design of IVIS that supports drivers’ ability to multitask in-vehicle, could lead to task performance improvements.Jaguar Land Rove
Enabling Deep Intelligence on Embedded Systems
As deep learning for resource-constrained systems become more popular, we see an increased number of intelligent embedded systems such as IoT devices, robots, autonomous vehicles, and the plethora of portable, wearable, and mobile devices that are feature-packed with a wide variety of machine learning tasks. However, the performance of DNNs (deep neural networks) running on an embedded system is significantly limited by the platform's CPU, memory, and battery-size; and their scope is limited to simplistic inference tasks only. This dissertation proposes on-device deep learning algorithms and supporting hardware designs, enabling embedded systems to efficiently perform deep intelligent tasks (i.e., deep neural networks) that are high-memory-footprint, compute-intensive, and energy-hungry beyond their limited computing resources. We name such on-device deep intelligence on embedded systems as Embedded Deep Intelligence. Specifically, we introduce resource-aware learning strategies devised to overcome the four fundamental constraints of embedded systems imposed on the way towards Embedded Deep Intelligence, i.e., in-memory multitask learning via introducing the concept of Neural Weight Virtualization, adaptive real-time learning via introducing the concept of SubFlow, opportunistic accelerated learning via introducing the concept of Neuro.ZERO, and energy-aware intermittent learning, which tackles the problems of the small size of memory, dynamic timing constraint, low-computing capability, and limited energy, respectively. Once deployed in the field with the proposed resource-aware learning strategies, embedded systems are not only able to perform deep inference tasks on sensor data but also update and re-train their learning models at run-time without requiring any help from any external system. Such an on-device learning capability of Embedded Deep Intelligence makes an embedded intelligent system real-time, privacy-aware, secure, autonomous, untethered, responsive, and adaptive without concern for its limited resources.Doctor of Philosoph
Exploring the effects of robotic design on learning and neural control
The ongoing deep learning revolution has allowed computers to outclass humans
in various games and perceive features imperceptible to humans during
classification tasks. Current machine learning techniques have clearly
distinguished themselves in specialized tasks. However, we have yet to see
robots capable of performing multiple tasks at an expert level. Most work in
this field is focused on the development of more sophisticated learning
algorithms for a robot's controller given a largely static and presupposed
robotic design. By focusing on the development of robotic bodies, rather than
neural controllers, I have discovered that robots can be designed such that
they overcome many of the current pitfalls encountered by neural controllers in
multitask settings. Through this discovery, I also present novel metrics to
explicitly measure the learning ability of a robotic design and its resistance
to common problems such as catastrophic interference.
Traditionally, the physical robot design requires human engineers to plan
every aspect of the system, which is expensive and often relies on human
intuition. In contrast, within the field of evolutionary robotics, evolutionary
algorithms are used to automatically create optimized designs, however, such
designs are often still limited in their ability to perform in a multitask
setting. The metrics created and presented here give a novel path to automated
design that allow evolved robots to synergize with their controller to improve
the computational efficiency of their learning while overcoming catastrophic
interference.
Overall, this dissertation intimates the ability to automatically design
robots that are more general purpose than current robots and that can perform
various tasks while requiring less computation.Comment: arXiv admin note: text overlap with arXiv:2008.0639
- …