1,129 research outputs found

    Sample efficiency, transfer learning and interpretability for deep reinforcement learning

    Get PDF
    Deep learning has revolutionised artificial intelligence, where the application of increased compute to train neural networks on large datasets has resulted in improvements in real-world applications such as object detection, text-to-speech synthesis and machine translation. Deep reinforcement learning (DRL) has similarly shown impressive results in board and video games, but less so in real-world applications such as robotic control. To address this, I have investigated three factors prohibiting further deployment of DRL: sample efficiency, transfer learning, and interpretability. To decrease the amount of data needed to train DRL systems, I have explored various storage strategies and exploration policies for episodic control (EC) algorithms, resulting in the application of online clustering to improve the memory efficiency of EC algorithms, and the maximum entropy mellowmax policy for improving the sample efficiency and final performance of the same EC algorithms. To improve performance during transfer learning, I have shown that a multi-headed neural network architecture trained using hierarchical reinforcement learning can retain the benefits of positive transfer between tasks while mitigating the interference effects of negative transfer. I additionally investigated the use of multi-headed architectures to reduce catastrophic forgetting under the continual learning setting. While the use of multiple heads worked well within a simple environment, it was of limited use within a more complex domain, indicating that this strategy does not scale well. Finally, I applied a wide range of quantitative and qualitative techniques to better interpret trained DRL agents. In particular, I compared the effects of training DRL agents both with and without visual domain randomisation (DR), a popular technique to achieve simulation-to-real transfer, providing a series of tests that can be applied before real-world deployment. One of the major findings is that DR produces more entangled representations within trained DRL agents, indicating quantitatively that they are invariant to nuisance factors associated with the DR process. Additionally, while my environment allowed agents trained without DR to succeed without requiring complex recurrent processing, all agents trained with DR appear to integrate information over time, as evidenced through ablations on the recurrent state.Open Acces

    Online Matrix Completion with Side Information

    Get PDF
    This thesis considers the problem of binary matrix completion with side information in the online setting and the applications thereof. The side information provides additional information on the rows and columns and can yield improved results compared to when such information is not available. We present efficient and general algorithms in transductive and inductive models. The performance guarantees that we prove are with respect to the matrix complexity measures of the max-norm and the margin complexity. We apply our bounds to the hypothesis class of biclustered matrices. Such matrices can be permuted through the rows and columns into homogeneous latent blocks. This class is a natural choice for our problem since the margin complexity and max-norm of these matrices have an upper bound that is easy to interpret in terms of the latent dimensions. We also apply our algorithms to a novel online multitask setting with RKHS hypothesis classes. In this setting, each task is partitioned in a sequence of segments, where a hypothesis is associated with each segment. Our algorithms are designed to exploit the scenario where the number of associated hypotheses is much smaller than the number of segments. We prove performance guarantees that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting, this is analogous to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]

    Online Multitask Learning with Long-Term Memory

    Get PDF
    We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting this is equivalent to switching with long-term memory in the sense of [Bousquet and Warmuth; 2003]. We provide an algorithm that predicts on each trial in time linear in the number of hypotheses when the hypothesis class is finite. We also consider infinite hypothesis classes from reproducing kernel Hilbert spaces for which we give an algorithm whose per trial time complexity is cubic in the number of cumulative trials. In the single-task special case this is the first example of an efficient regret-bounded switching algorithm with long-term memory for a non-parametric hypothesis class

    Enabling Deep Intelligence on Embedded Systems

    Get PDF
    As deep learning for resource-constrained systems become more popular, we see an increased number of intelligent embedded systems such as IoT devices, robots, autonomous vehicles, and the plethora of portable, wearable, and mobile devices that are feature-packed with a wide variety of machine learning tasks. However, the performance of DNNs (deep neural networks) running on an embedded system is significantly limited by the platform's CPU, memory, and battery-size; and their scope is limited to simplistic inference tasks only. This dissertation proposes on-device deep learning algorithms and supporting hardware designs, enabling embedded systems to efficiently perform deep intelligent tasks (i.e., deep neural networks) that are high-memory-footprint, compute-intensive, and energy-hungry beyond their limited computing resources. We name such on-device deep intelligence on embedded systems as Embedded Deep Intelligence. Specifically, we introduce resource-aware learning strategies devised to overcome the four fundamental constraints of embedded systems imposed on the way towards Embedded Deep Intelligence, i.e., in-memory multitask learning via introducing the concept of Neural Weight Virtualization, adaptive real-time learning via introducing the concept of SubFlow, opportunistic accelerated learning via introducing the concept of Neuro.ZERO, and energy-aware intermittent learning, which tackles the problems of the small size of memory, dynamic timing constraint, low-computing capability, and limited energy, respectively. Once deployed in the field with the proposed resource-aware learning strategies, embedded systems are not only able to perform deep inference tasks on sensor data but also update and re-train their learning models at run-time without requiring any help from any external system. Such an on-device learning capability of Embedded Deep Intelligence makes an embedded intelligent system real-time, privacy-aware, secure, autonomous, untethered, responsive, and adaptive without concern for its limited resources.Doctor of Philosoph

    Exploring the effects of robotic design on learning and neural control

    Full text link
    The ongoing deep learning revolution has allowed computers to outclass humans in various games and perceive features imperceptible to humans during classification tasks. Current machine learning techniques have clearly distinguished themselves in specialized tasks. However, we have yet to see robots capable of performing multiple tasks at an expert level. Most work in this field is focused on the development of more sophisticated learning algorithms for a robot's controller given a largely static and presupposed robotic design. By focusing on the development of robotic bodies, rather than neural controllers, I have discovered that robots can be designed such that they overcome many of the current pitfalls encountered by neural controllers in multitask settings. Through this discovery, I also present novel metrics to explicitly measure the learning ability of a robotic design and its resistance to common problems such as catastrophic interference. Traditionally, the physical robot design requires human engineers to plan every aspect of the system, which is expensive and often relies on human intuition. In contrast, within the field of evolutionary robotics, evolutionary algorithms are used to automatically create optimized designs, however, such designs are often still limited in their ability to perform in a multitask setting. The metrics created and presented here give a novel path to automated design that allow evolved robots to synergize with their controller to improve the computational efficiency of their learning while overcoming catastrophic interference. Overall, this dissertation intimates the ability to automatically design robots that are more general purpose than current robots and that can perform various tasks while requiring less computation.Comment: arXiv admin note: text overlap with arXiv:2008.0639
    • …
    corecore