149,367 research outputs found

    Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

    Full text link
    Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search

    Towards a framework for investigating tangible environments for learning

    Get PDF
    External representations have been shown to play a key role in mediating cognition. Tangible environments offer the opportunity for novel representational formats and combinations, potentially increasing representational power for supporting learning. However, we currently know little about the specific learning benefits of tangible environments, and have no established framework within which to analyse the ways that external representations work in tangible environments to support learning. Taking external representation as the central focus, this paper proposes a framework for investigating the effect of tangible technologies on interaction and cognition. Key artefact-action-representation relationships are identified, and classified to form a structure for investigating the differential cognitive effects of these features. An example scenario from our current research is presented to illustrate how the framework can be used as a method for investigating the effectiveness of differential designs for supporting science learning

    EMBEDDED LEARNING ROBOT WITH FUZZY Q-LEARNING FOR OBSTACLE AVOIDANCE BEHAVIOR

    Get PDF
    Fuzzy Q-learning is extending of Q-learning algorithm that uses fuzzy inference system to enable Q-learning holding continuous action and state. This learning has been implemented in various robot learning application like obstacle avoidance and target searching. However, most of them have not been realized in embedded robot. This paper presents implementation of fuzzy Q-learning for obstacle avoidance navigation in embedded mobile robot. The experimental result demonstrates that fuzzy Q-learning enables robot to be able to learn the right policy i.e. to avoid obstacle

    A Survey on Compiler Autotuning using Machine Learning

    Full text link
    Since the mid-1990s, researchers have been trying to use machine-learning based approaches to solve a number of different compiler optimization problems. These techniques primarily enhance the quality of the obtained results and, more importantly, make it feasible to tackle two main compiler optimization problems: optimization selection (choosing which optimizations to apply) and phase-ordering (choosing the order of applying optimizations). The compiler optimization space continues to grow due to the advancement of applications, increasing number of compiler optimizations, and new target architectures. Generic optimization passes in compilers cannot fully leverage newly introduced optimizations and, therefore, cannot keep up with the pace of increasing options. This survey summarizes and classifies the recent advances in using machine learning for the compiler optimization field, particularly on the two major problems of (1) selecting the best optimizations and (2) the phase-ordering of optimizations. The survey highlights the approaches taken so far, the obtained results, the fine-grain classification among different approaches and finally, the influential papers of the field.Comment: version 5.0 (updated on September 2018)- Preprint Version For our Accepted Journal @ ACM CSUR 2018 (42 pages) - This survey will be updated quarterly here (Send me your new published papers to be added in the subsequent version) History: Received November 2016; Revised August 2017; Revised February 2018; Accepted March 2018

    Design of multimedia processor based on metric computation

    Get PDF
    Media-processing applications, such as signal processing, 2D and 3D graphics rendering, and image compression, are the dominant workloads in many embedded systems today. The real-time constraints of those media applications have taxing demands on today's processor performances with low cost, low power and reduced design delay. To satisfy those challenges, a fast and efficient strategy consists in upgrading a low cost general purpose processor core. This approach is based on the personalization of a general RISC processor core according the target multimedia application requirements. Thus, if the extra cost is justified, the general purpose processor GPP core can be enforced with instruction level coprocessors, coarse grain dedicated hardware, ad hoc memories or new GPP cores. In this way the final design solution is tailored to the application requirements. The proposed approach is based on three main steps: the first one is the analysis of the targeted application using efficient metrics. The second step is the selection of the appropriate architecture template according to the first step results and recommendations. The third step is the architecture generation. This approach is experimented using various image and video algorithms showing its feasibility
    • …
    corecore