1,871 research outputs found

    Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

    Full text link
    Deep Learning is increasingly being adopted by industry for computer vision applications running on embedded devices. While Convolutional Neural Networks' accuracy has achieved a mature and remarkable state, inference latency and throughput are a major concern especially when targeting low-cost and low-power embedded platforms. CNNs' inference latency may become a bottleneck for Deep Learning adoption by industry, as it is a crucial specification for many real-time processes. Furthermore, deployment of CNNs across heterogeneous platforms presents major compatibility issues due to vendor-specific technology and acceleration libraries. In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices. We show that, an optimized combination can achieve 45x speedup in inference latency on CPU compared to a dependency-free baseline and 2x on average on GPGPU compared to the best vendor library. Further, we demonstrate that, the quality of results and time "to-solution" is much better than with Random Search and achieves up to 15x better results for a short-time search

    Brook Auto: High-Level Certification-Friendly Programming for GPU-powered Automotive Systems

    Get PDF
    Modern automotive systems require increased performance to implement Advanced Driving Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such computational tasks, however current low-level programming models challenge the accelerator software certification process, while they limit the hardware selection to a fraction of the available platforms. In this paper we present Brook Auto, a high-level programming language for automotive GPU systems which removes these limitations. We describe the challenges and solutions we faced in its implementation, as well as a complete evaluation in terms of performance and productivity, which shows the effectiveness of our method.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    GPGPU: Hardware/Software Co-Design for the Masses

    Get PDF
    With the recent development of high-performance graphical processing units (GPUs), capable of performing general-purpose computation (GPGPU: general-purpose computation on the GPU), a new platform is emerging. It consists of a central processing unit (CPU), which is very fast in sequential execution, and a GPU, which exhibits high degree of parallelism and thus very high performance on certain types of computations. Optimally leveraging the advantages of this platform is challenging in practice. We spotlight the analogy between GPGPU and hardware/software co-design (HSCD), a more mature design paradigm, to derive a design process for GPGPU. This process, with appropriate tool support and automation, will ease GPGPU design significantly. Identifying the challenges associated with establishing this process can serve as a roadmap for the future development of the GPGPU field
    corecore