6 research outputs found

    Brook Auto: High-Level Certification-Friendly Programming for GPU-powered Automotive Systems

    Get PDF
    Modern automotive systems require increased performance to implement Advanced Driving Assistance Systems (ADAS). GPU-powered platforms are promising candidates for such computational tasks, however current low-level programming models challenge the accelerator software certification process, while they limit the hardware selection to a fraction of the available platforms. In this paper we present Brook Auto, a high-level programming language for automotive GPU systems which removes these limitations. We describe the challenges and solutions we faced in its implementation, as well as a complete evaluation in terms of performance and productivity, which shows the effectiveness of our method.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer ReviewedPostprint (author's final draft

    Towards general purpose computations on low-end mobile GPUs

    Get PDF
    GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2Ă— over the CPU.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Leonidas Kosmidis is also funded by the Spanish Ministry of Education under the FPU grant AP2010-4208.Postprint (author's final draft

    Towards general purpose computations on low-end mobile GPUs

    No full text
    GPUs traditionally offer high computational capabilities, frequently higher than their CPU counterparts. While high-end mobile GPUs vendors introduced recently general purpose APIs, such as OpenCL, to leverage their computational power, the vast majority of the mobile devices lack such support. Despite that their graphics APIs have similarities with desktop graphics APIs, they have significant differences, which prevent the use of well-known techniques that offer general-purpose computations over such interfaces. In this paper we show how these obstacles can be overcome, in order to achieve general purpose programmability of these devices. As a proof of concept we implemented our proposal on a real embedded platform (Raspberry Pi) based on Broadcom's VideoCore IV GPU, obtaining a speedup of 7.2Ă— over the CPU.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence. Leonidas Kosmidis is also funded by the Spanish Ministry of Education under the FPU grant AP2010-4208

    Optimisation opportunities and evaluation for GPGPU applications on low-end mobile GPUs

    No full text
    Previous works in the literature have shown the feasibility of general purpose computations for non-visual applications on low-end mobile graphics processors using graphics APIs. These works focused only on the functional aspects of the software, ignoring the implementation details and therefore their performance implications due to their particular micro-architecture. Since various steps in such applications can be implemented in multiple ways, we identify optimisation opportunities, explore the different options and evaluate them. We show that the implementation details can significantly affect the obtained performance with discrepancies up to 3 orders of magnitude and we demonstrate the effectiveness of our proposal on two embedded platforms, obtaining more than 16Ă— speedup over benchmarks designed following OpenGL ES 2 best practices.This work has been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316-P and the HiPEAC Network of Excellence.Peer Reviewe

    An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems

    Get PDF
    Modern and future automotive systems incorporate several Advanced Driving Assistance Systems (ADAS). Those systems require significant performance that cannot be provided with traditional automotive processors and programming models. Multicore CPUs and Nvidia GPUs using CUDA are currently considered by both automotive industry and research community to provide the necessary computational power. However, despite several recent published works in this domain, there is an absolute lack of open implementations of GPU-based ADAS software, that can be used for benchmarking candidate platforms. In this work, we present a multi-CPU and GPU implementation of an open implementation of a pedestrian detection benchmark based on the Viola-Jones image recognition algorithm. We present our optimization strategies and evaluate our implementation on a multiprocessor system featuring multiple GPUs, showing an overall 88.5Ă— speedup over the sequential version.This work has been supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316P, the HiPEAC Network of Excellence and a Microsoft sponsored ACM SRC. The first two authors acknowledge Dr. Petrisor for her assistance in understanding and using the sequential version of the benchmark and dedicate this article to the memory of the late beloved advisor prof. Nacho Navarro, without whom this work would not have been possible.Peer Reviewe

    An open benchmark implementation for multi-CPU multi-GPU pedestrian detection in automotive systems

    No full text
    Modern and future automotive systems incorporate several Advanced Driving Assistance Systems (ADAS). Those systems require significant performance that cannot be provided with traditional automotive processors and programming models. Multicore CPUs and Nvidia GPUs using CUDA are currently considered by both automotive industry and research community to provide the necessary computational power. However, despite several recent published works in this domain, there is an absolute lack of open implementations of GPU-based ADAS software, that can be used for benchmarking candidate platforms. In this work, we present a multi-CPU and GPU implementation of an open implementation of a pedestrian detection benchmark based on the Viola-Jones image recognition algorithm. We present our optimization strategies and evaluate our implementation on a multiprocessor system featuring multiple GPUs, showing an overall 88.5Ă— speedup over the sequential version.This work has been supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316P, the HiPEAC Network of Excellence and a Microsoft sponsored ACM SRC. The first two authors acknowledge Dr. Petrisor for her assistance in understanding and using the sequential version of the benchmark and dedicate this article to the memory of the late beloved advisor prof. Nacho Navarro, without whom this work would not have been possible.Peer Reviewe
    corecore