Search CORE

20,013 research outputs found

Прикладні аспекти програмної реалізації системи розпізнавання зображень профілю лазерного променя на основі паралельно- ієрархічних мереж

Author: Матейчук Максим
Яровий А. А.
Publication venue: ВНТУ
Publication date: 01/01/2014
Field of study

В роботі показано способи організації процесу паралельно-ієрархічної обробки інформації в CPU- та GPU-системах. Результатом є підвищення швидкодії розпізнавання зображень профілю лазерного променя на основі паралельно-ієрархічних мереж. Наведено алгоритми розв’язання поставленої задачі на основі CPU та GPU-орієнтованої апаратної платформи.It is shown how to organize the process of parallel-hierarchical information processing in CPU- and GPU-systems. The result is an increase in laser beam images recognition process based on parallel- hierarchical networks. The algorithms of solving the formulated problem based on GPU and CPU hardware platforms are given

Repository of Vinnytsia National Technical University

EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices

Author: Almeida Mario
Lane Nicholas D.
Laskaridis Stefanos
Leontiadis Ilias
Venieris Stylianos I.
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2019
Field of study

In recent years, advances in deep learning have resulted in unprecedented leaps in diverse tasks spanning from speech and object recognition to context awareness and health monitoring. As a result, an increasing number of AI-enabled applications are being developed targeting ubiquitous and mobile devices. While deep neural networks (DNNs) are getting bigger and more complex, they also impose a heavy computational and energy burden on the host devices, which has led to the integration of various specialized processors in commodity devices. Given the broad range of competing DNN architectures and the heterogeneity of the target hardware, there is an emerging need to understand the compatibility between DNN-platform pairs and the expected performance benefits on each platform. This work attempts to demystify this landscape by systematically evaluating a collection of state-of-the-art DNNs on a wide variety of commodity devices. In this respect, we identify potential bottlenecks in each architecture and provide important guidelines that can assist the community in the co-design of more efficient DNNs and accelerators.Comment: Accepted at MobiSys 2019: 3rd International Workshop on Embedded and Mobile Deep Learning (EMDL), 201

arXiv.org e-Print Archive

Crossref

DeepPicar: A Low-cost Deep Neural Network-based Autonomous Car

Author: Bechtel Michael G.
Kim Minje
McEllhiney Elise
Yun Heechul
Publication venue
Publication date: 29/07/2018
Field of study

We present DeepPicar, a low-cost deep neural network based autonomous car platform. DeepPicar is a small scale replication of a real self-driving car called DAVE-2 by NVIDIA. DAVE-2 uses a deep convolutional neural network (CNN), which takes images from a front-facing camera as input and produces car steering angles as output. DeepPicar uses the same network architecture---9 layers, 27 million connections and 250K parameters---and can drive itself in real-time using a web camera and a Raspberry Pi 3 quad-core platform. Using DeepPicar, we analyze the Pi 3's computing capabilities to support end-to-end deep learning based real-time control of autonomous vehicles. We also systematically compare other contemporary embedded computing platforms using the DeepPicar's CNN-based real-time control workload. We find that all tested platforms, including the Pi 3, are capable of supporting the CNN-based real-time control, from 20 Hz up to 100 Hz, depending on hardware platform. However, we find that shared resource contention remains an important issue that must be considered in applying CNN models on shared memory based embedded computing platforms; we observe up to 11.6X execution time increase in the CNN based control loop due to shared resource contention. To protect the CNN workload, we also evaluate state-of-the-art cache partitioning and memory bandwidth throttling techniques on the Pi 3. We find that cache partitioning is ineffective, while memory bandwidth throttling is an effective solution.Comment: To be published as a conference paper at RTCSA 201

arXiv.org e-Print Archive

Crossref

A Review on Software Architectures for Heterogeneous Platforms

Author: Andrade Hugo
Crnkovic Ivica
Publication venue
Publication date: 01/01/2018
Field of study

The increasing demands for computing performance have been a reality regardless of the requirements for smaller and more energy efficient devices. Throughout the years, the strategy adopted by industry was to increase the robustness of a single processor by increasing its clock frequency and mounting more transistors so more calculations could be executed. However, it is known that the physical limits of such processors are being reached, and one way to fulfill such increasing computing demands has been to adopt a strategy based on heterogeneous computing, i.e., using a heterogeneous platform containing more than one type of processor. This way, different types of tasks can be executed by processors that are specialized in them. Heterogeneous computing, however, poses a number of challenges to software engineering, especially in the architecture and deployment phases. In this paper, we conduct an empirical study that aims at discovering the state-of-the-art in software architecture for heterogeneous computing, with focus on deployment. We conduct a systematic mapping study that retrieved 28 studies, which were critically assessed to obtain an overview of the research field. We identified gaps and trends that can be used by both researchers and practitioners as guides to further investigate the topic

arXiv.org e-Print Archive

Crossref

Chalmers Research