Search CORE

1,029 research outputs found

Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs

Author: Kadetotad Deepak
Kim Minkyu
Linares Barranco Alejandro
Ríos Navarro José Antonio
Seo Jae-Sun
Tapiador Morales Ricardo
Publication venue: 'SAGE Publications'
Publication date: 01/01/2016
Field of study

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded BlockRAM. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. Both Altera and Xilinx have adopted OpenCL co-design framework from GPU for FPGA designs as a pseudo-automatic development solution. In this paper, a comprehensive evaluation and comparison of Altera and Xilinx OpenCL frameworks for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times

idUS. Depósito de Investigación Universidad de Sevilla

Comprehensive Evaluation of OpenCL-Based CNN Implementations for FPGAs

Author: Kadetotad Deepak
Kim Minkyu
Linares Barranco Alejandro
Ríos Navarro José Antonio
Seo Jae-sun
Tapiador Morales Ricardo
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2017
Field of study

Deep learning has significantly advanced the state of the art in artificial intelligence, gaining wide popularity from both industry and academia. Special interest is around Convolutional Neural Networks (CNN), which take inspiration from the hierarchical structure of the visual cortex, to form deep layers of convolutional operations, along with fully connected classifiers. Hardware implementations of these deep CNN architectures are challenged with memory bottlenecks that require many convolution and fully-connected layers demanding large amount of communication for parallel computation. Multi-core CPU based solutions have demonstrated their inadequacy for this problem due to the memory wall and low parallelism. Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory. OpenCL is commonly used to describe these architectures for their execution on GPGPUs or FPGAs. FPGA design solutions are also actively being explored, which allow implementing the memory hierarchy using embedded parallel BlockRAMs. This boosts the parallel use of shared memory elements between multiple processing units, avoiding data replicability and inconsistencies. This makes FPGAs potentially powerful solutions for real-time classification of CNNs. In this paper both Altera and Xilinx adopted OpenCL co-design frameworks for pseudo-automatic development solutions are evaluated. A comprehensive evaluation and comparison for a 5-layer deep CNN is presented. Hardware resources, temporal performance and the OpenCL architecture for CNNs are discussed. Xilinx demonstrates faster synthesis, better FPGA resource utilization and more compact boards. Altera provides multi-platforms tools, mature design community and better execution times.Ministerio de Economía y Competitividad TEC2016-77785-

idUS. Depósito de Investigación Universidad de Sevilla

Designing BEE: A Hardware Emulation Engine for Signal Processing in Low-Power Wireless Applications

Author
Publication venue: Springer
Publication date
Field of study

Springer - Publisher Connector

A Review on Software Architectures for Heterogeneous Platforms

Author: Andrade Hugo
Crnkovic Ivica
Publication venue
Publication date: 01/01/2018
Field of study

The increasing demands for computing performance have been a reality regardless of the requirements for smaller and more energy efficient devices. Throughout the years, the strategy adopted by industry was to increase the robustness of a single processor by increasing its clock frequency and mounting more transistors so more calculations could be executed. However, it is known that the physical limits of such processors are being reached, and one way to fulfill such increasing computing demands has been to adopt a strategy based on heterogeneous computing, i.e., using a heterogeneous platform containing more than one type of processor. This way, different types of tasks can be executed by processors that are specialized in them. Heterogeneous computing, however, poses a number of challenges to software engineering, especially in the architecture and deployment phases. In this paper, we conduct an empirical study that aims at discovering the state-of-the-art in software architecture for heterogeneous computing, with focus on deployment. We conduct a systematic mapping study that retrieved 28 studies, which were critically assessed to obtain an overview of the research field. We identified gaps and trends that can be used by both researchers and practitioners as guides to further investigate the topic

arXiv.org e-Print Archive

Crossref

Chalmers Research

Heterogeneous Acceleration for 5G New Radio Channel Modelling Using FPGAs and GPUs

Author: SHAH NASIR ALI
Publication venue: country:Italy
Publication date: 09/10/2023
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

Hardware design and CAD for processor-based logic emulation systems.

Author: Yazdanshenas Amir Ali
Publication venue: 'University of Windsor Leddy Library'
Publication date: 01/01/2006
Field of study

Scholarship at UWindsor

Intelligent Embedded Software: New Perspectives and Challenges

Author: Belkebir Djalila
Boutekkouk Fateh
Djouani Ramissa
Lakhdari Saliha
Mahalaine Ridha
Mecibah Zina
Publication venue: 'IntechOpen'
Publication date: 20/12/2017
Field of study

Intelligent embedded systems (IES) represent a novel and promising generation of embedded systems (ES). IES have the capacity of reasoning about their external environments and adapt their behavior accordingly. Such systems are situated in the intersection of two different branches that are the embedded computing and the intelligent computing. On the other hand, intelligent embedded software (IESo) is becoming a large part of the engineering cost of intelligent embedded systems. IESo can include some artificial intelligence (AI)-based systems such as expert systems, neural networks and other sophisticated artificial intelligence (AI) models to guarantee some important characteristics such as self-learning, self-optimizing and self-repairing. Despite the widespread of such systems, some design challenging issues are arising. Designing a resource-constrained software and at the same time intelligent is not a trivial task especially in a real-time context. To deal with this dilemma, embedded system researchers have profited from the progress in semiconductor technology to develop specific hardware to support well AI models and render the integration of AI with the embedded world a reality

IntechOpen

Crossref

An FPGA platform for real-time simulation of spiking neuronal networks

Author: MASSOBRIO PAOLO
Meloni Paolo
Palumbo Francesca
Pani Danilo
RAFFO LUIGI
Tuveri Giuseppe
Publication venue: 'Frontiers Media SA'
Publication date: 01/01/2017
Field of study

In the last years, the idea to dynamically interface biological neurons with artificial ones has become more and more urgent. The reason is essentially due to the design of innovative neuroprostheses where biological cell assemblies of the brain can be substituted by artificial ones. For closed-loop experiments with biological neuronal networks interfaced with in silico modeled networks, several technological challenges need to be faced, from the low-level interfacing between the living tissue and the computational model to the implementation of the latter in a suitable form for real-time processing. Field programmable gate arrays (FPGAs) can improve flexibility when simple neuronal models are required, obtaining good accuracy, real-time performance, and the possibility to create a hybrid system without any custom hardware, just programming the hardware to achieve the required functionality. In this paper, this possibility is explored presenting a modular and efficient FPGA design of an in silico spiking neural network exploiting the Izhikevich model. The proposed system, prototypically implemented on a Xilinx Virtex 6 device, is able to simulate a fully connected network counting up to 1,440 neurons, in real-time, at a sampling rate of 10 kHz, which is reasonable for small to medium scale extra-cellular closed-loop experiments

Frontiers - Publisher Connector

PubMed Central

Archivio istituzionale della ricerca - Università di Cagliari

Archivio istituzionale della ricerca - Università di Genova

Prototyping Methodologies and Design of Communication-centric Heterogeneous Many-core Architectures

Author: Masing Leonard Jannik
Publication venue: KIT-Bibliothek, Karlsruhe
Publication date: 01/01/2020
Field of study

KITopen

Using Rapid Prototyping in Computer Architecture Design Laboratories

Author: Binh Dao
Henry Owen
James Hamblen
Sudhakar Yalamanchili
Publication venue
Publication date: 01/01/1996
Field of study

This paper describes the undergraduate computer architecture courses and laboratories introduced at Georgia Tech during the past two years. A core sequence of six required courses for computer engineering students has been developed. In this paper, emphasis is placed upon the new core laboratories which utilize commercial CAD tools, FPGAs, hardware emulators, and a VHDL based rapid prototyping approach to simulate, synthesize, and implement prototype computer hardware

CiteSeerX

Crossref