Search CORE

36,782 research outputs found

Acceleration of stereo-matching on multi-core CPU and GPU

Author: Cockshott Paul
Oehler Susanne
Tian Xu
Publication venue
Publication date: 01/01/2014
Field of study

This paper presents an accelerated version of a dense stereo-correspondence algorithm for two different parallelism enabled architectures, multi-core CPU and GPU. The algorithm is part of the vision system developed for a binocular robot-head in the context of the CloPeMa 1 research project. This research project focuses on the conception of a new clothes folding robot with real-time and high resolution requirements for the vision system. The performance analysis shows that the parallelised stereo-matching algorithm has been significantly accelerated, maintaining 12x and 176x speed-up respectively for multi-core CPU and GPU, compared with non-SIMD singlethread CPU. To analyse the origin of the speed-up and gain deeper understanding about the choice of the optimal hardware, the algorithm was broken into key sub-tasks and the performance was tested for four different hardware architectures

CiteSeerX

Enlighten

Hyperdrive: A Multi-Chip Systolically Scalable Binary-Weight CNN Inference Engine

Author: Andri Renzo
Benini Luca
Cavigelli Lukas
Rossi Davide
Publication venue
Publication date: 01/01/2019
Field of study

Deep neural networks have achieved impressive results in computer vision and machine learning. Unfortunately, state-of-the-art networks are extremely compute and memory intensive which makes them unsuitable for mW-devices such as IoT end-nodes. Aggressive quantization of these networks dramatically reduces the computation and memory footprint. Binary-weight neural networks (BWNs) follow this trend, pushing weight quantization to the limit. Hardware accelerators for BWNs presented up to now have focused on core efficiency, disregarding I/O bandwidth and system-level efficiency that are crucial for deployment of accelerators in ultra-low power devices. We present Hyperdrive: a BWN accelerator dramatically reducing the I/O bandwidth exploiting a novel binary-weight streaming approach, which can be used for arbitrarily sized convolutional neural network architecture and input resolution by exploiting the natural scalability of the compute units both at chip-level and system-level by arranging Hyperdrive chips systolically in a 2D mesh while processing the entire feature map together in parallel. Hyperdrive achieves 4.3 TOp/s/W system-level efficiency (i.e., including I/Os)---3.1x higher than state-of-the-art BWN accelerators, even if its core uses resource-intensive FP16 arithmetic for increased robustness

arXiv.org e-Print Archive

Archivio istituzionale della ricerca - Alma Mater Studiorum Università di Bologna

ACE16K: A 128×128 focal plane analog processor with digital I/O

Author: Domínguez Castro Rafael
Espejo Meana Servando Carlos
Liñán Cembrano Gustavo
Rodríguez Vázquez Ángel Benito
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2002
Field of study

This paper presents a new generation 128×128 focal-plane analog programmable array processor (FPAPAP), from a system level perspective, which has been manufactured in a 0.35 μm standard digital 1P-5M CMOS technology. The chip has been designed to achieve the high-speed and moderate-accuracy (8b) requirements of most real time early-vision processing applications. It is easily embedded in conventional digital hosting systems: external data interchange and control are completely digital. The chip contains close to four millions transistors, 90% of them working in analog mode, and exhibits a relatively low power consumption-<4 W, i.e. less than 1 μW per transistor. Computing vs. power peak values are in the order of 1 TeraOPS/W, while maintained VGA processing throughputs of 100 frames/s are possible with about 10-20 basic image processing tasks on each frame

idUS. Depósito de Investigación Universidad de Sevilla

Digital implementation of the cellular sensor-computers

Author: Benthien
Bolotski
Catrysse
Diamantaras
Domínguez-Castro
Dudek
Dudek
Dudek
El Gamal
El Gamal
Espejo
Foldesy
Gielen
Grossberg
Hamamoto
Herbordt
Johansson
Kleinfelder
Linan
Liu
Liñán
Liñán
Morris
Murray
Nayar
Ohta
Roska
Roska
Roska
Rudack
Schneider
Serra
Sinno
Tian
Wagner
Wanhammar
Zarándy
Publication venue: 'Wiley'
Publication date: 01/01/2006
Field of study

Two different kinds of cellular sensor-processor architectures are used nowadays in various applications. The first is the traditional sensor-processor architecture, where the sensor and the processor arrays are mapped into each other. The second is the foveal architecture, in which a small active fovea is navigating in a large sensor array. This second architecture is introduced and compared here. Both of these architectures can be implemented with analog and digital processor arrays. The efficiency of the different implementation types, depending on the used CMOS technology, is analyzed. It turned out, that the finer the technology is, the better to use digital implementation rather than analog

Crossref

SZTAKI Publication Repository

Repository of the Academy's Library

A scalable multi-core architecture with heterogeneous memory structures for Dynamic Neuromorphic Asynchronous Processors (DYNAPs)

Author: Indiveri Giacomo
Moradi Saber
Qiao Ning
Stefanini Fabio
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2017
Field of study

Neuromorphic computing systems comprise networks of neurons that use asynchronous events for both computation and communication. This type of representation offers several advantages in terms of bandwidth and power consumption in neuromorphic electronic systems. However, managing the traffic of asynchronous events in large scale systems is a daunting task, both in terms of circuit complexity and memory requirements. Here we present a novel routing methodology that employs both hierarchical and mesh routing strategies and combines heterogeneous memory structures for minimizing both memory requirements and latency, while maximizing programming flexibility to support a wide range of event-based neural network architectures, through parameter configuration. We validated the proposed scheme in a prototype multi-core neuromorphic processor chip that employs hybrid analog/digital circuits for emulating synapse and neuron dynamics together with asynchronous digital circuits for managing the address-event traffic. We present a theoretical analysis of the proposed connectivity scheme, describe the methods and circuits used to implement such scheme, and characterize the prototype chip. Finally, we demonstrate the use of the neuromorphic processor with a convolutional neural network for the real-time classification of visual symbols being flashed to a dynamic vision sensor (DVS) at high speed.Comment: 17 pages, 14 figure

arXiv.org e-Print Archive

ZORA

Locust-inspired vision system on chip architecture for collision detection in automotive applications

Author: Carranza González Luis
Cuadri Carvajo Jorge
Laviana Rubén
Liñán Cembrano Gustavo
Roca Moreno Elisenda
Rodríguez Vázquez Ángel Benito
Vargas Sierra Sonia
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

This paper describes a programmable digital computing architecture dedicated to process information in accordance to the organization and operating principles of the four-layer neuron structure encountered at the visual system of Locusts. This architecture takes advantage of the natural collision detection skills of locusts and is capable of processing images and ascertaining collision threats in real-time automotive scenarios. In addition to the Locust features, the architecture embeds a Topological Feature Estimator module to identify and classify objects in collision course.European Commission IST2001 - 38097Ministerio de Ciencia y Tecnología TIC2003 - 09817- C02 - 0

idUS. Depósito de Investigación Universidad de Sevilla

In the quest of vision-sensors-on-chip: Pre-processing sensors for data reduction

Author: Brea Sánchez Víctor
Carmona Galán Ricardo
Fernández Berni Jorge
Leñero Bardallo Juan Antonio
Rodríguez Vázquez Ángel Benito
Publication venue: 'Society for Imaging Science & Technology'
Publication date: 01/01/2017
Field of study

This paper shows that the implementation of vision systems benefits from the usage of sensing front-end chips with embedded pre-processing capabilities - called CVIS. Such embedded pre-processors reduce the number of data to be delivered for ulterior processing. This strategy, which is also adopted by natural vision systems, relaxes system-level requirements regarding data storage and communications and enables highly compact and fast vision systems. The paper includes several proof-o-concept CVIS chips with embedded pre-processing and illustrate their potential advantages. © 2017, Society for Imaging Science and Technology.Office of Naval Research (USA) N00014-14-1-0355Ministerio de Economía y Competitiviad TEC2015-66878-C3-1-R, TEC2015-66878-C3-3-RJunta de Andalucía 2012 TIC 233

idUS. Depósito de Investigación Universidad de Sevilla

CMOS Vision Sensors: Embedding Computer Vision at Imaging Front-Ends

Author: Carmona Galán Ricardo
Fernández Berni Jorge
Leñero Bardallo Juan Antonio
Rodríguez Vázquez Ángel Benito
Vornicu Ion
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2018
Field of study

CMOS Image Sensors (CIS) are key for imaging technol-ogies. These chips are conceived for capturing opticalscenes focused on their surface, and for delivering elec-trical images, commonly in digital format. CISs may incor-porate intelligence; however, their smartness basicallyconcerns calibration, error correction and other similartasks. The term CVISs (CMOS VIsion Sensors) definesother class of sensor front-ends which are aimed at per-forming vision tasks right at the focal plane. They havebeen running under names such as computational imagesensors, vision sensors and silicon retinas, among others. CVIS and CISs are similar regarding physical imple-mentation. However, while inputs of both CIS and CVISare images captured by photo-sensors placed at thefocal-plane, CVISs primary outputs may not be imagesbut either image features or even decisions based on thespatial-temporal analysis of the scenes. We may hencestate that CVISs are more “intelligent” than CISs as theyfocus on information instead of on raw data. Actually,CVIS architectures capable of extracting and interpretingthe information contained in images, and prompting reac-tion commands thereof, have been explored for years inacademia, and industrial applications are recently ramp-ing up.One of the challenges of CVISs architects is incorporat-ing computer vision concepts into the design flow. Theendeavor is ambitious because imaging and computervision communities are rather disjoint groups talking dif-ferent languages. The Cellular Nonlinear Network Univer-sal Machine (CNNUM) paradigm, proposed by Profs.Chua and Roska, defined an adequate framework forsuch conciliation as it is particularly well suited for hard-ware-software co-design [1]-[4]. This paper overviewsCVISs chips that were conceived and prototyped at IMSEVision Lab over the past twenty years. Some of them fitthe CNNUM paradigm while others are tangential to it. Allthem employ per-pixel mixed-signal processing circuitryto achieve sensor-processing concurrency in the quest offast operation with reduced energy budget.Junta de Andalucía TIC 2012-2338Ministerio de Economía y Competitividad TEC 2015-66878-C3-1-R y TEC 2015-66878-C3-3-

idUS. Depósito de Investigación Universidad de Sevilla

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

Author: Bouganis Christos-Savvas
Kouris Alexandros
Venieris Stylianos I.
Publication venue
Publication date: 19/02/2018
Field of study

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks. To accelerate the experimentation and development of CNNs, several software frameworks have been released, primarily targeting power-hungry CPUs and GPUs. In this context, reconfigurable hardware in the form of FPGAs constitutes a potential alternative platform that can be integrated in the existing deep learning ecosystem to provide a tunable balance between performance, power consumption and programmability. In this paper, a survey of the existing CNN-to-FPGA toolflows is presented, comprising a comparative study of their key characteristics which include the supported applications, architectural choices, design space exploration methods and achieved performance. Moreover, major challenges and objectives introduced by the latest trends in CNN algorithmic research are identified and presented. Finally, a uniform evaluation methodology is proposed, aiming at the comprehensive, complete and in-depth evaluation of CNN-to-FPGA toolflows.Comment: Accepted for publication at the ACM Computing Surveys (CSUR) journal, 201

arXiv.org e-Print Archive

Spiral - Imperial College Digital Repository