Search CORE

13,159 research outputs found

A general framework for efficient FPGA implementation of matrix product

Author: Amira A.
Bensaali F.
Sotudeh R.
Publication venue
Publication date: 01/01/2007
Field of study

Original article can be found at: http://www.medjcn.com/ Copyright Softmotor LimitedHigh performance systems are required by the developers for fast processing of computationally intensive applications. Reconfigurable hardware devices in the form of Filed-Programmable Gate Arrays (FPGAs) have been proposed as viable system building blocks in the construction of high performance systems at an economical price. Given the importance and the use of matrix algorithms in scientific computing applications, they seem ideal candidates to harness and exploit the advantages offered by FPGAs. In this paper, a system for matrix algorithm cores generation is described. The system provides a catalog of efficient user-customizable cores, designed for FPGA implementation, ranging in three different matrix algorithm categories: (i) matrix operations, (ii) matrix transforms and (iii) matrix decomposition. The generated core can be either a general purpose or a specific application core. The methodology used in the design and implementation of two specific image processing application cores is presented. The first core is a fully pipelined matrix multiplier for colour space conversion based on distributed arithmetic principles while the second one is a parallel floating-point matrix multiplier designed for 3D affine transformations.Peer reviewe

University of Hertfordshire Research Archive

Design and construction of a configurable full-field range imaging system for mobile robotic applications

Author: Carnegie Dale A.
Dorrington Adrian A.
Drayton B.
Jongenelen Adrian P.P.
McClymont J.R.K.
Payne Andrew D.
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2011
Field of study

Mobile robotic devices rely critically on extrospection sensors to determine the range to objects in the robot’s operating environment. This provides the robot with the ability both to navigate safely around obstacles and to map its environment and hence facilitate path planning and navigation. There is a requirement for a full-field range imaging system that can determine the range to any obstacle in a camera lens’ field of view accurately and in real-time. This paper details the development of a portable full-field ranging system whose bench-top version has demonstrated sub-millimetre precision. However, this precision required non-real-time acquisition rates and expensive hardware. By iterative replacement of components, a portable, modular and inexpensive version of this full-field ranger has been constructed, capable of real-time operation with some (user-defined) trade-off with precision

Crossref

Research Commons@Waikato

Parametric, Secure and Compact Implementation of RSA on FPGA

Author: Oksuzoglu Ersin
Savas Erkay
Savaş Erkay
Öksüzoğlu Ersin
Publication venue: IEEE Computer Society
Publication date: 09/09/2008
Field of study

We present a fast, efficient, and parameterized modular multiplier and a secure exponentiation circuit especially intended for FPGAs on the low end of the price range. The design utilizes dedicated block multipliers as the main functional unit and Block-RAM as storage unit for the operands. The adopted design methodology allows adjusting the number of multipliers, the radix used in the multipliers, and number of words to meet the system requirements such as available resources, precision and timing constraints. The architecture, based on the Montgomery modular multiplication algorithm, utilizes a pipelining technique that allows concurrent operation of hardwired multipliers. Our design completes 1020-bit and 2040-bit modular multiplications in 7.62 μs and 27.0 μs, respectively. The multiplier uses a moderate amount of system resources while achieving the best area-time product in literature. 2040-bit modular exponentiation engine can easily fit into Xilinx Spartan-3E 500; moreover the exponentiation circuit withstands known side channel attacks

Sabanci University Research Database

Efficient Neural Network Implementations on Parallel Embedded Platforms Applied to Real-Time Torque-Vectoring Optimization Using Predictions for Multi-Motor Electric Vehicles

Author: Cosco Francesco
Dendaluce Jahnke Martin
Gomez-Garay Vicente
Novickis Rihards
Pérez Rastelli Joshué
Publication venue: 'MDPI AG'
Publication date: 01/01/2019
Field of study

The combination of machine learning and heterogeneous embedded platforms enables new potential for developing sophisticated control concepts which are applicable to the field of vehicle dynamics and ADAS. This interdisciplinary work provides enabler solutions -ultimately implementing fast predictions using neural networks (NNs) on field programmable gate arrays (FPGAs) and graphical processing units (GPUs)- while applying them to a challenging application: Torque Vectoring on a multi-electric-motor vehicle for enhanced vehicle dynamics. The foundation motivating this work is provided by discussing multiple domains of the technological context as well as the constraints related to the automotive field, which contrast with the attractiveness of exploiting the capabilities of new embedded platforms to apply advanced control algorithms for complex control problems. In this particular case we target enhanced vehicle dynamics on a multi-motor electric vehicle benefiting from the greater degrees of freedom and controllability offered by such powertrains. Considering the constraints of the application and the implications of the selected multivariable optimization challenge, we propose a NN to provide batch predictions for real-time optimization. This leads to the major contribution of this work: efficient NN implementations on two intrinsically parallel embedded platforms, a GPU and a FPGA, following an analysis of theoretical and practical implications of their different operating paradigms, in order to efficiently harness their computing potential while gaining insight into their peculiarities. The achieved results exceed the expectations and additionally provide a representative illustration of the strengths and weaknesses of each kind of platform. Consequently, having shown the applicability of the proposed solutions, this work contributes valuable enablers also for further developments following similar fundamental principles.Some of the results presented in this work are related to activities within the 3Ccar project, which has received funding from ECSEL Joint Undertaking under grant agreement No. 662192. This Joint Undertaking received support from the European Union’s Horizon 2020 research and innovation programme and Germany, Austria, Czech Republic, Romania, Belgium, United Kingdom, France, Netherlands, Latvia, Finland, Spain, Italy, Lithuania. This work was also partly supported by the project ENABLES3, which received funding from ECSEL Joint Undertaking under grant agreement No. 692455-2

Multidisciplinary Digital Publishing Institute

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Directory of Open Access Journals

TECNALIA Publications

SVITE: A Spike-Based VITE Neuro-Inspired Robot Controller

Author: Domínguez Morales Manuel Jesús
Jiménez Fernández Ángel Francisco
Linares Barranco Alejandro
Morgado Estévez Arturo
Pérez Peña Fernando
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2013
Field of study

This paper presents an implementation of a neuro-inspired algorithm called VITE (Vector Integration To End Point) in FPGA in the spikes domain. VITE aims to generate a non-planned trajectory for reaching tasks in robots. The algorithm has been adapted to work completely in the spike domain under Simulink simulations. The FPGA implementation consists in 4 VITE in parallel for controlling a 4-degree-of-freedom stereo-vision robot. This work represents the main layer of a complex spike-based architecture for robot neuro-inspired reaching tasks in FPGAs. It has been implemented in two Xilinx FPGA families: Virtex-5 and Spartan-6. Resources consumption comparative between both devices is presented. Results obtained for Spartan device could allow controlling complex robotic structures with up to 96 degrees of freedom per FPGA, providing, in parallel, high speed connectivity with other neuromorphic systems sending movement references. An exponential and gamma distribution test over the inter spike interval has been performed to proof the approach to the neural code proposed.Ministerio de Economía y Competitividad TEC2012-37868-C04-0

idUS. Depósito de Investigación Universidad de Sevilla

Digital signal processing: the impact of convergence on education, society and design flow

Author: Andrew Hunter
Appiah Kofi
Publication venue: 'Common Ground Research Networks'
Publication date: 01/01/2005
Field of study

Design and development of real-time, memory and processor hungry digital signal processing systems has for decades been accomplished on general-purpose microprocessors. Increasing needs for high-performance DSP systems made these microprocessors unattractive for such implementations. Various attempts to improve the performance of these systems resulted in the use of dedicated digital signal processing devices like DSP processors and the former heavyweight champion of electronics design – Application Specific Integrated Circuits. The advent of RAM-based Field Programmable Gate Arrays has changed the DSP design flow. Software algorithmic designers can now take their DSP algorithms right from inception to hardware implementation, thanks to the increasing availability of software/hardware design flow or hardware/software co-design. This has led to a demand in the industry for graduates with good skills in both Electrical Engineering and Computer Science. This paper evaluates the impact of technology on DSP-based designs, hardware design languages, and how graduate/undergraduate courses have changed to suit this transition

University of Lincoln Institutional Repository

Nottingham Trent Institutional Repository (IRep)

An area-efficient 2-D convolution implementation on FPGA for space applications

Author: Di Carlo Stefano
Gambardella Giulio
Indaco Marco
Prinetto Paolo Ernesto
Rolfo Daniele
Tiotto Gabriele
Publication venue: IEEE Computer Society
Publication date: 01/01/2011
Field of study

The 2-D Convolution is an algorithm widely used in image and video processing. Although its computation is simple, its implementation requires a high computational power and an intensive use of memory. Field Programmable Gate Arrays (FPGA) architectures were proposed to accelerate calculations of 2-D Convolution and the use of buffers implemented on FPGAs are used to avoid direct memory access. In this paper we present an implementation of the 2-D Convolution algorithm on a FPGA architecture designed to support this operation in space applications. This proposed solution dramatically decreases the area needed keeping good performance, making it appropriate for embedded systems in critical space application

Crossref

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)

PORTO Publications Open Repository TOrino

A low cost reconfigurable soft processor for multimedia applications: design synthesis and programming model

Author: Chalamalasetti S.R.
Margala M.
Purohit S.
Vanderbauwhede W.
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2009
Field of study

This paper presents an FPGA implementation of a low cost 8 bit reconfigurable processor core for media processing applications. The core is optimized to provide all basic arithmetic and logic functions required by the media processing and other domains, as well as to make it easily integrable into a 2D array. This paper presents an investigation of the feasibility of the core as a potential soft processing architecture for FPGA platforms. The core was synthesized on the entire Virtex FPGA family to evaluate its overall performance, scalability and portability. A special feature of the proposed architecture is its simple programming model which allows low level programming. Throughput results for popular benchmarks coded using the programming model and cycle accurate simulator are presented

Crossref

Enlighten