Search CORE

341 research outputs found

A template-based methodology for efficient microprocessor and FPGA accelerator co-design

Author: Athanasiou George
Catthoor Francky
Goutis Costas E.
Kelefouras Vasileios
Kritikakou Angeliki
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/07/2012
Field of study

Embedded applications usually require Software/Hardware (SW/HW) designs to meet the hard timing constraints and the required design flexibility. Exhaustive exploration for SW/HW designs is a very time consuming task, while the adhoc approaches and the use of partially automatic tools usually lead to less efficient designs. To support a more efficient codesign process for FPGA platforms we propose a systematic methodology to map an application to SW/HW platform with a custom HW accelerator and a microprocessor core. The methodology mapping steps are expressed through parametric templates for the SW/HW Communication Organization, the Foreground (FG) Memory Management and the Data Path (DP) Mapping. Several performance-area tradeoff design Pareto points are produced by instantiating the templates. A real-time bioimaging application is mapped on a FPGA to evaluate the gains of our approach, i.e. 44,8% on performance compared with pure SW designs and 58% on area compared with pure HW designs

Lirias

Crossref

Sheffield Hallam University Research Archive

Online signature verification systems on a low-cost FPGA

Author: Cantó Navarro Enrique
López García Mariano
Ramos Lara Rafael Ramón
Publication venue: 'MDPI AG'
Publication date: 01/12/2021
Field of study

This paper describes three different approaches for the implementation of an online signature verification system on a low-cost FPGA. The system is based on an algorithm, which operates on real numbers using the double-precision floating-point IEEE 754 format. The doubleprecision computations are replaced by simpler formats, without affecting the biometrics performance, in order to permit efficient implementations on low-cost FPGA families. The first approach is an embedded system based on MicroBlaze, a 32-bit soft-core microprocessor designed for Xilinx FPGAs, which can be configured by including a single-precision floating-point unit (FPU). The second implementation attaches a hardware accelerator to the embedded system to reduce the execution time on floating-point vectors. The last approach is a custom computing system, which is built from a large set of arithmetic circuits that replace the floating-point data with a more efficient representation based on fixed-point format. The latter system provides a very high runtime acceleration factor at the expense of using a large number of FPGA resources, a complex development cycle and no flexibility since it cannot be adapted to other biometric algorithms. By contrast, the first system provides just the opposite features, while the second approach is a mixed solution between both of them. The experimental results show that both the hardware accelerator and the custom computing system reduce the execution time by a factor ×7.6 and ×201 but increase the logic FPGA resources by a factor ×2.3 and ×5.2, respectively, in comparison with the MicroBlaze embedded system.This research was funded by Spanish MCIN/AEI/10.13039/501100011033, grant number PID2019-107274RB-I00.Peer ReviewedPostprint (published version

Multidisciplinary Digital Publishing Institute

UPCommons. Portal del coneixement obert de la UPC

Directory of Open Access Journals

ASAM : Automatic Architecture Synthesis and Application Mapping; dl. 2: Final design methodology, flow, and tool requirements

Author: Cocco M.
Corvino R.
Jordans R.
Jozwiak L.
Kienhuis B.
Lindwer M.
Madsen J.
Meloni P.
Notarangelo G.
Raffo L.
Publication venue: 'Anadolu Universitesi Bilim ve Teknoloji Dergisi C : Yasam Bilimleri ve Biyoteknoloji'
Publication date: 01/01/2011
Field of study

Repository TU/e

Pure OAI Repository

Lessons Learned from Designing the Montium - a Coarse-Grained Reconfigurable Processing Tile

Author: Heysters Paul M.
Molenkamp Bert
Rosien Michel A.J.
Smit Gerard J.M.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2004
Field of study

In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wirelessdevices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project

University of Twente Research Information

Coarse-grained reconfigurable array architectures

Author: A Lambrechts
B Bougard
B Bougard
B Mei
B Mei
B Mei
B Sutter De
G Venkataramani
H Park
H Park
J Lee
JMP Cardoso
JW Waerdt van de
K Berkel van
K Bondalapati
K Sankaralingam
KE Coons
LH Lee
M Ahn
M Gebhart
M Schlansker
M Taylor
M Woh
MD Galanis
MH Lee
S Friedman
SA Mahlke
T Oh
Y Kim
Y Kim
Y Kim
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2010
Field of study

Coarse-Grained Reconﬁgurable Array (CGRA) architectures accelerate the same inner loops that beneﬁt from the high ILP support in VLIW architectures. By executing non-loop code on other cores, however, CGRAs can focus on such loops to execute them more efﬁciently. This chapter discusses the basic principles of CGRAs, and the wide range of design options available to a CGRA designer, covering a large number of existing CGRA designs. The impact of different options on ﬂexibility, performance, and power-efﬁciency is discussed, as well as the need for compiler support. The ADRES CGRA design template is studied in more detail as a use case to illustrate the need for design space exploration, for compiler support and for the manual ﬁne-tuning of source code

Crossref

Ghent University Academic Bibliography

ASC: A stream compiler for computing with FPGAs

Author: Mencer O
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2006
Field of study

Published versio

CiteSeerX

Crossref

Spiral - Imperial College Digital Repository

The Chameleon project in retrospective

Author: Heysters Paul M.
Molenkamp Bert
Smit Gerard J.M.
Publication venue: STW Technology Foundation
Publication date: 01/01/2004
Field of study

In this paper we describe in retrospective the main results of a four year project, called Chameleon. As part of this project we developed a coarse-grained reconfigurable core for DSP algorithms in wireless devices denoted MONTIUM. After presenting the main achievements within this project we present the lessons learned from this project

University of Twente Research Information

Synthesis of application specific processor architectures for ultra-low energy consumption

Author: Kazmierski T J
Leech Charles
Publication venue
Publication date: 12/02/2014
Field of study

In this paper we suggest that further energy savings can be achieved by a new approach to synthesis of embedded processor cores, where the architecture is tailored to the algorithms that the core executes. In the context of embedded processor synthesis, both single-core and many-core, the types of algorithms and demands on the execution efficiency are usually known at the chip design time. This knowledge can be utilised at the design stage to synthesise architectures optimised for energy consumption. Firstly, we present an overview of both traditional energy saving techniques and new developments in architectural approaches to energy-efficient processing. Secondly, we propose a picoMIPS architecture that serves as an architectural template for energy-efficient synthesis. As a case study, we show how the picoMIPS architecture can be tailored to an energy efficient execution of the DCT algorithm

Southampton (e-Prints Soton)

Performance and area evaluations of processor-based benchmarks on FPGA devices

Author: Jiunn-Tyng Kao (7215932)
Publication venue
Publication date: 01/01/2014
Field of study

The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

Loughborough University Institutional Repository

Hardware-software co-design of an iris recognition algorithm

Author: Canto Navarro Enrique Fernando
Daugman J.
López García Mariano
Publication venue: 'Institution of Engineering and Technology (IET)'
Publication date: 01/01/2011
Field of study

This paper describes the implementation of an iris recognition algorithm based on hardware-software co-design. The system architecture consists of a general-purpose 32- bit microprocessor and several slave coprocessors that accelerate the most intensive calculations. The whole iris recognition algorithm has been implemented on a low-cost Spartan 3 FPGA, achieving significant reduction in execution time when compared to a conventional software-based application. Experimental results show that with a clock speed of 40 MHz, an IrisCode is obtained in less than 523 ms from an image of 640x480 pixels, which is just 20% of the total time needed by a software solution running on the same microprocessor embedded in the architecture.Peer ReviewedPreprin

LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas

Crossref

UPCommons. Portal del coneixement obert de la UPC