Search CORE

75 research outputs found

FPGA-based module for SURF extraction

Author: H Bay
Jan Šváb
K Mikolajczyk
Libor Přeučil
Petr Čížek
Sol Pedre
Tomáš Krajník
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 27/02/2014
Field of study

We present a complete hardware and software solution of an FPGA-based computer vision embedded module capable of carrying out SURF image features extraction algorithm. Aside from image analysis, the module embeds a Linux distribution that allows to run programs specifically tailored for particular applications. The module is based on a Virtex-5 FXT FPGA which features powerful configurable logic and an embedded PowerPC processor. We describe the module hardware as well as the custom FPGA image processing cores that implement the algorithm's most computationally expensive process, the interest point detection. The module's overall performance is evaluated and compared to CPU and GPU based solutions. Results show that the embedded module achieves comparable disctinctiveness to the SURF software implementation running in a standard CPU while being faster and consuming significantly less power and space. Thus, it allows to use the SURF algorithm in applications with power and spatial constraints, such as autonomous navigation of small mobile robots

University of Lincoln Institutional Repository

Crossref

Recommended from our members

Energy Efficient Loop Unrolling for Low-Cost FPGAs

Author: Dumpala Naveen Kumar
Publication venue: ScholarWorks@UMass Amherst
Publication date: 27/10/2017
Field of study

Many embedded applications implement block ciphers and sorting and searching algorithms which use multiple loop iterations for computation. These applications often demand low power operation. The power consumption of designs varies with the implementation choices made by designers. The sequential implementation of loop operations consumes minimal area, but latency and clock power are high. Alternatively, loop unrolling causes high glitch power. In this work, we propose a low area overhead approach for unrolling loop iterations that exhibits reduced glitch power. A latch based glitch filter is introduced that reduces the propagation of glitches from one iteration to next. We explore the optimal number of filters to be inserted for different applications that give a good balance between area and power. We also implement partial unrolling with glitch filters. This approach consumes less area while still giving energy savings comparable to the fully unrolled implementation. Our approach is targeted to Xilinx and Altera FPGAs. We simulate different implementation choices and compare energy results to evaluate the savings. We demonstrate our approach on SIMON-128 and AES-256 block ciphers and a sorting algorithm. We prototype our design on Xilinx Artix-7 and Altera Cyclone-IV-GX FPGA development boards and measure the actual power savings. Results show up-to 90% dynamic energy reduction in Xilinx designs, and 97% reduction in Altera designs with our glitch filtering approach due to glitch power reduction

ScholarWorks@UMass Amherst

Performance and area evaluations of processor-based benchmarks on FPGA devices

Author: Jiunn-Tyng Kao (7215932)
Publication venue
Publication date: 01/01/2014
Field of study

The computing system on SoCs is being long-term research since the FPGA technology has emerged due to its personality of re-programmable fabric, reconfigurable computing, and fast development time to market. During the last decade, uni-processor in a SoC is no longer to deal with the high growing market for complex applications such as Mobile Phones audio and video encoding, image and network processing. Due to the number of transistors on a silicon wafer is increasing, the recent FPGAs or embedded systems are advancing toward multi-processor-based design to meet tremendous performance and benefit this kind of systems are possible. Therefore, is an upcoming age of the MPSoC. In addition, most of the embedded processors are soft-cores, because they are flexible and reconfigurable for specific software functions and easy to build homogenous multi-processor systems for parallel programming. Moreover, behavioural synthesis tools are becoming a lot more powerful and enable to create datapath of logic units from high-level algorithms such as C to HDL and available for partitioning a HW/SW concurrent methodology. A range of embedded processors is able to implement on a FPGA-based prototyping to integrate the CPUs on a programmable device. This research is, firstly represent different types of computer architectures in modern embedded processors that are followed in different type of software applications (eg. Multi-threading Operations or Complex Functions) on FPGA-based SoCs; and secondly investigate their capability by executing a wide-range of multimedia software codes (Integer-algometric only) in different models of the processor-systems (uni-processor or multi-processor or Co-design), and finally compare those results in terms of the benchmarks and resource utilizations within FPGAs. All the examined programs were written in standard C and executed in a variety numbers of soft-core processors or hardware units to obtain the execution times. However, the number of processors and their customizable configuration or hardware datapath being generated are limited by a target FPGA resource, and designers need to understand the FPGA-based tradeoffs that have been considered - Speed versus Area. For this experimental purpose, I defined benchmarks into DLP / HLS catalogues, which are "data" and "function" intensive respectively. The programs of DLP will be executed in LEON3 MP and LE1 CMP multi-processor systems and the programs of HLS in the LegUp Co-design system on target FPGAs. In preliminary, the performance of the soft-core processors will be examined by executing all the benchmarks. The whole story of this thesis work centres on the issue of the execute times or the speed-up and area breakdown on FPGA devices in terms of different programs

Loughborough University Institutional Repository

Comparison of different design alternatives for hardware-in-the-loop of power converters

Author: de Castro Angel
Martínez-García Maria Sofia
Sanchez Alberto
Yushkova Marina
Zamiri Elyas
Publication venue: 'MDPI AG'
Publication date: 13/04/2021
Field of study

This paper aims to compare different design alternatives of hardware-in-the-loop (HIL) for emulating power converters in Field Programmable Gate Arrays (FPGAs). It proposes various numerical formats (fixed and floating-point) and different approaches (pure VHSIC Hardware Description Language (VHDL), Intellectual Properties (IPs), automated MATLAB HDL code, and High-Level Synthesis (HLS)) to design power converters. Although the proposed models are simple power electronics HIL systems, the idea can be extended to any HIL system. This study compares the design effort of different coding methods and numerical formats considering possible synthesis tools (Precision and Vivado), and it comprises an analytical discussion in terms of area and speed. The different models are synthesized as ad-hoc modules in general-purpose FPGAs, but also using the NI myRIO device as an example of a commercial tool capable of implementing HIL models. The comparison confirms that the optimum design alternative must be chosen based on the application (complexity, frequency, etc.) and designers’ constraints, such as available area, coding expertise, and design effor

Multidisciplinary Digital Publishing Institute

Biblos-e Archivo

A Methodology to Design Pipelined Simulated Annealing Kernel Accelerators on Space-Borne Field-Programmable Gate Arrays

Author: Carver Jeffrey Michael
Publication venue: DigitalCommons@USU
Publication date: 01/05/2009
Field of study

Increased levels of science objectives expected from spacecraft systems necessitate the ability to carry out fast on-board autonomous mission planning and scheduling. Heterogeneous radiation-hardened Field Programmable Gate Arrays (FPGAs) with embedded multiplier and memory modules are well suited to support the acceleration of scheduling algorithms. A methodology to design circuits specifically to accelerate Simulated Annealing Kernels (SAKs) in event scheduling algorithms is shown. The main contribution of this thesis is the low complexity scoring calculation used for the heuristic mapping algorithm used to balance resource allocation across a coarse-grained pipelined data-path. The methodology was exercised over various kernels with different cost functions and problem sizes. These test cases were benchedmarked for execution time, resource usage, power, and energy on a Xilinx Virtex 4 LX QR 200 FPGA and a BAE RAD 750 microprocessor

DigitalCommons@USU

A Preliminary FPGA Implementation and Analysis of Phatak’s Quotient-First Scaling Algorithm in the Reduced-Precision Residue Number System

Author: Alan T. Sherman
Christopher D. Nguyen
Dhananjay S. Phatak
Steven D. Houston
Publication venue: International Association for Cryptologic Research (IACR)
Publication date: 25/12/2014
Field of study

We built and tested the first hardware implementation of Phatak’s Quotient-First Scaling (QFS) algorithm in the reduced-precision residue number system (RP-RNS). This algorithm is designed to expedite division in the Residue Number System for the special case when the divisor is known ahead of time (i.e., when the divisor can be considered to be a constant, as in the modular exponentiation required for the RSA encryption/decryption). We implemented the QFS algorithm using an FPGA and tested it for operand lengths up to 1024 bits. The RP-RNS modular exponentiation algorithm is not based on Montgomery’s method, but on quotient estimation derived from the straightforward division algorithm, with substantial amount of precomputations whose results are read from look-up tables at run-time. Phatak’s preliminary analysis indicates that under reasonable assumptions about hardware capabilities, a single modular multiplication’s (or QFS’s) execution time grows logarithmically with respect to the operand word length. We experimentally confirmed this predicted growth rate of the delay of a modular multiplication with our FPGA implementation. Though our implementation did not outperform the most recent implementations such as that by Gandino, et al., we determined that this outcome was solely a consequence of tradeoffs stemming from our decision to store the lookup tables on the FPGA

Cryptology ePrint Archive

An FPGA architecture design of a high performance adaptive notch filter

Author: Szalkowski Michael James
Publication venue: RIT Scholar Works
Publication date: 01/06/2008
Field of study

The occurrence of narrowband interference near frequencies carrying information is a common problem in modern control and signal processing applications. A very narrow notch filter is required in order to remove the unwanted signal while not compromising the integrity of the carrier signal. In many practical situations, the interference may wander within a frequency band, in which case a wider notch filter would be needed to guarantee its removal, which may also allow for the degradation of information being carried in nearby frequencies. If the interference frequency could be autonomously tracked, a narrow bandwidth notch filter could be successfully implemented for the particular frequency. Adaptive signal processing is a powerful technique that can be used in the tracking and elimination of such a signal. An application where an adaptive notch filter becomes necessary is in biomedical instrumentation, such as the electrocardiogram recorder. The recordings can become useless when in the presence of electromagnetic fields generated by power lines. Research was conducted to fully characterize the interference. Research on notch filter structures and adaptive filter algorithms has been carried out. The lattice form filter structure was chosen for its inherent stability and performance benefits. A new adaptive filter algorithm was developed targeting a hardware implementation. The algorithm used techniques from several other algorithms that were found to be beneficial. This work developed the hardware implementation of a lattice form adaptive notch filter to be used for the removal of power line interference from electrocardiogram signals. The various design tradeo s encountered were documented. The final design was targeted toward multiple field programmable gate arrays using multiple optimization efforts. Those results were then compared. The adaptive notch filter was able to successfully track and remove the interfering signal. The lattice form structure utilized by the proposed filter was verified to exhibit an inherently stable realization. The filter was subjected to various environments that modeled the different power line disturbances that could be present. The final filter design resulted in a 3 dB bandwidth of 15.8908 Hz, and a null depth of 54 dB. For the baseline test case, the algorithm achieved convergence after 270 iterations. The final hardware implementation was successfully verified against the MATLAB simulation results. A speedup of 3.8 was seen between the Xilinx Virtex-5 and Spartan-II device technologies. The final design used a small fraction of the available resources for each of the two devices that were characterized. This would allow the component to be more readily available to be added to existing projects, or further optimized by utilizing additional logic

RIT Scholar Works

Hardware implementation of naive bayes classifier for malware detection

Author: Al Hussein Yahya Khaled
Publication venue
Publication date: 01/01/2021
Field of study

Naïve bayes classifier is a probabilistic supervised machine learning algorithm, that can be launched on most general-purpose devices to solve wide range of classification problems. However, when it comes to real time applications, the general-purpose devices are limited in term of their computational throughput, thus this algorithm couldn’t be used for that purpose. The aim of this project is to accelerate this algorithm in hardware environment to improve its performance by exploring its hidden concurrency and map it into parallel hardware as an optimized IP package, suitable for FPGA-SoC applications. Thus, it could be used as a middle box system for real time malware detection. In order for the proposed hardware to meet the requirements of this research, it should be able to handle both training, and inference part in hardware, and also should be able to receive a flow of 20 features, each of 32-bitsize, organized in 4-gram format. To meet these requirements, an enhanced version of the algorithm was developed and tested in Cprogramming. Then an equivalent design with a 5-stages pipelined architecture, and single instruction multiple data capabilities, was built in hardware to address the case. At the end, the proposed hardware found to be 65 times faster in term of its computational throughput compared to an existing design, and that with keeping the accuracy level as high as 94%, under the conditions of experiment carried

Universiti Teknologi Malaysia Institutional Repository