Search CORE

550 research outputs found

Hardware Acceleration in Image Stitching: GPU vs FPGA

Author: Edgcombe Joshua David
Publication venue: ScholarWorks@GVSU
Publication date: 26/07/2021
Field of study

Image stitching is a process where two or more images with an overlapping field of view are combined. This process is commonly used to increase the field of view or image quality of a system. While this process is not particularly difficult for modern personal computers, hardware acceleration is often required to achieve real-time performance in low-power image stitching solutions. In this thesis, two separate hardware accelerated image stitching solutions are developed and compared. One solution is accelerated using a Xilinx Zynq UltraScale+ ZU3EG FPGA and the other solution is accelerated using an Nvidia RTX 2070 Super GPU. The image stitching solutions implemented in this paper increase the system’s field of view and involve the end-to-end process of feature detection, image registration, and image mixing. The latency, resource utilization, and power consumption for the accelerated portions of each system are compared and each systems tradeoffs and use cases are considered

Accelerating Pattern Recognition Algorithms On Parallel Computing Architectures

Author: Rice Kenneth
Publication venue: Clemson University Libraries
Publication date: 01/12/2011
Field of study

The move to more parallel computing architectures places more responsibility on the programmer to achieve greater performance. The programmer must now have a greater understanding of the underlying architecture and the inherent algorithmic parallelism. Using parallel computing architectures for exploiting algorithmic parallelism can be a complex task. This dissertation demonstrates various techniques for using parallel computing architectures to exploit algorithmic parallelism. Specifically, three pattern recognition (PR) approaches are examined for acceleration across multiple parallel computing architectures, namely field programmable gate arrays (FPGAs) and general purpose graphical processing units (GPGPUs). Phase-only filter correlation for fingerprint identification was studied as the first PR approach. This approach\u27s sensitivity to angular rotations, scaling, and missing data was surveyed. Additionally, a novel FPGA implementation of this algorithm was created using fixed point computations, deep pipelining, and four computation phases. Communication and computation were overlapped to efficiently process large fingerprint galleries. The FPGA implementation showed approximately a 47 times speedup over a central processing unit (CPU) implementation with negligible impact on precision. For the second PR approach, a spiking neural network (SNN) algorithm for a character recognition application was examined. A novel FPGA implementation of the approach was developed incorporating a scalable modular SNN processing element (PE) to efficiently perform neural computations. The modular SNN PE incorporated streaming memory, fixed point computation, and deep pipelining. This design showed speedups of approximately 3.3 and 8.5 times over CPU implementations for 624 and 9,264 sized neural networks, respectively. Results indicate that the PE design could scale to process larger sized networks easily. Finally for the third PR approach, cellular simultaneous recurrent networks (CSRNs) were investigated for GPGPU acceleration. Particularly, the applications of maze traversal and face recognition were studied. Novel GPGPU implementations were developed employing varying quantities of task-level, data-level, and instruction-level parallelism to achieve efficient runtime performance. Furthermore, the performance of the face recognition application was examined across a heterogeneous cluster of multi-core and GPGPU architectures. A combination of multi-core processors and GPGPUs achieved roughly a 996 times speedup over a single-core CPU implementation. From examining these PR approaches for acceleration, this dissertation presents useful techniques and insight applicable to other algorithms to improve performance when designing a parallel implementation

Exploiting run-time reconfigurable hardware in the development of automatic fingerprint-based personal recognition applications

Author: Francisco Fons
Mariano Fons
Publication venue: 'IntechOpen'
Publication date: 27/07/2011
Field of study

Starlight: A kernel optimizer for GPU processing

Author: Alberto Zeni
Davide Conficconi
Eleonora D'Arnese
Emanuele del Sozzo
Marco Domenico Santambrogio
Publication venue
Publication date: 01/01/2024
Field of study

Archivio istituzionale della ricerca - Politecnico di Milano

Low power and high performance heterogeneous computing on FPGAs

Author: Ma Liang
Publication venue: Politecnico di Torino
Publication date
Field of study

L'abstract è presente nell'allegato / the abstract is in the attachmen

PORTO@iris (Publications Open Repository TOrino - Politecnico di Torino)