1,358 research outputs found
High throughput spatial convolution filters on FPGAs
Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility
Image Processing Using FPGAs
This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs
Hardware/software 2D-3D backprojection on a SoPC platform
International audienceThe reduction of image reconstruction time is needed to spread the use of PET for research and routine clinical practice. In this purpose, this article presents a hardware/software architecture for the acceleration of 3D backprojection based upon an efficient 2D backprojection. This architecture has been designed in order to provide a high level of parallelism thanks to an efficient management of the memory accesses which would have been otherwise strongly slowed by the external memory. The reconstruction system is embedded in a SoPC platform (System on Programmable Chip), the new generation of reconfigurable circuit. The originality of this architecture comes from the design of a 2D Adaptative and Predictive Cache (2D-AP Cache) which has proved to be an efficient way to overcome the memory access bottleneck. Thanks to a hierarchical use of this cache, several backprojection operators can run in parallel, accelerating in this manner noteworthy the reconstruction process. This 2D reconstruction system will next be used to speed up 3D image reconstruction
Realtime image noise reduction FPGA implementation with edge detection
The purpose of this dissertation was to develop and implement, in a Field
Programmable Gate Array (FPGA), a noise reduction algorithm for real-time
sensor acquired images. A Moving Average filter was chosen due to its
fulfillment of a low demanding computational expenditure nature, speed, good
precision and low to medium hardware resources utilization. The technique is
simple to implement, however, if all pixels are indiscriminately filtered, the result
will be a blurry image which is undesirable.
Since human eye is more sensitive to contrasts, a technique was
introduced to preserve sharp contour transitions which, in the author’s opinion,
is the dissertation contribution. Synthetic and real images were tested.
Synthetic, composed both with sharp and soft tone transitions, were generated
with a developed algorithm, while real images were captured with an 8-kbit
(8192 shades) high resolution sensor scaled up to 10 × 103 shades.
A least-squares polynomial data smoothing filter, Savitzky-Golay, was
used as comparison. It can be adjusted using 3 degrees of freedom ─ the
window frame length which varies the filtering relation size between pixels’
neighborhood, the derivative order, which varies the curviness and the
polynomial coefficients which change the adaptability of the curve. Moving
Average filter only permits one degree of freedom, the window frame length.
Tests revealed promising results with 2 and 4â„Ž polynomial orders. Higher
qualitative results were achieved with Savitzky-Golay’s better signal
characteristics preservation, especially at high frequencies.
FPGA algorithms were implemented in 64-bit integer registers serving
two purposes: increase precision, hence, reducing the error comparatively as if
it were done in floating-point registers; accommodate the registers’ growing
cumulative multiplications. Results were then compared with MATLAB’s double
precision 64-bit floating-point computations to verify the error difference
between both. Used comparison parameters were Mean Squared Error, Signalto-Noise Ratio and Similarity coefficient.O objetivo desta dissertação foi desenvolver e implementar, em FPGA,
um algoritmo de redução de ruÃdo para imagens adquiridas em tempo real.
Optou-se por um filtro de Média Deslizante por não exigir uma elevada
complexidade computacional, ser rápido, ter boa precisão e requerer moderada
utilização de recursos. A técnica é simples, mas se abordada como filtragem
monotónica, o resultado é uma indesejável imagem desfocada.
Dado o olho humano ser mais sensÃvel ao contraste, introduziu-se uma
técnica para preservar os contornos que, na opinião do autor, é a sua principal
contribuição. Utilizaram-se imagens sintéticas e reais nos testes. As sintéticas,
compostas por fortes e suaves contrastes foram geradas por um algoritmo
desenvolvido. As reais foram capturadas com um sensor de alta resolução de
8-kbit (8192 tons) e escalonadas a 10 × 103 tons.
Um filtro com suavização polinomial de mÃnimos quadrados, SavitzkyGolay, foi usado como comparação. Possui 3 graus de liberdade: o tamanho da
janela, que varia o tamanho da relação de filtragem entre os pixels vizinhos; a
ordem da derivada, que varia a curvatura do filtro e os coeficientes polinomiais,
que variam a adaptabilidade da curva aos pontos a suavizar. O filtro de Média
Deslizante é apenas ajustável no tamanho da janela. Os testes revelaram-se
promissores nas 2ª e 4ª ordens polinomiais. Obtiveram-se resultados
qualitativos com o filtro Savitzky-Golay que detém melhores caracterÃsticas na
preservação do sinal, especialmente em altas frequências.
Os algoritmos em FPGA foram implementados em registos de vÃrgula
fixa de 64-bits, servindo dois propósitos: aumentar a precisão, reduzindo o erro
comparativamente ao terem sido em vÃrgula flutuante; acomodar o efeito
cumulativo das multiplicações. Os resultados foram comparados com os
cálculos de 64-bits obtidos pelo MATLAB para verificar a diferença de erro
entre ambos. Os parâmetros de medida foram MSE, SNR e coeficiente de
Semelhança
Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code
This paper introduces Tiramisu, a polyhedral framework designed to generate
high performance code for multiple platforms including multicores, GPUs, and
distributed machines. Tiramisu introduces a scheduling language with novel
extensions to explicitly manage the complexities that arise when targeting
these systems. The framework is designed for the areas of image processing,
stencils, linear algebra and deep learning. Tiramisu has two main features: it
relies on a flexible representation based on the polyhedral model and it has a
rich scheduling language allowing fine-grained control of optimizations.
Tiramisu uses a four-level intermediate representation that allows full
separation between the algorithms, loop transformations, data layouts, and
communication. This separation simplifies targeting multiple hardware
architectures with the same algorithm. We evaluate Tiramisu by writing a set of
image processing, deep learning, and linear algebra benchmarks and compare them
with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu
matches or outperforms existing compilers and libraries on different hardware
architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041
Hardware Acceleration in Image Stitching: GPU vs FPGA
Image stitching is a process where two or more images with an overlapping field of view are combined. This process is commonly used to increase the field of view or image quality of a system. While this process is not particularly difficult for modern personal computers, hardware acceleration is often required to achieve real-time performance in low-power image stitching solutions. In this thesis, two separate hardware accelerated image stitching solutions are developed and compared. One solution is accelerated using a Xilinx Zynq UltraScale+ ZU3EG FPGA and the other solution is accelerated using an Nvidia RTX 2070 Super GPU. The image stitching solutions implemented in this paper increase the system’s field of view and involve the end-to-end process of feature detection, image registration, and image mixing. The latency, resource utilization, and power consumption for the accelerated portions of each system are compared and each systems tradeoffs and use cases are considered
FPGA implementation and performance comparison of a Bayesian face detection system
Face detection has primarily been a software-based effort. A hardware-based approach can provide significant speed-up over its software counterpart. Advances in transistor technology have made it possible to produce larger and faster FPGAs at more affordable prices. Through VHDL and synthesis tools it is possible to rapidly develop a hardware-based solution to face detection on an FPGA.
This work analyzes and compares the performance of a feature-invariant face detection method implemented in software and an FPGA. The primary components of the face detector were a Bayesian classifier used to segment the image into skin and nonskin pixels, and a direct least square elliptical fitting technique to determine if the skin region\u27s shape has elliptical characteristics similar to a face. The C++ implementation was benchmarked on several high performance workstations, while the VHDL implementation was synthesized for FPGAs from several Xilinx product lines.
The face detector used to compare software and hardware performance had a modest correct detection rate of 48.6% and a false alarm rate of 29.7%. The elliptical-shape of the region was determined to be an inaccurate approach for filtering out non-face skin regions. The software-based face detector was capable of detecting faces within images of approximately 378x567 pixels or less at 20 frames per second on Pentium 4 and Pentium D systems. The FPGA-based implementation was capable of faster detection speeds; a speedup of 3.33 was seen on a Spartan 3 and 4.52 on a Virtex 4. The comparison shows that an FPGA-based face detector could provide a significant increase in computational speed
Noise Suppression in Images by Median Filter
A new and efficient algorithm for high-density salt and pepper noise removal in images and videos is proposed. In the transmission of images over channels, images are corrupted by salt and pepper noise, due to faulty communications. Salt and Pepper noise is also referred to as Impulse noise. The objective of filtering is to remove the impulses so that the noise free image is fully recovered with minimum signal distortion. Noise removal can be achieved, by using a number of existing linear filtering techniques. We will deal with the images corrupted by salt-and-pepper noise in which the noisy pixels can take only the maximum or minimum values (i.e. 0 or 255 for 8-bit grayscale images)
- …