1,358 research outputs found

    High throughput spatial convolution filters on FPGAs

    Get PDF
    Digital signal processing (DSP) on field- programmable gate arrays (FPGAs) has long been appealing because of the inherent parallelism in these computations that can be easily exploited to accelerate such algorithms. FPGAs have evolved significantly to further enhance the mapping of these algorithms, included additional hard blocks, such as the DSP blocks found in modern FPGAs. Although these DSP blocks can offer more efficient mapping of DSP computations, they are primarily designed for 1-D filter structures. We present a study on spatial convolutional filter implementations on FPGAs, optimizing around the structure of the DSP blocks to offer high throughput while maintaining the coefficient flexibility that other published architectures usually sacrifice. We show that it is possible to implement large filters for large 4K resolution image frames at frame rates of 30–60 FPS, while maintaining functional flexibility

    Image Processing Using FPGAs

    Get PDF
    This book presents a selection of papers representing current research on using field programmable gate arrays (FPGAs) for realising image processing algorithms. These papers are reprints of papers selected for a Special Issue of the Journal of Imaging on image processing using FPGAs. A diverse range of topics is covered, including parallel soft processors, memory management, image filters, segmentation, clustering, image analysis, and image compression. Applications include traffic sign recognition for autonomous driving, cell detection for histopathology, and video compression. Collectively, they represent the current state-of-the-art on image processing using FPGAs

    Hardware/software 2D-3D backprojection on a SoPC platform

    Get PDF
    International audienceThe reduction of image reconstruction time is needed to spread the use of PET for research and routine clinical practice. In this purpose, this article presents a hardware/software architecture for the acceleration of 3D backprojection based upon an efficient 2D backprojection. This architecture has been designed in order to provide a high level of parallelism thanks to an efficient management of the memory accesses which would have been otherwise strongly slowed by the external memory. The reconstruction system is embedded in a SoPC platform (System on Programmable Chip), the new generation of reconfigurable circuit. The originality of this architecture comes from the design of a 2D Adaptative and Predictive Cache (2D-AP Cache) which has proved to be an efficient way to overcome the memory access bottleneck. Thanks to a hierarchical use of this cache, several backprojection operators can run in parallel, accelerating in this manner noteworthy the reconstruction process. This 2D reconstruction system will next be used to speed up 3D image reconstruction

    Realtime image noise reduction FPGA implementation with edge detection

    Get PDF
    The purpose of this dissertation was to develop and implement, in a Field Programmable Gate Array (FPGA), a noise reduction algorithm for real-time sensor acquired images. A Moving Average filter was chosen due to its fulfillment of a low demanding computational expenditure nature, speed, good precision and low to medium hardware resources utilization. The technique is simple to implement, however, if all pixels are indiscriminately filtered, the result will be a blurry image which is undesirable. Since human eye is more sensitive to contrasts, a technique was introduced to preserve sharp contour transitions which, in the author’s opinion, is the dissertation contribution. Synthetic and real images were tested. Synthetic, composed both with sharp and soft tone transitions, were generated with a developed algorithm, while real images were captured with an 8-kbit (8192 shades) high resolution sensor scaled up to 10 × 103 shades. A least-squares polynomial data smoothing filter, Savitzky-Golay, was used as comparison. It can be adjusted using 3 degrees of freedom ─ the window frame length which varies the filtering relation size between pixels’ neighborhood, the derivative order, which varies the curviness and the polynomial coefficients which change the adaptability of the curve. Moving Average filter only permits one degree of freedom, the window frame length. Tests revealed promising results with 2 and 4ℎ polynomial orders. Higher qualitative results were achieved with Savitzky-Golay’s better signal characteristics preservation, especially at high frequencies. FPGA algorithms were implemented in 64-bit integer registers serving two purposes: increase precision, hence, reducing the error comparatively as if it were done in floating-point registers; accommodate the registers’ growing cumulative multiplications. Results were then compared with MATLAB’s double precision 64-bit floating-point computations to verify the error difference between both. Used comparison parameters were Mean Squared Error, Signalto-Noise Ratio and Similarity coefficient.O objetivo desta dissertação foi desenvolver e implementar, em FPGA, um algoritmo de redução de ruído para imagens adquiridas em tempo real. Optou-se por um filtro de Média Deslizante por não exigir uma elevada complexidade computacional, ser rápido, ter boa precisão e requerer moderada utilização de recursos. A técnica é simples, mas se abordada como filtragem monotónica, o resultado é uma indesejável imagem desfocada. Dado o olho humano ser mais sensível ao contraste, introduziu-se uma técnica para preservar os contornos que, na opinião do autor, é a sua principal contribuição. Utilizaram-se imagens sintéticas e reais nos testes. As sintéticas, compostas por fortes e suaves contrastes foram geradas por um algoritmo desenvolvido. As reais foram capturadas com um sensor de alta resolução de 8-kbit (8192 tons) e escalonadas a 10 × 103 tons. Um filtro com suavização polinomial de mínimos quadrados, SavitzkyGolay, foi usado como comparação. Possui 3 graus de liberdade: o tamanho da janela, que varia o tamanho da relação de filtragem entre os pixels vizinhos; a ordem da derivada, que varia a curvatura do filtro e os coeficientes polinomiais, que variam a adaptabilidade da curva aos pontos a suavizar. O filtro de Média Deslizante é apenas ajustável no tamanho da janela. Os testes revelaram-se promissores nas 2ª e 4ª ordens polinomiais. Obtiveram-se resultados qualitativos com o filtro Savitzky-Golay que detém melhores características na preservação do sinal, especialmente em altas frequências. Os algoritmos em FPGA foram implementados em registos de vírgula fixa de 64-bits, servindo dois propósitos: aumentar a precisão, reduzindo o erro comparativamente ao terem sido em vírgula flutuante; acomodar o efeito cumulativo das multiplicações. Os resultados foram comparados com os cálculos de 64-bits obtidos pelo MATLAB para verificar a diferença de erro entre ambos. Os parâmetros de medida foram MSE, SNR e coeficiente de Semelhança

    Tiramisu: A Polyhedral Compiler for Expressing Fast and Portable Code

    Full text link
    This paper introduces Tiramisu, a polyhedral framework designed to generate high performance code for multiple platforms including multicores, GPUs, and distributed machines. Tiramisu introduces a scheduling language with novel extensions to explicitly manage the complexities that arise when targeting these systems. The framework is designed for the areas of image processing, stencils, linear algebra and deep learning. Tiramisu has two main features: it relies on a flexible representation based on the polyhedral model and it has a rich scheduling language allowing fine-grained control of optimizations. Tiramisu uses a four-level intermediate representation that allows full separation between the algorithms, loop transformations, data layouts, and communication. This separation simplifies targeting multiple hardware architectures with the same algorithm. We evaluate Tiramisu by writing a set of image processing, deep learning, and linear algebra benchmarks and compare them with state-of-the-art compilers and hand-tuned libraries. We show that Tiramisu matches or outperforms existing compilers and libraries on different hardware architectures, including multicore CPUs, GPUs, and distributed machines.Comment: arXiv admin note: substantial text overlap with arXiv:1803.0041

    Hardware Acceleration in Image Stitching: GPU vs FPGA

    Get PDF
    Image stitching is a process where two or more images with an overlapping field of view are combined. This process is commonly used to increase the field of view or image quality of a system. While this process is not particularly difficult for modern personal computers, hardware acceleration is often required to achieve real-time performance in low-power image stitching solutions. In this thesis, two separate hardware accelerated image stitching solutions are developed and compared. One solution is accelerated using a Xilinx Zynq UltraScale+ ZU3EG FPGA and the other solution is accelerated using an Nvidia RTX 2070 Super GPU. The image stitching solutions implemented in this paper increase the system’s field of view and involve the end-to-end process of feature detection, image registration, and image mixing. The latency, resource utilization, and power consumption for the accelerated portions of each system are compared and each systems tradeoffs and use cases are considered

    FPGA implementation and performance comparison of a Bayesian face detection system

    Get PDF
    Face detection has primarily been a software-based effort. A hardware-based approach can provide significant speed-up over its software counterpart. Advances in transistor technology have made it possible to produce larger and faster FPGAs at more affordable prices. Through VHDL and synthesis tools it is possible to rapidly develop a hardware-based solution to face detection on an FPGA. This work analyzes and compares the performance of a feature-invariant face detection method implemented in software and an FPGA. The primary components of the face detector were a Bayesian classifier used to segment the image into skin and nonskin pixels, and a direct least square elliptical fitting technique to determine if the skin region\u27s shape has elliptical characteristics similar to a face. The C++ implementation was benchmarked on several high performance workstations, while the VHDL implementation was synthesized for FPGAs from several Xilinx product lines. The face detector used to compare software and hardware performance had a modest correct detection rate of 48.6% and a false alarm rate of 29.7%. The elliptical-shape of the region was determined to be an inaccurate approach for filtering out non-face skin regions. The software-based face detector was capable of detecting faces within images of approximately 378x567 pixels or less at 20 frames per second on Pentium 4 and Pentium D systems. The FPGA-based implementation was capable of faster detection speeds; a speedup of 3.33 was seen on a Spartan 3 and 4.52 on a Virtex 4. The comparison shows that an FPGA-based face detector could provide a significant increase in computational speed

    Noise Suppression in Images by Median Filter

    Full text link
    A new and efficient algorithm for high-density salt and pepper noise removal in images and videos is proposed. In the transmission of images over channels, images are corrupted by salt and pepper noise, due to faulty communications. Salt and Pepper noise is also referred to as Impulse noise. The objective of filtering is to remove the impulses so that the noise free image is fully recovered with minimum signal distortion. Noise removal can be achieved, by using a number of existing linear filtering techniques. We will deal with the images corrupted by salt-and-pepper noise in which the noisy pixels can take only the maximum or minimum values (i.e. 0 or 255 for 8-bit grayscale images)
    • …
    corecore