590 research outputs found

    FPGA implementation of a memory-efficient Hough Parameter Space for the detection of lines

    Get PDF
    The Line Hough Transform (LHT) is a robust and accurate line detection algorithm, useful for applications such as lane detection in Advanced Driver Assistance Systems. For real-time implementation, the LHT is demanding in terms of computation and memory, and hence Field Programmable Gate Arrays (FPGAs) are often deployed. However, many small FPGAs are incapable of implementing the LHT due to the large memory requirement of the Hough Parameter Space (HPS). This paper presents a memory-efficient architecture of the LHT named the Angular Regions - Line Hough Transform (AR-LHT). We present a suitable FPGA implementation of the AR-LHT and provide a performance and resource analysis after targeting a Xilinx xc7z010-1 device. Results demonstrate that, for an image of 1024x1024 pixels, approximately 48% less memory is used than the Standard LHT. The FPGA architecture is capable of processing a single image in 9.03ms

    A template-based methodology for efficient microprocessor and FPGA accelerator co-design

    Get PDF
    Embedded applications usually require Software/Hardware (SW/HW) designs to meet the hard timing constraints and the required design flexibility. Exhaustive exploration for SW/HW designs is a very time consuming task, while the adhoc approaches and the use of partially automatic tools usually lead to less efficient designs. To support a more efficient codesign process for FPGA platforms we propose a systematic methodology to map an application to SW/HW platform with a custom HW accelerator and a microprocessor core. The methodology mapping steps are expressed through parametric templates for the SW/HW Communication Organization, the Foreground (FG) Memory Management and the Data Path (DP) Mapping. Several performance-area tradeoff design Pareto points are produced by instantiating the templates. A real-time bioimaging application is mapped on a FPGA to evaluate the gains of our approach, i.e. 44,8% on performance compared with pure SW designs and 58% on area compared with pure HW designs

    Hough Transform Proposal and Simulations for Particle Track Recognition for LHC Phase-II Upgrade

    Get PDF
    In the near future, LHC experiments will continue future upgrades by overcoming the technological obsolescence of the detectors and the readout capabilities. Therefore, after the conclusion of a data collection period, CERN will have to face a long shutdown to improve overall performance, by updating the experiments, and implementing more advanced technologies and infrastructures. In particular, the largest LHC experiment, i.e., ATLAS, will upgrade parts of the detector, the trigger, and the data acquisition system. In addition, the ATLAS experiment will complete the implementation of new strategies, algorithms for data handling, and transmission to the final storage apparatus. This paper presents an overview of an upgrade planned for the second half of this decade for the ATLAS experiment. In particular, we show a study of a novel pattern recognition algorithm used in the trigger system, which is a device designed to provide the information needed to select physical events from unnecessary background data. The idea is to use a well known mathematical transform, the Hough transform, as the algorithm for the detection of particle trajectories. The effectiveness of the algorithm has already been validated in the past, regardless of particle physics applications, to recognize generic shapes within images. On the contrary, here, we first propose a software emulation tool, and a subsequent hardware implementation of the Hough transform, for particle physics applications. Until now, the Hough transform has never been implemented on electronics in particle physics experiments, and since a hardware implementation would provide benefits in terms of overall Latency, we complete the studies by comparing the simulated data with a physical system implemented on a Xilinx hardware accelerator (FELIX-II card). In more detail, we have implemented a low-abstraction RTL design of the Hough transform on Xilinx UltraScale+ FPGAs as target devices for filtering applications

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Implementation of a real time Hough transform using FPGA technology

    Get PDF
    This thesis is concerned with the modelling, design and implementation of efficient architectures for performing the Hough Transform (HT) on mega-pixel resolution real-time images using Field Programmable Gate Array (FPGA) technology. Although the HT has been around for many years and a number of algorithms have been developed it still remains a significant bottleneck in many image processing applications. Even though, the basic idea of the HT is to locate curves in an image that can be parameterized: e.g. straight lines, polynomials or circles, in a suitable parameter space, the research presented in this thesis will focus only on location of straight lines on binary images. The HT algorithm uses an accumulator array (accumulator bins) to detect the existence of a straight line on an image. As the image needs to be binarized, a novel generic synchronization circuit for windowing operations was designed to perform edge detection. An edge detection method of special interest, the canny method, is used and the design and implementation of it in hardware is achieved in this thesis. As each image pixel can be implemented independently, parallel processing can be performed. However, the main disadvantage of the HT is the large storage and computational requirements. This thesis presents new and state-of-the-art hardware implementations for the minimization of the computational cost, using the Hybrid-Logarithmic Number System (Hybrid-LNS) for calculating the HT for fixed bit-width architectures. It is shown that using the Hybrid-LNS the computational cost is minimized, while the precision of the HT algorithm is maintained. Advances in FPGA technology now make it possible to implement functions as the HT in reconfigurable fabrics. Methods for storing large arrays on FPGA’s are presented, where data from a 1024 x 1024 pixel camera at a rate of up to 25 frames per second are processed

    Simulated Hough Transform Model Optimized for Straight-Line Recognition Using Frontier FPGA Devices

    Get PDF
    The use of the Hough transforms to identify shapes or images has been extensively studied in the past using software for artificial intelligence applications. In this article, we present a generalization of the goal of shape recognition using the Hough transform, applied to a broader range of real problems. A software simulator was developed to generate input patterns (straight-lines) and test the ability of a generic low-latency system to identify these lines: first in a clean environment with no other inputs and then looking for the same lines as ambient background noise increases. In particular, the paper presents a study to optimize the implementation of the Hough transform algorithm in programmable digital devices, such as FPGAs. We investigated the ability of the Hough transform to discriminate straight-lines within a vast bundle of random lines, emulating a noisy environment. In more detail, the study follows an extensive investigation we recently conducted to recognize tracks of ionizing particles in high-energy physics. In this field, the lines can represent the trajectories of particles that must be immediately recognized as they are created in a particle detector. The main advantage of using FPGAs over any other component is their speed and low latency to investigate pattern recognition problems in a noisy environment. In fact, FPGAs guarantee a latency that increases linearly with the incoming data, while other solutions increase latency times more quickly. Furthermore, HT inherently adapts to incomplete input data sets, especially if resolutions are limited. Hence, an FPGA system that implements HT is inefficient for small sets of input data but becomes more cost-effective as the size of the input data increases. The document first presents an example that uses a large Accumulator consisting of 1100 x 600 Bins and several sets of input data to validate the Hough transform algorithm as random noise increases to 80% of input data. Then, a more specifically dedicated input set was chosen to emulate a real situation where a Xilinx UltraScale+ was to be used as the final target device. Thus, we have reduced the Accumulator to 280 x  280 Bins using a clock signal at 250 MHz and a few tens input points. Under these conditions, the behavior of the firmware matched the software simulations, confirming the feasibility of the HT implementation on FPGA

    Hough Transform recursive evaluation using Distributed Arithmetic

    Get PDF
    Paper submitted to the IFIP International Conference on Very Large Scale Integration (VLSI-SOC), Darmstadt, Germany, 2003.The Hough Transform (HT) is a useful technique in image segmentation, concretely for geometrical primitive detection. A Convolution-Based Recursive Method (CBRM) is presented for generic function evaluation. In this approach, calculations are carried out by a unique parametric formula which provides all function points by successive iteration. The case of combined trigonometric functions involved in the calculation of the HT is analyzed under this scope. An architecture for reconfigurable FPGA-based hardware, using Distributed Arithmetic (DA) implements the design. It provides memory and hardware resource saving as well as speed improvements according to the experiments carried out with the HT

    High speed event-based visual processing in the presence of noise

    Get PDF
    Standard machine vision approaches are challenged in applications where large amounts of noisy temporal data must be processed in real-time. This work aims to develop neuromorphic event-based processing systems for such challenging, high-noise environments. The novel event-based application-focused algorithms developed are primarily designed for implementation in digital neuromorphic hardware with a focus on noise robustness, ease of implementation, operationally useful ancillary signals and processing speed in embedded systems

    Ein flexibles, heterogenes Bildverarbeitungs-Framework für weltraumbasierte, rekonfigurierbare Datenverarbeitungsmodule

    Get PDF
    Scientific instruments as payload of current space missions are often equipped with high-resolution sensors. Thereby, especially camera-based instruments produce a vast amount of data. To obtain the desired scientific information, this data usually is processed on ground. Due to the high distance of missions within the solar system, the data rate for downlink to the ground station is strictly limited. The volume of scientific relevant data is usually less compared to the obtained raw data. Therefore, processing already has to be carried out on-board the spacecraft. An example of such an instrument is the Polarimetric and Helioseismic Imager (PHI) on-board Solar Orbiter. For acquisition, storage and processing of images, the instrument is equipped with a Data Processing Module (DPM). It makes use of heterogeneous computing based on a dedicated LEON3 processor in combination with two reconfigurable Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). The thesis will provide an overview of the available space-grade processing components (processors and FPGAs) which fulfill the requirements of deepspace missions. It also presents existing processing platforms which are based upon a heterogeneous system combining processors and FPGAs. This also includes the DPM of the PHI instrument, whose architecture will be introduced in detail. As core contribution of this thesis, a framework will be presented which enables high-performance image processing on such hardware-based systems while retaining software-like flexibility. This framework mainly consists of a variety of modules for hardware acceleration which are integrated seamlessly into the data flow of the on-board software. Supplementary, it makes extensive use of the dynamic in-flight reconfigurability of the used Virtex-4 FPGAs. The flexibility of the presented framework is proven by means of multiple examples from within the image processing of the PHI instrument. The framework is analyzed with respect to processing performance as well as power consumption.Wissenschaftliche Instrumente auf aktuellen Raumfahrtmissionen sind oft mit hochauflösenden Sensoren ausgestattet. Insbesondere kamerabasierte Instrumente produzieren dabei eine große Menge an Daten. Diese werden üblicherweise nach dem Empfang auf der Erde weiterverarbeitet, um daraus wissenschaftlich relevante Informationen zu gewinnen. Aufgrund der großen Entfernung von Missionen innerhalb unseres Sonnensystems ist die Datenrate zur Übertragung an die Bodenstation oft sehr begrenzt. Das Volumen der wissenschaftlich relevanten Daten ist meist deutlich kleiner als die aufgenommenen Rohdaten. Daher ist es vorteilhaft, diese bereits an Board der Sonde zu verarbeiten. Ein Beispiel für solch ein Instrument ist der Polarimetric and Helioseismic Imager (PHI) an Bord von Solar Orbiter. Um die Daten aufzunehmen, zu speichern und zu verarbeiten, ist das Instrument mit einem Data Processing Module (DPM) ausgestattet. Dieses nutzt ein heterogenes Rechnersystem aus einem dedizierten LEON3 Prozessor, zusammen mit zwei rekonfigurierbaren Xilinx Virtex-4 Field-Programmable Gate Arrays (FPGAs). Die folgende Arbeit gibt einen Überblick über verfügbare Komponenten zur Datenverarbeitung (Prozessoren und FPGAs), die den Anforderungen von Raumfahrtmissionen gerecht werden, und stellt einige existierende Plattformen vor, die auf einem heterogenen System aus Prozessor und FPGA basieren. Hierzu gehört auch das Data Processing Module des PHI Instrumentes, dessen Architektur im Verlauf dieser Arbeit beschrieben wird. Als Kernelement der Dissertation wird ein Framework vorgestellt, das sowohl eine performante, als auch eine flexible Bilddatenverarbeitung auf einem solchen System ermöglicht. Dieses Framework besteht aus verschiedenen Modulen zur Hardwarebeschleunigung und bindet diese nahtlos in den Datenfluss der On-Board Software ein. Dabei wird außerdem die Möglichkeit genutzt, die eingesetzten Virtex-4 FPGAs dynamisch zur Laufzeit zu rekonfigurieren. Die Flexibilität des vorgestellten Frameworks wird anhand mehrerer Fallbeispiele aus der Bildverarbeitung von PHI dargestellt. Das Framework wird bezüglich der Verarbeitungsgeschwindigkeit und Energieeffizienz analysiert

    Development of anFPGA-based Data Reduction System for the Belle II DEPFET Pixel Detector

    Get PDF
    The innermost two layers of the Belle II detector at the KEKB collider in Tsukuba, Japan will be covered by highly granular DEPFET pixel sensors. The large number of pixels lead to a maximum data rate of 256 Gbps, which has to be significantly reduced by the Data Acquisition System. For data reduction, the hit information of the silicon-strip vertex detector surrounding the pixel detector is used to define so-called Regions of Interest (ROI) in the pixel detector. Only hit information of the pixels located inside these ROIs are saved. The ROIs for the pixel detector are computed by reconstructing track segments from strip data and extrapolation to the pixel detector. The goal is to achieve a reduction factor of up to 10 with this ROI selection. All the necessary processing stages, the receiving, decoding and multiplexing of SVD data on 48 optical fibers, the track reconstruction and the definition of the ROIs, will be performed by the DATCON system, developed in the scope of this thesis. The planned hardware design is based on a distributed set of Advanced Mezzanine Cards (AMC), each equipped with a Field Programmable Gate Array (FPGA) and four optical transceivers. An algorithm is developed based on a Hough Transformation, a commonly used pattern recognition method in image processing to identify the track segments in the strip detector and calculation of the track parameters. Using simulations, the performance of the developed algorithms are evaluated. For use in the DATCON system the Hough track reconstruction is implemented on FPGAs. Several tests of the modules required to create the ROIs are performed in a simulation environment and tested on the AMC hardware. After a line of successful tests, the DATCON prototype was used in two test beam campaigns to verify the concept and practice the integration with the other detector systems. The developed track reconstruction algorithm shows a high reconstruction efficiency down to low track momenta. A higher data reduction than originally intended was achieved within the limits of the available processing time. The FPGA track reconstruction algorithm is found to be even three times faster than demanded by the trigger rate of the experiment. The used concepts and developed algorithms are not specifically designed for the Belle II vertex detector only, but can be used in different experiments. It was successfully tested on the low-level trigger for Belle II, using drift chamber information and showed a comparably good track reconstruction performance
    corecore