944 research outputs found

    Computer Vision System-On-Chip Designs for Intelligent Vehicles

    Get PDF
    Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy. The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability. An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems. A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights. Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption. In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn\u27t rely on external memory for storage

    Computer Vision System-On-Chip Designs for Intelligent Vehicles

    Get PDF
    Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy. The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability. An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems. A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights. Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption. In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn\u27t rely on external memory for storage

    Biblioteca de procesamiento de imágenes optimizada para Arm Cortex-M7

    Get PDF
    La mayoría de los vehículos en la actualidad están equipados con sistemas que asisten al conductor en tareas difíciles y repetitivas, como reducir la velocidad del vehículo en una zona escolar. Algunos de estos sistemas requieren una computadora a bordo capaz de realizar el procesamiento en tiempo real de las imágenes del camino obtenidas por una cámara. El objetivo de este proyecto es implementar una librería de procesamiento de imagen optimizada para la arquitectura ARM® Cortex®-M7. Esta librería provee rutinas para realizar filtrado espacial, resta, binarización y extracción de la información direccional de una imagen, así como el reconocimiento parametrizado de patrones de una figura predefinida utilizando la Transformada Generalizada de Hough. Estas rutinas están escritas en el lenguaje de programación C, para aprovechar las optimizaciones del compilador GNU ARM C, y obtener el máximo desempeño y el mínimo tamaño de objetos. El desempeño de las rutinas fue comparado con la implementación existente para otro microcontrolador, el Freescale® MPC5561. Para probar la funcionalidad de esta librería en una aplicación de tiempo real, se desarrolló un sistema de reconocimiento de señales de tráfico. Los resultados muestran que en promedio el tiempo de ejecución es 18% más rápido y el tamaño de objetos es 25% menor que en la implementación de referencia, lo que habilita a este sistema para procesar hasta 24 cuadros por segundo. En conclusión, estos resultados demuestran la funcionalidad de la librería de procesamiento de imágenes en sistemas de tiempo real.Most modern vehicles are equipped with systems that assist the driver by automating difficult and repetitive tasks, such as reducing the vehicle speed in a school zone. Some of these systems require an onboard computer capable of performing real-time processing of the road images captured by a camera. The goal of this project is to implement an optimized image processing library for the ARM® Cortex®-M7 architecture. This library includes the routines to perform image spatial filtering, subtraction, binarization, and extraction of the directional information along with the parameterized pattern recognition of a predefined template using the Generalized Hough Transform (GHT). These routines are written in the C programming language, leveraging GNU ARM C compiler optimizations to obtain maximum performance and minimum object size. The performance of the routines was benchmarked with an existing implementation for a different microcontroller, the Freescale® MPC5561. To prove the usability of this library in a real-time application, a Traffic Sign Recognition (TSR) system was implemented. The results show that in average the execution time is 18% faster and the binary object size is 25% smaller than in the reference implementation, enabling the TSR application to process up to 24 fps. In conclusion, these results demonstrate that the image processing library implemented in this project is suitable for real-time applications.ITESO, A. C.Consejo Nacional de Ciencia y TecnologíaContinental Automotiv

    Simulated Hough Transform Model Optimized for Straight-Line Recognition Using Frontier FPGA Devices

    Get PDF
    The use of the Hough transforms to identify shapes or images has been extensively studied in the past using software for artificial intelligence applications. In this article, we present a generalization of the goal of shape recognition using the Hough transform, applied to a broader range of real problems. A software simulator was developed to generate input patterns (straight-lines) and test the ability of a generic low-latency system to identify these lines: first in a clean environment with no other inputs and then looking for the same lines as ambient background noise increases. In particular, the paper presents a study to optimize the implementation of the Hough transform algorithm in programmable digital devices, such as FPGAs. We investigated the ability of the Hough transform to discriminate straight-lines within a vast bundle of random lines, emulating a noisy environment. In more detail, the study follows an extensive investigation we recently conducted to recognize tracks of ionizing particles in high-energy physics. In this field, the lines can represent the trajectories of particles that must be immediately recognized as they are created in a particle detector. The main advantage of using FPGAs over any other component is their speed and low latency to investigate pattern recognition problems in a noisy environment. In fact, FPGAs guarantee a latency that increases linearly with the incoming data, while other solutions increase latency times more quickly. Furthermore, HT inherently adapts to incomplete input data sets, especially if resolutions are limited. Hence, an FPGA system that implements HT is inefficient for small sets of input data but becomes more cost-effective as the size of the input data increases. The document first presents an example that uses a large Accumulator consisting of 1100 x 600 Bins and several sets of input data to validate the Hough transform algorithm as random noise increases to 80% of input data. Then, a more specifically dedicated input set was chosen to emulate a real situation where a Xilinx UltraScale+ was to be used as the final target device. Thus, we have reduced the Accumulator to 280 x  280 Bins using a clock signal at 250 MHz and a few tens input points. Under these conditions, the behavior of the firmware matched the software simulations, confirming the feasibility of the HT implementation on FPGA

    Hardware acceleration of the trace transform for vision applications

    Get PDF
    Computer Vision is a rapidly developing field in which machines process visual data to extract meaningful information. Digitised images in their pixels and bits serve no purpose of their own. It is only by interpreting the data, and extracting higher level information that a scene can be understood. The algorithms that enable this process are often complex, and data-intensive, limiting the processing rate when implemented in software. Hardware-accelerated implementations provide a significant performance boost that can enable real- time processing. The Trace Transform is a newly proposed algorithm that has been proven effective in image categorisation and recognition tasks. It is flexibly defined allowing the mathematical details to be tailored to the target application. However, it is highly computationally intensive, which limits its applications. Modern heterogeneous FPGAs provide an ideal platform for accelerating the Trace transform for real-time performance, while also allowing an element of flexibility, which highly suits the generality of the Trace transform. This thesis details the implementation of an extensible Trace transform architecture for vision applications, before extending this architecture to a full flexible platform suited to the exploration of Trace transform applications. As part of the work presented, a general set of architectures for large-windowed median and weighted median filters are presented as required for a number of Trace transform implementations. Finally an acceleration of Pseudo 2-Dimensional Hidden Markov Model decoding, usable in a person detection system, is presented. Such a system can be used to extract frames of interest from a video sequence, to be subsequently processed by the Trace transform. All these architectures emphasise the need for considered, platform-driven design in achieving maximum performance through hardware acceleration

    Hough Transform Proposal and Simulations for Particle Track Recognition for LHC Phase-II Upgrade

    Get PDF
    In the near future, LHC experiments will continue future upgrades by overcoming the technological obsolescence of the detectors and the readout capabilities. Therefore, after the conclusion of a data collection period, CERN will have to face a long shutdown to improve overall performance, by updating the experiments, and implementing more advanced technologies and infrastructures. In particular, the largest LHC experiment, i.e., ATLAS, will upgrade parts of the detector, the trigger, and the data acquisition system. In addition, the ATLAS experiment will complete the implementation of new strategies, algorithms for data handling, and transmission to the final storage apparatus. This paper presents an overview of an upgrade planned for the second half of this decade for the ATLAS experiment. In particular, we show a study of a novel pattern recognition algorithm used in the trigger system, which is a device designed to provide the information needed to select physical events from unnecessary background data. The idea is to use a well known mathematical transform, the Hough transform, as the algorithm for the detection of particle trajectories. The effectiveness of the algorithm has already been validated in the past, regardless of particle physics applications, to recognize generic shapes within images. On the contrary, here, we first propose a software emulation tool, and a subsequent hardware implementation of the Hough transform, for particle physics applications. Until now, the Hough transform has never been implemented on electronics in particle physics experiments, and since a hardware implementation would provide benefits in terms of overall Latency, we complete the studies by comparing the simulated data with a physical system implemented on a Xilinx hardware accelerator (FELIX-II card). In more detail, we have implemented a low-abstraction RTL design of the Hough transform on Xilinx UltraScale+ FPGAs as target devices for filtering applications

    Hough Transform recursive evaluation using Distributed Arithmetic

    Get PDF
    Paper submitted to the IFIP International Conference on Very Large Scale Integration (VLSI-SOC), Darmstadt, Germany, 2003.The Hough Transform (HT) is a useful technique in image segmentation, concretely for geometrical primitive detection. A Convolution-Based Recursive Method (CBRM) is presented for generic function evaluation. In this approach, calculations are carried out by a unique parametric formula which provides all function points by successive iteration. The case of combined trigonometric functions involved in the calculation of the HT is analyzed under this scope. An architecture for reconfigurable FPGA-based hardware, using Distributed Arithmetic (DA) implements the design. It provides memory and hardware resource saving as well as speed improvements according to the experiments carried out with the HT

    Feature detection in an indoor environment using Hardware Accelerators for time-efficient Monocular SLAM

    Get PDF
    In the field of Robotics, Monocular Simultaneous Localization and Mapping (Monocular SLAM) has gained immense popularity, as it replaces large and costly sensors such as laser range finders with a single cheap camera. Additionally, the well-developed area of Computer Vision provides robust image processing algorithms which aid in developing feature detection technique for the implementation of Monocular SLAM. Similarly, in the field of digital electronics and embedded systems, hardware acceleration using FPGAs, has become quite popular. Hardware acceleration is based upon the idea of offloading certain iterative algorithms from the processor and implementing them on a dedicated piece of hardware such as an ASIC or FPGA, to speed up performance in terms of timing and to possibly reduce the net power consumption of the system. Good strides have been taken in developing massively pipelined and resource efficient hardware implementations of several image processing algorithms on FPGAs, which achieve fairly decent speed-up of the processing time. In this thesis, we have developed a very simple algorithm for feature detection in an indoor environment by means of a single camera, based on Canny Edge Detection and Hough Transform algorithms using OpenCV library, and proposed its integration with existing feature initialization technique for a complete Monocular SLAM implementation. Following this, we have developed hardware accelerators for Canny Edge Detection & Hough Transform and we have compared the timing performance of implementation in hardware (using FPGAs) with an implementation in software (using C++ and OpenCV)

    A Specialized Processor for Track Reconstruction at the LHC Crossing Rate

    Full text link
    We present the results of an R&D study of a specialized processor capable of precisely reconstructing events with hundreds of charged-particle tracks in pixel detectors at 40 MHz, thus suitable for processing LHC events at the full crossing frequency. For this purpose we design and test a massively parallel pattern-recognition algorithm, inspired by studies of the processing of visual images by the brain as it happens in nature. We find that high-quality tracking in large detectors is possible with sub-μ\mus latencies when this algorithm is implemented in modern, high-speed, high-bandwidth FPGA devices. This opens a possibility of making track reconstruction happen transparently as part of the detector readout.Comment: Presented by G.Punzi at the conference on "Instrumentation for Colliding Beam Physics" (INSTR14), 24 Feb to 1 Mar 2014, Novosibirsk, Russia. Submitted to JINST proceeding
    • …
    corecore