246 research outputs found

    A Hybrid Real-Time Vision-Based Person Detection Method

    Full text link
    [EN] In this paper, we introduce a hybrid real-time method for vision-based pedestrian detection made up by the sequential combination of two basic methods applied in a coarse to fine fashion. The proposed method aims to achieve an improved balance between detection accuracy and computational load by taking advantage of the strengths of these basic techniques. Haar-like features combined with Boosting techniques, which have been demonstrated to provide rapid but not accurate enough results in human detection, are used in the first stage to provide a preliminary candidate selection in the scene. Then, feature extraction and classification methods, which present high accuracy rates at expenses of a higher computational cost, are applied over boosting candidates providing the final prediction. Experimental results show that the proposed method performs effectively and efficiently, which supports its suitability for real applications.This work is supported by CASBLIP project 6-th FP\cite{RefCASBLIP}. The authors acknowledge the support of the Technological Institute of Optics, Colour and Imaging of Valencia - AIDO. Dr. Samuel Morillas acknowledges the support of Generalitat Valenciana under grant GVPRE/2008/257 and Universitat Politècnica de València under grant Primeros Proyetos de Investigación 13202. }Oliver Moll, J.; Albiol Colomer, A.; Morillas, S.; Peris Fajarnes, G. (2011). A Hybrid Real-Time Vision-Based Person Detection Method. Waves. 86-95. http://hdl.handle.net/10251/57676S869

    Histogram of gradients of Time-Frequency Representations for Audio scene detection

    Full text link
    This paper addresses the problem of audio scenes classification and contributes to the state of the art by proposing a novel feature. We build this feature by considering histogram of gradients (HOG) of time-frequency representation of an audio scene. Contrarily to classical audio features like MFCC, we make the hypothesis that histogram of gradients are able to encode some relevant informations in a time-frequency {representation:} namely, the local direction of variation (in time and frequency) of the signal spectral power. In addition, in order to gain more invariance and robustness, histogram of gradients are locally pooled. We have evaluated the relevance of {the novel feature} by comparing its performances with state-of-the-art competitors, on several datasets, including a novel one that we provide, as part of our contribution. This dataset, that we make publicly available, involves 1919 classes and contains about 900900 minutes of audio scene recording. We thus believe that it may be the next standard dataset for evaluating audio scene classification algorithms. Our comparison results clearly show that our HOG-based features outperform its competitor

    HOG-Dot: A Parallel Kernel-Based Gradient Extraction for Embedded Image Processing

    Get PDF
    International audienceIn this paper we propose HOG-Dot, a method for the direct computation of the polar image gradients coordinates from the pixels values. The proposed algorithm, to be used as the first step of the Histogram of Oriented Gradient (HOG) pipeline, approximates the exact gradient with its projection onto a versor chosen among the projection plane set. Instead of non linear computations, the HOG-Dot method exploits linear operations while introducing a bounded approximation error with respect to other HOG approaches, thus resulting a more suitable solution for embedded devices. Concerning the state of the art, it also achieves improved accuracy with the mathematical spatial gradient formulation

    Cyclist Detection, Tracking, and Trajectory Analysis in Urban Traffic Video Data

    Full text link
    The major objective of this thesis work is examining computer vision and machine learning detection methods, tracking algorithms and trajectory analysis for cyclists in traffic video data and developing an efficient system for cyclist counting. Due to the growing number of cyclist accidents on urban roads, methods for collecting information on cyclists are of significant importance to the Department of Transportation. The collected information provides insights into solving critical problems related to transportation planning, implementing safety countermeasures, and managing traffic flow efficiently. Intelligent Transportation System (ITS) employs automated tools to collect traffic information from traffic video data. In comparison to other road users, such as cars and pedestrians, the automated cyclist data collection is relatively a new research area. In this work, a vision-based method for gathering cyclist count data at intersections and road segments is developed. First, we develop methodology for an efficient detection and tracking of cyclists. The combination of classification features along with motion based properties are evaluated to detect cyclists in the test video data. A Convolutional Neural Network (CNN) based detector called You Only Look Once (YOLO) is implemented to increase the detection accuracy. In the next step, the detection results are fed into a tracker which is implemented based on the Kernelized Correlation Filters (KCF) which in cooperation with the bipartite graph matching algorithm allows to track multiple cyclists, concurrently. Then, a trajectory rebuilding method and a trajectory comparison model are applied to refine the accuracy of tracking and counting. The trajectory comparison is performed based on semantic similarity approach. The proposed counting method is the first cyclist counting method that has the ability to count cyclists under different movement patterns. The trajectory data obtained can be further utilized for cyclist behavioral modeling and safety analysis

    Computer Vision System-On-Chip Designs for Intelligent Vehicles

    Get PDF
    Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy. The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability. An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems. A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights. Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption. In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn\u27t rely on external memory for storage

    Computer Vision System-On-Chip Designs for Intelligent Vehicles

    Get PDF
    Intelligent vehicle technologies are growing rapidly that can enhance road safety, improve transport efficiency, and aid driver operations through sensors and intelligence. Advanced driver assistance system (ADAS) is a common platform of intelligent vehicle technologies. Many sensors like LiDAR, radar, cameras have been deployed on intelligent vehicles. Among these sensors, optical cameras are most widely used due to their low costs and easy installation. However, most computer vision algorithms are complicated and computationally slow, making them difficult to be deployed on power constraint systems. This dissertation investigates several mainstream ADAS applications, and proposes corresponding efficient digital circuits implementations for these applications. This dissertation presents three ways of software / hardware algorithm division for three ADAS applications: lane detection, traffic sign classification, and traffic light detection. Using FPGA to offload critical parts of the algorithm, the entire computer vision system is able to run in real time while maintaining a low power consumption and a high detection rate. Catching up with the advent of deep learning in the field of computer vision, we also present two deep learning based hardware implementations on application specific integrated circuits (ASIC) to achieve even lower power consumption and higher accuracy. The real time lane detection system is implemented on Xilinx Zynq platform, which has a dual core ARM processor and FPGA fabric. The Xilinx Zynq platform integrates the software programmability of an ARM processor with the hardware programmability of an FPGA. For the lane detection task, the FPGA handles the majority of the task: region-of-interest extraction, edge detection, image binarization, and hough transform. After then, the ARM processor takes in hough transform results and highlights lanes using the hough peaks algorithm. The entire system is able to process 1080P video stream at a constant speed of 69.4 frames per second, realizing real time capability. An efficient system-on-chip (SOC) design which classifies up to 48 traffic signs in real time is presented in this dissertation. The traditional histogram of oriented gradients (HoG) and support vector machine (SVM) are proven to be very effective on traffic sign classification with an average accuracy rate of 93.77%. For traffic sign classification, the biggest challenge comes from the low execution efficiency of the HoG on embedded processors. By dividing the HoG algorithm into three fully pipelined stages, as well as leveraging extra on-chip memory to store intermediate results, we successfully achieved a throughput of 115.7 frames per second at 1080P resolution. The proposed generic HoG hardware implementation could also be used as an individual IP core by other computer vision systems. A real time traffic signal detection system is implemented to present an efficient hardware implementation of the traditional grass-fire blob detection. The traditional grass-fire blob detection method iterates the input image multiple times to calculate connected blobs. In digital circuits, five extra on-chip block memories are utilized to save intermediate results. By using additional memories, all connected blob information could be obtained through one-pass image traverse. The proposed hardware friendly blob detection can run at 72.4 frames per second with 1080P video input. Applying HoG + SVM as feature extractor and classifier, 92.11% recall rate and 99.29% precision rate are obtained on red lights, and 94.44% recall rate and 98.27% precision rate on green lights. Nowadays, convolutional neural network (CNN) is revolutionizing computer vision due to learnable layer by layer feature extraction. However, when coming into inference, CNNs are usually slow to train and slow to execute. In this dissertation, we studied the implementation of principal component analysis based network (PCANet), which strikes a balance between algorithm robustness and computational complexity. Compared to a regular CNN, the PCANet only needs one iteration training, and typically at most has a few tens convolutions on a single layer. Compared to hand-crafted features extraction methods, the PCANet algorithm well reflects the variance in the training dataset and can better adapt to difficult conditions. The PCANet algorithm achieves accuracy rates of 96.8% and 93.1% on road marking detection and traffic light detection, respectively. Implementing in Synopsys 32nm process technology, the proposed chip can classify 724,743 32-by-32 image candidates in one second, with only 0.5 watt power consumption. In this dissertation, binary neural network (BNN) is adopted as a potential detector for intelligent vehicles. The BNN constrains all activations and weights to be +1 or -1. Compared to a CNN with the same network configuration, the BNN achieves 50 times better resource usage with only 1% - 2% accuracy loss. Taking car detection and pedestrian detection as examples, the BNN achieves an average accuracy rate of over 95%. Furthermore, a BNN accelerator implemented in Synopsys 32nm process technology is presented in our work. The elastic architecture of the BNN accelerator makes it able to process any number of convolutional layers with high throughput. The BNN accelerator only consumes 0.6 watt and doesn\u27t rely on external memory for storage

    Real-time object detection using monocular vision for low-cost automotive sensing systems

    Get PDF
    This work addresses the problem of real-time object detection in automotive environments using monocular vision. The focus is on real-time feature detection, tracking, depth estimation using monocular vision and finally, object detection by fusing visual saliency and depth information. Firstly, a novel feature detection approach is proposed for extracting stable and dense features even in images with very low signal-to-noise ratio. This methodology is based on image gradients, which are redefined to take account of noise as part of their mathematical model. Each gradient is based on a vector connecting a negative to a positive intensity centroid, where both centroids are symmetric about the centre of the area for which the gradient is calculated. Multiple gradient vectors define a feature with its strength being proportional to the underlying gradient vector magnitude. The evaluation of the Dense Gradient Features (DeGraF) shows superior performance over other contemporary detectors in terms of keypoint density, tracking accuracy, illumination invariance, rotation invariance, noise resistance and detection time. The DeGraF features form the basis for two new approaches that perform dense 3D reconstruction from a single vehicle-mounted camera. The first approach tracks DeGraF features in real-time while performing image stabilisation with minimal computational cost. This means that despite camera vibration the algorithm can accurately predict the real-world coordinates of each image pixel in real-time by comparing each motion-vector to the ego-motion vector of the vehicle. The performance of this approach has been compared to different 3D reconstruction methods in order to determine their accuracy, depth-map density, noise-resistance and computational complexity. The second approach proposes the use of local frequency analysis of i ii gradient features for estimating relative depth. This novel method is based on the fact that DeGraF gradients can accurately measure local image variance with subpixel accuracy. It is shown that the local frequency by which the centroid oscillates around the gradient window centre is proportional to the depth of each gradient centroid in the real world. The lower computational complexity of this methodology comes at the expense of depth map accuracy as the camera velocity increases, but it is at least five times faster than the other evaluated approaches. This work also proposes a novel technique for deriving visual saliency maps by using Division of Gaussians (DIVoG). In this context, saliency maps express the difference of each image pixel is to its surrounding pixels across multiple pyramid levels. This approach is shown to be both fast and accurate when evaluated against other state-of-the-art approaches. Subsequently, the saliency information is combined with depth information to identify salient regions close to the host vehicle. The fused map allows faster detection of high-risk areas where obstacles are likely to exist. As a result, existing object detection algorithms, such as the Histogram of Oriented Gradients (HOG) can execute at least five times faster. In conclusion, through a step-wise approach computationally-expensive algorithms have been optimised or replaced by novel methodologies to produce a fast object detection system that is aligned to the requirements of the automotive domain
    • …
    corecore