40 research outputs found

    Pedestrian and Vehicle Detection in Autonomous Vehicle Perception Systems—A Review

    Get PDF
    Autonomous Vehicles (AVs) have the potential to solve many traffic problems, such as accidents, congestion and pollution. However, there are still challenges to overcome, for instance, AVs need to accurately perceive their environment to safely navigate in busy urban scenarios. The aim of this paper is to review recent articles on computer vision techniques that can be used to build an AV perception system. AV perception systems need to accurately detect non-static objects and predict their behaviour, as well as to detect static objects and recognise the information they are providing. This paper, in particular, focuses on the computer vision techniques used to detect pedestrians and vehicles. There have been many papers and reviews on pedestrians and vehicles detection so far. However, most of the past papers only reviewed pedestrian or vehicle detection separately. This review aims to present an overview of the AV systems in general, and then review and investigate several detection computer vision techniques for pedestrians and vehicles. The review concludes that both traditional and Deep Learning (DL) techniques have been used for pedestrian and vehicle detection; however, DL techniques have shown the best results. Although good detection results have been achieved for pedestrians and vehicles, the current algorithms still struggle to detect small, occluded, and truncated objects. In addition, there is limited research on how to improve detection performance in difficult light and weather conditions. Most of the algorithms have been tested on well-recognised datasets such as Caltech and KITTI; however, these datasets have their own limitations. Therefore, this paper recommends that future works should be implemented on more new challenging datasets, such as PIE and BDD100K.EPSRC DTP PhD studentshi

    Ten Years of Pedestrian Detection, What Have We Learned?

    Full text link
    Paper-by-paper results make it easy to miss the forest for the trees.We analyse the remarkable progress of the last decade by discussing the main ideas explored in the 40+ detectors currently present in the Caltech pedestrian detection benchmark. We observe that there exist three families of approaches, all currently reaching similar detection quality. Based on our analysis, we study the complementarity of the most promising ideas by combining multiple published strategies. This new decision forest detector achieves the current best known performance on the challenging Caltech-USA dataset.Comment: To appear in ECCV 2014 CVRSUAD workshop proceeding

    Efficient Pedestrian Detection in Urban Traffic Scenes

    Get PDF
    Pedestrians are important participants in urban traffic environments, and thus act as an interesting category of objects for autonomous cars. Automatic pedestrian detection is an essential task for protecting pedestrians from collision. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance. In this thesis, we investigate and develop novel approaches by interpreting spatial and temporal characteristics of pedestrians, in three different aspects: shape, cognition and motion. The special up-right human body shape, especially the geometry of the head and shoulder area, is the most discriminative characteristic for pedestrians from other object categories. Inspired by the success of Haar-like features for detecting human faces, which also exhibit a uniform shape structure, we propose to design particular Haar-like features for pedestrians. Tailored to a pre-defined statistical pedestrian shape model, Haar-like templates with multiple modalities are designed to describe local difference of the shape structure. Cognition theories aim to explain how human visual systems process input visual signals in an accurate and fast way. By emulating the center-surround mechanism in human visual systems, we design multi-channel, multi-direction and multi-scale contrast features, and boost them to respond to the appearance of pedestrians. In this way, our detector is considered as a top-down saliency system. In the last part of this thesis, we exploit the temporal characteristics for moving pedestrians and then employ motion information for feature design, as well as for regions of interest (ROIs) selection. Motion segmentation on optical flow fields enables us to select those blobs most probably containing moving pedestrians; a combination of Histogram of Oriented Gradients (HOG) and motion self difference features further enables robust detection. We test our three approaches on image and video data captured in urban traffic scenes, which are rather challenging due to dynamic and complex backgrounds. The achieved results demonstrate that our approaches reach and surpass state-of-the-art performance, and can also be employed for other applications, such as indoor robotics or public surveillance

    Learning Complexity-Aware Cascades for Deep Pedestrian Detection

    Full text link
    The design of complexity-aware cascaded detectors, combining features of very different complexities, is considered. A new cascade design procedure is introduced, by formulating cascade learning as the Lagrangian optimization of a risk that accounts for both accuracy and complexity. A boosting algorithm, denoted as complexity aware cascade training (CompACT), is then derived to solve this optimization. CompACT cascades are shown to seek an optimal trade-off between accuracy and complexity by pushing features of higher complexity to the later cascade stages, where only a few difficult candidate patches remain to be classified. This enables the use of features of vastly different complexities in a single detector. In result, the feature pool can be expanded to features previously impractical for cascade design, such as the responses of a deep convolutional neural network (CNN). This is demonstrated through the design of a pedestrian detector with a pool of features whose complexities span orders of magnitude. The resulting cascade generalizes the combination of a CNN with an object proposal mechanism: rather than a pre-processing stage, CompACT cascades seamlessly integrate CNNs in their stages. This enables state of the art performance on the Caltech and KITTI datasets, at fairly fast speeds

    Taking a Deeper Look at Pedestrians

    Full text link
    In this paper we study the use of convolutional neural networks (convnets) for the task of pedestrian detection. Despite their recent diverse successes, convnets historically underperform compared to other pedestrian detectors. We deliberately omit explicitly modelling the problem into the network (e.g. parts or occlusion modelling) and show that we can reach competitive performance without bells and whistles. In a wide range of experiments we analyse small and big convnets, their architectural choices, parameters, and the influence of different training data, including pre-training on surrogate tasks. We present the best convnet detectors on the Caltech and KITTI dataset. On Caltech our convnets reach top performance both for the Caltech1x and Caltech10x training setup. Using additional data at training time our strongest convnet model is competitive even to detectors that use additional data (optical flow) at test time

    Deteção visual de peÔes usando canais integrais para ajuda à condução

    Get PDF
    Mestrado em Engenharia MecĂąnicaThis work exploits a vision based pedestrian recognition technique to build a framework capable of performing detection in images of urban setting. This method takes advantage of the richness of information present in multiple channels of an image to assemble different detection techniques in a simple and generic infrastructure. The general idea behind this method is that pedestrians present specifc visual properties that can be used to differentiate them apart from a scene. So, to exploit such properties, features are extracted from the channels in an optimized manner through the use of integral images and a machine learning engine based on AdaBoost is trained through positive and negative examples the relationship of those features with the presence of pedestrians on visual data. This framework then integrates a multi-scale, sliding window approach to perform visual pedestrian detection. To evaluate the algorithm, tests were carried out in two distinct datasets and results confirm its validity.Este trabalho explora uma tĂ©cnica de deteção visual de peĂ”es que visa estabelecer uma estrutura capaz de realizar deteção em imagens de ambiente urbano. O mĂ©todo tira partido da riqueza de informação presente em mĂșltiplos canais da imagem para integrar diferentes tĂ©cnicas de deteção numa estrutura simples e genĂ©rica. Esta tĂ©cnica tem como base a idĂ©ia de que um peĂŁo apresenta propriedades visuais especĂ­ficas que permitem a sua distinção do meio envolvente. EntĂŁo, com vista a explorar essas propriedades, sĂŁo recolhidos descritores dos mĂșltiplos canais de uma forma otimizada com o uso da imagem integral, e, atravĂ©s de exemplos positivos e negativos, um algorĂ­tmo de aprendizagem baseado no AdaBoost Ă© treinado para relacionar esses descritores com a presença de peĂ”es. Esta tĂ©cnica integra uma estrutura de varrimento de imagem a mĂșltiplas escalas para efectuar o reconhecimento visual de peĂ”es. O mĂ©todo Ă© testado em dois conjuntos de dados diferentes e os resultados confirmam a sua validade

    Entwicklung von Objekterkennungssystemen fĂŒr SoC-FPGA Plattformen

    Get PDF
    Object detection is a crucial and challenging task in the field of embedded vision and robotics. Over the last 15 years, various object detection algorithms/systems (e.g., face detection, traffic sign detection) are proposed by different researchers and companies. Most of these research are either focused on improving the detection performance or devoted to boosting the detection speed, which reveals two typically employed criteria while evaluating an object detection algorithm/system: detection accuracy and execution speed. Considering these two factors and the application of object detection to the domains such as service robots and Advanced Driving Assistance System (ADAS), FPGA is a promising platform to achieve accurate object detection in real-time. Therefore, FPGA-based robust object detection systems are designed in this work. The main work of this thesis can be divided into two parts: promising algorithm obtaining and hardware design on SoC-FPGA. Firstly, representative object detection algorithms are selected, implemented, and evaluated. Thereafter, a generalized object detection framework is created. With this framework, pedestrian detection, traffic sign detection, and head detection algorithms are realized and tested. The experiments verify that promising detection results can be obtained by employing the generalized object detection framework. For the work of hardware design on FPGA, the platform of the object detection system, which consists of stereo OV7670 cameras, Xilinx Zedboard, and a monitor that can visualize the detection results, is created. After that, IP cores that correspond to each block of the framework are designed. Configurable parameters are provided by each IP core so that the IPs, especially feature calculation IP and feature scaler IP, can be correctly instanced according to the fast feature pyramid theory. Finally, by employing the designed IP cores, pedestrian detection system, traffic sign detection system, and head detection system are designed and evaluated. The on-board testing results show that real-time object (e.g., pedestrian, traffic sign, head) detection with promising accuracy can all be achieved. In addition, with the generalized object detection framework and the designed IP-toolbox, the object detection system that targets any instance of objects can be designed and implemented rapidly.Objekterkennung ist eine essenzielle und herausfordernde Aufgabe in den Forschungsgebieten der Embedded Vision und der Robotik. In den letzten 15 Jahren wurden verschiedene Objekterkennungsalgorithmen und Systeme (z.B. Gesichtserkennung) aus der Forschung sowie der Industrie prĂ€sentiert. Der Forschungsschwerpunkt liegt dabei typischerweise entweder bei der Verbesserung der ErkennungsqualitĂ€t oder bei der Beschleunigung der Erkennungsgeschwindigkeit. Hieraus leiten sich direkt die Kriterien fĂŒr den Einsatz von Objekterkennungsalgorithmen und Systemen ab: Erkennungsgenauigkeit und die Erkennungslatenz. Unter BerĂŒcksichtigung dieser Kriterien und dem konkreten Einsatzgebiet der Objekterkennung fĂŒr Serviceroboter und ADAS haben sich FPGA Plattformen als vielversprechende Kandidaten fĂŒr die Implementierung von hoch genauen Objekterkennungsalgorithmen mit strikten Echtzeitanforderungen herausgestellt. Auf Bases dessen wurden im Rahmen dieser Arbeit eine robuste Objekterkennung entwickelt. Der Fokus dieser Arbeit ist geteilt in zwei Aspekte: Identifikation und Analyse geeigneter Objekterkennungsalgorithmen und deren Implementierung in einer FPGA basierten SoC-Plattform. Zu Beginn wurden eine Reihe von reprĂ€sentativen Algorithmen ausgewĂ€hlt, in einer Testumgebung implementiert und evaluiert. Darauf aufbauend wurde ein Entwicklungs-Frameworks fĂŒr die Erkennung von Passanten, Verkehrszeichen sowie Kopfdetektion entwickelt und analysiert. FĂŒr die Entwicklung einer FPGA basierten Plattform wurde ein Objekterkennungssystem erstellt. Eine Sammlung aus IP-Cores, die das Entwicklungs-Framework bilden, wurden fĂŒr die FPGA Plattform implementiert. Die IP-Cores bieten eine Reihe von Konfigurationsparametern fĂŒr den flexiblen Einsatz von neuen Komponenten, im Besonderen IP-Cores fĂŒr die Berechnung und Skalierung von Erkennungseigenschaften, basierend auf der Fast Feature Pyramid Theorie. Abschliessend wurden die entwickelten IP-Cores zur Erkennung von Passanten, Verkehrszeichen sowie die Kopfdetektion integriert und gemeinsam evaluiert. Die gesammelten Ergebnisse aus verschiedenen Testscenarios der entwickelten FPGA-Plattform zeigen, dass die Objekterkennung in Echtzeit mit vielversprechender Genauigkeit erreichbar ist. DarĂŒber hinaus können mit dem generalisierten Objekterkennungsrahmen und der entwickelten IP-Toolbox schnell und flexibel belibige Objekterkennungssysteme entwickelt und implementiert werden

    Dynamic Scene Understanding with Applications to Traffic Monitoring

    Get PDF
    Many breakthroughs have been witnessed in the computer vision community in recent years, largely due to deep Convolutional Neural Networks (CNN) and largescale datasets. This thesis aims to investigate dynamic scene understanding from images. The problem of dynamic scene understanding involves simultaneously solving several sub-tasks including object detection, object recognition, and segmentation. Successfully completing these tasks will enable us to interpret the objects of interest within a scene. Vision-based traffic monitoring is one of many fast-emerging areas in the intelligent transportation system (ITS). In the thesis, we focus on the following problems in traffic scene understanding. They are 1) How to detect and recognize all the objects of interest in street view images? 2) How to employ CNN features and semantic pixel labelling to boost the performance of pedestrian detection? 3) How to enhance the discriminative power of CNN representations for improving the performance of fine-grained car recognition? 4) How to learn an adaptive color space to represent vehicle images for vehicle color recognition? For the first task, we propose a single learning based detection framework to detect three important classes of objects (traffic signs, cars, and cyclists). The proposed framework consists of a dense feature extractor and detectors of these three classes. The advantage of using one common framework is that the detection speed is much faster, since all dense features need only to be evaluated once and then are shared with all detectors. The proposed framework introduces spatially pooled features as a part of aggregated channel features to enhance the robustness to noises and image deformations. We also propose an object subcategorization scheme as a means of capturing the intra-class variation of objects. To address the second problem, we show that by re-using the convolutional feature maps (CFMs) of a deep CNN model as visual features to train an ensemble of boosted decision forests, we are able to remarkably improve the performance of pedestrian detection without using specially designed learning algorithms. We also show that semantic pixel labelling can be simply combined with a pedestrian detector to further boost the detection performance. Fine-grained details of objects usually contain very discriminative information which are crucial for fine-grained object recognition. Conventional pooling strategies (e.g. max-pooling, average-pooling) may discard these fine-grained details and hurt the iii iv recognition performance. To remedy this problem, we propose a spatially weighted pooling (swp) strategy which considerably improves the discriminative power of CNN representations. The swp pools the CNN features with the guidance of its learnt masks, which measures the importance of the spatial units in terms of discriminative power. In image color recognition, visual features are extracted from image pixels represented in one color space. The choice of the color space may influence the quality of extracted features and impact the recognition performance. We propose a color transformation method that converts image pixels from the RGB space to a learnt space for improving the recognition performance. Moreover, we propose a ColorNet which optimizes the architecture of AlexNet and embeds a mini-CNN of color transformation for vehicle color recognition.Thesis (Ph.D.) -- University of Adelaide, School of Computer Science, 201
    corecore