768 research outputs found

    Object Detection in 20 Years: A Survey

    Full text link
    Object detection, as of one the most fundamental and challenging problems in computer vision, has received great attention in recent years. Its development in the past two decades can be regarded as an epitome of computer vision history. If we think of today's object detection as a technical aesthetics under the power of deep learning, then turning back the clock 20 years we would witness the wisdom of cold weapon era. This paper extensively reviews 400+ papers of object detection in the light of its technical evolution, spanning over a quarter-century's time (from the 1990s to 2019). A number of topics have been covered in this paper, including the milestone detectors in history, detection datasets, metrics, fundamental building blocks of the detection system, speed up techniques, and the recent state of the art detection methods. This paper also reviews some important detection applications, such as pedestrian detection, face detection, text detection, etc, and makes an in-deep analysis of their challenges as well as technical improvements in recent years.Comment: This work has been submitted to the IEEE TPAMI for possible publicatio

    Compound Models for Vision-Based Pedestrian Recognition

    Get PDF
    This thesis addresses the problem of recognizing pedestrians in video images acquired from a moving camera in real-world cluttered environments. Instead of focusing on the development of novel feature primitives or pattern classifiers, we follow an orthogonal direction and develop feature- and classifier-independent compound techniques which integrate complementary information from multiple image-based sources with the objective of improved pedestrian classification performance. After establishing a performance baseline in terms of a thorough experimental study on monocular pedestrian recognition, we investigate the use of multiple cues on module-level. A motion-based focus of attention stage is proposed based on a learned probabilistic pedestrian-specific model of motion features. The model is used to generate pedestrian localization hypotheses for subsequent shape- and texture-based classification modules. In the remainder of this work, we focus on the integration of complementary information directly into the pattern classification step. We present a combination of shape and texture information by means of pose-specific generative shape and texture models. The generative models are integrated with discriminative classification models by utilizing synthesized virtual pedestrian training samples from the former to enhance the classification performance of the latter. Both models are linked using Active Learning to guide the training process towards informative samples. A multi-level mixture-of-experts classification framework is proposed which involves local pose-specific expert classifiers operating on multiple image modalities and features. In terms of image modalities, we consider gray-level intensity, depth cues derived from dense stereo vision and motion cues arising from dense optical flow. We furthermore employ shape-based, gradient-based and texture-based features. The mixture-of-experts formulation compares favorably to joint space approaches, in view of performance and practical feasibility. Finally, we extend this mixture-of-experts framework in terms of multi-cue partial occlusion handling and the estimation of pedestrian body orientation. Our occlusion model involves examining occlusion boundaries which manifest in discontinuities in depth and motion space. Occlusion-dependent weights which relate to the visibility of certain body parts focus the decision on unoccluded body components. We further apply the pose-specific nature of our mixture-of-experts framework towards estimating the density of pedestrian body orientation from single images, again integrating shape and texture information. Throughout this work, particular emphasis is laid on thorough performance evaluation both regarding methodology and competitive real-world datasets. Several datasets used in this thesis are made publicly available for benchmarking purposes. Our results indicate significant performance boosts over state-of-the-art for all aspects considered in this thesis, i.e. pedestrian recognition, partial occlusion handling and body orientation estimation. The pedestrian recognition performance in particular is considerably advanced; false detections at constant detection rates are reduced by significantly more than an order of magnitude

    Multi-Cue Pedestrian Recognition

    Full text link
    This thesis addresses the problem of detecting complex, deformable objects in an arbitrary, cluttered environment in sequences of video images. Often, no single best technique exists for such a challenging problem, as different approaches possess different characteristics with regard to detection accuracy, processing speed, or the kind of errors made. Therefore, multi-cue approaches are pursued in this thesis. By combining multiple detection methods, each utilizing a different aspect of the video images, we seek to gain detection accuracy, robustness, and computational efficiency. The first part of this thesis deals with texture classification. In a comparative study, various combinations of feature extraction and classification methods, some of which novel, are examined with respect to classification performance and processing speed, and the relation to the training sample size is analyzed. The integration of shape matching and texture classification is investigated. A pose-specific mixture-of-experts architecture is proposed, where shape matching yields a probabilistic assignment of a texture pattern to a set of distinct pose clusters, each handled by a specialized texture classifier, the local expert. The reduced appearance variability that each local expert needs to cope with leads to improved classification performance. A slight further performance gain could be achieved by shape normalization. The second multi-cue approach deals with cascade systems that employ a sequence of fast-to-complex system modules in order to gain computational efficiency. Three optimization techniques are examined that adjust system parameters so as to optimize the three performance measures detection rate, false positive rate, and processing cost. A combined application of two techniques, a novel fast sequential optimization scheme based on ROC (receiver operating characteristics) frontier following, followed by an iterative gradient descent optimization method, is found to work best. The third method investigated is a Bayesian combination of multiple visual cues. An integrated object detection and tracking framework based on particle filtering is presented. A novel object representation combines mixture models of shape and texture, the former based on a generative point distribution model, the latter on discriminative texture classifiers. The associated observation density function integrates the three visual cues shape, texture, and depth. All methods are extensively evaluated on the problem of detecting pedestrians in urban environment from within a moving vehicle. Large data sets consisting of tens of thousands of video images have been recorded in order to obtain statistically meaningful results

    Calibration-free Pedestrian Partial Pose Estimation Using a High-mounted Kinect

    Get PDF
    Les applications de l’analyse du comportement humain ont subit de rapides développements durant les dernières décades, tant au niveau des systèmes de divertissements que pour des applications professionnelles comme les interfaces humain-machine, les systèmes d’assistance de conduite automobile ou des systèmes de protection des piétons. Cette thèse traite du problème de reconnaissance de piétons ainsi qu’à l’estimation de leur orientation en 3D. Cette estimation est faite dans l’optique que la connaissance de cette orientation est bénéfique tant au niveau de l’analyse que de la prédiction du comportement des piétons. De ce fait, cette thèse propose à la fois une nouvelle méthode pour détecter les piétons et une manière d’estimer leur orientation, par l’intégration séquentielle d’un module de détection et un module d’estimation d’orientation. Pour effectuer cette détection de piéton, nous avons conçu un classificateur en cascade qui génère automatiquement une boîte autour des piétons détectés dans l’image. Suivant cela, des régions sont extraites d’un nuage de points 3D afin de classifier l’orientation du torse du piéton. Cette classification se base sur une image synthétique grossière par tramage (rasterization) qui simule une caméra virtuelle placée immédiatement au-dessus du piéton détecté. Une machine à vecteurs de support effectue la classification à partir de cette image de synthèse, pour l’une des 10 orientations discrètes utilisées lors de l’entrainement (incréments de 30 degrés). Afin de valider les performances de notre approche d’estimation d’orientation, nous avons construit une base de données de référence contenant 764 nuages de points. Ces données furent capturées à l’aide d’une caméra Kinect de Microsoft pour 30 volontaires différents, et la vérité-terrain sur l’orientation fut établie par l’entremise d’un système de capture de mouvement Vicon. Finalement, nous avons démontré les améliorations apportées par notre approche. En particulier, nous pouvons détecter des piétons avec une précision de 95.29% et estimer l’orientation du corps (dans un intervalle de 30 degrés) avec une précision de 88.88%. Nous espérons ainsi que nos résultats de recherche puissent servir de point de départ à d’autres recherches futures.The application of human behavior analysis has undergone rapid development during the last decades from entertainment system to professional one, as Human Robot Interaction (HRI), Advanced Driver Assistance System (ADAS), Pedestrian Protection System (PPS), etc. Meanwhile, this thesis addresses the problem of recognizing pedestrians and estimating their body orientation in 3D based on the fact that estimating a person’s orientation is beneficial in determining their behavior. In this thesis, a new method is proposed for detecting and estimating the orientation, in which the result of a pedestrian detection module and a orientation estimation module are integrated sequentially. For the goal of pedestrian detection, a cascade classifier is designed to draw a bounding box around the detected pedestrian. Following this, extracted regions are given to a discrete orientation classifier to estimate pedestrian body’s orientation. This classification is based on a coarse, rasterized depth image simulating a top-view virtual camera, and uses a support vector machine classifier that was trained to distinguish 10 orientations (30 degrees increments). In order to test the performance of our approach, a new benchmark database contains 764 sets of point cloud for body-orientation classification was captured. For this benchmark, a Kinect recorded the point cloud of 30 participants and a marker-based motion capture system (Vicon) provided the ground truth on their orientation. Finally we demonstrated the improvements brought by our system, as it detected pedestrian with an accuracy of 95:29% and estimated the body orientation with an accuracy of 88:88%.We hope it can provide a new foundation for future researches

    FuSSI-Net: Fusion of Spatio-temporal Skeletons for Intention Prediction Network

    Full text link
    Pedestrian intention recognition is very important to develop robust and safe autonomous driving (AD) and advanced driver assistance systems (ADAS) functionalities for urban driving. In this work, we develop an end-to-end pedestrian intention framework that performs well on day- and night- time scenarios. Our framework relies on objection detection bounding boxes combined with skeletal features of human pose. We study early, late, and combined (early and late) fusion mechanisms to exploit the skeletal features and reduce false positives as well to improve the intention prediction performance. The early fusion mechanism results in AP of 0.89 and precision/recall of 0.79/0.89 for pedestrian intention classification. Furthermore, we propose three new metrics to properly evaluate the pedestrian intention systems. Under these new evaluation metrics for the intention prediction, the proposed end-to-end network offers accurate pedestrian intention up to half a second ahead of the actual risky maneuver.Comment: 5 pages, 6 figures, 5 tables, IEEE Asilomar SS

    Automatic Designs in Deep Neural Networks

    Full text link
    To train a Deep Neural Network (DNN) that performs well for a task, many design steps are taken including data designs, model designs and loss designs. Despite that remarkable progress has been made in all these domains of designing DNNs, the unexplored design space of each component is still vast. That brings the research field of developing automated techniques to lift some heavy work from human researchers when exploring the design space. The automated designs can help human researchers to make massive or challenging design choices and reduce the expertise required from human researchers. Much effort has been made towards automated designs of DNNs, including synthetic data generation, automated data augmentation, neural architecture search and so on. Despite the huge effort, the automation of DNN designs is still far from complete. This thesis contributes in two ways: identifying new problems in the DNN design pipeline that can be solved automatically, and proposing new solutions to problems that have been explored by automated designs. The first part of this thesis presents two problems that were usually solved with manual designs but can benefit from automated designs. To tackle the problem of inefficient computation due to using a static DNN architecture for different inputs, some manual efforts have been made to use different networks for different inputs as needed, such as cascade models. We propose an automated dynamic inference framework that can cut this manual effort and automatically choose different architectures for different inputs during inference. To tackle the problem of designing differentiable loss functions for non-differentiable performance metrics, researchers usually design the loss manually for each individual task. We propose an unified loss framework that reduces the amount of manual design of losses in different tasks. The second part of this thesis discusses developing new techniques in domains where the automated design has been shown effective. In the synthetic data generation domain, we propose a novel method to automatically generate synthetic data for small-data object detection. The synthetic data generated can amend the limited annotated real data of the small-data object detection tasks, such as rare disease detection. In the architecture search domain, we propose an architecture search method customized for generative adversarial networks (GANs). GANs are commonly known unstable to train where we propose this new method that can stabilize the training of GANs in the architecture search process.PHDComputer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/163208/1/llanlan_1.pd
    • …
    corecore