336 research outputs found

    Improved foreground detection via block-based classifier cascade with probabilistic decision integration

    Get PDF
    Background subtraction is a fundamental low-level processing task in numerous computer vision applications. The vast majority of algorithms process images on a pixel-by-pixel basis, where an independent decision is made for each pixel. A general limitation of such processing is that rich contextual information is not taken into account. We propose a block-based method capable of dealing with noise, illumination variations, and dynamic backgrounds, while still obtaining smooth contours of foreground objects. Specifically, image sequences are analyzed on an overlapping block-by-block basis. A low-dimensional texture descriptor obtained from each block is passed through an adaptive classifier cascade, where each stage handles a distinct problem. A probabilistic foreground mask generation approach then exploits block overlaps to integrate interim block-level decisions into final pixel-level foreground segmentation. Unlike many pixel-based methods, ad-hoc postprocessing of foreground masks is not required. Experiments on the difficult Wallflower and I2R datasets show that the proposed approach obtains on average better results (both qualitatively and quantitatively) than several prominent methods. We furthermore propose the use of tracking performance as an unbiased approach for assessing the practical usefulness of foreground segmentation methods, and show that the proposed approach leads to considerable improvements in tracking accuracy on the CAVIAR dataset

    Compound Models for Vision-Based Pedestrian Recognition

    Get PDF
    This thesis addresses the problem of recognizing pedestrians in video images acquired from a moving camera in real-world cluttered environments. Instead of focusing on the development of novel feature primitives or pattern classifiers, we follow an orthogonal direction and develop feature- and classifier-independent compound techniques which integrate complementary information from multiple image-based sources with the objective of improved pedestrian classification performance. After establishing a performance baseline in terms of a thorough experimental study on monocular pedestrian recognition, we investigate the use of multiple cues on module-level. A motion-based focus of attention stage is proposed based on a learned probabilistic pedestrian-specific model of motion features. The model is used to generate pedestrian localization hypotheses for subsequent shape- and texture-based classification modules. In the remainder of this work, we focus on the integration of complementary information directly into the pattern classification step. We present a combination of shape and texture information by means of pose-specific generative shape and texture models. The generative models are integrated with discriminative classification models by utilizing synthesized virtual pedestrian training samples from the former to enhance the classification performance of the latter. Both models are linked using Active Learning to guide the training process towards informative samples. A multi-level mixture-of-experts classification framework is proposed which involves local pose-specific expert classifiers operating on multiple image modalities and features. In terms of image modalities, we consider gray-level intensity, depth cues derived from dense stereo vision and motion cues arising from dense optical flow. We furthermore employ shape-based, gradient-based and texture-based features. The mixture-of-experts formulation compares favorably to joint space approaches, in view of performance and practical feasibility. Finally, we extend this mixture-of-experts framework in terms of multi-cue partial occlusion handling and the estimation of pedestrian body orientation. Our occlusion model involves examining occlusion boundaries which manifest in discontinuities in depth and motion space. Occlusion-dependent weights which relate to the visibility of certain body parts focus the decision on unoccluded body components. We further apply the pose-specific nature of our mixture-of-experts framework towards estimating the density of pedestrian body orientation from single images, again integrating shape and texture information. Throughout this work, particular emphasis is laid on thorough performance evaluation both regarding methodology and competitive real-world datasets. Several datasets used in this thesis are made publicly available for benchmarking purposes. Our results indicate significant performance boosts over state-of-the-art for all aspects considered in this thesis, i.e. pedestrian recognition, partial occlusion handling and body orientation estimation. The pedestrian recognition performance in particular is considerably advanced; false detections at constant detection rates are reduced by significantly more than an order of magnitude

    Multi-Cue Pedestrian Recognition

    Full text link
    This thesis addresses the problem of detecting complex, deformable objects in an arbitrary, cluttered environment in sequences of video images. Often, no single best technique exists for such a challenging problem, as different approaches possess different characteristics with regard to detection accuracy, processing speed, or the kind of errors made. Therefore, multi-cue approaches are pursued in this thesis. By combining multiple detection methods, each utilizing a different aspect of the video images, we seek to gain detection accuracy, robustness, and computational efficiency. The first part of this thesis deals with texture classification. In a comparative study, various combinations of feature extraction and classification methods, some of which novel, are examined with respect to classification performance and processing speed, and the relation to the training sample size is analyzed. The integration of shape matching and texture classification is investigated. A pose-specific mixture-of-experts architecture is proposed, where shape matching yields a probabilistic assignment of a texture pattern to a set of distinct pose clusters, each handled by a specialized texture classifier, the local expert. The reduced appearance variability that each local expert needs to cope with leads to improved classification performance. A slight further performance gain could be achieved by shape normalization. The second multi-cue approach deals with cascade systems that employ a sequence of fast-to-complex system modules in order to gain computational efficiency. Three optimization techniques are examined that adjust system parameters so as to optimize the three performance measures detection rate, false positive rate, and processing cost. A combined application of two techniques, a novel fast sequential optimization scheme based on ROC (receiver operating characteristics) frontier following, followed by an iterative gradient descent optimization method, is found to work best. The third method investigated is a Bayesian combination of multiple visual cues. An integrated object detection and tracking framework based on particle filtering is presented. A novel object representation combines mixture models of shape and texture, the former based on a generative point distribution model, the latter on discriminative texture classifiers. The associated observation density function integrates the three visual cues shape, texture, and depth. All methods are extensively evaluated on the problem of detecting pedestrians in urban environment from within a moving vehicle. Large data sets consisting of tens of thousands of video images have been recorded in order to obtain statistically meaningful results

    Coopération de réseaux de caméras ambiantes et de vision embarquée sur robot mobile pour la surveillance de lieux publics

    Get PDF
    Actuellement, il y a une demande croissante pour le déploiement de robots mobile dans des lieux publics. Pour alimenter cette demande, plusieurs chercheurs ont déployé des systèmes robotiques de prototypes dans des lieux publics comme les hôpitaux, les supermarchés, les musées, et les environnements de bureau. Une principale préoccupation qui ne doit pas être négligé, comme des robots sortent de leur milieu industriel isolé et commencent à interagir avec les humains dans un espace de travail partagé, est une interaction sécuritaire. Pour un robot mobile à avoir un comportement interactif sécuritaire et acceptable - il a besoin de connaître la présence, la localisation et les mouvements de population à mieux comprendre et anticiper leurs intentions et leurs actions. Cette thèse vise à apporter une contribution dans ce sens en mettant l'accent sur les modalités de perception pour détecter et suivre les personnes à proximité d'un robot mobile. Comme une première contribution, cette thèse présente un système automatisé de détection des personnes visuel optimisé qui prend explicitement la demande de calcul prévue sur le robot en considération. Différentes expériences comparatives sont menées pour mettre clairement en évidence les améliorations de ce détecteur apporte à la table, y compris ses effets sur la réactivité du robot lors de missions en ligne. Dans un deuxiè contribution, la thèse propose et valide un cadre de coopération pour fusionner des informations depuis des caméras ambiant affixé au mur et de capteurs montés sur le robot mobile afin de mieux suivre les personnes dans le voisinage. La même structure est également validée par des données de fusion à partir des différents capteurs sur le robot mobile au cours de l'absence de perception externe. Enfin, nous démontrons les améliorations apportées par les modalités perceptives développés en les déployant sur notre plate-forme robotique et illustrant la capacité du robot à percevoir les gens dans les lieux publics supposés et respecter leur espace personnel pendant la navigation.This thesis deals with detection and tracking of people in a surveilled public place. It proposes to include a mobile robot in classical surveillance systems that are based on environment fixed sensors. The mobile robot brings about two important benefits: (1) it acts as a mobile sensor with perception capabilities, and (2) it can be used as means of action for service provision. In this context, as a first contribution, it presents an optimized visual people detector based on Binary Integer Programming that explicitly takes the computational demand stipulated into consideration. A set of homogeneous and heterogeneous pool of features are investigated under this framework, thoroughly tested and compared with the state-of-the-art detectors. The experimental results clearly highlight the improvements the different detectors learned with this framework bring to the table including its effect on the robot's reactivity during on-line missions. As a second contribution, the thesis proposes and validates a cooperative framework to fuse information from wall mounted cameras and sensors on the mobile robot to better track people in the vicinity. Finally, we demonstrate the improvements brought by the developed perceptual modalities by deploying them on our robotic platform and illustrating the robot's ability to perceive people in supposed public areas and respect their personal space during navigation

    Contributions for the automatic description of multimodal scenes

    Get PDF
    Tese de doutoramento. Engenharia Electrotécnica e de Computadores. Faculdade de Engenharia. Universidade do Porto. 200
    • …
    corecore