8 research outputs found

    The robot's vista space : a computational 3D scene analysis

    Get PDF
    Swadzba A. The robot's vista space : a computational 3D scene analysis. Bielefeld (Germany): Bielefeld University; 2011.The space that can be explored quickly from a fixed view point without locomotion is known as the vista space. In indoor environments single rooms and room parts follow this definition. The vista space plays an important role in situations with agent-agent interaction as it is the directly surrounding environment in which the interaction takes place. A collaborative interaction of the partners in and with the environment requires that both partners know where they are, what spatial structures they are talking about, and what scene elements they are going to manipulate. This thesis focuses on the analysis of a robot's vista space. Mechanisms for extracting relevant spatial information are developed which enable the robot to recognize in which place it is, to detect the scene elements the human partner is talking about, and to segment scene structures the human is changing. These abilities are addressed by the proposed holistic, aligned, and articulated modeling approach. For a smooth human-robot interaction, the computed models should be aligned to the partner's representations. Therefore, the design of the computational models is based on the combination of psychological results from studies on human scene perception with basic physical properties of the perceived scene and the perception itself. The holistic modeling realizes a categorization of room percepts based on the observed 3D spatial layout. Room layouts have room type specific features and fMRI studies have shown that some of the human brain areas being active in scene recognition are sensitive to the 3D geometry of a room. With the aligned modeling, the robot is able to extract the hierarchical scene representation underlying a scene description given by a human tutor. Furthermore, it is able to ground the inferred scene elements in its own visual perception of the scene. This modeling follows the assumption that cognition and language schematize the world in the same way. This is visible in the fact that a scene depiction mainly consists of relations between an object and its supporting structure or between objects located on the same supporting structure. Last, the articulated modeling equips the robot with a methodology for articulated scene part extraction and fast background learning under short and disturbed observation conditions typical for human-robot interaction scenarios. Articulated scene parts are detected model-less by observing scene changes caused by their manipulation. Change detection and background learning are closely coupled because change is defined phenomenologically as variation of structure. This means that change detection involves a comparison of currently visible structures with a representation in memory. In range sensing this comparison can be nicely implement as subtraction of these two representations. The three modeling approaches enable the robot to enrich its visual perceptions of the surrounding environment, the vista space, with semantic information about meaningful spatial structures useful for further interaction with the environment and the human partner

    Probabilistic Models for 3D Urban Scene Understanding from Movable Platforms

    Get PDF
    This work is a contribution to understanding multi-object traffic scenes from video sequences. All data is provided by a camera system which is mounted on top of the autonomous driving platform AnnieWAY. The proposed probabilistic generative model reasons jointly about the 3D scene layout as well as the 3D location and orientation of objects in the scene. In particular, the scene topology, geometry as well as traffic activities are inferred from short video sequences

    Probabilistic parameter selection for learning scene structure from video

    No full text
    We present an online learning approach for robustly combining unreliable observations from a pedestrian detector to estimate the rough 3D scene geometry from video sequences of a static camera. Our approach is based on an entropy modelling framework, which allows to simultaneously adapt the detector parameters, such that the expected information gain about the scene structure is maximised. As a result, our approach automatically restricts the detector scale range for each image region as the estimation results become more confident, thus improving detector run-time and limiting false positives.M. D. Breitenstein, E. Sommerlade, B. Leibe, L. Van Gool, I. Reidhttp://www.comp.leeds.ac.uk/bmvc2008/proceedings/index.htm

    Probabilistic parameter selection for learning scene structure from video

    No full text
    We present an online learning approach for robustly combining unreliable observations from a pedestrian detector to estimate the rough 3D scene geometry from video sequences of a static camera. Our approach is based on an entropy modelling framework, which allows to simultaneously adapt the detector parameters, such that the expected information gain about the scene structure is maximised. As a result, our approach automatically restricts the detector scale range for each image region as the estimation results become more confident, thus improving detector run-time and limiting false positives.M. D. Breitenstein, E. Sommerlade, B. Leibe, L. Van Gool, I. Reidhttp://www.comp.leeds.ac.uk/bmvc2008/proceedings/index.htm

    Contextualisation d'un détecteur de piétons (application à la surveillance d'espaces publics)

    Get PDF
    La démocratisation de la vidéosurveillance intelligente nécessite le développement d outils automatiques et temps réel d analyse vidéo. Parmi ceux-ci, la détection de piétons joue un rôle majeur car de nombreux systèmes reposent sur cette technologie. Les approches classiques de détection de piétons utilisent la reconnaissance de formes et l apprentissage statistique. Elles souffrent donc d une dégradation des performances quand l apparence des piétons ou des éléments de la scène est trop différente de celle étudiée lors de l apprentissage. Pour y remédier, une solution appelée contextualisation du détecteur est étudiée lorsque la caméra est fixe. L idée est d enrichir le système à l aide d informations provenant de la scène afin de l adapter aux situations qu il risque de fréquemment rencontrer. Ce travail a été réalisé en deux temps. Tout d abord, l architecture d un détecteur et les différents outils utiles à sa construction sont présentés dans un état de l art. Puis la problématique de la contextualisation est abordée au travers de diverses expériences validant ou non les pistes d amélioration envisagées. L objectif est d identifier toutes les briques du système pouvant bénéficier de cet apport afin de contextualiser complètement le détecteur. Pour faciliter l exploitation d un tel système, la contextualisation a été entièrement automatisée et s appuie sur des algorithmes d apprentissage semi-supervisé. Une première phase consiste à collecter le maximum d informations sur la scène. Différents oracles sont proposés afin d extraire l apparence des piétons et des éléments du fond pour former une base d apprentissage dite contextualisée. La géométrie de la scène, influant sur la taille et l orientation des piétons, peut ensuite être analysée pour définir des régions, dans lesquelles les piétons, tout comme le fond, restent visuellement proches. Dans la deuxième phase, toutes ces connaissances sont intégrées dans le détecteur. Pour chaque région, un classifieur est construit à l aide de la base contextualisée et fonctionne indépendamment des autres. Ainsi chaque classifieur est entraîné avec des données ayant la même apparence que les piétons qu il devra détecter. Cela simplifie le problème de l apprentissage et augmente significativement les performances du système.With the rise of videosurveillance systems comes a logical need for automatic and real-time processes to analyze the huge amount of generated data. Among these tools, pedestrian detection algorithms are essential, because in videosurveillance locating people is often the first step leading to more complex behavioral analyses. Classical pedestrian detection approaches are based on machine learning and pattern recognition algorithms. Thus they generally underperform when the pedestrians appearance observed by a camera tends to differ too much from the one in the generic training dataset. This thesis studies the concept of the contextualization of such a detector. This consists in introducing scene information into a generic pedestrian detector. The main objective is to adapt it to the most frequent situations and so to improve its overall performances. The key hypothesis made here is that the camera is static, which is common in videosurveillance scenarios.This work is split into two parts. First a state of the art introduces the architecture of a pedestrian detector and the different algorithms involved in its building. Then the problem of the contextualization is tackled and a series of experiments validates or not the explored leads. The goal is to identify every part of the detector which can benefit from the approach in order to fully contextualize it. To make the contextualization process easier, our method is completely automatic and is based on semi-supervised learning methods. First of all, data coming from the scene are gathered. We propose different oracles to detect some pedestrians in order to catch their appearance and to form a contextualized training dataset. Then, we analyze the scene geometry, which influences the size and the orientation of the pedestrians and we divide the scene into different regions. In each region, pedestrians as well as background elements share a similar appearance.In the second step, all this information is used to build the final detector which is composed of several classifiers, one by region. Each classifier independently scans its dedicated piece of image. Thus, it is only trained with a region-specific contextualized dataset, containing less appearance variability than a global one. Consequently, the training stage is easier and the overall detection results on the scene are improved.CLERMONT FD-Bib.électronique (631139902) / SudocSudocFranceF
    corecore