113 research outputs found

    Dynamic Foveal 3D Sensing Using Affine Models

    Get PDF
    This study is aimed at developing a method of analysis of the 3D structure of a scene considering a monocular image sequence, with an uncalibrated camera -as for an active visual system- and using a continuous model of motion. Surprisingly perhaps, this problem has not been studied much in literature except \cite{vieville-faugeras:95}, but only preliminarly, and without any reference to active vision. This difficulty might have its source in the intrinsic complexity of the underlying equations, which yields a heavy implementation and are thus a-priori not robust. Moreover important developments of analytic equations are not possible as it is the case for calibrated systems \cite{vieville-clergue-etal:95,chaumette-boukir:91,boukir:93}, because of the algebraic complexity of the equations. In order to overcome this difficulty, we have attempted to develop a simplified parameterization of the problem in the case of two or more views, considering a scene with a set of stationary objects and applying an orthographic model of the projection. In this case, fusion along the image sequence is trivial. Thanks to the integration of active visual perception, we demonstrate that it is always possible to generate a displacement so that the previous model is valid, and we can then very easily reconstruct the observed scene. In the case where the motion constraints are approximately verified, we can show that the model is still approximately valid close to the retina. At an experimental level, we report a small implementation taking an image sequence as input, which allows us to compute the retinal motion fields and calculate the reconstruction up to a particular affine transform of the scene

    Children, Humanoid Robots and Caregivers

    Get PDF
    This paper presents developmental learning on a humanoid robot from human-robot interactions. We consider in particular teaching humanoids as children during the child's Separation and Individuation developmental phase (Mahler, 1979). Cognitive development during this phase is characterized both by the child's dependence on her mother for learning while becoming awareness of her own individuality, and by self-exploration of her physical surroundings. We propose a learning framework for a humanoid robot inspired on such cognitive development

    Variational Saccading: Efficient Inference for Large Resolution Images

    Full text link
    Image classification with deep neural networks is typically restricted to images of small dimensionality such as 224 x 244 in Resnet models [24]. This limitation excludes the 4000 x 3000 dimensional images that are taken by modern smartphone cameras and smart devices. In this work, we aim to mitigate the prohibitive inferential and memory costs of operating in such large dimensional spaces. To sample from the high-resolution original input distribution, we propose using a smaller proxy distribution to learn the co-ordinates that correspond to regions of interest in the high-dimensional space. We introduce a new principled variational lower bound that captures the relationship of the proxy distribution's posterior and the original image's co-ordinate space in a way that maximizes the conditional classification likelihood. We empirically demonstrate on one synthetic benchmark and one real world large resolution DSLR camera image dataset that our method produces comparable results with ~10x faster inference and lower memory consumption than a model that utilizes the entire original input distribution. Finally, we experiment with a more complex setting using mini-maps from Starcraft II [56] to infer the number of characters in a complex 3d-rendered scene. Even in such complicated scenes our model provides strong localization: a feature missing from traditional classification models.Comment: Published BMVC 2019 & NIPS 2018 Bayesian Deep Learning Worksho

    Development of a foveated vision system for the tracking of mobile targets in dynamic environments

    Get PDF
    Mestrado em Engenharia MecânicaEste trabalho descreve um sistema baseado em percepção activa e em visão foveada, projectado para identificar e seguir objectos móveis em ambientes dinâmicos. O sistema inclui uma unidade pan & tilt para facilitar o seguimento e manter o objecto no centro do campo visual das câmaras, cujas lentes grandeangular e tele-objectiva proporcionam uma visão periférica e foveada do mundo, respectivamente. O método Haar features é utilizado para efectuar o reconhecimento dos objectos. O algoritmo de seguimento baseado em template matching continua a perseguir o objecto mesmo quando este não mais está a ser reconhecido pelo módulo de identificação. Algumas técnicas utilizadas para melhorar o template matching são também apresentadas, nomeadamente o Filtro Gaussiano e a Computação Rápida de Filtro Gaussiano. São indicados resultados relativos ao seguimento, identificação e desempenho global do sistema. O sistema comporta-se muito bem, mantendo o processamento de, pelo menos, 15 fotogramas por segundo em imagens de 320x240, num computador portátil normal. São também abordados alguns aspectos para melhorar o desempenho do sistema. ABSTRACT: This work describes a system based on active perception and foveated vision, intended to identify and track moving targets in dynamic environments. The full system includes a pan and tilt unit to ease tracking and keep the interesting target in the two cameras’ view, whose wide / narrow field lenses provide both a peripheral and a foveal view of the world respectively. View-based Haar-like features are employed for object recognition. A template matching based tracking technique continues to track the object even when its view is not recognized by the object recognition module. Some of the techniques used to improve the template matching performance are also presented, namely Gaussian Filtering and Fast Gaussian computation. Results are presented for tracking, identification and global system’s operation. The system performs well up to 15 frames per second on a 320 x 240 image on an ordinary laptop computer. Several issues to improve the system’s performance are also addressed

    A hierarchical active binocular robot vision architecture for scene exploration and object appearance learning

    Get PDF
    This thesis presents an investigation of a computational model of hierarchical visual behaviours within an active binocular robot vision architecture. The robot vision system is able to localise multiple instances of the same object class, while simultaneously maintaining vergence and directing its gaze to attend and recognise objects within cluttered, complex scenes. This is achieved by implementing all image analysis in an egocentric symbolic space without creating explicit pixel-space maps and without the need for calibration or other knowledge of the camera geometry. One of the important aspects of the active binocular vision paradigm requires that visual features in both camera eyes must be bound together in order to drive visual search to saccade, locate and recognise putative objects or salient locations in the robot's field of view. The system structure is based on the “attentional spotlight” metaphor of biological systems and a collection of abstract and reactive visual behaviours arranged in a hierarchical structure. Several studies have shown that the human brain represents and learns objects for recognition by snapshots of 2-dimensional views of the imaged scene that happens to contain the object of interest during active interaction (exploration) of the environment. Likewise, psychophysical findings specify that the primate’s visual cortex represents common everyday objects by a hierarchical structure of their parts or sub-features and, consequently, recognise by simple but imperfect 2D view object part approximations. This thesis incorporates the above observations into an active visual learning behaviour in the hierarchical active binocular robot vision architecture. By actively exploring the object viewing sphere (as higher mammals do), the robot vision system automatically synthesises and creates its own part-based object representation from multiple observations while a human teacher indicates the object and supplies a classification name. Its is proposed to adopt the computational concepts of a visual learning exploration mechanism that controls the accumulation of visual evidence and directs attention towards the spatial salient object parts. The behavioural structure of the binocular robot vision architecture is loosely modelled by a WHAT and WHERE visual streams. The WHERE stream maintains and binds spatial attention on the object part coordinates that egocentrically characterises the location of the object of interest and extracts spatio-temporal properties of feature coordinates and descriptors. The WHAT stream either determines the identity of an object or triggers a learning behaviour that stores view-invariant feature descriptions of the object part. Therefore, the robot vision is capable to perform a collection of different specific visual tasks such as vergence, detection, discrimination, recognition localisation and multiple same-instance identification. This classification of tasks enables the robot vision system to execute and fulfil specified high-level tasks, e.g. autonomous scene exploration and active object appearance learning

    Memory-Based Active Visual Search for Humanoid Robots

    Get PDF

    Cognitive-developmental learning for a humanoid robot : a caregiver's gift

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 319-341).(cont.) which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes,by Artur Miguel Do Amaral Arsenio.Ph.D
    corecore