113 research outputs found
Dynamic Foveal 3D Sensing Using Affine Models
This study is aimed at developing a method of analysis of the 3D structure of a scene considering a monocular image sequence, with an uncalibrated camera -as for an active visual system- and using a continuous model of motion. Surprisingly perhaps, this problem has not been studied much in literature except \cite{vieville-faugeras:95}, but only preliminarly, and without any reference to active vision. This difficulty might have its source in the intrinsic complexity of the underlying equations, which yields a heavy implementation and are thus a-priori not robust. Moreover important developments of analytic equations are not possible as it is the case for calibrated systems \cite{vieville-clergue-etal:95,chaumette-boukir:91,boukir:93}, because of the algebraic complexity of the equations. In order to overcome this difficulty, we have attempted to develop a simplified parameterization of the problem in the case of two or more views, considering a scene with a set of stationary objects and applying an orthographic model of the projection. In this case, fusion along the image sequence is trivial. Thanks to the integration of active visual perception, we demonstrate that it is always possible to generate a displacement so that the previous model is valid, and we can then very easily reconstruct the observed scene. In the case where the motion constraints are approximately verified, we can show that the model is still approximately valid close to the retina. At an experimental level, we report a small implementation taking an image sequence as input, which allows us to compute the retinal motion fields and calculate the reconstruction up to a particular affine transform of the scene
Children, Humanoid Robots and Caregivers
This paper presents developmental learning on a humanoid robot from human-robot interactions. We consider in particular teaching humanoids as children during the child's Separation and Individuation developmental phase (Mahler, 1979). Cognitive development during this phase is characterized both by the child's dependence on her mother for learning while becoming awareness of her own individuality, and by self-exploration of her physical surroundings. We propose a learning framework for a humanoid robot inspired on such cognitive development
Variational Saccading: Efficient Inference for Large Resolution Images
Image classification with deep neural networks is typically restricted to
images of small dimensionality such as 224 x 244 in Resnet models [24]. This
limitation excludes the 4000 x 3000 dimensional images that are taken by modern
smartphone cameras and smart devices. In this work, we aim to mitigate the
prohibitive inferential and memory costs of operating in such large dimensional
spaces. To sample from the high-resolution original input distribution, we
propose using a smaller proxy distribution to learn the co-ordinates that
correspond to regions of interest in the high-dimensional space. We introduce a
new principled variational lower bound that captures the relationship of the
proxy distribution's posterior and the original image's co-ordinate space in a
way that maximizes the conditional classification likelihood. We empirically
demonstrate on one synthetic benchmark and one real world large resolution DSLR
camera image dataset that our method produces comparable results with ~10x
faster inference and lower memory consumption than a model that utilizes the
entire original input distribution. Finally, we experiment with a more complex
setting using mini-maps from Starcraft II [56] to infer the number of
characters in a complex 3d-rendered scene. Even in such complicated scenes our
model provides strong localization: a feature missing from traditional
classification models.Comment: Published BMVC 2019 & NIPS 2018 Bayesian Deep Learning Worksho
Development of a foveated vision system for the tracking of mobile targets in dynamic environments
Mestrado em Engenharia MecânicaEste trabalho descreve um sistema baseado em percepção activa e em visão
foveada, projectado para identificar e seguir objectos móveis em ambientes
dinâmicos. O sistema inclui uma unidade pan & tilt para facilitar o seguimento e
manter o objecto no centro do campo visual das câmaras, cujas lentes grandeangular
e tele-objectiva proporcionam uma visão periférica e foveada do
mundo, respectivamente. O método Haar features é utilizado para efectuar o
reconhecimento dos objectos. O algoritmo de seguimento baseado em
template matching continua a perseguir o objecto mesmo quando este não
mais está a ser reconhecido pelo módulo de identificação. Algumas técnicas
utilizadas para melhorar o template matching são também apresentadas,
nomeadamente o Filtro Gaussiano e a Computação Rápida de Filtro
Gaussiano. São indicados resultados relativos ao seguimento, identificação e
desempenho global do sistema. O sistema comporta-se muito bem, mantendo
o processamento de, pelo menos, 15 fotogramas por segundo em imagens de
320x240, num computador portátil normal. São também abordados alguns
aspectos para melhorar o desempenho do sistema.
ABSTRACT: This work describes a system based on active perception and foveated vision,
intended to identify and track moving targets in dynamic environments. The full
system includes a pan and tilt unit to ease tracking and keep the interesting
target in the two cameras’ view, whose wide / narrow field lenses provide both
a peripheral and a foveal view of the world respectively. View-based Haar-like
features are employed for object recognition. A template matching based
tracking technique continues to track the object even when its view is not
recognized by the object recognition module. Some of the techniques used to
improve the template matching performance are also presented, namely
Gaussian Filtering and Fast Gaussian computation. Results are presented for
tracking, identification and global system’s operation. The system performs well
up to 15 frames per second on a 320 x 240 image on an ordinary laptop
computer. Several issues to improve the system’s performance are also
addressed
A hierarchical active binocular robot vision architecture for scene exploration and object appearance learning
This thesis presents an investigation of a computational model of hierarchical visual behaviours within an active binocular robot vision architecture. The robot vision system is able to localise multiple instances of the same object class, while simultaneously maintaining vergence and directing its gaze to attend and recognise objects within cluttered, complex scenes. This is achieved by implementing all image analysis in an egocentric symbolic space without creating explicit pixel-space maps and without the need for calibration or other knowledge of the camera geometry. One of the important aspects of the active binocular vision paradigm requires that visual features in both camera eyes must be bound together in order to drive visual search to saccade, locate and recognise putative objects or salient locations in the robot's field of view. The system structure is based on the “attentional spotlight” metaphor of biological systems and a collection of abstract and reactive visual behaviours arranged in a hierarchical structure.
Several studies have shown that the human brain represents and learns objects for recognition by snapshots of 2-dimensional views of the imaged scene that happens to contain the object of interest during active interaction (exploration) of the environment. Likewise, psychophysical findings specify that the primate’s visual cortex represents common everyday objects by a hierarchical structure of their parts or sub-features and, consequently, recognise by simple but imperfect 2D view object part approximations. This thesis incorporates the above observations into an active visual learning behaviour in the hierarchical active binocular robot vision architecture. By actively exploring the object viewing sphere (as higher mammals do), the robot vision system automatically synthesises and creates its own part-based object representation from multiple observations while a human teacher indicates the object and supplies a classification name. Its is proposed to adopt the computational concepts of a visual learning exploration mechanism that controls the accumulation of visual evidence and directs attention towards the spatial salient object parts.
The behavioural structure of the binocular robot vision architecture is loosely modelled by a WHAT and WHERE visual streams. The WHERE stream maintains and binds spatial attention on the object part coordinates that egocentrically characterises the location of the object of interest and extracts spatio-temporal properties of feature coordinates and descriptors. The WHAT stream either determines the identity of an object or triggers a learning behaviour that stores view-invariant feature descriptions of the object part. Therefore, the robot vision is capable to perform a collection of different specific visual tasks such as vergence, detection, discrimination, recognition localisation and multiple same-instance identification. This classification of tasks enables the robot vision system to execute and fulfil specified high-level tasks, e.g. autonomous scene exploration and active object appearance learning
Cognitive-developmental learning for a humanoid robot : a caregiver's gift
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.Includes bibliographical references (p. 319-341).(cont.) which are then applied to developmentally acquire new object representations. The humanoid robot therefore sees the world through the caregiver's eyes. Building an artificial humanoid robot's brain, even at an infant's cognitive level, has been a long quest which still lies only in the realm of our imagination. Our efforts towards such a dimly imaginable task are developed according to two alternate and complementary views: cognitive and developmental.The goal of this work is to build a cognitive system for the humanoid robot, Cog, that exploits human caregivers as catalysts to perceive and learn about actions, objects, scenes, people, and the robot itself. This thesis addresses a broad spectrum of machine learning problems across several categorization levels. Actions by embodied agents are used to automatically generate training data for the learning mechanisms, so that the robot develops categorization autonomously. Taking inspiration from the human brain, a framework of algorithms and methodologies was implemented to emulate different cognitive capabilities on the humanoid robot Cog. This framework is effectively applied to a collection of AI, computer vision, and signal processing problems. Cognitive capabilities of the humanoid robot are developmentally created, starting from infant-like abilities for detecting, segmenting, and recognizing percepts over multiple sensing modalities. Human caregivers provide a helping hand for communicating such information to the robot. This is done by actions that create meaningful events (by changing the world in which the robot is situated) thus inducing the "compliant perception" of objects from these human-robot interactions. Self-exploration of the world extends the robot's knowledge concerning object properties. This thesis argues for enculturating humanoid robots using infant development as a metaphor for building a humanoid robot's cognitive abilities. A human caregiver redesigns a humanoid's brain by teaching the humanoid robot as she would teach a child, using children's learning aids such as books, drawing boards, or other cognitive artifacts. Multi-modal object properties are learned using these tools and inserted into several recognition schemes,by Artur Miguel Do Amaral Arsenio.Ph.D
- …