26 research outputs found
Active Vision for Scene Understanding
Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot\u27s view in order to explore interaction possibilities of the scene
Active Vision for Scene Understanding
Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot's view in order to explore interaction possibilities of the scene
Development of a foveated vision system for the tracking of mobile targets in dynamic environments
Mestrado em Engenharia MecânicaEste trabalho descreve um sistema baseado em percepção activa e em visão
foveada, projectado para identificar e seguir objectos móveis em ambientes
dinâmicos. O sistema inclui uma unidade pan & tilt para facilitar o seguimento e
manter o objecto no centro do campo visual das câmaras, cujas lentes grandeangular
e tele-objectiva proporcionam uma visão periférica e foveada do
mundo, respectivamente. O método Haar features é utilizado para efectuar o
reconhecimento dos objectos. O algoritmo de seguimento baseado em
template matching continua a perseguir o objecto mesmo quando este não
mais está a ser reconhecido pelo módulo de identificação. Algumas técnicas
utilizadas para melhorar o template matching são também apresentadas,
nomeadamente o Filtro Gaussiano e a Computação Rápida de Filtro
Gaussiano. São indicados resultados relativos ao seguimento, identificação e
desempenho global do sistema. O sistema comporta-se muito bem, mantendo
o processamento de, pelo menos, 15 fotogramas por segundo em imagens de
320x240, num computador portátil normal. São também abordados alguns
aspectos para melhorar o desempenho do sistema.
ABSTRACT: This work describes a system based on active perception and foveated vision,
intended to identify and track moving targets in dynamic environments. The full
system includes a pan and tilt unit to ease tracking and keep the interesting
target in the two cameras’ view, whose wide / narrow field lenses provide both
a peripheral and a foveal view of the world respectively. View-based Haar-like
features are employed for object recognition. A template matching based
tracking technique continues to track the object even when its view is not
recognized by the object recognition module. Some of the techniques used to
improve the template matching performance are also presented, namely
Gaussian Filtering and Fast Gaussian computation. Results are presented for
tracking, identification and global system’s operation. The system performs well
up to 15 frames per second on a 320 x 240 image on an ordinary laptop
computer. Several issues to improve the system’s performance are also
addressed
Bio-inspired foveal and peripheral visual sensing for saliency-based decision making in robotics
Computer vision is an area of research that has grown at immense speed in the last few decades, tackling problems towards scene understanding from very diverse fronts, such as image classification, object detection, localization, mapping and tracking. It has also been long understood that there are very valuable lessons to learn from biology and to be applied to this research field, where the human visual system is very likely the most studied brain mechanism.
The eye foveation system is a very good example of such lessons, since both machines and animals often face a similar dilemma; to prioritize visual areas of interest to faster process information, given limited computing power and from a field of view that is too wide to be simultaneously attended. While extensive models of artificial foveation have been presented, the re-emerging area of machine learning with deep neural networks has opened the question into how these two approaches can contribute to each other. Novel deep learning models often rely on the availability of substantial computing power, but areas of application face strict constraints, a good example are unmanned aerial vehicles, which in order to be autonomous should lift and power all their computing equipment.
In this work it is studied how applying a foveation principle to down-scale images can be used to reduce the number of operations required for object detection, and compare its effect to normally down-sampled images, given the prevalent number of operations by Convolutional Neural Network (CNN) layers. Foveation requires prior knowledge of regions of interest to center the fovea, this point in question is addressed by a merging of bottom-up saliency and top-down feedback of objects that the CNN has been trained to detect. Albeit saliency models have also been studied extensively in the last couple of decades, most often comparing their performance to human observer datasets, the question remains open into how they fit in wider information processing paradigms and into functional representations of the human brain. It is proposed here an information flow scheme that encompasses these principles.
Finally, to give to the model the capacity to operate coherently in the time domain, it adapts a representation of a well-established theory of the decision-making process that takes place in the basal ganglia region of the brain. The behaviour of this representation is then tested against human observer's data in an omnidirectional field of view, where the importance of selecting the most contextually relevant region of interest in each time-step is highlighted
Expressive social exchange between humans and robots
Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 253-264).Sociable humanoid robots are natural and intuitive for people to communicate with and to teach. We present recent advances in building an autonomous humanoid robot, Kismet, that can engage humans in expressive social interaction. We outline a set of design issues and a framework that we have found to be of particular importance for sociable robots. Having a human-in-the-loop places significant social constraints on how the robot aesthetically appears, how its sensors are configured, its quality of movement, and its behavior. Inspired by infant social development, psychology, ethology, and evolutionary perspectives, this work integrates theories and concepts from these diverse viewpoints to enable Kismet to enter into natural and intuitive social interaction with a human caregiver, reminiscent of parent-infant exchanges. Kismet perceives a variety of natural social cues from visual and auditory channels, and delivers social signals to people through gaze direction, facial expression, body posture, and vocalizations. We present the implementation of Kismet's social competencies and evaluate each with respect to: 1) the ability of naive subjects to read and interpret the robot's social cues, 2) the robot's ability to perceive and appropriately respond to naturally offered social cues, 3) the robot's ability to elicit interaction scenarios that afford rich learning potential, and 4) how this produces a rich, flexible, dynamic interaction that is physical, affective, and social. Numerous studies with naive human subjects are described that provide the data upon which we base our evaluations.by Cynthia L. Breazeal.Sc.D
Change blindness: eradication of gestalt strategies
Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task
Learning and Execution of Object Manipulation Tasks on Humanoid Robots
Equipping robots with complex capabilities still requires a great amount of effort. In this work, a novel approach is proposed to understand, to represent and to execute object manipulation tasks learned from observation by combining methods of data analysis, graphical modeling and artificial intelligence. Employing this approach enables robots to reason about how to solve tasks in dynamic environments and to adapt to unseen situations