26 research outputs found

    Memory-Based Active Visual Search for Humanoid Robots

    Get PDF

    Active Vision for Scene Understanding

    Get PDF
    Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot\u27s view in order to explore interaction possibilities of the scene

    Active multi-view object search on a humanoid head

    Full text link

    Active Vision for Scene Understanding

    Get PDF
    Visual perception is one of the most important sources of information for both humans and robots. A particular challenge is the acquisition and interpretation of complex unstructured scenes. This work contributes to active vision for humanoid robots. A semantic model of the scene is created, which is extended by successively changing the robot's view in order to explore interaction possibilities of the scene

    Development of a foveated vision system for the tracking of mobile targets in dynamic environments

    Get PDF
    Mestrado em Engenharia MecânicaEste trabalho descreve um sistema baseado em percepção activa e em visão foveada, projectado para identificar e seguir objectos móveis em ambientes dinâmicos. O sistema inclui uma unidade pan & tilt para facilitar o seguimento e manter o objecto no centro do campo visual das câmaras, cujas lentes grandeangular e tele-objectiva proporcionam uma visão periférica e foveada do mundo, respectivamente. O método Haar features é utilizado para efectuar o reconhecimento dos objectos. O algoritmo de seguimento baseado em template matching continua a perseguir o objecto mesmo quando este não mais está a ser reconhecido pelo módulo de identificação. Algumas técnicas utilizadas para melhorar o template matching são também apresentadas, nomeadamente o Filtro Gaussiano e a Computação Rápida de Filtro Gaussiano. São indicados resultados relativos ao seguimento, identificação e desempenho global do sistema. O sistema comporta-se muito bem, mantendo o processamento de, pelo menos, 15 fotogramas por segundo em imagens de 320x240, num computador portátil normal. São também abordados alguns aspectos para melhorar o desempenho do sistema. ABSTRACT: This work describes a system based on active perception and foveated vision, intended to identify and track moving targets in dynamic environments. The full system includes a pan and tilt unit to ease tracking and keep the interesting target in the two cameras’ view, whose wide / narrow field lenses provide both a peripheral and a foveal view of the world respectively. View-based Haar-like features are employed for object recognition. A template matching based tracking technique continues to track the object even when its view is not recognized by the object recognition module. Some of the techniques used to improve the template matching performance are also presented, namely Gaussian Filtering and Fast Gaussian computation. Results are presented for tracking, identification and global system’s operation. The system performs well up to 15 frames per second on a 320 x 240 image on an ordinary laptop computer. Several issues to improve the system’s performance are also addressed

    Bio-inspired foveal and peripheral visual sensing for saliency-based decision making in robotics

    Get PDF
    Computer vision is an area of research that has grown at immense speed in the last few decades, tackling problems towards scene understanding from very diverse fronts, such as image classification, object detection, localization, mapping and tracking. It has also been long understood that there are very valuable lessons to learn from biology and to be applied to this research field, where the human visual system is very likely the most studied brain mechanism. The eye foveation system is a very good example of such lessons, since both machines and animals often face a similar dilemma; to prioritize visual areas of interest to faster process information, given limited computing power and from a field of view that is too wide to be simultaneously attended. While extensive models of artificial foveation have been presented, the re-emerging area of machine learning with deep neural networks has opened the question into how these two approaches can contribute to each other. Novel deep learning models often rely on the availability of substantial computing power, but areas of application face strict constraints, a good example are unmanned aerial vehicles, which in order to be autonomous should lift and power all their computing equipment. In this work it is studied how applying a foveation principle to down-scale images can be used to reduce the number of operations required for object detection, and compare its effect to normally down-sampled images, given the prevalent number of operations by Convolutional Neural Network (CNN) layers. Foveation requires prior knowledge of regions of interest to center the fovea, this point in question is addressed by a merging of bottom-up saliency and top-down feedback of objects that the CNN has been trained to detect. Albeit saliency models have also been studied extensively in the last couple of decades, most often comparing their performance to human observer datasets, the question remains open into how they fit in wider information processing paradigms and into functional representations of the human brain. It is proposed here an information flow scheme that encompasses these principles. Finally, to give to the model the capacity to operate coherently in the time domain, it adapts a representation of a well-established theory of the decision-making process that takes place in the basal ganglia region of the brain. The behaviour of this representation is then tested against human observer's data in an omnidirectional field of view, where the importance of selecting the most contextually relevant region of interest in each time-step is highlighted

    Expressive social exchange between humans and robots

    Get PDF
    Thesis (Sc.D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2000.Includes bibliographical references (p. 253-264).Sociable humanoid robots are natural and intuitive for people to communicate with and to teach. We present recent advances in building an autonomous humanoid robot, Kismet, that can engage humans in expressive social interaction. We outline a set of design issues and a framework that we have found to be of particular importance for sociable robots. Having a human-in-the-loop places significant social constraints on how the robot aesthetically appears, how its sensors are configured, its quality of movement, and its behavior. Inspired by infant social development, psychology, ethology, and evolutionary perspectives, this work integrates theories and concepts from these diverse viewpoints to enable Kismet to enter into natural and intuitive social interaction with a human caregiver, reminiscent of parent-infant exchanges. Kismet perceives a variety of natural social cues from visual and auditory channels, and delivers social signals to people through gaze direction, facial expression, body posture, and vocalizations. We present the implementation of Kismet's social competencies and evaluate each with respect to: 1) the ability of naive subjects to read and interpret the robot's social cues, 2) the robot's ability to perceive and appropriately respond to naturally offered social cues, 3) the robot's ability to elicit interaction scenarios that afford rich learning potential, and 4) how this produces a rich, flexible, dynamic interaction that is physical, affective, and social. Numerous studies with naive human subjects are described that provide the data upon which we base our evaluations.by Cynthia L. Breazeal.Sc.D

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Learning and Execution of Object Manipulation Tasks on Humanoid Robots

    Get PDF
    Equipping robots with complex capabilities still requires a great amount of effort. In this work, a novel approach is proposed to understand, to represent and to execute object manipulation tasks learned from observation by combining methods of data analysis, graphical modeling and artificial intelligence. Employing this approach enables robots to reason about how to solve tasks in dynamic environments and to adapt to unseen situations
    corecore