227,938 research outputs found
Cognitive visual tracking and camera control
Cognitive visual tracking is the process of observing and understanding the behaviour of a moving person. This paper presents an efficient solution to extract, in real-time, high-level information from an observed scene, and generate the most appropriate commands for a set of pan-tilt-zoom (PTZ) cameras in a surveillance scenario. Such a high-level feedback control loop, which is the main novelty of our work, will serve to reduce uncertainties in the observed scene and to maximize the amount of information extracted from it. It is implemented with a distributed camera system using SQL tables as virtual communication channels, and Situation Graph Trees for knowledge representation, inference and high-level camera control. A set of experiments in a surveillance scenario show the effectiveness of our approach and its potential for real applications of cognitive vision
A multimodal smartphone interface for active perception by visually impaired
The diffuse availability of mobile devices, such as smartphones and tablets, has the potential to bring substantial benefits to the people with sensory impairments. The solution proposed in this paper is part of an ongoing effort to create an accurate obstacle and hazard detector for the visually impaired, which is embedded in a hand-held device. In particular, it presents a proof of concept for a multimodal interface to control the orientation of a smartphone's camera, while being held by a person, using a combination of vocal messages, 3D sounds and vibrations. The solution, which is to be evaluated experimentally by users, will enable further research in the area of active vision with human-in-the-loop, with potential application to mobile assistive devices for indoor navigation of visually impaired people
Letter processing and font information during reading: beyond distinctiveness, where vision meets design
Letter identification is a critical front end of the
reading process. In general, conceptualizations of the identification process have emphasized arbitrary sets of distinctive features. However, a richer view of letter processing incorporates principles from the field of type design, including an emphasis on uniformities across letters within a font. The importance of uniformities is supported by a small body of research indicating that consistency of font increases letter identification efficiency. We review design concepts and the relevant literature, with the goal of stimulating further thinking about letter processing during reading
Coding of visual object features and feature conjunctions in the human brain
Peer reviewedPublisher PD
Who am I talking with? A face memory for social robots
In order to provide personalized services and to
develop human-like interaction capabilities robots need to rec-
ognize their human partner. Face recognition has been studied
in the past decade exhaustively in the context of security systems
and with significant progress on huge datasets. However, these
capabilities are not in focus when it comes to social interaction
situations. Humans are able to remember people seen for a
short moment in time and apply this knowledge directly in
their engagement in conversation. In order to equip a robot with
capabilities to recall human interlocutors and to provide user-
aware services, we adopt human-human interaction schemes to
propose a face memory on the basis of active appearance models
integrated with the active memory architecture. This paper
presents the concept of the interactive face memory, the applied
recognition algorithms, and their embedding into the robot’s
system architecture. Performance measures are discussed for
general face databases as well as scenario-specific datasets
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Being-in-the-world-with: Presence Meets Social And Cognitive Neuroscience
In this chapter we will discuss the concepts of “presence” (Inner Presence) and “social presence” (Co-presence) within a cognitive and ecological perspective. Specifically, we claim that the concepts of “presence” and “social presence” are the possible links between self, action, communication and culture. In the first section we will provide a capsule view of Heidegger’s work by examining the two main features of the Heideggerian concept of “being”: spatiality and “being with”. We argue that different visions from social and cognitive sciences – Situated Cognition, Embodied Cognition, Enactive Approach, Situated Simulation, Covert Imitation - and discoveries from neuroscience – Mirror and Canonical Neurons - have many contact points with this view. In particular, these data suggest that our conceptual system dynamically produces contextualized representations (simulations) that support grounded action in different situations. This is allowed by a common coding – the motor code – shared by perception, action and concepts. This common coding also allows the subject for natively recognizing actions done by other selves within the phenomenological contents. In this picture we argue that the role of presence and social presence is to allow the process of self-identification through the separation between “self” and “other,” and between “internal” and “external”. Finally, implications of this position for communication and media studies are discussed by way of conclusion
Drawing cartoon faces - a functional imaging study of the cognitive neuroscience of drawing
We report a functional imaging study of drawing cartoon faces. Normal, untrained participants were scanned while viewing simple black and white cartoon line-drawings of human faces, retaining them for a short memory interval, and then drawing them without vision of their hand or the paper. Specific encoding and retention of information about the faces was tested for by contrasting these two stages (with display of cartoon faces) against the exploration and retention of random dot stimuli. Drawing was contrasted between conditions in which only memory of a previously viewed face was available versus a condition in which both memory and simultaneous viewing of the cartoon was possible, and versus drawing of a new, previously unseen, face. We show that the encoding of cartoon faces powerfully activates the face sensitive areas of the lateral occipital cortex and the fusiform gyrus, but there is no significant activation in these areas during the retention interval. Activity in both areas was also high when drawing the displayed cartoons. Drawing from memory activates areas in posterior parietal cortex and frontal areas.
This activity is consistent with the encoding and retention of the spatial information about the face to be drawn as a visuo-motor action plan, either representing a series of targets for ocular fixation or as spatial targets for the drawing actio
Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
This paper presents a self-supervised method for visual detection of the
active speaker in a multi-person spoken interaction scenario. Active speaker
detection is a fundamental prerequisite for any artificial cognitive system
attempting to acquire language in social settings. The proposed method is
intended to complement the acoustic detection of the active speaker, thus
improving the system robustness in noisy conditions. The method can detect an
arbitrary number of possibly overlapping active speakers based exclusively on
visual information about their face. Furthermore, the method does not rely on
external annotations, thus complying with cognitive development. Instead, the
method uses information from the auditory modality to support learning in the
visual domain. This paper reports an extensive evaluation of the proposed
method using a large multi-person face-to-face interaction dataset. The results
show good performance in a speaker dependent setting. However, in a speaker
independent setting the proposed method yields a significantly lower
performance. We believe that the proposed method represents an essential
component of any artificial cognitive system or robotic platform engaging in
social interactions.Comment: 10 pages, IEEE Transactions on Cognitive and Developmental System
Active Estimation of Distance in a Robotic Vision System that Replicates Human Eye Movement
Many visual cues, both binocular and monocular, provide 3D information. When an agent moves with respect to a scene, an important cue is the different motion of objects located at various distances. While a motion parallax is evident for large translations of the agent, in most head/eye systems a small parallax occurs also during rotations of the cameras. A similar parallax is present also in the human eye. During a relocation of gaze, the shift in the retinal projection of an object depends not only on the amplitude of the movement, but also on the distance of the object with respect to the observer. This study proposes a method for estimating distance on the basis of the parallax that emerges from rotations of a camera. A pan/tilt system specifically designed to reproduce the oculomotor parallax present in the human eye was used to replicate the oculomotor strategy by which humans scan visual scenes. We show that the oculomotor parallax provides accurate estimation of distance during sequences of eye movements. In a system that actively scans a visual scene, challenging tasks such as image segmentation and figure/ground segregation greatly benefit from this cue.National Science Foundation (BIC-0432104, CCF-0130851
- …