133,303 research outputs found
Who am I talking with? A face memory for social robots
In order to provide personalized services and to
develop human-like interaction capabilities robots need to rec-
ognize their human partner. Face recognition has been studied
in the past decade exhaustively in the context of security systems
and with significant progress on huge datasets. However, these
capabilities are not in focus when it comes to social interaction
situations. Humans are able to remember people seen for a
short moment in time and apply this knowledge directly in
their engagement in conversation. In order to equip a robot with
capabilities to recall human interlocutors and to provide user-
aware services, we adopt human-human interaction schemes to
propose a face memory on the basis of active appearance models
integrated with the active memory architecture. This paper
presents the concept of the interactive face memory, the applied
recognition algorithms, and their embedding into the robot’s
system architecture. Performance measures are discussed for
general face databases as well as scenario-specific datasets
Multiscale Discriminant Saliency for Visual Attention
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between center and surround
classes. Discriminant power of features for the classification is measured as
mutual information between features and two classes distribution. The estimated
discrepancy of two feature classes very much depends on considered scale
levels; then, multi-scale structure and discriminant power are integrated by
employing discrete wavelet features and Hidden markov tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, saliency value for
each dyadic square at each scale level is computed with discriminant power
principle and the MAP. Finally, across multiple scales is integrated the final
saliency map by an information maximization rule. Both standard quantitative
tools such as NSS, LCC, AUC and qualitative assessments are used for evaluating
the proposed multiscale discriminant saliency method (MDIS) against the
well-know information-based saliency method AIM on its Bruce Database wity
eye-tracking data. Simulation results are presented and analyzed to verify the
validity of MDIS as well as point out its disadvantages for further research
direction.Comment: 16 pages, ICCSA 2013 - BIOCA sessio
Multi-Modal Human-Machine Communication for Instructing Robot Grasping Tasks
A major challenge for the realization of intelligent robots is to supply them
with cognitive abilities in order to allow ordinary users to program them
easily and intuitively. One way of such programming is teaching work tasks by
interactive demonstration. To make this effective and convenient for the user,
the machine must be capable to establish a common focus of attention and be
able to use and integrate spoken instructions, visual perceptions, and
non-verbal clues like gestural commands. We report progress in building a
hybrid architecture that combines statistical methods, neural networks, and
finite state machines into an integrated system for instructing grasping tasks
by man-machine interaction. The system combines the GRAVIS-robot for visual
attention and gestural instruction with an intelligent interface for speech
recognition and linguistic interpretation, and an modality fusion module to
allow multi-modal task-oriented man-machine communication with respect to
dextrous robot manipulation of objects.Comment: 7 pages, 8 figure
Speech rhythms and multiplexed oscillatory sensory coding in the human brain
Cortical oscillations are likely candidates for segmentation and coding of continuous speech. Here, we monitored continuous speech processing with magnetoencephalography (MEG) to unravel the principles of speech segmentation and coding. We demonstrate that speech entrains the phase of low-frequency (delta, theta) and the amplitude of high-frequency (gamma) oscillations in the auditory cortex. Phase entrainment is stronger in the right and amplitude entrainment is stronger in the left auditory cortex. Furthermore, edges in the speech envelope phase reset auditory cortex oscillations thereby enhancing their entrainment to speech. This mechanism adapts to the changing physical features of the speech envelope and enables efficient, stimulus-specific speech sampling. Finally, we show that within the auditory cortex, coupling between delta, theta, and gamma oscillations increases following speech edges. Importantly, all couplings (i.e., brain-speech and also within the cortex) attenuate for backward-presented speech, suggesting top-down control. We conclude that segmentation and coding of speech relies on a nested hierarchy of entrained cortical oscillations
Multi-scale Discriminant Saliency with Wavelet-based Hidden Markov Tree Modelling
The bottom-up saliency, an early stage of humans' visual attention, can be
considered as a binary classification problem between centre and surround
classes. Discriminant power of features for the classification is measured as
mutual information between distributions of image features and corresponding
classes . As the estimated discrepancy very much depends on considered scale
level, multi-scale structure and discriminant power are integrated by employing
discrete wavelet features and Hidden Markov Tree (HMT). With wavelet
coefficients and Hidden Markov Tree parameters, quad-tree like label structures
are constructed and utilized in maximum a posterior probability (MAP) of hidden
class variables at corresponding dyadic sub-squares. Then, a saliency value for
each square block at each scale level is computed with discriminant power
principle. Finally, across multiple scales is integrated the final saliency map
by an information maximization rule. Both standard quantitative tools such as
NSS, LCC, AUC and qualitative assessments are used for evaluating the proposed
multi-scale discriminant saliency (MDIS) method against the well-know
information based approach AIM on its released image collection with
eye-tracking data. Simulation results are presented and analysed to verify the
validity of MDIS as well as point out its limitation for further research
direction.Comment: arXiv admin note: substantial text overlap with arXiv:1301.396
Taking Synchrony Seriously: A Perceptual-Level Model of Infant Synchrony Detection
Synchrony detection between different sensory and/or motor channels appears critically important for young infant learning and cognitive development. For example, empirical studies demonstrate that audio-visual synchrony aids in language acquisition. In this paper we compare these infant studies with a model of synchrony detection based on the Hershey and Movellan (2000) algorithm augmented with methods for quantitative synchrony estimation. Four infant-model comparisons are presented, using audio-visual stimuli of increasing complexity. While infants and the model showed learning or discrimination with each type of stimuli used, the model was most successful with stimuli comprised of one audio and one visual source, and also with two audio sources and a dynamic-face visual motion source. More difficult for the model were stimuli conditions with two motion sources, and more abstract visual dynamics—an oscilloscope instead of a face. Future research should model the developmental pathway of synchrony detection. Normal audio-visual synchrony detection in infants may be experience-dependent (e.g., Bergeson, et al., 2004)
- …