33,514 research outputs found
Speech-Gesture Mapping and Engagement Evaluation in Human Robot Interaction
A robot needs contextual awareness, effective speech production and
complementing non-verbal gestures for successful communication in society. In
this paper, we present our end-to-end system that tries to enhance the
effectiveness of non-verbal gestures. For achieving this, we identified
prominently used gestures in performances by TED speakers and mapped them to
their corresponding speech context and modulated speech based upon the
attention of the listener. The proposed method utilized Convolutional Pose
Machine [4] to detect the human gesture. Dominant gestures of TED speakers were
used for learning the gesture-to-speech mapping. The speeches by them were used
for training the model. We also evaluated the engagement of the robot with
people by conducting a social survey. The effectiveness of the performance was
monitored by the robot and it self-improvised its speech pattern on the basis
of the attention level of the audience, which was calculated using visual
feedback from the camera. The effectiveness of interaction as well as the
decisions made during improvisation was further evaluated based on the
head-pose detection and interaction survey.Comment: 8 pages, 9 figures, Under review in IRC 201
Study of the Importance of Adequacy to Robot Verbal and Non Verbal Communication in Human-Robot interaction
The Robadom project aims at creating a homecare robot that help and assist
people in their daily life, either in doing task for the human or in managing
day organization. A robot could have this kind of role only if it is accepted
by humans. Before thinking about the robot appearance, we decided to evaluate
the importance of the relation between verbal and nonverbal communication
during a human-robot interaction in order to determine the situation where the
robot is accepted. We realized two experiments in order to study this
acceptance. The first experiment studied the importance of having robot
nonverbal behavior in relation of its verbal behavior. The second experiment
studied the capability of a robot to provide a correct human-robot interaction.Comment: the 43rd Symposium on Robotics - ISR 2012, Taipei : Taiwan, Province
Of China (2012
'It's a film' : medium specificity as textual gesture in Red road and The unloved
British cinema has long been intertwined with television. The
buzzwords of the transition to digital media, 'convergence' and
'multi-platform delivery', have particular histories in the British
context which can be grasped only through an understanding of the
cultural, historical and institutional peculiarities of the British film
and television industries. Central to this understanding must be two
comparisons: first, the relative stability of television in the duopoly
period (at its core, the licence-funded BBC) in contrast to the repeated
boom and bust of the many different financial/industrial combinations
which have comprised the film industry; and second, the cultural and
historical connotations of 'film' and 'television'. All readers of this
journal will be familiar – possibly over-familiar – with the notion that
'British cinema is alive and well and living on television'. At the end of
the first decade of the twenty-first century, when 'the end of medium
specificity' is much trumpeted, it might be useful to return to the
historical imbrication of British film and television, to explore both
the possibility that medium specificity may be more nationally specific
than much contemporary theorisation suggests, and to consider some
of the relationships between film and television manifest at a textual
level in two recent films, Red Road (2006) and The Unloved (2009)
Continuous Interaction with a Virtual Human
Attentive Speaking and Active Listening require that a Virtual Human be capable of simultaneous perception/interpretation and production of communicative behavior. A Virtual Human should be able to signal its attitude and attention while it is listening to its interaction partner, and be able to attend to its interaction partner while it is speaking – and modify its communicative behavior on-the-fly based on what it perceives from its partner. This report presents the results of a four week summer project that was part of eNTERFACE’10. The project resulted in progress on several aspects of continuous interaction such as scheduling and interrupting multimodal behavior, automatic classification of listener responses, generation of response eliciting behavior, and models for appropriate reactions to listener responses. A pilot user study was conducted with ten participants. In addition, the project yielded a number of deliverables that are released for public access
Communicating and accentuating the aesthetic and expressive dimension in choral conducting
This article considers the issues that are involved in effective choral conducting from an aesthetic dimension. Drawing upon research, theories and practice, it provides some insight into the nature of communication and the significance of gesture on vocal outcome as well as qualities of leadership concomitant with such musical activity. The article also reports on a research study that investigated the professional development of students and teachers in the area of choral conducting, focusing on their attitudes, skill acquisition and the importance attached to reflection on practice. The findings reveal that consideration of what counts as effective conducting gesture and communication skill can promote better conducting and, consequently, better, more expressive singing. In addition, the positive impact of self and peer reflection on progress (both face-to-face and within a virtual learning environment) was also acknowledged. Certain suggestions for promoting effective musical leadership in the area of choral conducting are provided, in order to ground theoretical perspectives in practice
Tied factor analysis for face recognition across large pose differences
Face recognition algorithms perform very unreliably when the pose of the probe face is different from the gallery face: typical feature vectors vary more with pose than with identity. We propose a generative model that creates a one-to-many mapping from an idealized “identity” space to the observed data space. In identity space, the representation for each individual does not vary with pose. We model the measured feature vector as being generated by a pose-contingent linear transformation of the identity variable in the presence of Gaussian noise. We term this model “tied” factor analysis. The choice of linear transformation (factors) depends on the pose, but the loadings are constant (tied) for a given individual. We use the EM algorithm to estimate the linear transformations and the noise parameters from training data.
We propose a probabilistic distance metric that allows a full posterior over possible matches to be established. We introduce a novel feature extraction process and investigate recognition performance by using the FERET, XM2VTS, and PIE databases. Recognition performance compares favorably with contemporary approaches
- …