19 research outputs found

    Visual focus of attention estimation using eye center localization

    Get PDF
    Estimating people visual focus of attention (VFOA) plays a crucial role in various practical systems such as human-robot interaction. It is challenging to extract the cue of the VFOA of a person due to the difficulty of recognizing gaze directionality. In this paper, we propose an improved integrodifferential approach to represent gaze via efficiently and accurately localizing the eye center in lower resolution image. The proposed method takes advantage of the drastic intensity changes between the iris and the sclera and the grayscale of the eye center as well. The number of kernels is optimized to convolute the original eye region image, and the eye center is located via searching the maximum ratio derivative of the neighbor curve magnitudes in the convolution image. Experimental results confirm that the algorithm outperforms the state-of-the-art methods in terms of computational cost, accuracy, and robustness to illumination changes

    Addressee detection for dialog systems using temporal and spectral dimensions of speaking style,” in

    Get PDF
    Abstract As dialog systems evolve to handle unconstrained input and for use in open environments, addressee detection (detecting speech to the system versus to other people) becomes an increasingly important challenge. We study a corpus in which speakers talk both to a system and to each other, and model two dimensions of speaking style that talkers modify when changing addressee: speech rhythm and vocal effort. For each dimension we design features that do not require speech recognition output, session normalization, speaker normalization, or dialog context. Detection experiments show that rhythm and effort features are complementary, outperform lexical models based on recognized words, and reduce error rates even if word recognition is error-free. Simulated online processing experiments show that all features need only the first couple seconds of speech. Finally, we find that temporal and spectral stylistic models can be trained on outside corpora, such as ATIS and ICSI meetings, with reasonable generalization to the target task, thus showing promise for domain-independent computerversus-human addressee detectors

    A corpus for studying addressing behaviour in multi-party dialogues

    Get PDF
    This paper describes a multi-modal corpus of hand-annotated meeting dialogues that was designed for studying addressing behaviour in face-to-face conversations. The corpus contains annotated dialogue acts, addressees, adjacency pairs and gaze direction. First, we describe the corpus design where we present the meetings collection, annotation scheme and annotation tools. Then, we present the\ud analysis of the reproducibility and stability of the annotation scheme

    Analyzing Group Interactions in Conversations: a Review

    Get PDF
    \noindent Multiparty face-to-face conversations in professional and social settings represent an emerging research domain for which automatic activity-based analysis is relevant for scientific and practical reasons. The activity patterns emerging from groups engaged in conversations are intrinsically multimodal and thus constitute interesting target problems for multistream and multisensor fusion techniques. In this paper, a summarized review of the literature on automatic analysis of group activities in face-to-face conversational settings is presented. A basic categorization of group activities is proposed based on their typical temporal scale, and existing works are then discussed for various types of activities and trends including addressing, turn taking, interest, and dominance

    Saliency-based identification and recognition of pointed-at objects

    Full text link
    Abstract — When persons interact, non-verbal cues are used to direct the attention of persons towards objects of interest. Achieving joint attention this way is an important aspect of natural communication. Most importantly, it allows to couple verbal descriptions with the visual appearance of objects, if the referred-to object is non-verbally indicated. In this contri-bution, we present a system that utilizes bottom-up saliency and pointing gestures to efficiently identify pointed-at objects. Furthermore, the system focuses the visual attention by steering a pan-tilt-zoom camera towards the object of interest and thus provides a suitable model-view for SIFT-based recognition and learning. We demonstrate the practical applicability of the proposed system through experimental evaluation in different environments with multiple pointers and objects

    A Review of Verbal and Non-Verbal Human-Robot Interactive Communication

    Get PDF
    In this paper, an overview of human-robot interactive communication is presented, covering verbal as well as non-verbal aspects of human-robot interaction. Following a historical introduction, and motivation towards fluid human-robot communication, ten desiderata are proposed, which provide an organizational axis both of recent as well as of future research on human-robot communication. Then, the ten desiderata are examined in detail, culminating to a unifying discussion, and a forward-looking conclusion

    Tracking the visual focus of attention for a varying number of wandering people

    Get PDF
    In this article, we define and address the problem of finding the visual focus of attention for a varying number of wandering people (VFOA-W) -- determining where a person is looking when their movement is unconstrained. VFOA-W estimation is a new and important problem with implications in behavior understanding and cognitive science, as well as real-world applications. One such application, presented in this article, monitors the attention passers-by pay to an outdoor advertisement using a single video camera. In our approach to the VFOA-W problem, we propose a multi-person tracking solution based on a dynamic Bayesian network that simultaneously infers the number of people in a scene, their body locations, their head locations, and their head pose. For efficient inference in the resulting variable-dimensional state-space we propose a Reversible Jump Markov Chain Monte Carlo (RJMCMC) sampling scheme, as well as a novel global observation model which determines the number of people in the scene and their locations. To determine if a person is looking at the advertisement or not, we propose a Gaussian Mixture Model (GMM) and Hidden Markov Model (HMM)-based VFOA-W model which uses head pose and location information. Our models are evaluated for tracking performance and ability to recognize people looking at an outdoor advertisement, with results indicating good performance on sequences where up to three people pass in front of an advertisement
    corecore