Search CORE

21,979 research outputs found

Object Referring in Videos with Language and Human Gaze

Author: Dai Dengxin
Van Gool Luc
Vasudevan Arun Balajee
Publication venue
Publication date: 04/04/2018
Field of study

We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

arXiv.org e-Print Archive

Repository for Publications and Research Data

Crossref

Detecting Emotional Involvement in Professional News Reporters: An Analysis of Speech and Gestures

Author: Busa' MARIA GRAZIA
Cravotta Alice
Publication venue
Publication date: 01/01/2016
Field of study

This study is aimed to investigate the extent to which reporters\u2019 voice and body behaviour may betray different degrees of emotional involvement when reporting on emergency situations. The hypothesis is that emotional involvement is associated with an increase in body movements and pitch and intensity variation. The object of investigation is a corpus of 21 10-second videos of Italian news reports on flooding taken from Italian nation-wide TV channels. The gestures and body movements of the reporters were first inspected visually. Then, measures of the reporters\u2019 pitch and intensity variations were calculated and related with the reporters' gestures. The effects of the variability in the reporters' voice and gestures were tested with an evaluation test. The results show that the reporters vary greatly in the extent to which they move their hands and body in their reportings. Two gestures seem to characterise reporters\u2019 communication of emergencies: beats and deictics. The reporters\u2019 use of gestures partially parallels the reporters\u2019 variations in pitch and intensity. The evaluation study shows that increased gesturing is associated with greater emotional involvement and less professionalism. The data was used to create an ontology of gestures for the communication of emergenc

Archivio istituzionale della ricerca - Università di Padova

Can you see what i am talking about? Human speech triggers referential expectation in four-month-old infants

Author: Ekramnia Milad
Farroni Teresa
Marno Hanna
Mehler Jacques
Nespor Marina
Vidal Dos Santos Yamil
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2015
Field of study

Infants’ sensitivity to selectively attend to human speech and to process it in a unique way has been widely reported in the past. However, in order to successfully acquire language, one should also understand that speech is a referential, and that words can stand for other entities in the world. While there has been some evidence showing that young infants can make inferences about the communicative intentions of a speaker, whether they would also appreciate the direct relationship between a specific word and its referent, is still unknown. In the present study we tested four-month-old infants to see whether they would expect to find a referent when they hear human speech. Our results showed that compared to other auditory stimuli or to silence, when infants were listening to speech they were more prepared to find some visual referents of the words, as signalled by their faster orienting towards the visual objects. Hence, our study is the first to report evidence that infants at a very young age already understand the referential relationship between auditory words and physical objects, thus show a precursor in appreciating the symbolic nature of language, even if they do not understand yet the meanings of words

Crossref

PubMed Central

Archivio istituzionale della ricerca - Università di Padova

Markers of Discourse Structure in Child-Directed Speech

Author: Frank Michael
Rohde Hannah
Publication venue
Publication date: 01/01/2011
Field of study

Although the language we encounter is typically embedded in rich discourse contexts, existing models of sentence processing focus largely on phenomena that occur sentence internally. Here we analyze a video corpus of child-caregiver interactions with the aim of characterizing how discourse structure is reflected in child-directed speech and in children’s and caregivers ’ behavior. We use topic continuity as a measure of discourse structure, examining how caregivers introduce and discuss objects across sentences. We develop a variant on a Hidden Markov Model to identify coherent discourses, taking into account speakers ’ intended referent and the time delays between utterances. Using the discourses found by this model, we analyze how the lexical, syntactic, and social properties of caregiver-child interaction change over the course of a sequence of topically-related utterances. Our findings suggest that cues used to signal topicality in adult discourse are also available in child-directed speech and that children’s responses reflect joint attention in communication

CiteSeerX

Edinburgh Research Explorer

eScholarship - University of California

Learning to Localize and Align Fine-Grained Actions to Sparse Instructions

Author: Alayrac Jean-Baptiste
Hahn Meera
Laptev Ivan
Rehg James M.
Ruiz Nataniel
Publication venue
Publication date: 22/09/2018
Field of study

Automatic generation of textual video descriptions that are time-aligned with video content is a long-standing goal in computer vision. The task is challenging due to the difficulty of bridging the semantic gap between the visual and natural language domains. This paper addresses the task of automatically generating an alignment between a set of instructions and a first person video demonstrating an activity. The sparse descriptions and ambiguity of written instructions create significant alignment challenges. The key to our approach is the use of egocentric cues to generate a concise set of action proposals, which are then matched to recipe steps using object recognition and computational linguistic techniques. We obtain promising results on both the Extended GTEA Gaze+ dataset and the Bristol Egocentric Object Interactions Dataset

arXiv.org e-Print Archive

INRIA a CCSD electronic archive server

Recommended from our members

Examining the role of social cues in early word learning

Author: Briganti Alicia Marie, 1982-
Publication venue
Publication date: 01/05/2007
Field of study

Infant word learning has become a popular field of study over the past decade. Research during this time has shown that infants can learn, in a short period of time, to attach words to objects. Two experiments on the role of social cues in early word learning are reported using tightly controlled conditions. Fourteen- and 18-month-old infants were trained by viewing a video of an adult pointing and nodding towards one of two different novel objects appearing on a screen simultaneously, while novel labels were emitted through a speaker. Infants’ looking times to each object were recorded both during training and test trials. Our analyses indicated that both 14-and 18-month-olds looked significantly longer at the object that the adult pointed to in the training trials. However, only 18-month-olds showed any evidence of looking longer at the target object during the test in the consistent condition than in the inconsistent (control) condition. These studies are important because they show, in a controlled laboratory study of infant word learning, that different types of social cues are available at different ages. Fourteen-month-olds are aware of adult pointing and head turning and can follow those cues to an object during training. However, it isn’t until 18 months of age that infants seem able to use those cues in the service of actual word learning.Psycholog

Texas ScholarWorks