3,964 research outputs found

    Taking the bite out of automated naming of characters in TV video

    No full text
    We investigate the problem of automatically labelling appearances of characters in TV or film material with their names. This is tremendously challenging due to the huge variation in imaged appearance of each character and the weakness and ambiguity of available annotation. However, we demonstrate that high precision can be achieved by combining multiple sources of information, both visual and textual. The principal novelties that we introduce are: (i) automatic generation of time stamped character annotation by aligning subtitles and transcripts; (ii) strengthening the supervisory information by identifying when characters are speaking. In addition, we incorporate complementary cues of face matching and clothing matching to propose common annotations for face tracks, and consider choices of classifier which can potentially correct errors made in the automatic extraction of training data from the weak textual annotation. Results are presented on episodes of the TV series ‘‘Buffy the Vampire Slayer”

    Recurrent Attention Models for Depth-Based Person Identification

    Get PDF
    We present an attention-based model that reasons on human body shape and motion dynamics to identify individuals in the absence of RGB information, hence in the dark. Our approach leverages unique 4D spatio-temporal signatures to address the identification problem across days. Formulated as a reinforcement learning task, our model is based on a combination of convolutional and recurrent neural networks with the goal of identifying small, discriminative regions indicative of human identity. We demonstrate that our model produces state-of-the-art results on several published datasets given only depth images. We further study the robustness of our model towards viewpoint, appearance, and volumetric changes. Finally, we share insights gleaned from interpretable 2D, 3D, and 4D visualizations of our model's spatio-temporal attention.Comment: Computer Vision and Pattern Recognition (CVPR) 201

    Object Referring in Videos with Language and Human Gaze

    Full text link
    We investigate the problem of object referring (OR) i.e. to localize a target object in a visual scene coming with a language description. Humans perceive the world more as continued video snippets than as static images, and describe objects not only by their appearance, but also by their spatio-temporal context and motion features. Humans also gaze at the object when they issue a referring expression. Existing works for OR mostly focus on static images only, which fall short in providing many such cues. This paper addresses OR in videos with language and human gaze. To that end, we present a new video dataset for OR, with 30, 000 objects over 5, 000 stereo video sequences annotated for their descriptions and gaze. We further propose a novel network model for OR in videos, by integrating appearance, motion, gaze, and spatio-temporal context into one network. Experimental results show that our method effectively utilizes motion cues, human gaze, and spatio-temporal context. Our method outperforms previousOR methods. For dataset and code, please refer https://people.ee.ethz.ch/~arunv/ORGaze.html.Comment: Accepted to CVPR 2018, 10 pages, 6 figure

    Olfaction scaffolds the developing human from neonate to adolescent and beyond

    Get PDF
    The impact of the olfactory sense is regularly apparent across development. The foetus is bathed in amniotic fluid that conveys the mother’s chemical ecology. Transnatal olfactory continuity between the odours of amniotic fluid and milk assists in the transition to nursing. At the same time, odours emanating from the mammary areas provoke appetitive responses in newborns. Odours experienced from the mother’s diet during breastfeeding, and from practices such as pre-mastication, may assist in the dietary transition at weaning. In parallel, infants are attracted to and recognise their mother’s odours; later, children are able to recognise other kin and peers based on their odours. Familiar odours, such as those of the mother, regulate the child’s emotions, and scaffold perception and learning through non-olfactory senses. During adolescence, individuals become more sensitive to some bodily odours, while the timing of adolescence itself has been speculated to draw from the chemical ecology of the family unit. Odours learnt early in life and within the family niche continue to influence preferences as mate choice becomes relevant. Olfaction thus appears significant in turning on, sustaining and, in cases when mother odour is altered, disturbing adaptive reciprocity between offspring and caregiver during the multiple transitions of development between birth and adolescence

    Im2Flow: Motion Hallucination from Static Images for Action Recognition

    Full text link
    Existing methods to recognize actions in static images take the images at their face value, learning the appearances---objects, scenes, and body poses---that distinguish each action class. However, such models are deprived of the rich dynamic structure and motions that also define human activity. We propose an approach that hallucinates the unobserved future motion implied by a single snapshot to help static-image action recognition. The key idea is to learn a prior over short-term dynamics from thousands of unlabeled videos, infer the anticipated optical flow on novel static images, and then train discriminative models that exploit both streams of information. Our main contributions are twofold. First, we devise an encoder-decoder convolutional neural network and a novel optical flow encoding that can translate a static image into an accurate flow map. Second, we show the power of hallucinated flow for recognition, successfully transferring the learned motion into a standard two-stream network for activity recognition. On seven datasets, we demonstrate the power of the approach. It not only achieves state-of-the-art accuracy for dense optical flow prediction, but also consistently enhances recognition of actions and dynamic scenes.Comment: Published in CVPR 2018, project page: http://vision.cs.utexas.edu/projects/im2flow

    Facial expression of pain: an evolutionary account.

    Get PDF
    This paper proposes that human expression of pain in the presence or absence of caregivers, and the detection of pain by observers, arises from evolved propensities. The function of pain is to demand attention and prioritise escape, recovery, and healing; where others can help achieve these goals, effective communication of pain is required. Evidence is reviewed of a distinct and specific facial expression of pain from infancy to old age, consistent across stimuli, and recognizable as pain by observers. Voluntary control over amplitude is incomplete, and observers can better detect pain that the individual attempts to suppress rather than amplify or simulate. In many clinical and experimental settings, the facial expression of pain is incorporated with verbal and nonverbal vocal activity, posture, and movement in an overall category of pain behaviour. This is assumed by clinicians to be under operant control of social contingencies such as sympathy, caregiving, and practical help; thus, strong facial expression is presumed to constitute and attempt to manipulate these contingencies by amplification of the normal expression. Operant formulations support skepticism about the presence or extent of pain, judgments of malingering, and sometimes the withholding of caregiving and help. To the extent that pain expression is influenced by environmental contingencies, however, "amplification" could equally plausibly constitute the release of suppression according to evolved contingent propensities that guide behaviour. Pain has been largely neglected in the evolutionary literature and the literature on expression of emotion, but an evolutionary account can generate improved assessment of pain and reactions to it

    Dominance attributions following damage to the ventromedial prefrontal cortex

    Get PDF
    Damage to the human ventromedial prefrontal cortex (VM) can result in dramatic and maladaptive changes in social behavior despite preservation of most other cognitive abilities. One important aspect of social cognition is the ability to detect social dominance, a process of attributing from particular social signals another person's relative standing in the social world. To test the role of the VM in making attributions of social dominance, we designed two experiments: one requiring dominance judgments from static pictures of faces, the second requiring dominance judgments from film clips. We tested three demographically matched groups of subjects: subjects with focal lesions in the VM (n=15), brain-damaged comparison subjects with lesions excluding the VM (n=11), and a reference group of normal individuals with no history of neurological disease (n=32). Contrary to our expectation, we found that subjects with VM lesions gave dominance judgments on both tasks that did not differ significantly from those given by the other groups. Despite their grossly normal performance, however, subjects with VM lesions showed more subtle impairments specifically when judging static faces: They were less discriminative in their dominance judgments, and did not appear to make normal use of gender and age of the faces in forming their judgments. The findings suggest that, in the laboratory tasks we used, damage to the VM does not necessarily impair judgments of social dominance, although it appears to result in alterations in strategy that might translate into behavioral impairments in real life
    • 

    corecore