70,335 research outputs found

    Fixation prediction with a combined model of bottom-up saliency and vanishing point

    Full text link
    By predicting where humans look in natural scenes, we can understand how they perceive complex natural scenes and prioritize information for further high-level visual processing. Several models have been proposed for this purpose, yet there is a gap between best existing saliency models and human performance. While many researchers have developed purely computational models for fixation prediction, less attempts have been made to discover cognitive factors that guide gaze. Here, we study the effect of a particular type of scene structural information, known as the vanishing point, and show that human gaze is attracted to the vanishing point regions. We record eye movements of 10 observers over 532 images, out of which 319 have vanishing points. We then construct a combined model of traditional saliency and a vanishing point channel and show that our model outperforms state of the art saliency models using three scores on our dataset.Comment: arXiv admin note: text overlap with arXiv:1512.0172

    The development and neural basis of referential gaze perception

    Get PDF
    Infants are sensitive to the referential information conveyed by others’ eye gaze, which could be one of the developmental foundations of theory of mind. To investigate the neural correlates of gaze–object relations, we recorded ERPs from adults and 9-month-old infants while they watched scenes containing gaze shifts either towards or away from the location of a preceding object. In adults, object-incongruent gaze shifts elicited enhanced ERP amplitudes over the occipito-temporal area (N330). In infants, a similar posterior ERP component (N290) was greater for object-incongruent gaze shifts, which suggests that by the age of 9 months infants encode referential information of gaze in a similar way to adults. In addition, in infants we observed an early frontal ERP component (anterior N200), which showed higher amplitude in response to the perception of object-congruent gaze shifts. This component may reflect fast-track processing of socially relevant information, such as the detection of communicative or informative situations, and could form a developmental foundation for attention sharing, social learning and theory of mind

    Looking Beyond a Clever Narrative: Visual Context and Attention are Primary Drivers of Affect in Video Advertisements

    Full text link
    Emotion evoked by an advertisement plays a key role in influencing brand recall and eventual consumer choices. Automatic ad affect recognition has several useful applications. However, the use of content-based feature representations does not give insights into how affect is modulated by aspects such as the ad scene setting, salient object attributes and their interactions. Neither do such approaches inform us on how humans prioritize visual information for ad understanding. Our work addresses these lacunae by decomposing video content into detected objects, coarse scene structure, object statistics and actively attended objects identified via eye-gaze. We measure the importance of each of these information channels by systematically incorporating related information into ad affect prediction models. Contrary to the popular notion that ad affect hinges on the narrative and the clever use of linguistic and social cues, we find that actively attended objects and the coarse scene structure better encode affective information as compared to individual scene objects or conspicuous background elements.Comment: Accepted for publication in the Proceedings of 20th ACM International Conference on Multimodal Interaction, Boulder, CO, US

    The detection of communicative signals directed at the self in infant prefrontal cortex

    Get PDF
    A precondition for successful communication between people is the detection of signals indicating the intention to communicate, such as eye contact or calling a person's name. In adults, establishing communication by eye contact or calling a person's name results in overlapping activity in right prefrontal cortex, suggesting that, regardless of modality, the intention to communicate is detected by the same brain region. We measured prefrontal cortex responses in 5-month-olds using near-infrared spectroscopy (NIRS) to examine the neural basis of detecting communicative signals across modalities in early development. Infants watched human faces that either signaled eye contact or directed their gaze away from the infant, and they also listened to voices that addressed them with their own name or another name. The results revealed that infants recruit adjacent but non-overlapping regions in the left dorsal prefrontal cortex when they process eye contact and own name. Moreover, infants that responded sensitively to eye contact in the one prefrontal region were also more likely to respond sensitively to their own name in the adjacent prefrontal region as revealed in a correlation analysis, suggesting that responding to communicative signals in these two regions might be functionally related. These NIRS results suggest that infants selectively process and attend to communicative signals directed at them. However, unlike adults, infants do not seem to recruit a common prefrontal region when processing communicative signals of different modalities. The implications of these findings for our understanding of infants’ developing communicative abilities are discussed

    A review of the empirical studies of computer supported human-to-human communication

    Get PDF
    This paper presents a review of the empirical studies of human-to-human communication which have been carried out over the last three decades. Although this review is primarily concerned with the empirical studies of computer supported human-to-human communication, a number of studies dealing with group work in non-computer-based collaborative environments, which form the basis of many of the empirical studies of the recent years in the area of CSCW, are also discussed. The concept of person and task spaces is introduced and then subsequently used to categorise the large volume of studies reported in this review. This paper also gives a comparative analysis of the findings of these studies, and draws a number of general conclusions to guide the design and evaluation of future CSCW systems

    Mask-guided Style Transfer Network for Purifying Real Images

    Full text link
    Recently, the progress of learning-by-synthesis has proposed a training model for synthetic images, which can effectively reduce the cost of human and material resources. However, due to the different distribution of synthetic images compared with real images, the desired performance cannot be achieved. To solve this problem, the previous method learned a model to improve the realism of the synthetic images. Different from the previous methods, this paper try to purify real image by extracting discriminative and robust features to convert outdoor real images to indoor synthetic images. In this paper, we first introduce the segmentation masks to construct RGB-mask pairs as inputs, then we design a mask-guided style transfer network to learn style features separately from the attention and bkgd(background) regions and learn content features from full and attention region. Moreover, we propose a novel region-level task-guided loss to restrain the features learnt from style and content. Experiments were performed using mixed studies (qualitative and quantitative) methods to demonstrate the possibility of purifying real images in complex directions. We evaluate the proposed method on various public datasets, including LPW, COCO and MPIIGaze. Experimental results show that the proposed method is effective and achieves the state-of-the-art results.Comment: arXiv admin note: substantial text overlap with arXiv:1903.0582
    corecore