Search CORE

79,222 research outputs found

A computer vision model for visual-object-based attention and eye movements

Author: Backer
Bonmassar
Chambers
Craighero
Duncan
Fang Wang
Herman Martins Gomes
Hoffman
Horowitz
Itti
Juan
Kelley
LaBerge
Lee
McPeek
Pashler
Posner
Pylyshyn
Rensink
Rizzolatti
Robert Fisher
Scholl
Sela
Serences
Sun
Thompson
Tipper
Tsotsos
Walther
Wright
Yaoru Sun
Publication venue: 'Elsevier BV'
Publication date: 01/11/2008
Field of study

This is the post-print version of the final paper published in Computer Vision and Image Understanding. The published article is available from the link below. Changes resulting from the publishing process, such as peer review, editing, corrections, structural formatting, and other quality control mechanisms may not be reflected in this document. Changes may have been made to this work since it was submitted for publication. Copyright @ 2008 Elsevier B.V.This paper presents a new computational framework for modelling visual-object-based attention and attention-driven eye movements within an integrated system in a biologically inspired approach. Attention operates at multiple levels of visual selection by space, feature, object and group depending on the nature of targets and visual tasks. Attentional shifts and gaze shifts are constructed upon their common process circuits and control mechanisms but also separated from their different function roles, working together to fulfil flexible visual selection tasks in complicated visual environments. The framework integrates the important aspects of human visual attention and eye movements resulting in sophisticated performance in complicated natural scenes. The proposed approach aims at exploring a useful visual selection system for computer vision, especially for usage in cluttered natural visual environments.National Natural Science of Founda- tion of Chin

Crossref

Edinburgh Research Explorer

Brunel University Research Archive

Reducing “Structure from Motion”: a general framework for dynamic vision. 2. Implementation and experimental assessment

Author: Perona Pietro
Soatto Stefano
Publication venue
Publication date: 01/09/1998
Field of study

For pt.1 see ibid., p.933-42 (1998). A number of methods have been proposed in the literature for estimating scene-structure and ego-motion from a sequence of images using dynamical models. Despite the fact that all methods may be derived from a “natural” dynamical model within a unified framework, from an engineering perspective there are a number of trade-offs that lead to different strategies depending upon the applications and the goals one is targeting. We want to characterize and compare the properties of each model such that the engineer may choose the one best suited to the specific application. We analyze the properties of filters derived from each dynamical model under a variety of experimental conditions, assess the accuracy of the estimates, their robustness to measurement noise, sensitivity to initial conditions and visual angle, effects of the bas-relief ambiguity and occlusions, dependence upon the number of image measurements and their sampling rate

Caltech Authors

Two-Stream Action Recognition-Oriented Video Super-Resolution

Author: Liu Dong
Xiong Zhiwei
Zhang Haochen
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 15/08/2019
Field of study

We study the video super-resolution (SR) problem for facilitating video analytics tasks, e.g. action recognition, instead of for visual quality. The popular action recognition methods based on convolutional networks, exemplified by two-stream networks, are not directly applicable on video of low spatial resolution. This can be remedied by performing video SR prior to recognition, which motivates us to improve the SR procedure for recognition accuracy. Tailored for two-stream action recognition networks, we propose two video SR methods for the spatial and temporal streams respectively. On the one hand, we observe that regions with action are more important to recognition, and we propose an optical-flow guided weighted mean-squared-error loss for our spatial-oriented SR (SoSR) network to emphasize the reconstruction of moving objects. On the other hand, we observe that existing video SR methods incur temporal discontinuity between frames, which also worsens the recognition accuracy, and we propose a siamese network for our temporal-oriented SR (ToSR) training that emphasizes the temporal continuity between consecutive frames. We perform experiments using two state-of-the-art action recognition networks and two well-known datasets--UCF101 and HMDB51. Results demonstrate the effectiveness of our proposed SoSR and ToSR in improving recognition accuracy.Comment: Accepted to ICCV 2019. Code: https://github.com/AlanZhang1995/TwoStreamS

arXiv.org e-Print Archive

Crossref

Conversion Prediction Using Multi-task Conditional Attention Networks to Support the Creation of Effective Ad Creative

Author: Bahdanau Dzmitry
Kingma Diederik P
Kudo Taku
Lin Zhouhan
Luong Thang
Thomaidou Stamatina
Xu Kelvin
Yang Hongxia
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 17/05/2019
Field of study

Accurately predicting conversions in advertisements is generally a challenging task, because such conversions do not occur frequently. In this paper, we propose a new framework to support creating high-performing ad creatives, including the accurate prediction of ad creative text conversions before delivering to the consumer. The proposed framework includes three key ideas: multi-task learning, conditional attention, and attention highlighting. Multi-task learning is an idea for improving the prediction accuracy of conversion, which predicts clicks and conversions simultaneously, to solve the difficulty of data imbalance. Furthermore, conditional attention focuses attention of each ad creative with the consideration of its genre and target gender, thus improving conversion prediction accuracy. Attention highlighting visualizes important words and/or phrases based on conditional attention. We evaluated the proposed framework with actual delivery history data (14,000 creatives displayed more than a certain number of times from Gunosy Inc.), and confirmed that these ideas improve the prediction performance of conversions, and visualize noteworthy words according to the creatives' attributes.Comment: 9 pages, 6 figures. Accepted at The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2019) as an applied data science pape

arXiv.org e-Print Archive

Crossref