A study in human attention to guide computational action recognition

Abstract

Thesis: M. Eng., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.Cataloged from PDF version of thesis.Includes bibliographical references (pages 93-95).Computer vision researchers have a lot to learn from the human visual system. We, as humans, are usually unaware of how enormously difficult it is to watch a scene and summarize its most important events in words. We only begin to appreciate this truth when we attempt to build a system that performs comparably. In this thesis, I study two features of human visual apparatus: Attention and Peripheral Vision. I then use these to propose heuristics for computational approaches to action recognition. I think that building a system modeled after human vision, with the nonuniform distribution of resolution and processing power, can greatly increase the performance of the computer systems that target action recognition. In this study: (i) I develop and construct tools that allow me to study human vision and its role in action recognition, (ii) I perform four distinct experiments to gain insight into the role of attention and peripheral vision in this task, (iii) I propose computational heuristics, as well as mechanisms, that I believe will increase the efficiency, and recognition power of artificial vision systems. The tools I have developed can be applied to a variety of studies, including those performed on online crowd-sourcing markets (e.g. Amazon's Mechanical Turk). With my human experiments, I demonstrate that there is consistency of visual behavior among multiple subjects when they are asked to report the occurrence of a verb. Further, I demonstrate that while peripheral vision may play a small direct role in action recognition, it is a key component of attentional allocation, whereby it becomes fundamental to action recognition. Moreover, I propose heuristics based on these experiments, that can be informative to the artificial systems. In particular, I argue that the proper medium for action recognition are videos, not still images, and the basic driver of attention should be movement. Finally, I outline a computational mechanism that incorporates these heuristics into an implementable scheme.by Sam Sinai.M. Eng

    Similar works

    Full text

    thumbnail-image

    Available Versions