Search CORE

214,207 research outputs found

Understanding and predicting where people look in images

Author: Judd Tilke (Tilke M.)
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2011
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.Cataloged from PDF version of thesis.Includes bibliographical references (p. 115-126).For many applications in graphics, design, and human computer interaction, it is essential to understand where humans look in a scene. This is a challenging task given that no one fully understands how the human visual system works. This thesis explores the way people look at different types of images and provides methods of predicting where they look in new scenes. We describe a new way to model where people look from ground truth eye tracking data using techniques of machine learning that outperforms all existing models, and provide a benchmark data set to quantitatively compare existing and future models. In addition we explore how image resolution affects where people look. Our experiments, models, and large eye tracking data sets should help future researchers better understand and predict where people look in order to create more powerful computational vision systems.by Tilke Judd.Ph.D

DSpace@MIT

Learning plan networks in conversational video games

Author: Orkin Jeffrey David
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2007
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2007.Includes bibliographical references (p. 121-123).We look forward to a future where robots collaborate with humans in the home and workplace, and virtual agents collaborate with humans in games and training simulations. A representation of common ground for everyday scenarios is essential for these agents if they are to be effective collaborators and communicators. Effective collaborators can infer a partner's goals and predict future actions. Effective communicators can infer the meaning of utterances based on semantic context. This thesis introduces a computational cognitive model of common ground called a Plan Network. A Plan Network is a statistical model that provides representations of social roles, object affordances, and expected patterns of behavior and language. I describe a methodology for unsupervised learning of a Plan Network using a multiplayer video game, visualization of this network, and evaluation of the learned model with respect to human judgment of typical behavior. Specifically, I describe learning the Restaurant Plan Network from data collected from over 5,000 players of an online game called The Restaurant Game.by Jeffrey David Orkin.S.M

DSpace@MIT

Tree-Based Deep Mixture of Experts with Applications to Visual Saliency Prediction and Quality Robust Visual Recognition

Author
Publication venue
Publication date: 01/01/2018
Field of study

abstract: Mixture of experts is a machine learning ensemble approach that consists of individual models that are trained to be ``experts'' on subsets of the data, and a gating network that provides weights to output a combination of the expert predictions. Mixture of experts models do not currently see wide use due to difficulty in training diverse experts and high computational requirements. This work presents modifications of the mixture of experts formulation that use domain knowledge to improve training, and incorporate parameter sharing among experts to reduce computational requirements. First, this work presents an application of mixture of experts models for quality robust visual recognition. First it is shown that human subjects outperform deep neural networks on classification of distorted images, and then propose a model, MixQualNet, that is more robust to distortions. The proposed model consists of ``experts'' that are trained on a particular type of image distortion. The final output of the model is a weighted sum of the expert models, where the weights are determined by a separate gating network. The proposed model also incorporates weight sharing to reduce the number of parameters, as well as increase performance. Second, an application of mixture of experts to predict visual saliency is presented. A computational saliency model attempts to predict where humans will look in an image. In the proposed model, each expert network is trained to predict saliency for a set of closely related images. The final saliency map is computed as a weighted mixture of the expert networks' outputs, with weights determined by a separate gating network. The proposed model achieves better performance than several other visual saliency models and a baseline non-mixture model. Finally, this work introduces a saliency model that is a weighted mixture of models trained for different levels of saliency. Levels of saliency include high saliency, which corresponds to regions where almost all subjects look, and low saliency, which corresponds to regions where some, but not all subjects look. The weighted mixture shows improved performance compared with baseline models because of the diversity of the individual model predictions.Dissertation/ThesisDoctoral Dissertation Electrical Engineering 201

ASU Digital Repository