6 research outputs found

    Modelling eye movements and visual attention in synchronous visual and linguistic processing

    Get PDF
    This thesis focuses on modelling visual attention in tasks in which vision interacts with language and other sources of contextual information. The work is based on insights provided by experimental studies in visual cognition and psycholinguistics, particularly cross-modal processing. We present a series of models of eye-movements in situated language comprehension capable of generating human-like scan-paths. Moreover we investigate the existence of high level structure of the scan-paths and applicability of tools used in Natural Language Processing in the analysis of this structure. We show that scan paths carry interesting information that is currently neglected in both experimental and modelling studies. This information, studied at a level beyond simple statistical measures such as proportion of looks, can be used to extract knowledge of more complicated patterns of behaviour, and to build models capable of simulating human behaviour in the presence of linguistic material. We also revisit classical model saliency and its extensions, in particular the Contextual Guidance Model of Torralba et al. (2006), and extend it with memory of target positions in visual search. We show that models of contextual guidance should contain components responsible for short term learning and memorisation. We also investigate the applicability of this type of model to prediction of human behaviour in tasks with incremental stimuli as in situated language comprehension. Finally we investigate the issue of objectness and object saliency, including their effects on eye-movements and human responses to experimental tasks. In a simple experiment we show that when using an object-based notion of saliency it is possible to predict fixation locations better than using pixel-based saliency as formulated by Itti et al. (1998). In addition we show that object based saliency fits into current theories such as cognitive relevance and can be used to build unified models of cross-referential visual and linguistic processing. This thesis forms a foundation towards a more detailed study of scan-paths within an object-based framework such as Cognitive Relevance Framework (Henderson et al., 2007, 2009) by providing models capable of explaining human behaviour, and the delivery of tools and methodologies to predict which objects would be attended to during synchronous visual and linguistic processing

    Incremental Learning of Target Locations in Visual Search

    No full text
    The top-down guidance of visual attention is one of the main factors allowing humans to effectively process vast amounts of incoming visual information. Nevertheless we still lack a full understanding of the visual, semantic, and memory processes governing visual attention. In this paper, we present a computational model of visual search capable of predicting the most likely positions of target objects. The model does not require a separate training phase, but learns likely target positions in an incremental fashion based on a memory of previous fixations. We evaluate the model on two search tasks and show that it outperforms saliency alone and comes close to the maximal performance the Contextual Guidance Model can achieve (CGM, Torralba et al. 2006; Ehinger et al. 2009), even though our model does not perform scene recognition or compute global image statistics
    corecore