20 research outputs found

    Hidden semi-Markov models to segment reading phases from eye movements

    Get PDF
    Our objective is to analyze scanpaths acquired through participants achieving a reading task aiming at answering a binary question: Is the text related or not to some given target topic? We propose a data-driven method based on hidden semi-Markov chains to segment scanpaths into phases deduced from the model states, which are shown to represent different cognitive strategies: normal reading, fast reading, information search, and slow confirmation. These phases were confirmed using different external covariates, among which semantic information extracted from texts. Analyses highlighted some strong preference of specific participants for specific strategies and more globally, large individual variability in eye-movement characteristics, as accounted for by random effects. As a perspective, the possibility of improving reading models by accounting for possible heterogeneity sources during reading is discussed

    ScanGAN360: a generative model of realistic scanpaths for 360 images

    Get PDF
    Understanding and modeling the dynamics of human gaze behavior in 360° environments is crucial for creating, improving, and developing emerging virtual reality applications. However, recruiting human observers and acquiring enough data to analyze their behavior when exploring virtual environments requires complex hardware and software setups, and can be time-consuming. Being able to generate virtual observers can help overcome this limitation, and thus stands as an open problem in this medium. Particularly, generative adversarial approaches could alleviate this challenge by generating a large number of scanpaths that reproduce human behavior when observing new scenes, essentially mimicking virtual observers. However, existing methods for scanpath generation do not adequately predict realistic scanpaths for 360° images. We present ScanGAN360, a new generative adversarial approach to address this problem. We propose a novel loss function based on dynamic time warping and tailor our network to the specifics of 360° images. The quality of our generated scanpaths outperforms competing approaches by a large margin, and is almost on par with the human baseline. ScanGAN360 allows fast simulation of large numbers of virtual observers, whose behavior mimics real users, enabling a better understanding of gaze behavior, facilitating experimentation, and aiding novel applications in virtual reality and beyond

    ModÚles de semi-Markov cachés pour la segmentation de trajectoires oculométriques en phases de lecture

    Get PDF
    Textual information search is not a homogeneous process in time, neither from a cognitive perspective nor in terms ofeye-movement patterns, as shown in previous studies. Our objective is to analyze eye-tracking signals acquired throughparticipants achieving a reading task aiming at answering a binary question: Is the text related or not to some given targettopic? This activity is expected to involve several phases with contrasted oculometric characteristics, such as normalreading, scanning, careful reading, associated with different cognitive strategies, such as creation and rejection ofhypotheses, confirmation and decision. To model such phases, we propose an analytical data-driven method based onhidden semi-Markov chains, whose latent states represent different dynamics in eye movements.Four interpretable phases were highlighted: normal reading, speed reading, information search and slow confirmation.This interpretation was derived using model parameters and scanpath segmentations. It was then confirmed usingdifferent external covariates, among which semantic information extracted from texts. Analyses highlighted a gooddiscrimination of reading speeds by phases, some contrasted use of phases depending on the degree of relationshipbetween text semantic contents and target topics, and a strong preference of specific participants for specific strategies.As another output of our analyses, the individual variability in all eye-movement characteristics was assessed to be highand thus had to be taken into account, particularly trough mixed-effects models.As a perspective, the possibility of improving reading models by accounting for possible heterogeneity sources duringreading was discussed. We highlighted how analysing other sources of information regarding the cognitive processes atstake, such as EEG recordings, could benefit from the segmentation induced by our approach.La recherche d’information textuelle n’est pas un processus homogĂšne dans le temps, que ce soit d’unpoint de vue cognitif ou de celui des mouvements des yeux, ainsi que l’ont montrĂ© des Ă©tudes prĂ©cĂ©dentes. Notreobjectif est d’analyser des signaux oculomĂ©triques acquis lors de tĂąches oĂč les participant.e.s doivent rĂ©pondre Ă  unequestion binaire : est-ce que le texte est liĂ© ou non Ă  un thĂšme cible donnĂ© ? Nous nous attendons Ă  ce que cetteactivitĂ© mette en jeu diverses phases avec des caractĂ©ristiques oculomĂ©triques contrastĂ©es, telle que la lecturenormale, rapide, de confirmation et de dĂ©cision. Pour mettre en Ă©vidence des diffĂ©rentes phases, nous proposons unemĂ©thode basĂ©e sur l’analyse de donnĂ©es fondĂ©e sur des modĂšles semi-markoviens cachĂ©s, dont les Ă©tats latentsreprĂ©sentent diffĂ©rentes dynamiques relatives aux mouvements des yeux. Quatre phases interprĂ©tables ont Ă©tĂ©mises en Ă©vidence : lecture normale, lecture rapide, recherche d’information et confirmation lente. Leur interprĂ©tationdĂ©coule des paramĂštres du modĂšle et de la segmentation des traces oculomĂ©triques.En perspective, nous discutons des possibilitĂ©s offertes par cette approche pour amĂ©liorer des modĂšles de lecture enprenant en compte de potentiels modes de lecture hĂ©tĂ©rogĂšnes mobilisĂ©s dans ce type de tĂąche. Nous mettons enĂ©vidence comment l’analyse d’autres sources d’information relatives aux processus cognitifs mis en jeu, telles quedes enregistrements EEG, pourraient bĂ©nĂ©ficier de la segmentation induite par notre approche

    Modelling eye movements and visual attention in synchronous visual and linguistic processing

    Get PDF
    This thesis focuses on modelling visual attention in tasks in which vision interacts with language and other sources of contextual information. The work is based on insights provided by experimental studies in visual cognition and psycholinguistics, particularly cross-modal processing. We present a series of models of eye-movements in situated language comprehension capable of generating human-like scan-paths. Moreover we investigate the existence of high level structure of the scan-paths and applicability of tools used in Natural Language Processing in the analysis of this structure. We show that scan paths carry interesting information that is currently neglected in both experimental and modelling studies. This information, studied at a level beyond simple statistical measures such as proportion of looks, can be used to extract knowledge of more complicated patterns of behaviour, and to build models capable of simulating human behaviour in the presence of linguistic material. We also revisit classical model saliency and its extensions, in particular the Contextual Guidance Model of Torralba et al. (2006), and extend it with memory of target positions in visual search. We show that models of contextual guidance should contain components responsible for short term learning and memorisation. We also investigate the applicability of this type of model to prediction of human behaviour in tasks with incremental stimuli as in situated language comprehension. Finally we investigate the issue of objectness and object saliency, including their effects on eye-movements and human responses to experimental tasks. In a simple experiment we show that when using an object-based notion of saliency it is possible to predict fixation locations better than using pixel-based saliency as formulated by Itti et al. (1998). In addition we show that object based saliency fits into current theories such as cognitive relevance and can be used to build unified models of cross-referential visual and linguistic processing. This thesis forms a foundation towards a more detailed study of scan-paths within an object-based framework such as Cognitive Relevance Framework (Henderson et al., 2007, 2009) by providing models capable of explaining human behaviour, and the delivery of tools and methodologies to predict which objects would be attended to during synchronous visual and linguistic processing

    Visual Attention Saccadic Models Learn to Emulate Gaze Patterns From Childhood to Adulthood

    Get PDF
    International audienceHow people look at visual information reveals fundamental information about themselves, their interests and their state of mind. While previous visual attention models output static 2-dimensional saliency maps, saccadic models aim to predict not only where observers look at but also how they move their eyes to explore the scene. In this paper, we demonstrate that saccadic models are a flexible framework that can be tailored to emulate observer's viewing tendencies. More specifically, we use fixation data from 101 observers split into 5 age groups (adults, 8-10 y.o., 6-8 y.o., 4-6 y.o. and 2 y.o.) to train our saccadic model for different stages of the development of human visual system. We show that the joint distribution of saccade amplitude and orientation is a visual signature specific to each age group, and can be used to generate age-dependent scanpaths. Our age-dependent saccadic model does not only output human-like, age-specific visual scanpaths, but also significantly outperforms other state-of-the-art saliency models. We demonstrate that the computational modelling of visual attention, through the use of saccadic model, can be efficiently adapted to emulate the gaze behavior of a specific group of observers

    Combining segmentation and attention: a new foveal attention model

    Get PDF
    Artificial vision systems cannot process all the information that they receive from the world in real time because it is highly expensive and inefficient in terms of computational cost. Inspired by biological perception systems, artificial attention models pursuit to select only the relevant part of the scene. On human vision, it is also well established that these units of attention are not merely spatial but closely related to perceptual objects (proto-objects). This implies a strong bidirectional relationship between segmentation and attention processes. While the segmentation process is the responsible to extract the proto-objects from the scene, attention can guide segmentation, arising the concept of foveal attention. When the focus of attention is deployed from one visual unit to another, the rest of the scene is perceived but at a lower resolution that the focused object. The result is a multi-resolution visual perception in which the fovea, a dimple on the central retina, provides the highest resolution vision. In this paper, a bottom-up foveal attention model is presented. In this model the input image is a foveal image represented using a Cartesian Foveal Geometry (CFG), which encodes the field of view of the sensor as a fovea (placed in the focus of attention) surrounded by a set of concentric rings with decreasing resolution. Then multi-resolution perceptual segmentation is performed by building a foveal polygon using the Bounded Irregular Pyramid (BIP). Bottom-up attention is enclosed in the same structure, allowing to set the fovea over the most salient image proto-object. Saliency is computed as a linear combination of multiple low level features such as color and intensity contrast, symmetry, orientation and roundness. Obtained results from natural images show that the performance of the combination of hierarchical foveal segmentation and saliency estimation is good in terms of accuracy and speed
    corecore