98 research outputs found

    Deep learning investigation for chess player attention prediction using eye-tracking and game data

    Get PDF
    This article reports on an investigation of the use of convolutional neural networks to predict the visual attention of chess players. The visual attention model described in this article has been created to generate saliency maps that capture hierarchical and spatial features of chessboard, in order to predict the probability fixation for individual pixels Using a skip-layer architecture of an autoencoder, with a unified decoder, we are able to use multiscale features to predict saliency of part of the board at different scales, showing multiple relations between pieces. We have used scan path and fixation data from players engaged in solving chess problems, to compute 6600 saliency maps associated to the corresponding chess piece configurations. This corpus is completed with synthetically generated data from actual games gathered from an online chess platform. Experiments realized using both scan-paths from chess players and the CAT2000 saliency dataset of natural images, highlights several results. Deep features, pretrained on natural images, were found to be helpful in training visual attention prediction for chess. The proposed neural network architecture is able to generate meaningful saliency maps on unseen chess configurations with good scores on standard metrics. This work provides a baseline for future work on visual attention prediction in similar contexts

    Saliency Benchmarking Made Easy: Separating Models, Maps and Metrics

    Full text link
    Dozens of new models on fixation prediction are published every year and compared on open benchmarks such as MIT300 and LSUN. However, progress in the field can be difficult to judge because models are compared using a variety of inconsistent metrics. Here we show that no single saliency map can perform well under all metrics. Instead, we propose a principled approach to solve the benchmarking problem by separating the notions of saliency models, maps and metrics. Inspired by Bayesian decision theory, we define a saliency model to be a probabilistic model of fixation density prediction and a saliency map to be a metric-specific prediction derived from the model density which maximizes the expected performance on that metric given the model density. We derive these optimal saliency maps for the most commonly used saliency metrics (AUC, sAUC, NSS, CC, SIM, KL-Div) and show that they can be computed analytically or approximated with high precision. We show that this leads to consistent rankings in all metrics and avoids the penalties of using one saliency map for all metrics. Our method allows researchers to have their model compete on many different metrics with state-of-the-art in those metrics: "good" models will perform well in all metrics.Comment: published at ECCV 201

    Visual communication of technology: its impact on designing and innovation in industrial and engineering design education

    Get PDF
    Visual communication (VC) resources can be seen as playing an increasingly important role in delivery and learning systems in today s design and technology education. The performance of current tools and resources is the primary concern of this research, and particularly whether they take full advantage of VC when delivering technological information to industrial design (ID) and engineering design (ED) students. This thesis sought key principles behind the visual communication of technology (VCT) and its association to designing, creativity and innovation through a literature survey. The findings concluded that there were many such assertions made with little evidence concerning the associations suggested. Some guiding sources and key emerging principles (KEPs) for good VCT practices were established. A miniature-kite-designing exercise was conducted as a case study for the purpose of examining the links between VCT, designing and creativity and/or innovation. Kite-technological-information posters were used as the VCT tool for the kite-designing case. A comparative study of kite-designing was conducted in Malaysia to check the reliability of the study, and another validation study was carried out for the purpose of establishing the validity of the data gathering. Visual technological information (VTI) for kite design (or a kite-poster) was refined accordingly to the KEPs established from the literature review, and its visual impact was tested through the use of eye-tracking technology. Some selected current and historical visual tools, which have been used in design and technology communication and were recognised as having positive impacts were analysed and articulated in order to reveal a deeper understanding of the KEPs. These were further validated through eye-tracking of reading patterns of participants on those selected visuals. The perceptual responses toward those visuals were also recorded and analysed. A theoretical research framework was established to investigate VTI representation used in books by Ashby (1999) and Ashby and Johnson (2002), in new authors scholarly papers (METU, 2010), and of the author s analysis and redesign of some of those studied VTIs based on the KEPs emerging from the research. A questionnaire survey was conducted within a number of higher education institutions in 3 regions around the world in order to achieve reliable data gathering. This third case study was validated through experts discussion of the findings and related issues. Within these three case studies, a mixture of scientific (using the eye-tracker device) and conventional methods (questionnaires, interviews, discussion group and comparative studies), and also others methods such as design workshops, analysing existing resources, using own practice of design-and-redesign activities were conducted to provide quantitative and qualitative measurements to empirically validate the literature search. Evidence of links between VCT, designerly activities which involved knowledge, skills and values within the technological communication, and of facilitating creativity was obtained. Empirical evidence showed that VTIs were effective in communicating knowledge, skills and values; where the KEPs criteria had played essential roles in enriching the visual emphasis of VTIs. The redesigning exercise using the author s own practice, which articulated the KEPs through the redesign of the existing VTIs for the purpose of more effective VCT, again obtained significant evidence of visual effectiveness and easy understanding capability. Evidence from the analysis of 2 books on materials technology for ID and ED students, views from the 2 materials experts, and the literature review suggested that ID and ED students require difference types of representational models and graphical strategies of VCT in their learning. However, the empirical data from the research, which was supported by one of the materials experts, suggested that ID and ED students even with different cultural backgrounds did not require different VTIs or the use of different VCT strategies for effective communication

    From vision to reasoning

    Get PDF

    Modelling eye movements and visual attention in synchronous visual and linguistic processing

    Get PDF
    This thesis focuses on modelling visual attention in tasks in which vision interacts with language and other sources of contextual information. The work is based on insights provided by experimental studies in visual cognition and psycholinguistics, particularly cross-modal processing. We present a series of models of eye-movements in situated language comprehension capable of generating human-like scan-paths. Moreover we investigate the existence of high level structure of the scan-paths and applicability of tools used in Natural Language Processing in the analysis of this structure. We show that scan paths carry interesting information that is currently neglected in both experimental and modelling studies. This information, studied at a level beyond simple statistical measures such as proportion of looks, can be used to extract knowledge of more complicated patterns of behaviour, and to build models capable of simulating human behaviour in the presence of linguistic material. We also revisit classical model saliency and its extensions, in particular the Contextual Guidance Model of Torralba et al. (2006), and extend it with memory of target positions in visual search. We show that models of contextual guidance should contain components responsible for short term learning and memorisation. We also investigate the applicability of this type of model to prediction of human behaviour in tasks with incremental stimuli as in situated language comprehension. Finally we investigate the issue of objectness and object saliency, including their effects on eye-movements and human responses to experimental tasks. In a simple experiment we show that when using an object-based notion of saliency it is possible to predict fixation locations better than using pixel-based saliency as formulated by Itti et al. (1998). In addition we show that object based saliency fits into current theories such as cognitive relevance and can be used to build unified models of cross-referential visual and linguistic processing. This thesis forms a foundation towards a more detailed study of scan-paths within an object-based framework such as Cognitive Relevance Framework (Henderson et al., 2007, 2009) by providing models capable of explaining human behaviour, and the delivery of tools and methodologies to predict which objects would be attended to during synchronous visual and linguistic processing

    Real-time synthetic primate vision

    Get PDF

    Scenes, saliency maps and scanpaths

    Get PDF
    The aim of this chapter is to review some of the key research investigating how people look at pictures. In particular, my goal is to provide theoretical background for those that are new to the field, while also explaining some of the relevant methods and analyses. I begin by introducing eye movements in the context of natural scene perception. As in other complex tasks, eye movements provide a measure of attention and information processing over time, and they tell us about how the foveated visual system determines what to prioritise. I then describe some of the many measures which have been derived to summarize where people look in complex images. These include global measures, analyses based on regions of interest and comparisons based on heat maps. A particularly popular approach for trying to explain fixation locations is the saliency map approach, and the first half of the chapter is mostly devoted to this topic. A large number of papers and models are built on this approach, but it is also worth spending time on this topic because the methods involved have been used across a wide range of applications. The saliency map approach is based on the fact that the visual system has topographic maps of visual features, that contrast within these features seems to be represented and prioritized, and that a central representation can be used to control attention and eye movements. This approach, and the underlying principles, has led to an increase in the number of researchers using complex natural scenes as stimuli. It is therefore important that those new to the field are familiar with saliency maps, their usage, and their pitfalls. I describe the original implementation of this approach (Itti & Koch, 2000), which uses spatial filtering at different levels of coarseness and combines them in an attempt to identify the regions which stand out from their background. Evaluating this model requires comparing fixation locations to model predictions. Several different experimental and comparison methods have been used, but most recent research shows that bottom-up guidance is rather limited in terms of predicting real eye movements. The second part of the chapter is largely concerned with measuring eye movement scanpaths. Scanpaths are the sequential patterns of fixations and saccades made when looking at something for a period of time. They show regularities which may reflect top-down attention, and some have attempted to link these to memory and an individual’s mental model of what they are looking at. While not all researchers will be testing hypotheses about scanpaths, an understanding of the underlying methods and theory will be of benefit to all. I describe the theories behind analyzing eye movements in this way, and various methods which have been used to represent and compare them. These methods allow one to quantify the similarity between two viewing patterns, and this similarity is linked to both the image and the observer. The last part of the chapter describes some applications of eye movements in image viewing. The methods discussed can be applied to complex images, and therefore these experiments can tell us about perception in art and marketing, as well as about machine vision
    • …
    corecore