18,397 research outputs found
Recommended from our members
A computer model of chess memory
Chess research provides rich data for testing computational models of human memory. This paper presents a model which shares several common concepts with an earlier attempt (Simon & Gilmartin, 1973), but features several new attributes: dynamic short-term memory, recursive chunking, more sophisticated perceptual mechanisms and use of a retrieval structure (Chase & Ericsson, 1982). Simulations of data from three experiments are presented: 1) differential recall of random and game positions; 2) recall of several boards presented in short succession; 3) recall of positions modified by mirror image reflection about various axes. The model fits the data reasonably well, although some empirical phenomena are not captured by it. At a theoretical level, the conceptualization of the internal representation and its relation with the retrieval structure needs further refinement
Chunks hierarchies and retrieval structures: Comments on Saariluoma and Laine
The empirical results of Saariluoma and Laine (in press) are discussed and their computer simulations are compared with CHREST, a computational model of perception, memory and learning in chess. Mathematical functions such as power functions and logarithmic functions account for Saariluoma and Laine's (in press) correlation heuristic and for CHREST very well. However, these functions fit human data well only with game positions, not with random positions. As CHREST, which learns using spatial proximity, accounts for the human data as well as Saariluoma and Laine's (in press) correlation heuristic, their conclusion that frequency-based heuristics match the data better than proximity-based heuristics is questioned. The idea of flat chunk organisation and its relation to retrieval structures is discussed. In the conclusion, emphasis is given to the need for detailed empirical data, including information about chunk structure and types of errors, for discriminating between various learning algorithms
Visual-Linguistic Semantic Alignment: Fusing Human Gaze and Spoken Narratives for Image Region Annotation
Advanced image-based application systems such as image retrieval and visual question answering depend heavily on semantic image region annotation. However, improvements in image region annotation are limited because of our inability to understand how humans, the end users, process these images and image regions. In this work, we expand a framework for capturing image region annotations where interpreting an image is influenced by the end user\u27s visual perception skills, conceptual knowledge, and task-oriented goals. Human image understanding is reflected by individuals\u27 visual and linguistic behaviors, but the meaningful computational integration and interpretation of their multimodal representations (e.g. gaze, text) remain a challenge. Our work explores the hypothesis that eye movements can help us understand experts\u27 perceptual processes and that spoken language descriptions can reveal conceptual elements of image inspection tasks. We propose that there exists a meaningful relation between gaze, spoken narratives, and image content. Using unsupervised bitext alignment, we create meaningful mappings between participants\u27 eye movements (which reveal key areas of images) and spoken descriptions of those images. The resulting alignments are then used to annotate image regions with concept labels. Our alignment accuracy exceeds baseline alignments that are obtained using both simultaneous and a fixed-delay temporal correspondence. Additionally, comparison of alignment accuracy between a method that identifies clusters in the images based on eye movements and a method that identifies clusters using image features shows that the two approaches perform well on different types of images and concept labels. This suggests that an image annotation framework could integrate information from more than one technique to handle heterogeneous images. The resulting alignments can be used to create a database of low-level image features and high-level semantic annotations corresponding to perceptually important image regions. We demonstrate the applicability of the proposed framework with two datasets: one consisting of general-domain images and another with images from the domain of medicine. This work is an important contribution toward the highly challenging problem of fusing human-elicited multimodal data sources, a problem that will become increasingly important as low-resource scenarios become more common
Ranking algorithms for implicit feedback
This report presents novel algorithms to use eye movements as an implicit relevance feedback in order to improve the performance of the searches. The algorithms are evaluated on "Transport Rank Five" Dataset which were previously collected in Task 8.3. We demonstrated that simple linear combination or tensor product of eye movement and image features can improve the retrieval accuracy
A perceptual comparison of empirical and predictive region-of-interest video
When viewing multimedia presentations, a user only
attends to a relatively small part of the video display at any one point in time. By shifting allocation of bandwidth from peripheral areas to those locations where a user’s gaze is more likely to rest, attentive displays can be produced. Attentive displays aim to reduce resource requirements while minimizing negative user perception—understood in this paper as not only a user’s ability to assimilate and understand information but also his/her subjective satisfaction with the video content. This paper introduces and discusses a perceptual comparison between two region-of-interest display (RoID) adaptation techniques. A RoID is an attentive display where bandwidth has been preallocated around measured or highly probable areas of user gaze. In this paper, video content was manipulated using two sources of data: empirical measured data (captured using eye-tracking technology) and predictive data (calculated from the physical characteristics of the video data). Results show that display adaptation causes significant variation in users’ understanding of specific multimedia content. Interestingly, RoID adaptation and the type of video being presented both affect user perception of video quality. Moreover, the use of frame rates less than 15 frames per second, for any video adaptation technique, caused a significant reduction in user perceived quality, suggesting that although users are aware of video quality reduction, it does impact level of information assimilation and understanding. Results also highlight that user level of enjoyment is significantly affected by the type of video yet is not as affected by the quality or type of video adaptation—an interesting implication in the field of entertainment
Attention mechanisms in the CHREST cognitive architecture
In this paper, we describe the attention mechanisms in CHREST, a computational architecture of human visual expertise. CHREST organises information acquired by direct experience from the world in the form of chunks. These chunks are searched for, and verified, by a unique set of heuristics, comprising the attention mechanism. We explain how the attention mechanism combines bottom-up and top-down heuristics from internal and external sources of information. We describe some experimental evidence demonstrating the correspondence of CHREST’s perceptual mechanisms with those of human subjects. Finally, we discuss how visual attention can play an important role in actions carried out by human experts in domains such as chess
The CHREST architecture of cognition : the role of perception in general intelligence
Original paper can be found at: http://www.atlantis-press.com/publications/aisr/AGI-10/ Copyright Atlantis Press. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits non-commercial use, distribution and reproduction in any medium, provided the original work is properly cited.This paper argues that the CHREST architecture of cognition can shed important light on developing artificial general intelligence. The key theme is that "cognition is perception." The description of the main components and mechanisms of the architecture is followed by a discussion of several domains where CHREST has already been successfully applied, such as the psychology of expert behaviour, the acquisition of language by children, and the learning of multiple representations in physics. The characteristics of CHREST that enable it to account for empirical data include: self-organisation, an emphasis on cognitive limitations, the presence of a perception-learning cycle, and the use of naturalistic data as input for learning. We argue that some of these characteristics can help shed light on the hard questions facing theorists developing artificial general intelligence, such as intuition, the acquisition and use of concepts and the role of embodiment
Prediction of Search Targets From Fixations in Open-World Settings
Previous work on predicting the target of visual search from human fixations
only considered closed-world settings in which training labels are available
and predictions are performed for a known set of potential targets. In this
work we go beyond the state of the art by studying search target prediction in
an open-world setting in which we no longer assume that we have fixation data
to train for the search targets. We present a dataset containing fixation data
of 18 users searching for natural images from three image categories within
synthesised image collages of about 80 images. In a closed-world baseline
experiment we show that we can predict the correct target image out of a
candidate set of five images. We then present a new problem formulation for
search target prediction in the open-world setting that is based on learning
compatibilities between fixations and potential targets
Culture shapes how we look at faces
Background: Face processing, amongst many basic visual skills, is thought to be invariant across all humans. From as early as 1965, studies of eye movements have consistently revealed a systematic triangular sequence of fixations over the eyes and the mouth, suggesting that faces elicit a universal, biologically-determined information extraction pattern. Methodology/Principal Findings: Here we monitored the eye movements of Western Caucasian and East Asian observers while they learned, recognized, and categorized by race Western Caucasian and East Asian faces. Western Caucasian observers reproduced a scattered triangular pattern of fixations for faces of both races and across tasks. Contrary to intuition, East Asian observers focused more on the central region of the face. Conclusions/Significance: These results demonstrate that face processing can no longer be considered as arising from a universal series of perceptual events. The strategy employed to extract visual information from faces differs across cultures
Recommended from our members
Five seconds or sixty? Presentation time in expert memory
The template theory presented in Gobet and Simon (1996a, 1998) is based on the EPAM theory (Feigenbaum & Simon, 1984; Richman et al., 1995), including the numerical parameters that have been estimated in tests of the latter; and it therefore offers precise predictions for the timing of cognitive processes during the presentation and recall of chess positions. This paper describes the behavior of CHREST, a computer implementation of the template theory, in a task when the presentation time is systematically varied from one second to sixty seconds, on the recall of both game and random positions, and compares the model to human data. As predicted by the model, strong players are better than weak players with both types of positions. Their superiority with random positions is especially clear with long presentation times, but is also present after brief presentation times, although smaller in absolute value. CHREST accounts for the data, both qualitatively and quantitatively. Strong players’ superiority with random positions is explained by the large number of chunks they hold in LTM. Strong players’ high recall percentage with short presentation times is explained by the presence of templates, a special class of chunks. The model is compared to other theories of chess skill, which either cannot account for the superiority of Masters with random positions (models based on high-level descriptions and on levels of processing) or predict too strong a performance of Masters with random positions (long-term working memory)
- …