817 research outputs found

    Cortical Dynamics of Contextually-Cued Attentive Visual Learning and Search: Spatial and Object Evidence Accumulation

    Full text link
    How do humans use predictive contextual information to facilitate visual search? How are consistently paired scenic objects and positions learned and used to more efficiently guide search in familiar scenes? For example, a certain combination of objects can define a context for a kitchen and trigger a more efficient search for a typical object, such as a sink, in that context. A neural model, ARTSCENE Search, is developed to illustrate the neural mechanisms of such memory-based contextual learning and guidance, and to explain challenging behavioral data on positive/negative, spatial/object, and local/distant global cueing effects during visual search. The model proposes how global scene layout at a first glance rapidly forms a hypothesis about the target location. This hypothesis is then incrementally refined by enhancing target-like objects in space as a scene is scanned with saccadic eye movements. The model clarifies the functional roles of neuroanatomical, neurophysiological, and neuroimaging data in visual search for a desired goal object. In particular, the model simulates the interactive dynamics of spatial and object contextual cueing in the cortical What and Where streams starting from early visual areas through medial temporal lobe to prefrontal cortex. After learning, model dorsolateral prefrontal cortical cells (area 46) prime possible target locations in posterior parietal cortex based on goalmodulated percepts of spatial scene gist represented in parahippocampal cortex, whereas model ventral prefrontal cortical cells (area 47/12) prime possible target object representations in inferior temporal cortex based on the history of viewed objects represented in perirhinal cortex. The model hereby predicts how the cortical What and Where streams cooperate during scene perception, learning, and memory to accumulate evidence over time to drive efficient visual search of familiar scenes.CELEST, an NSF Science of Learning Center (SBE-0354378); SyNAPSE program of Defense Advanced Research Projects Agency (HR0011-09-3-0001, HR0011-09-C-0011

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Visual attention in the real world

    Get PDF
    Humans typically direct their gaze and attention at locations important for the tasks they are engaged in. By measuring the direction of gaze, the relative importance of each location can be estimated which can reveal how cognitive processes choose where gaze is to be directed. For decades, this has been done in laboratory setups, which have the advantage of being well-controlled. Here, visual attention is studied in more life-like situations, which allows testing ecological validity of laboratory results and allows the use of real-life setups that are hard to mimic in a laboratory. All four studies in this thesis contribute to our understanding of visual attention and perception in more complex situations than are found in the traditional laboratory experiments. Bottom-up models of attention use the visual input to predict attention or even the direction of gaze. In such models the input image is analyzed for each of several features first. In the classic Saliency Map model, these features are color contrast, luminance contrast and orientation contrast. The “interestingness” of each location in the image is represented in a ‘conspicuity maps’, one for each feature. The Saliency Map model then combines these conspicuity maps by linear addition, and this additivity has recently been challenged. The alternative is to use the maxima across all conspicuity maps. In the first study, the features color contrast and luminance contrast were manipulated in photographs of natural scenes to test which of these mechanisms is the best predictor of human behavior. It was shown that a linear addition, as in the original model, matches human behavior best. As all the assumptions of the Saliency Map model on the processes preceding the linear addition of the conspicuity maps are based on physiological research, this result constrains future models in their mechanistic assumption. If models of visual attention are to have ecological validity, comparing visual attention in laboratory and real-world conditions is necessary, and this is done in the second study. In the first condition, eye movements and head-centered, first-person perspective movies were recorded while participants explored 15 real-world environments (“free exploration”). Clips from these movies were shown to participants in two laboratory tasks. First, the movies were replayed as they were recorded (“video replay”), and second, a shuffled selection of frames was shown for 1 second each (“1s frame replay”). Eye-movement recordings from all three conditions revealed that in comparison to 1s frame replay, the video replay condition was qualitatively more alike to the free exploration condition with respect to the distribution of gaze and the relationship between gaze and model saliency and was quantitatively better able to predict free exploration gaze. Furthermore, the onset of a new frame in 1s frame replay evoked a reorientation of gaze towards the center. That is, the event of presenting a stimulus in a laboratory setup affects attention in a way unlikely to occur in real life. In conclusion, video replay is a better model for real-world visual input. The hypothesis that walking on more irregular terrain requires visual attention to be directed at the path more was tested on a local street (“Hirschberg”) in the third study. Participants walked on both sides of this inclined street; a cobbled road and the immediately adjacent, irregular steps. The environment and instructions were kept constant. Gaze was directed at the path more when participants walked on the steps as compared to the road. This was accomplished by pointing both the head and the eyes lower on the steps than on the road, while only eye-in-head orientation was spread out along the vertical more on the steps, indicating more or large eye movements on the more irregular steps. These results confirm earlier findings that eye and head movements play distinct roles in directing gaze in real-world situations. Furthermore, they show that implicit tasks (not falling, in this case) affect visual attention as much as explicit tasks do. In the last study it is asked if actions affect perception. An ambiguous stimulus that is alternatively perceived as rotating clockwise or counterclockwise (the ‘percept’) was used. When participants had to rotate a manipulandum continuously in a pre-defined direction – either clockwise or counterclockwise – and reported their concurrent percept with a keyboard, percepts weren’t affected by movements. If participants had to use the manipulandum to indicate their percept – by rotating either congruently or incongruently with the percept – the movements did affect perception. This shows that ambiguity in visual input is resolved by relying on motor signals, but only when they are relevant for the task at hand. Either by using natural stimuli, by comparing behavior in the laboratory with behavior in the real world, by performing an experiment on the street, or by testing how two diverse but everyday sources of information are integrated, the faculty of vision was studied in more life like situations. The validity of some laboratory work has been examined and confirmed and some first steps in doing experiments in real-world situations have been made. Both seem to be promising approaches for future research

    Influence of Low-Level Stimulus Features, Task Dependent Factors, and Spatial Biases on Overt Visual Attention

    Get PDF
    Visual attention is thought to be driven by the interplay between low-level visual features and task dependent information content of local image regions, as well as by spatial viewing biases. Though dependent on experimental paradigms and model assumptions, this idea has given rise to varying claims that either bottom-up or top-down mechanisms dominate visual attention. To contribute toward a resolution of this discussion, here we quantify the influence of these factors and their relative importance in a set of classification tasks. Our stimuli consist of individual image patches (bubbles). For each bubble we derive three measures: a measure of salience based on low-level stimulus features, a measure of salience based on the task dependent information content derived from our subjects' classification responses and a measure of salience based on spatial viewing biases. Furthermore, we measure the empirical salience of each bubble based on our subjects' measured eye gazes thus characterizing the overt visual attention each bubble receives. A multivariate linear model relates the three salience measures to overt visual attention. It reveals that all three salience measures contribute significantly. The effect of spatial viewing biases is highest and rather constant in different tasks. The contribution of task dependent information is a close runner-up. Specifically, in a standardized task of judging facial expressions it scores highly. The contribution of low-level features is, on average, somewhat lower. However, in a prototypical search task, without an available template, it makes a strong contribution on par with the two other measures. Finally, the contributions of the three factors are only slightly redundant, and the semi-partial correlation coefficients are only slightly lower than the coefficients for full correlations. These data provide evidence that all three measures make significant and independent contributions and that none can be neglected in a model of human overt visual attention

    Dynamical biomarkers in teams and other multiagent systems

    Get PDF
    Effective team behavior in high-performance environments such as in sport and the military requires individual team members to efficiently perceive the unfolding task events, predict the actions and action intents of the other team members, and plan and execute their own actions to simultaneously accomplish individual and collective goals. To enhance team performance through effective cooperation, it is crucial to measure the situation awareness and dynamics of each team member and how they collectively impact the team's functioning. Further, to be practically useful for real-life settings, such measures must be easily obtainable from existing sensors. This paper presents several methodologies that can be used on positional and movement acceleration data of team members to quantify and/or predict team performance, assess situation awareness, and to help identify task-relevant information to support individual decision-making. Given the limited reporting of these methods within military cohorts, these methodologies are described using examples from team sports and teams training in virtual environments, with discussion as to how they can be applied to real-world military teams.</p

    Content-prioritised video coding for British Sign Language communication.

    Get PDF
    Video communication of British Sign Language (BSL) is important for remote interpersonal communication and for the equal provision of services for deaf people. However, the use of video telephony and video conferencing applications for BSL communication is limited by inadequate video quality. BSL is a highly structured, linguistically complete, natural language system that expresses vocabulary and grammar visually and spatially using a complex combination of facial expressions (such as eyebrow movements, eye blinks and mouth/lip shapes), hand gestures, body movements and finger-spelling that change in space and time. Accurate natural BSL communication places specific demands on visual media applications which must compress video image data for efficient transmission. Current video compression schemes apply methods to reduce statistical redundancy and perceptual irrelevance in video image data based on a general model of Human Visual System (HVS) sensitivities. This thesis presents novel video image coding methods developed to achieve the conflicting requirements for high image quality and efficient coding. Novel methods of prioritising visually important video image content for optimised video coding are developed to exploit the HVS spatial and temporal response mechanisms of BSL users (determined by Eye Movement Tracking) and the characteristics of BSL video image content. The methods implement an accurate model of HVS foveation, applied in the spatial and temporal domains, at the pre-processing stage of a current standard-based system (H.264). Comparison of the performance of the developed and standard coding systems, using methods of video quality evaluation developed for this thesis, demonstrates improved perceived quality at low bit rates. BSL users, broadcasters and service providers benefit from the perception of high quality video over a range of available transmission bandwidths. The research community benefits from a new approach to video coding optimisation and better understanding of the communication needs of deaf people

    Texture Structure Analysis

    Get PDF
    abstract: Texture analysis plays an important role in applications like automated pattern inspection, image and video compression, content-based image retrieval, remote-sensing, medical imaging and document processing, to name a few. Texture Structure Analysis is the process of studying the structure present in the textures. This structure can be expressed in terms of perceived regularity. Our human visual system (HVS) uses the perceived regularity as one of the important pre-attentive cues in low-level image understanding. Similar to the HVS, image processing and computer vision systems can make fast and efficient decisions if they can quantify this regularity automatically. In this work, the problem of quantifying the degree of perceived regularity when looking at an arbitrary texture is introduced and addressed. One key contribution of this work is in proposing an objective no-reference perceptual texture regularity metric based on visual saliency. Other key contributions include an adaptive texture synthesis method based on texture regularity, and a low-complexity reduced-reference visual quality metric for assessing the quality of synthesized textures. In order to use the best performing visual attention model on textures, the performance of the most popular visual attention models to predict the visual saliency on textures is evaluated. Since there is no publicly available database with ground-truth saliency maps on images with exclusive texture content, a new eye-tracking database is systematically built. Using the Visual Saliency Map (VSM) generated by the best visual attention model, the proposed texture regularity metric is computed. The proposed metric is based on the observation that VSM characteristics differ between textures of differing regularity. The proposed texture regularity metric is based on two texture regularity scores, namely a textural similarity score and a spatial distribution score. In order to evaluate the performance of the proposed regularity metric, a texture regularity database called RegTEX, is built as a part of this work. It is shown through subjective testing that the proposed metric has a strong correlation with the Mean Opinion Score (MOS) for the perceived regularity of textures. The proposed method is also shown to be robust to geometric and photometric transformations and outperforms some of the popular texture regularity metrics in predicting the perceived regularity. The impact of the proposed metric to improve the performance of many image-processing applications is also presented. The influence of the perceived texture regularity on the perceptual quality of synthesized textures is demonstrated through building a synthesized textures database named SynTEX. It is shown through subjective testing that textures with different degrees of perceived regularities exhibit different degrees of vulnerability to artifacts resulting from different texture synthesis approaches. This work also proposes an algorithm for adaptively selecting the appropriate texture synthesis method based on the perceived regularity of the original texture. A reduced-reference texture quality metric for texture synthesis is also proposed as part of this work. The metric is based on the change in perceived regularity and the change in perceived granularity between the original and the synthesized textures. The perceived granularity is quantified through a new granularity metric that is proposed in this work. It is shown through subjective testing that the proposed quality metric, using just 2 parameters, has a strong correlation with the MOS for the fidelity of synthesized textures and outperforms the state-of-the-art full-reference quality metrics on 3 different texture databases. Finally, the ability of the proposed regularity metric in predicting the perceived degradation of textures due to compression and blur artifacts is also established.Dissertation/ThesisPh.D. Electrical Engineering 201

    Perception and Orientation in Minimally Invasive Surgery

    No full text
    During the last two decades, we have seen a revolution in the way that we perform abdominal surgery with increased reliance on minimally invasive techniques. This paradigm shift has come at a rapid pace, with laparoscopic surgery now representing the gold standard for many surgical procedures and further minimisation of invasiveness being seen with the recent clinical introduction of novel techniques such as single-incision laparoscopic surgery and natural orifice translumenal endoscopic surgery. Despite the obvious benefits conferred on the patient in terms of morbidity, length of hospital stay and post-operative pain, this paradigm shift comes at a significantly higher demand on the surgeon, in terms of both perception and manual dexterity. The issues involved include degradation of sensory input to the operator compared to conventional open surgery owing to a loss of three-dimensional vision through the use of the two-dimensional operative interface, and decreased haptic feedback from the instruments. These changes have led to a much higher cognitive load on the surgeon and a greater risk of operator disorientation leading to potential surgical errors. This thesis represents a detailed investigation of disorientation in minimally invasive surgery. In this thesis, eye tracking methodology is identified as the method of choice for evaluating behavioural patterns during orientation. An analysis framework is proposed to profile orientation behaviour using eye tracking data validated in a laboratory model. This framework is used to characterise and quantify successful orientation strategies at critical stages of laparoscopic cholecystectomy and furthermore use these strategies to prove that focused teaching of this behaviour in novices can significantly increase performance in this task. Orientation strategies are then characterised for common clinical scenarios in natural orifice translumenal endoscopic surgery and the concept of image saliency is introduced to further investigate the importance of specific visual cues associated with effective orientation. Profiling of behavioural patterns is related to performance in orientation and implications on education and construction of smart surgical robots are drawn. Finally, a method for potentially decreasing operator disorientation is investigated in the form of endoscopic horizon stabilization in a simulated operative model for transgastric surgery. The major original contributions of this thesis include: Validation of a profiling methodology/framework to characterise orientation behaviour Identification of high performance orientation strategies in specific clinical scenarios including laparoscopic cholecystectomy and natural orifice translumenal endoscopic surgery Evaluation of the efficacy of teaching orientation strategies Evaluation of automatic endoscopic horizon stabilization in natural orifice translumenal endoscopic surgery The impact of the results presented in this thesis, as well as the potential for further high impact research is discussed in the context of both eye tracking as an evaluation tool in minimally invasive surgery as well as implementation of means to combat operator disorientation in a surgical platform. The work also provides further insight into the practical implementation of computer-assistance and technological innovation in future flexible access surgical platforms

    A Role for Hippocampal Sharp-wave Ripples in Active Visual Search

    Get PDF
    Sharp-wave ripples (SWRs) in the hippocampus are thought to contribute to memory formation, though this effect has only been demonstrated in rodents. The SWR, a large deflection in the hippocampal LFP (local field potential), is known to occur primarily during slow wave sleep and during immobility and consummator behaviors. SWRs have widespread effects throughout the cortex, and are directly implicated in memory formation their occurrence correlates with correct performance, and their ablation impairs memory in spatial memory tasks. Though SWRs have been reported in primates, their role is poorly understood. Whether or not SWRs play a role in memory formation, as they do in rodents, has yet to be confirmed. This work encompasses three separate studies with the goal of determining whether there is a link between SWR occurrence and memory formation in the macaque. Chapter 2 establishes the validity of the modified Change Blindness task as a memory task which is sensitive to normal hippocampal function in monkeys. Chapter 3 establishes that SWR events occur during waking (and stationary) activity, during visual search, in the macaque. Until this work, the prevalence of SWRs in macaques during waking exploration was unknown. Chapter 4 shows that gaze during SWRs was more likely to be near the target object on repeated than on novel presentations, even after accounting for overall differences in gaze location with scene repetition. The increase in ripple likelihood near remembered visual objects suggests a link between ripples and memory in primates; specifically, SWRs may reflect part of a mechanism supporting the guidance of search based on experience. The amalgamation of this work reveals several novel findings and establishes an important step towards understanding the role that SWRs play in memory formation in predominantly-visual primate brains
    corecore