13,061 research outputs found

    Visual Recognition for Dynamic Scenes

    Get PDF
    abstract: Recognition memory was investigated for naturalistic dynamic scenes. Although visual recognition for static objects and scenes has been investigated previously and found to be extremely robust in terms of fidelity and retention, visual recognition for dynamic scenes has received much less attention. In four experiments, participants view a number of clips from novel films and are then tasked to complete a recognition test containing frames from the previously viewed films and difficult foil frames. Recognition performance is good when foils are taken from other parts of the same film (Experiment 1), but degrades greatly when foils are taken from unseen gaps from within the viewed footage (Experiments 3 and 4). Removing all non-target frames had a serious effect on recognition performance (Experiment 2). Across all experiments, presenting the films as a random series of clips seemed to have no effect on recognition performance. Patterns of accuracy and response latency in Experiments 3 and 4 appear to be a result of a serial-search process. It is concluded that visual representations of dynamic scenes may be stored as units of events, and participant's old/new judgments of individual frames were better characterized by a cued-recall paradigm than traditional recognition judgments.Dissertation/ThesisPh.D. Psychology 201

    Eye Movements during dynamic scene viewing are affected by visual attention skills and events of the scene: Evidence from first-person shooter gameplay videos

    Get PDF
    The role of individual differences during dynamic scene viewing was explored. Participants (N=38) watched a gameplay video of a first-person shooter (FPS) videogame while their eye movements were recorded. In addition, the participants’ skills in three visual attention tasks (attentional blink, visual search, and multiple object tracking) were assessed.  The results showed that individual differences in visual attention tasks were associated with eye movement patterns observed during viewing of the gameplay video. The differences were noted in four eye movement measures: number of fixations, fixation durations, saccade amplitudes and fixation distances from the center of the screen. The individual differences showed during specific events of the video as well as during the video as a whole. The results highlight that an unedited, fast-paced and cluttered dynamic scene can bring about individual differences in dynamic scene viewing

    Eye Movements during dynamic scene viewing are affected by visual attention skills and events of the scene: Evidence from first-person shooter gameplay videos

    Get PDF
    The role of individual differences during dynamic scene viewing was explored. Participants (N=38) watched a gameplay video of a first-person shooter (FPS) videogame while their eye movements were recorded. In addition, the participants’ skills in three visual attention tasks (attentional blink, visual search, and multiple object tracking) were assessed.  The results showed that individual differences in visual attention tasks were associated with eye movement patterns observed during viewing of the gameplay video. The differences were noted in four eye movement measures: number of fixations, fixation durations, saccade amplitudes and fixation distances from the center of the screen. The individual differences showed during specific events of the video as well as during the video as a whole. The results highlight that an unedited, fast-paced and cluttered dynamic scene can bring about individual differences in dynamic scene viewing.</p

    IST Austria Thesis

    Get PDF
    This thesis describes a brittle fracture simulation method for visual effects applications. Building upon a symmetric Galerkin boundary element method, we first compute stress intensity factors following the theory of linear elastic fracture mechanics. We then use these stress intensities to simulate the motion of a propagating crack front at a significantly higher resolution than the overall deformation of the breaking object. Allowing for spatial variations of the material's toughness during crack propagation produces visually realistic, highly-detailed fracture surfaces. Furthermore, we introduce approximations for stress intensities and crack opening displacements, resulting in both practical speed-up and theoretically superior runtime complexity compared to previous methods. While we choose a quasi-static approach to fracture mechanics, ignoring dynamic deformations, we also couple our fracture simulation framework to a standard rigid-body dynamics solver, enabling visual effects artists to simulate both large scale motion, as well as fracturing due to collision forces in a combined system. As fractures inside of an object grow, their geometry must be represented both in the coarse boundary element mesh, as well as at the desired fine output resolution. Using a boundary element method, we avoid complicated volumetric meshing operations. Instead we describe a simple set of surface meshing operations that allow us to progressively add cracks to the mesh of an object and still re-use all previously computed entries of the linear boundary element system matrix. On the high resolution level, we opt for an implicit surface representation. We then describe how to capture fracture surfaces during crack propagation, as well as separate the individual fragments resulting from the fracture process, based on this implicit representation. We show results obtained with our method, either solving the full boundary element system in every time step, or alternatively using our fast approximations. These results demonstrate that both of these methods perform well in basic test cases and produce realistic fracture surfaces. Furthermore we show that our fast approximations substantially out-perform the standard approach in more demanding scenarios. Finally, these two methods naturally combine, using the full solution while the problem size is manageably small and switching to the fast approximations later on. The resulting hybrid method gives the user a direct way to choose between speed and accuracy of the simulation

    MANIPULATION ACTION UNDERSTANDING FOR OBSERVATION AND EXECUTION

    Get PDF
    Modern intelligent agents will need to learn the actions that humans perform. They will need to recognize these actions when they see them and they will need to perform these actions themselves. We want to propose a cognitive system that interprets human manipulation actions from perceptual information (image and depth data) and consists of perceptual modules and reasoning modules that are in interaction with each other. The contributions of this work are given along two core problems at the heart of action understanding: a.) the grounding of relevant information about actions in perception (the perception - action integration problem), and b.) the organization of perceptual and high-level symbolic information for interpreting the actions (the sequencing problem). At the high level, actions are represented with the Manipulation Action Context-free Grammar (MACFG) , a syntactic grammar and associated parsing algorithms, which organizes actions as a sequence of sub-events. Each sub-event is described by the hand (as well as grasp type), movements (actions) and the objects and tools involved, and the relevant information about these quantities is obtained from biological-inspired perception modules. These modules track the hands and objects and recognize the hand grasp, actions, segmentation, and action consequences. Furthermore, a probabilistic semantic parsing framework based on CCG (Combinatory Categorial Grammar) theory is adopted to model the semantic meaning of human manipulation actions. Additionally, the lesson from the findings on mirror neurons is that the two processes of interpreting visually observed action and generating actions, should share the same underlying cognitive process. Recent studies have shown that grammatical structures underlie the representation of manipulation actions, which are used both to understand and to execute these actions. Analogically, understanding manipulation actions is like understanding language, while executing them is like generating language. Experiments on two tasks, 1) a robot observing people performing manipulation actions, and 2) a robot then executing manipulation actions accordingly, are presented to validate the formalism. The technical parts of this thesis are devoted to the experimental setting of task (1), while the task (2) is given as a live demonstration

    Audio-visual saliency prediction for 360â—¦ video via deep learning.

    Get PDF
    The interest in virtual reality (VR) has rapidly grown in recent years, being now widely available to consumers in different forms. This technology provides an unprecedented level of immersion, creating many new possibilities that could change the way people experience digital content. Understanding how users behave and interact with virtual experiences could be decisive for many different applications such as designing better virtual experiences, advanced compression techniques, or medical diagnosis.One of the most critical areas in the study of human behaviour is visual attention. It refers to to the qualities that different items have which makes them stand out and attract our attention.Despite the fact that there have been significant advances in this field in recent years, saliency prediction remains a very challenging problem due to the many factors that affect the behaviour of the observer, such as stimuli sources of different types or users having different backgrounds and emotional states. On top of that, saliency prediction for VR content is even more difficult as this form of media presents additional challenges such as distortions, users having control of the camera, or different stimuli possibly being located outside the current view of the observer.This work proposes a novel saliency prediction solution for 360â—¦ video based on deep learning. Deep learning has been proven to obtain outstanding results in many different image and video tasks, including saliency prediction. Although most works in this field focus solely on visual information, the proposed model incorporates both visual and directional audio information with the objective of obtaining more accurate predictions. It uses a series of convolutional neural networks (CNNs) specially designed for VR content, and it is able to learn spatio-temporal visual and auditory features by using three-dimensional convolutions. It is the first solution to make use of directional audio without the need for a hand-crafted attention modelling technique. The proposed model is evaluated using a publicly available dataset. The results show that it outperforms previous state-of-the-art work in both quantitative and qualitative analysis. Additionally, various ablation studies are presented, supporting the decisions made during the design phase of the model.<br /
    • …
    corecore