2,378 research outputs found

    Multi-modal surrogates for retrieving and making sense of videos: is synchronization between the multiple modalities optimal?

    Get PDF
    Video surrogates can help people quickly make sense of the content of a video before downloading or seeking more detailed information. Visual and audio features of a video are primary information carriers and might become important components of video retrieval and video sense-making. In the past decades, most research and development efforts on video surrogates have focused on visual features of the video, and comparatively little work has been done on audio surrogates and examining their pros and cons in aiding users' retrieval and sense-making of digital videos. Even less work has been done on multi-modal surrogates, where more than one modality are employed for consuming the surrogates, for example, the audio and visual modalities. This research examined the effectiveness of a number of multi-modal surrogates, and investigated whether synchronization between the audio and visual channels is optimal. A user study was conducted to evaluate six different surrogates on a set of six recognition and inference tasks to answer two main research questions: (1) How do automatically-generated multi-modal surrogates compare to manually-generated ones in video retrieval and video sense-making? and (2) Does synchronization between multiple surrogate channels enhance or inhibit video retrieval and video sense-making? Forty-eight participants participated in the study, in which the surrogates were measured on the the time participants spent on experiencing the surrogates, the time participants spent on doing the tasks, participants' performance accuracy on the tasks, participants' confidence in their task responses, and participants' subjective ratings on the surrogates. On average, the uncoordinated surrogates were more helpful than the coordinated ones, but the manually-generated surrogates were only more helpful than the automatically-generated ones in terms of task completion time. Participants' subjective ratings were more favorable for the coordinated surrogate C2 (Magic A + V) and the uncoordinated surrogate U1 (Magic A + Storyboard V) with respect to usefulness, usability, enjoyment, and engagement. The post-session questionnaire comments demonstrated participants' preference for the coordinated surrogates, but the comments also revealed the value of having uncoordinated sensory channels

    Feedback-Based Gameplay Metrics and Gameplay Performance Segmentation: An audio-visual approach for assessing player experience.

    Get PDF
    Gameplay metrics is a method and approach that is growing in popularity amongst the game studies research community for its capacity to assess players’ engagement with game systems. Yet, little has been done, to date, to quantify players’ responses to feedback employed by games that conveys information to players, i.e., their audio-visual streams. The present thesis introduces a novel approach to player experience assessment - termed feedback-based gameplay metrics - which seeks to gather gameplay metrics from the audio-visual feedback streams presented to the player during play. So far, gameplay metrics - quantitative data about a game state and the player's interaction with the game system - are directly logged via the game's source code. The need to utilise source code restricts the range of games that researchers can analyse. By using computer science algorithms for audio-visual processing, yet to be employed for processing gameplay footage, the present thesis seeks to extract similar metrics through the audio-visual streams, thus circumventing the need for access to, whilst also proposing a method that focuses on describing the way gameplay information is broadcast to the player during play. In order to operationalise feedback-based gameplay metrics, the present thesis introduces the concept of gameplay performance segmentation which describes how coherent segments of play can be identified and extracted from lengthy game play sessions. Moreover, in order to both contextualise the method for processing metrics and provide a conceptual framework for analysing the results of a feedback-based gameplay metric segmentation, a multi-layered architecture based on five gameplay concepts (system, game world instance, spatial-temporal, degree of freedom and interaction) is also introduced. Finally, based on data gathered from game play sessions with participants, the present thesis discusses the validity of feedback-based gameplay metrics, gameplay performance segmentation and the multi-layered architecture. A software system has also been specifically developed to produce gameplay summaries based on feedback-based gameplay metrics, and examples of summaries (based on several games) are presented and analysed. The present thesis also demonstrates that feedback-based gameplay metrics can be conjointly analysed with other forms of data (such as biometry) in order to build a more complete picture of game play experience. Feedback based game-play metrics constitutes a post-processing approach that allows the researcher or analyst to explore the data however they wish and as many times as they wish. The method is also able to process any audio-visual file, and can therefore process material from a range of audio-visual sources. This novel methodology brings together game studies and computer sciences by extending the range of games that can now be researched but also to provide a viable solution accounting for the exact way players experience games

    Unsupervised video indexing on audiovisual characterization of persons

    Get PDF
    Cette thèse consiste à proposer une méthode de caractérisation non-supervisée des intervenants dans les documents audiovisuels, en exploitant des données liées à leur apparence physique et à leur voix. De manière générale, les méthodes d'identification automatique, que ce soit en vidéo ou en audio, nécessitent une quantité importante de connaissances a priori sur le contenu. Dans ce travail, le but est d'étudier les deux modes de façon corrélée et d'exploiter leur propriété respective de manière collaborative et robuste, afin de produire un résultat fiable aussi indépendant que possible de toute connaissance a priori. Plus particulièrement, nous avons étudié les caractéristiques du flux audio et nous avons proposé plusieurs méthodes pour la segmentation et le regroupement en locuteurs que nous avons évaluées dans le cadre d'une campagne d'évaluation. Ensuite, nous avons mené une étude approfondie sur les descripteurs visuels (visage, costume) qui nous ont servis à proposer de nouvelles approches pour la détection, le suivi et le regroupement des personnes. Enfin, le travail s'est focalisé sur la fusion des données audio et vidéo en proposant une approche basée sur le calcul d'une matrice de cooccurrence qui nous a permis d'établir une association entre l'index audio et l'index vidéo et d'effectuer leur correction. Nous pouvons ainsi produire un modèle audiovisuel dynamique des intervenants.This thesis consists to propose a method for an unsupervised characterization of persons within audiovisual documents, by exploring the data related for their physical appearance and their voice. From a general manner, the automatic recognition methods, either in video or audio, need a huge amount of a priori knowledge about their content. In this work, the goal is to study the two modes in a correlated way and to explore their properties in a collaborative and robust way, in order to produce a reliable result as independent as possible from any a priori knowledge. More particularly, we have studied the characteristics of the audio stream and we have proposed many methods for speaker segmentation and clustering and that we have evaluated in a french competition. Then, we have carried a deep study on visual descriptors (face, clothing) that helped us to propose novel approches for detecting, tracking, and clustering of people within the document. Finally, the work was focused on the audiovisual fusion by proposing a method based on computing the cooccurrence matrix that allowed us to establish an association between audio and video indexes, and to correct them. That will enable us to produce a dynamic audiovisual model for each speaker

    Scientific poster session

    Get PDF

    2011 IMSAloquium, Student Investigation Showcase

    Get PDF
    Inquiry Without Boundaries reflects our students’ infinite possibilities to explore their unique passions, develop new interests, and collaborate with experts around the globe.https://digitalcommons.imsa.edu/archives_sir/1003/thumbnail.jp

    Change blindness: eradication of gestalt strategies

    Get PDF
    Arrays of eight, texture-defined rectangles were used as stimuli in a one-shot change blindness (CB) task where there was a 50% chance that one rectangle would change orientation between two successive presentations separated by an interval. CB was eliminated by cueing the target rectangle in the first stimulus, reduced by cueing in the interval and unaffected by cueing in the second presentation. This supports the idea that a representation was formed that persisted through the interval before being 'overwritten' by the second presentation (Landman et al, 2003 Vision Research 43149–164]. Another possibility is that participants used some kind of grouping or Gestalt strategy. To test this we changed the spatial position of the rectangles in the second presentation by shifting them along imaginary spokes (by ±1 degree) emanating from the central fixation point. There was no significant difference seen in performance between this and the standard task [F(1,4)=2.565, p=0.185]. This may suggest two things: (i) Gestalt grouping is not used as a strategy in these tasks, and (ii) it gives further weight to the argument that objects may be stored and retrieved from a pre-attentional store during this task

    Live Video and Image Recolouring for Colour Vision Deficient Patients

    Get PDF
    Colour Vision Deficiency (CVD) is an important issue for a significant population across the globe. There are several types of CVD\u27s, such as monochromacy, dichromacy, trichromacy, and anomalous trichromacy. Each of these categories contain specific other subtypes. The aim of this research is to device a scheme to address CVD by using variations in pixel plotting of colours to capture colour disparities and perform colour compensation. The proposed scheme recolours the video and images by colour contrast variation of each colour for CVD patients, and depending on the type of deficiency, it is able to provide live results. Different types of CVD’s can be identified and cured by changing the particular colour related to it and based upon the type of diseases, it performs RGB (Red, Green, and Blue) to LMS (Long, Medium, and Short) transformation. This helps in colour identification and also adjustments of colour contrasts. The processing and rendering of recoloured video and images, allows the affected patients with CVD to see perfect shades in the recoloured frames of video or images and other modes of files. In this thesis, we propose an efficient recolouring algorithm with a strong focus on real-time applications that is capable of providing different recoloured outputs based on specific types of CVD

    Functional MRI and behavioral investigations of long-term memory-guided visuospatial attention

    Full text link
    Real-world human visual perception is superb, despite pervasive attentional capacity limitations that can severely impact behavioral performance. Long-term memory (LTM) is suggested to play a key role in efficiently deploying attentional resources; however, the nature of LTM-attention interactions remains poorly understood. Here, I present a series of behavioral and functional magnetic resonance imaging (fMRI) investigations of the mechanisms of LTM-guided visual attention in 139 healthy participants (18-34 years). In Experiment 1, I hypothesized that humans can use memory to guide spatial attention to multiple discrete locations that have been previously studied. Participants were able to simultaneously attend to more than one spatial location using an LTM cue in a novel change-detection behavioral paradigm also used in fMRI Experiments 2 and 4. Cortical networks associated with LTM and attention often interact competitively. In Experiment 2, I hypothesized that the cognitive control network supports cooperation between LTM and attention. Three posterior regions involved with cognitive control were more strongly recruited for LTM-guided attention than stimulus-guided attention: the posterior precuneus, posterior callosal sulcus, and lateral intraparietal sulcus. In Experiment 3, I hypothesized that regions identified in Experiment 2 are specifically activated for LTM-guided attention, not for LTM retrieval or stimulus-guided attention alone. This hypothesis was supported. Taken together, the results of Experiments 2 and 3 identify a cognitive control subnetwork specifically recruited for LTM-guided attention. Experiment 4 tested how LTM-guided attention affected spatial responsivity of maps within intraparietal sulcus. I hypothesized that left parietal maps would change their spatial responsivity due to the left lateralized effects of memory retrieval. During stimulus-guided attention, contralateral visuotopic maps in the right but not left intraparietal sulcus responded to the full visual field. In contrast, during LTM-guided attention, maps in both the left and right intraparietal sulcus responded to the full visual field, providing evidence for complementary forms of dynamic recruitment under different attentional conditions. Together, these results demonstrate that LTM-guided attention is supported by a parietal subnetwork within the cognitive control network and that internal attentional states influence the spatial specificity of visuotopically mapped regions in parietal cortex
    corecore