9,589 research outputs found

    Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks

    Full text link
    An important goal of computer vision is to build systems that learn visual representations over time that can be applied to many tasks. In this paper, we investigate a vision-language embedding as a core representation and show that it leads to better cross-task transfer than standard multi-task learning. In particular, the task of visual recognition is aligned to the task of visual question answering by forcing each to use the same word-region embeddings. We show this leads to greater inductive transfer from recognition to VQA than standard multitask learning. Visual recognition also improves, especially for categories that have relatively few recognition training labels but appear often in the VQA setting. Thus, our paper takes a small step towards creating more general vision systems by showing the benefit of interpretable, flexible, and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on correlation with human attentio

    Telematic Immersion: Performance, Technology, and Audiovisual Work in Virtual Studies (2020) by Paulo C. Chagas

    Get PDF
    This article discusses the concept of telematic immersion developed in partnership with composer Paulo C. Chagas. Focusing on Chagas’s work Virtual Studies (2020) for flute, live electronics, and 3D video, we reflect on different aspects of artistic creativity in the telematic environment including the use of audio and video technology and the multiple connections between musicians and apparatuses. The article discusses how the parameters of chamber music in the physical environment, such as the uniqueness of the live performance, the corporality of sound, and the visual and choreographic dimension of the performance are being transformed through the virtual setting of the telematic environment. Moreover, we introduce the project Connecting Creative Communities as an example of how our research in telematic music has pedagogic applications. We show how the telematic paradigm involves not only the technology of interaction among human beings but also connections between humans, other intelligent systems, and affect

    ReLoc-PDR: Visual Relocalization Enhanced Pedestrian Dead Reckoning via Graph Optimization

    Full text link
    Accurately and reliably positioning pedestrians in satellite-denied conditions remains a significant challenge. Pedestrian dead reckoning (PDR) is commonly employed to estimate pedestrian location using low-cost inertial sensor. However, PDR is susceptible to drift due to sensor noise, incorrect step detection, and inaccurate stride length estimation. This work proposes ReLoc-PDR, a fusion framework combining PDR and visual relocalization using graph optimization. ReLoc-PDR leverages time-correlated visual observations and learned descriptors to achieve robust positioning in visually-degraded environments. A graph optimization-based fusion mechanism with the Tukey kernel effectively corrects cumulative errors and mitigates the impact of abnormal visual observations. Real-world experiments demonstrate that our ReLoc-PDR surpasses representative methods in accuracy and robustness, achieving accurte and robust pedestrian positioning results using only a smartphone in challenging environments such as less-textured corridors and dark nighttime scenarios.Comment: 11 pages, 14 figure

    Music in Virtual Space: Theories and Techniques for Sound Spatialization and Virtual Reality-Based Stage Performance

    Get PDF
    This research explores virtual reality as a medium for live concert performance. I have realized compositions in which the individual performing on stage uses a VR head-mounted display complemented by other performance controllers to explore a composed virtual space. Movements and objects within the space are used to influence and control sound spatialization and diffusion, musical form, and sonic content. Audience members observe this in real-time, watching the performer\u27s journey through the virtual space on a screen while listening to spatialized audio on loudspeakers variable in number and position. The major artistic challenge I will explore through this activity is the relationship between virtual space and musical form. I will also explore and document the technical challenges of this activity, resulting in a shareable software tool called the Multi-source Ambisonic Spatialization Interface (MASI), which is useful in creating a bridge between VR technologies and associated software, ambisonic spatialization techniques, sound synthesis, and audio playback and effects, and establishes a unique workflow for working with sound in virtual space
    corecore