9,589 research outputs found
Aligned Image-Word Representations Improve Inductive Transfer Across Vision-Language Tasks
An important goal of computer vision is to build systems that learn visual
representations over time that can be applied to many tasks. In this paper, we
investigate a vision-language embedding as a core representation and show that
it leads to better cross-task transfer than standard multi-task learning. In
particular, the task of visual recognition is aligned to the task of visual
question answering by forcing each to use the same word-region embeddings. We
show this leads to greater inductive transfer from recognition to VQA than
standard multitask learning. Visual recognition also improves, especially for
categories that have relatively few recognition training labels but appear
often in the VQA setting. Thus, our paper takes a small step towards creating
more general vision systems by showing the benefit of interpretable, flexible,
and trainable core representations.Comment: Accepted in ICCV 2017. The arxiv version has an extra analysis on
correlation with human attentio
Telematic Immersion: Performance, Technology, and Audiovisual Work in Virtual Studies (2020) by Paulo C. Chagas
This article discusses the concept of telematic immersion developed in partnership with composer Paulo C. Chagas. Focusing on Chagas’s work Virtual Studies (2020) for flute, live electronics, and 3D video, we reflect on different aspects of artistic creativity in the telematic environment including the use of audio and video technology and the multiple connections between musicians and apparatuses. The article discusses how the parameters of chamber music in the physical environment, such as the uniqueness of the live performance, the corporality of sound, and the visual and choreographic dimension of the performance are being transformed through the virtual setting of the telematic environment. Moreover, we introduce the project Connecting Creative Communities as an example of how our research in telematic music has pedagogic applications. We show how the telematic paradigm involves not only the technology of interaction among human beings but also connections between humans, other intelligent systems, and affect
ReLoc-PDR: Visual Relocalization Enhanced Pedestrian Dead Reckoning via Graph Optimization
Accurately and reliably positioning pedestrians in satellite-denied
conditions remains a significant challenge. Pedestrian dead reckoning (PDR) is
commonly employed to estimate pedestrian location using low-cost inertial
sensor. However, PDR is susceptible to drift due to sensor noise, incorrect
step detection, and inaccurate stride length estimation. This work proposes
ReLoc-PDR, a fusion framework combining PDR and visual relocalization using
graph optimization. ReLoc-PDR leverages time-correlated visual observations and
learned descriptors to achieve robust positioning in visually-degraded
environments. A graph optimization-based fusion mechanism with the Tukey kernel
effectively corrects cumulative errors and mitigates the impact of abnormal
visual observations. Real-world experiments demonstrate that our ReLoc-PDR
surpasses representative methods in accuracy and robustness, achieving accurte
and robust pedestrian positioning results using only a smartphone in
challenging environments such as less-textured corridors and dark nighttime
scenarios.Comment: 11 pages, 14 figure
Music in Virtual Space: Theories and Techniques for Sound Spatialization and Virtual Reality-Based Stage Performance
This research explores virtual reality as a medium for live concert performance. I have realized compositions in which the individual performing on stage uses a VR head-mounted display complemented by other performance controllers to explore a composed virtual space. Movements and objects within the space are used to influence and control sound spatialization and diffusion, musical form, and sonic content. Audience members observe this in real-time, watching the performer\u27s journey through the virtual space on a screen while listening to spatialized audio on loudspeakers variable in number and position. The major artistic challenge I will explore through this activity is the relationship between virtual space and musical form. I will also explore and document the technical challenges of this activity, resulting in a shareable software tool called the Multi-source Ambisonic Spatialization Interface (MASI), which is useful in creating a bridge between VR technologies and associated software, ambisonic spatialization techniques, sound synthesis, and audio playback and effects, and establishes a unique workflow for working with sound in virtual space
- …