939 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Saliency in Augmented Reality
With the rapid development of multimedia technology, Augmented Reality (AR)
has become a promising next-generation mobile platform. The primary theory
underlying AR is human visual confusion, which allows users to perceive the
real-world scenes and augmented contents (virtual-world scenes) simultaneously
by superimposing them together. To achieve good Quality of Experience (QoE), it
is important to understand the interaction between two scenarios, and
harmoniously display AR contents. However, studies on how this superimposition
will influence the human visual attention are lacking. Therefore, in this
paper, we mainly analyze the interaction effect between background (BG) scenes
and AR contents, and study the saliency prediction problem in AR. Specifically,
we first construct a Saliency in AR Dataset (SARD), which contains 450 BG
images, 450 AR images, as well as 1350 superimposed images generated by
superimposing BG and AR images in pair with three mixing levels. A large-scale
eye-tracking experiment among 60 subjects is conducted to collect eye movement
data. To better predict the saliency in AR, we propose a vector quantized
saliency prediction method and generalize it for AR saliency prediction. For
comparison, three benchmark methods are proposed and evaluated together with
our proposed method on our SARD. Experimental results demonstrate the
superiority of our proposed method on both of the common saliency prediction
problem and the AR saliency prediction problem over benchmark methods. Our data
collection methodology, dataset, benchmark methods, and proposed saliency
models will be publicly available to facilitate future research
Complexity measurement and characterization of 360-degree content
The appropriate characterization of the test material, used for subjective evaluation tests and for benchmarking image and video processing algorithms and quality metrics, can be crucial in order to perform comparative studies that provide useful insights. This paper focuses on the characterisation of 360-degree images. We discuss why it is important to take into account the geometry of the signal and the interactive nature of 360-degree content navigation, for a perceptual characterization of these signals. Particularly, we show that the computation of classical indicators of spatial complexity, commonly used for 2D images, might lead to different conclusions depending on the geometrical domain use
Spherical clustering of users navigating 360{\deg} content
In Virtual Reality (VR) applications, understanding how users explore the
omnidirectional content is important to optimize content creation, to develop
user-centric services, or even to detect disorders in medical applications.
Clustering users based on their common navigation patterns is a first direction
to understand users behaviour. However, classical clustering techniques fail in
identifying these common paths, since they are usually focused on minimizing a
simple distance metric. In this paper, we argue that minimizing the distance
metric does not necessarily guarantee to identify users that experience similar
navigation path in the VR domain. Therefore, we propose a graph-based method to
identify clusters of users who are attending the same portion of the spherical
content over time. The proposed solution takes into account the spherical
geometry of the content and aims at clustering users based on the actual
overlap of displayed content among users. Our method is tested on real VR user
navigation patterns. Results show that our solution leads to clusters in which
at least 85% of the content displayed by one user is shared among the other
users belonging to the same cluster.Comment: 5 pages, conference (Published in: ICASSP 2019 - 2019 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP)
D-SAV360: A Dataset of Gaze Scanpaths on 360° Ambisonic Videos
Understanding human visual behavior within virtual reality environments is crucial to fully leverage their potential. While previous research has provided rich visual data from human observers, existing gaze datasets often suffer from the absence of multimodal stimuli. Moreover, no dataset has yet gathered eye gaze trajectories (i.e., scanpaths) for dynamic content with directional ambisonic sound, which is a critical aspect of sound perception by humans. To address this gap, we introduce D-SAV360, a dataset of 4,609 head and eye scanpaths for 360° videos with first-order ambisonics. This dataset enables a more comprehensive study of multimodal interaction on visual behavior in virtual reality environments. We analyze our collected scanpaths from a total of 87 participants viewing 85 different videos and show that various factors such as viewing mode, content type, and gender significantly impact eye movement statistics. We demonstrate the potential of D-SAV360 as a benchmarking resource for state-of-the-art attention prediction models and discuss its possible applications in further research. By providing a comprehensive dataset of eye movement data for dynamic, multimodal virtual environments, our work can facilitate future investigations of visual behavior and attention in virtual reality
Complexity measurement and characterization of 360-degree content
The appropriate characterization of the test material, used for subjective evaluation tests and for benchmarking image and video processing algorithms and quality metrics, can be crucial in order to perform comparative studies that provide useful insights. This paper focuses on the characterisation of 360-degree images. We discuss why it is important to take into account the geometry of the signal and the interactive nature of 360-degree content navigation, for a perceptual characterization of these signals. Particularly, we show that the computation of classical indicators of spatial complexity, commonly used for 2D images, might lead to different conclusions depending on the geometrical domain use
- âŠ