4,490 research outputs found
Intent Preserving 360 Video Stabilization Using Constrained Optimization
A system and method are disclosed, that solve for rotational updates in 360 videos by removing camera shakes, while preserving user intended motions. The method uses a constrained nonlinear optimization approach in quaternion space. At first, optimal 3D camera rotations are computed between key frames. 3D camera rotations between consecutive frames are then computed. The first, second, and third derivatives of the resulting camera path are minimized, to stabilize the camera orientation path. The computation strives to find a smooth path, while also limiting its deviation from the original path. The system keeps the orientations close to the original, for example, even when the videographer takes a turn. Each frame is then warped to the stabilized path, which results in a smoother video. The rotational camera updates may be applied to the input stream at source or added as metadata. The technology may influence standards by making rotational updates metadata a component of 360 videos.
KEYWORDS: 360 degree video, camera rotation, removing camera shake, computing camera rotatio
Camera Stabilization in 360° Videos and Its Impact on Cyber Sickness, Environmental Perceptions, and Psychophysiological Responses to a Simulated Nature Walk: A Single-Blinded Randomized Trial
Immersive virtual environments (IVEs) technology has emerged as a valuable tool to environmental psychology research in general, and specifically to studies of human–nature interactions. However, virtual reality is known to induce cyber sickness, which limits its application and highlights the need for scientific strategies to optimize virtual experiences. In this study, we assessed the impact of improved camera stability on cyber sickness, presence, and psychophysiological responses to a simulated nature walk. In a single-blinded trial, 50 participants were assigned to watch, using a head-mounted display, one of two 10-min 360° videos showing a first-person nature walk: one video contained small-magnitude scene oscillations associated with cameraman locomotion, while in the other video, the oscillations were drastically reduced thanks to an electric stabilizer and a dolly. Measurements of cyber sickness (in terms of both occurrence and severity of symptoms), perceptions of the IVE (presence and perceived environmental restorativeness), and indicators of psychophysiological responses [affect, enjoyment, and heart rate (HR)] were collected before and/or after the exposure. Compared to the low-stability (LS) condition, in the high-stability (HS) condition, participants reported lower severity of cyber sickness symptoms. The delta values for pre–post changes in affect for the LS video revealed a deterioration of participants’ affect profile with a significant increase in ratings of negative affect and fatigue, and decrease in ratings of positive affect. In contrast, there were no pre–post changes in affect for the HS video. No differences were found between the HS and LS conditions with respect to presence, perceived environmental restorativeness, enjoyment, and HR. Cyber sickness was significantly correlated with all components of affect and enjoyment, but not with presence, perceived environmental restorativeness, or HR. These findings demonstrate that improved camera stability in 360° videos is crucial to reduce cyber sickness symptoms and negative affective responses in IVE users. The lack of associations between improved stability and presence, perceived environmental restorativeness, and HR suggests that other aspects of IVE technology must be taken into account in order to improve virtual experiences of nature.publishedVersio
A robust and efficient video representation for action recognition
This paper introduces a state-of-the-art video representation and applies it
to efficient action recognition and detection. We first propose to improve the
popular dense trajectory features by explicit camera motion estimation. More
specifically, we extract feature point matches between frames using SURF
descriptors and dense optical flow. The matches are used to estimate a
homography with RANSAC. To improve the robustness of homography estimation, a
human detector is employed to remove outlier matches from the human body as
human motion is not constrained by the camera. Trajectories consistent with the
homography are considered as due to camera motion, and thus removed. We also
use the homography to cancel out camera motion from the optical flow. This
results in significant improvement on motion-based HOF and MBH descriptors. We
further explore the recent Fisher vector as an alternative feature encoding
approach to the standard bag-of-words histogram, and consider different ways to
include spatial layout information in these encodings. We present a large and
varied set of evaluations, considering (i) classification of short basic
actions on six datasets, (ii) localization of such actions in feature-length
movies, and (iii) large-scale recognition of complex events. We find that our
improved trajectory features significantly outperform previous dense
trajectories, and that Fisher vectors are superior to bag-of-words encodings
for video recognition tasks. In all three tasks, we show substantial
improvements over the state-of-the-art results
Domain-Specific Face Synthesis for Video Face Recognition from a Single Sample Per Person
The performance of still-to-video FR systems can decline significantly
because faces captured in unconstrained operational domain (OD) over multiple
video cameras have a different underlying data distribution compared to faces
captured under controlled conditions in the enrollment domain (ED) with a still
camera. This is particularly true when individuals are enrolled to the system
using a single reference still. To improve the robustness of these systems, it
is possible to augment the reference set by generating synthetic faces based on
the original still. However, without knowledge of the OD, many synthetic images
must be generated to account for all possible capture conditions. FR systems
may, therefore, require complex implementations and yield lower accuracy when
training on many less relevant images. This paper introduces an algorithm for
domain-specific face synthesis (DSFS) that exploits the representative
intra-class variation information available from the OD. Prior to operation, a
compact set of faces from unknown persons appearing in the OD is selected
through clustering in the captured condition space. The domain-specific
variations of these face images are projected onto the reference stills by
integrating an image-based face relighting technique inside the 3D
reconstruction framework. A compact set of synthetic faces is generated that
resemble individuals of interest under the capture conditions relevant to the
OD. In a particular implementation based on sparse representation
classification, the synthetic faces generated with the DSFS are employed to
form a cross-domain dictionary that account for structured sparsity.
Experimental results reveal that augmenting the reference gallery set of FR
systems using the proposed DSFS approach can provide a higher level of accuracy
compared to state-of-the-art approaches, with only a moderate increase in its
computational complexity
- …