58 research outputs found
Are all the frames equally important?
In this work, we address the problem of measuring and predicting temporal
video saliency - a metric which defines the importance of a video frame for
human attention. Unlike the conventional spatial saliency which defines the
location of the salient regions within a frame (as it is done for still
images), temporal saliency considers importance of a frame as a whole and may
not exist apart from context. The proposed interface is an interactive
cursor-based algorithm for collecting experimental data about temporal
saliency. We collect the first human responses and perform their analysis. As a
result, we show that qualitatively, the produced scores have very explicit
meaning of the semantic changes in a frame, while quantitatively being highly
correlated between all the observers. Apart from that, we show that the
proposed tool can simultaneously collect fixations similar to the ones produced
by eye-tracker in a more affordable way. Further, this approach may be used for
creation of first temporal saliency datasets which will allow training
computational predictive algorithms. The proposed interface does not rely on
any special equipment, which allows to run it remotely and cover a wide
audience.Comment: CHI'20 Late Breaking Work
Katseenseurannan sovellukset mielenkiintoisen alueen HEVC-pakkaukselle
The increase in video streaming services and video resolutions has exploded the volume of Internet video traffic. New video coding standards, such as High Efficiency Video Coding (HEVC) have been developed to mitigate this inevitable video data explosion with better compression. The aim of video coding is to reduce the video size while maintaining the best possible perceived quality. Region of Interest (ROI) encoding particularly addresses this objective by focusing on the areas that humans would pay the most attention at and encode them with higher quality than the non-ROI areas.
Methods for finding the ROI, and video encoding in general, take advantage of the Human Visual System (HVS). Computational HVS models can be used for the ROI detection but all current state-of-the-art models are designed for still images. Eye tracking data can be used for creating and verifying these models, including models suitable for video, which in turn calls for a reliable way to collect eye tracking data. Eye tracking glasses allow the widest range of possible scenarios out of all eye tracking equipment. Therefore, the glasses are used in this work to collect eye tracking data from 41 different videos.
The main contribution of this work is to present a real-time system using eye tracking data to enhance the perceived quality of the video. The proposed system makes use of video recorded from the scene camera of the eye tracking glasses and Kvazaar open-source HEVC encoder for video compression. The system was shown to provide better subjective quality over the native rate control algorithm of Kvazaar. The obtained results were evaluated with Eye tracking Weighted PSNR (EWPSNR) that represents the HVS better than traditional PSNR. The system is shown to achieve up to 33% bit rate reduction for the same EWPSNR and on average 5-10% reduction depending on the parameter set. Additionally, the encoding time is improved by 8-20%
Automatic detection of salient objects and spatial relations in videos for a video database system
Cataloged from PDF version of article.Multimedia databases have gained popularity due to rapidly growing quantities of multimedia data and the need to perform efficient
indexing, retrieval and analysis of this data. One downside of multimedia databases is the necessity to process the data for feature extraction
and labeling prior to storage and querying. Huge amount of data makes it impossible to complete this task manually. We propose a
tool for the automatic detection and tracking of salient objects, and derivation of spatio-temporal relations between them in video. Our
system aims to reduce the work for manual selection and labeling of objects significantly by detecting and tracking the salient objects, and
hence, requiring to enter the label for each object only once within each shot instead of specifying the labels for each object in every frame
they appear. This is also required as a first step in a fully-automatic video database management system in which the labeling should also
be done automatically. The proposed framework covers a scalable architecture for video processing and stages of shot boundary detection,
salient object detection and tracking, and knowledge-base construction for effective spatio-temporal object querying.
(c) 2008 Elsevier B.V. All rights reserved
Glimpse: A gaze-based measure of temporal salience
Temporal salience considers how visual attention varies over time. Although visual salience
has been widely studied from a spatial perspective, its temporal dimension has been mostly ignored,
despite arguably being of utmost importance to understand the temporal evolution of attention
on dynamic contents. To address this gap, we proposed GLIMPSE, a novel measure to compute
temporal salience based on the observer-spatio-temporal consistency of raw gaze data. The measure
is conceptually simple, training free, and provides a semantically meaningful quantification of
visual attention over time. As an extension, we explored scoring algorithms to estimate temporal
salience from spatial salience maps predicted with existing computational models. However, these
approaches generally fall short when compared with our proposed gaze-based measure. GLIMPSE
could serve as the basis for several downstream tasks such as segmentation or summarization of
videos. GLIMPSE’s software and data are publicly available
- …