986 research outputs found
The Evolution of First Person Vision Methods: A Survey
The emergence of new wearable technologies such as action cameras and
smart-glasses has increased the interest of computer vision scientists in the
First Person perspective. Nowadays, this field is attracting attention and
investments of companies aiming to develop commercial devices with First Person
Vision recording capabilities. Due to this interest, an increasing demand of
methods to process these videos, possibly in real-time, is expected. Current
approaches present a particular combinations of different image features and
quantitative methods to accomplish specific objectives like object detection,
activity recognition, user machine interaction and so on. This paper summarizes
the evolution of the state of the art in First Person Vision video analysis
between 1997 and 2014, highlighting, among others, most commonly used features,
methods, challenges and opportunities within the field.Comment: First Person Vision, Egocentric Vision, Wearable Devices, Smart
Glasses, Computer Vision, Video Analytics, Human-machine Interactio
Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention
We propose a method to detect individualized highlights for users on given
target videos based on their preferred highlight clips marked on previous
videos they have watched. Our method explicitly leverages the contents of both
the preferred clips and the target videos using pre-trained features for the
objects and the human activities. We design a multi-head attention mechanism to
adaptively weigh the preferred clips based on their object- and
human-activity-based contents, and fuse them using these weights into a single
feature representation for each user. We compute similarities between these
per-user feature representations and the per-frame features computed from the
desired target videos to estimate the user-specific highlight clips from the
target videos. We test our method on a large-scale highlight detection dataset
containing the annotated highlights of individual users. Compared to current
baselines, we observe an absolute improvement of 2-4% in the mean average
precision of the detected highlights. We also perform extensive ablation
experiments on the number of preferred highlight clips associated with each
user as well as on the object- and human-activity-based feature representations
to validate that our method is indeed both content-based and user-specific.Comment: 14 pages, 5 figures, 7 table
Activity-driven content adaptation for effective video summarisation
In this paper, we present a novel method for content adaptation and video summarization fully implemented in compressed-domain. Firstly, summarization of generic videos is modeled as the process of extracted human objects under various activities/events. Accordingly, frames are classified into five categories via fuzzy decision including shot changes (cut and gradual transitions), motion activities (camera motion and object motion) and others by using two inter-frame measurements. Secondly, human objects are detected using Haar-like features. With the detected human objects and attained frame categories, activity levels for each frame are determined to adapt with video contents. Continuous frames belonging to same category are grouped to form one activity entry as content of interest (COI) which will convert the original video into a series of activities. An overall adjustable quota is used to control the size of generated summarization for efficient streaming purpose. Upon this quota, the frames selected for summarization are determined by evenly sampling the accumulated activity levels for content adaptation. Quantitative evaluations have proved the effectiveness and efficiency of our proposed approach, which provides a more flexible and general solution for this topic as domain-specific tasks such as accurate recognition of objects can be avoided
- …