7,826 research outputs found
Summarizing First-Person Videos from Third Persons' Points of Views
Video highlight or summarization is among interesting topics in computer
vision, which benefits a variety of applications like viewing, searching, or
storage. However, most existing studies rely on training data of third-person
videos, which cannot easily generalize to highlight the first-person ones. With
the goal of deriving an effective model to summarize first-person videos, we
propose a novel deep neural network architecture for describing and
discriminating vital spatiotemporal information across videos with different
points of view. Our proposed model is realized in a semi-supervised setting, in
which fully annotated third-person videos, unlabeled first-person videos, and a
small number of annotated first-person ones are presented during training. In
our experiments, qualitative and quantitative evaluations on both benchmarks
and our collected first-person video datasets are presented.Comment: 16+10 pages, ECCV 201
Leveraging Contextual Cues for Generating Basketball Highlights
The massive growth of sports videos has resulted in a need for automatic
generation of sports highlights that are comparable in quality to the
hand-edited highlights produced by broadcasters such as ESPN. Unlike previous
works that mostly use audio-visual cues derived from the video, we propose an
approach that additionally leverages contextual cues derived from the
environment that the game is being played in. The contextual cues provide
information about the excitement levels in the game, which can be ranked and
selected to automatically produce high-quality basketball highlights. We
introduce a new dataset of 25 NCAA games along with their play-by-play stats
and the ground-truth excitement data for each basket. We explore the
informativeness of five different cues derived from the video and from the
environment through user studies. Our experiments show that for our study
participants, the highlights produced by our system are comparable to the ones
produced by ESPN for the same games.Comment: Proceedings of ACM Multimedia 201
Show Me What I Like: Detecting User-Specific Video Highlights Using Content-Based Multi-Head Attention
We propose a method to detect individualized highlights for users on given
target videos based on their preferred highlight clips marked on previous
videos they have watched. Our method explicitly leverages the contents of both
the preferred clips and the target videos using pre-trained features for the
objects and the human activities. We design a multi-head attention mechanism to
adaptively weigh the preferred clips based on their object- and
human-activity-based contents, and fuse them using these weights into a single
feature representation for each user. We compute similarities between these
per-user feature representations and the per-frame features computed from the
desired target videos to estimate the user-specific highlight clips from the
target videos. We test our method on a large-scale highlight detection dataset
containing the annotated highlights of individual users. Compared to current
baselines, we observe an absolute improvement of 2-4% in the mean average
precision of the detected highlights. We also perform extensive ablation
experiments on the number of preferred highlight clips associated with each
user as well as on the object- and human-activity-based feature representations
to validate that our method is indeed both content-based and user-specific.Comment: 14 pages, 5 figures, 7 table
- …