744 research outputs found
Summarizing First-Person Videos from Third Persons' Points of Views
Video highlight or summarization is among interesting topics in computer
vision, which benefits a variety of applications like viewing, searching, or
storage. However, most existing studies rely on training data of third-person
videos, which cannot easily generalize to highlight the first-person ones. With
the goal of deriving an effective model to summarize first-person videos, we
propose a novel deep neural network architecture for describing and
discriminating vital spatiotemporal information across videos with different
points of view. Our proposed model is realized in a semi-supervised setting, in
which fully annotated third-person videos, unlabeled first-person videos, and a
small number of annotated first-person ones are presented during training. In
our experiments, qualitative and quantitative evaluations on both benchmarks
and our collected first-person video datasets are presented.Comment: 16+10 pages, ECCV 201
PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation
Highlight detection models are typically trained to identify cues that make
visual content appealing or interesting for the general public, with the
objective of reducing a video to such moments. However, the "interestingness"
of a video segment or image is subjective. Thus, such highlight models provide
results of limited relevance for the individual user. On the other hand,
training one model per user is inefficient and requires large amounts of
personal information which is typically not available. To overcome these
limitations, we present a global ranking model which conditions on each
particular user's interests. Rather than training one model per user, our model
is personalized via its inputs, which allows it to effectively adapt its
predictions, given only a few user-specific examples. To train this model, we
create a large-scale dataset of users and the GIFs they created, giving us an
accurate indication of their interests. Our experiments show that using the
user history substantially improves the prediction accuracy. On our test set of
850 videos, our model improves the recall by 8% with respect to generic
highlight detectors. Furthermore, our method proves more precise than the
user-agnostic baselines even with just one person-specific example.Comment: Accepted for publication at the 2018 ACM Multimedia Conference (MM
'18
Who's Better? Who's Best? Pairwise Deep Ranking for Skill Determination
We present a method for assessing skill from video, applicable to a variety
of tasks, ranging from surgery to drawing and rolling pizza dough. We formulate
the problem as pairwise (who's better?) and overall (who's best?) ranking of
video collections, using supervised deep ranking. We propose a novel loss
function that learns discriminative features when a pair of videos exhibit
variance in skill, and learns shared features when a pair of videos exhibit
comparable skill levels. Results demonstrate our method is applicable across
tasks, with the percentage of correctly ordered pairs of videos ranging from
70% to 83% for four datasets. We demonstrate the robustness of our approach via
sensitivity analysis of its parameters. We see this work as effort toward the
automated organization of how-to video collections and overall, generic skill
determination in video.Comment: CVPR 201
- …