3 research outputs found
Retrieving, annotating and recognizing human activities in web videos
Recent e orts in computer vision tackle the problem of human activity understanding in video sequences. Traditionally, these algorithms require annotated video data to learn models. In this work, we introduce a novel data collection framework, to take advantage of the large amount of video data available on the web. We use this new framework to retrieve videos of human activities, and build training and evaluation datasets for computer vision algorithms. We rely on Amazon Mechanical Turk workers to obtain high accuracy annotations. An agglomerative clustering technique brings the possibility to achieve reliable and consistent annotations for temporal localization of human activities in videos. Using two datasets, Olympics Sports and our novel Daily Human Activities dataset, we show that our collection/annotation framework can make robust annotations of human activities in large amount of video data
VideoMap: Video Editing in Latent Space
Video has become a dominant form of media. However, video editing interfaces
have remained largely unchanged over the past two decades. Such interfaces
typically consist of a grid-like asset management panel and a linear editing
timeline. When working with a large number of video clips, it can be difficult
to sort through them all and identify patterns within (e.g. opportunities for
smooth transitions and storytelling). In this work, we imagine a new paradigm
for video editing by mapping videos into a 2D latent space and building a
proof-of-concept interface.Comment: Accepted to NeurIPS 2022 Workshop on Machine Learning for Creativity
and Design. Website: https://chuanenlin.com/videoma
Videogenic: Video Highlights via Photogenic Moments
This paper investigates the challenge of extracting highlight moments from
videos. To perform this task, a system needs to understand what constitutes a
highlight for arbitrary video domains while at the same time being able to
scale across different domains. Our key insight is that photographs taken by
photographers tend to capture the most remarkable or photogenic moments of an
activity. Drawing on this insight, we present Videogenic, a system capable of
creating domain-specific highlight videos for a wide range of domains. In a
human evaluation study (N=50), we show that a high-quality photograph
collection combined with CLIP-based retrieval (which uses a neural network with
semantic knowledge of images) can serve as an excellent prior for finding video
highlights. In a within-subjects expert study (N=12), we demonstrate the
usefulness of Videogenic in helping video editors create highlight videos with
lighter workload, shorter task completion time, and better usability.Comment: Accepted to NeurIPS 2022 Workshop on Machine Learning for Creativity
and Design. Website: https://chuanenlin.com/videogeni