730,977 research outputs found
Summarizing First-Person Videos from Third Persons' Points of Views
Video highlight or summarization is among interesting topics in computer
vision, which benefits a variety of applications like viewing, searching, or
storage. However, most existing studies rely on training data of third-person
videos, which cannot easily generalize to highlight the first-person ones. With
the goal of deriving an effective model to summarize first-person videos, we
propose a novel deep neural network architecture for describing and
discriminating vital spatiotemporal information across videos with different
points of view. Our proposed model is realized in a semi-supervised setting, in
which fully annotated third-person videos, unlabeled first-person videos, and a
small number of annotated first-person ones are presented during training. In
our experiments, qualitative and quantitative evaluations on both benchmarks
and our collected first-person video datasets are presented.Comment: 16+10 pages, ECCV 201
Hierarchical Video Generation from Orthogonal Information: Optical Flow and Texture
Learning to represent and generate videos from unlabeled data is a very
challenging problem. To generate realistic videos, it is important not only to
ensure that the appearance of each frame is real, but also to ensure the
plausibility of a video motion and consistency of a video appearance in the
time direction. The process of video generation should be divided according to
these intrinsic difficulties. In this study, we focus on the motion and
appearance information as two important orthogonal components of a video, and
propose Flow-and-Texture-Generative Adversarial Networks (FTGAN) consisting of
FlowGAN and TextureGAN. In order to avoid a huge annotation cost, we have to
explore a way to learn from unlabeled data. Thus, we employ optical flow as
motion information to generate videos. FlowGAN generates optical flow, which
contains only the edge and motion of the videos to be begerated. On the other
hand, TextureGAN specializes in giving a texture to optical flow generated by
FlowGAN. This hierarchical approach brings more realistic videos with plausible
motion and appearance consistency. Our experiments show that our model
generates more plausible motion videos and also achieves significantly improved
performance for unsupervised action classification in comparison to previous
GAN works. In addition, because our model generates videos from two independent
information, our model can generate new combinations of motion and attribute
that are not seen in training data, such as a video in which a person is doing
sit-up in a baseball ground.Comment: Our supplemental material is available on
http://www.mi.t.u-tokyo.ac.jp/assets/publication/hierarchical_video_generation_sup/
Accepted to AAAI201
Taking video cameras into the classroom.
Research into the communication and interactions in classrooms need to take the multimodal nature of classrooms into account. Video cameras can capture the dynamics of teaching and learning, but the use of videos for research purposes needs to be well thought through in order to accommodate the challenges this tool holds. This article refers to three research projects where videos were used to generate data. It is argued that videos allow the researcher to hone in on the micro-details and, in contrast to other data generation tools, allows researchers who were not present at the time to view what has been witnessed. A video recording is a data source but not data by itself and the information that is discerned from a video is framed and shaped by the research paradigm and the questions asked
Will This Video Go Viral? Explaining and Predicting the Popularity of Youtube Videos
What makes content go viral? Which videos become popular and why others
don't? Such questions have elicited significant attention from both researchers
and industry, particularly in the context of online media. A range of models
have been recently proposed to explain and predict popularity; however, there
is a short supply of practical tools, accessible for regular users, that
leverage these theoretical results. HIPie -- an interactive visualization
system -- is created to fill this gap, by enabling users to reason about the
virality and the popularity of online videos. It retrieves the metadata and the
past popularity series of Youtube videos, it employs Hawkes Intensity Process,
a state-of-the-art online popularity model for explaining and predicting video
popularity, and it presents videos comparatively in a series of interactive
plots. This system will help both content consumers and content producers in a
range of data-driven inquiries, such as to comparatively analyze videos and
channels, to explain and predict future popularity, to identify viral videos,
and to estimate response to online promotion.Comment: 4 page
A Trip to the Moon: Personalized Animated Movies for Self-reflection
Self-tracking physiological and psychological data poses the challenge of
presentation and interpretation. Insightful narratives for self-tracking data
can motivate the user towards constructive self-reflection. One powerful form
of narrative that engages audience across various culture and age groups is
animated movies. We collected a week of self-reported mood and behavior data
from each user and created in Unity a personalized animation based on their
data. We evaluated the impact of their video in a randomized control trial with
a non-personalized animated video as control. We found that personalized videos
tend to be more emotionally engaging, encouraging greater and lengthier writing
that indicated self-reflection about moods and behaviors, compared to
non-personalized control videos
Co-interest Person Detection from Multiple Wearable Camera Videos
Wearable cameras, such as Google Glass and Go Pro, enable video data
collection over larger areas and from different views. In this paper, we tackle
a new problem of locating the co-interest person (CIP), i.e., the one who draws
attention from most camera wearers, from temporally synchronized videos taken
by multiple wearable cameras. Our basic idea is to exploit the motion patterns
of people and use them to correlate the persons across different videos,
instead of performing appearance-based matching as in traditional video
co-segmentation/localization. This way, we can identify CIP even if a group of
people with similar appearance are present in the view. More specifically, we
detect a set of persons on each frame as the candidates of the CIP and then
build a Conditional Random Field (CRF) model to select the one with consistent
motion patterns in different videos and high spacial-temporal consistency in
each video. We collect three sets of wearable-camera videos for testing the
proposed algorithm. All the involved people have similar appearances in the
collected videos and the experiments demonstrate the effectiveness of the
proposed algorithm.Comment: ICCV 201
Describing and Forecasting Video Access Patterns
Computer systems are increasingly driven by workloads that reflect large-scale social behavior, such as rapid changes in the popularity of media items like videos. Capacity planners and system designers must plan for rapid, massive changes in workloads when such social behavior is a factor. In this paper we make two contributions intended to assist in the design and provisioning of such systems.We analyze an extensive dataset consisting of the daily access counts of hundreds of thousands of YouTube videos. In this dataset, we find that there are two types of videos: those that show rapid changes in popularity, and those that are consistently popular over long time periods. We call these two types rarely-accessed and frequently-accessed videos, respectively. We observe that most of the videos in our data set clearly fall in one of these two types. For each type of video we ask two questions: first, are there relatively simple models that can describe its daily access patterns? And second, can we use these simple models to predict the number of accesses that a video will have in the near future, as a tool for capacity planning? To answer these questions we develop two different frameworks for characterization and forecasting of access patterns. We show that for frequently-accessed videos, daily access patterns can be extracted via principal component analysis, and used efficiently for forecasting. For rarely-accessed videos, we demonstrate a clustering method that allows one to classify bursts of popularity and use those classifications for forecasting
- …
