10,014 research outputs found
Scene extraction in motion pictures
This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method
Associating characters with events in films
The work presented here combines the analysis of a film's audiovisual features with the analysis of an accompanying audio description. Specifically, we describe a technique for semantic-based indexing of feature films that associates character names with meaningful events. The technique fuses the results of event detection based on audiovisual features with the inferred on-screen presence of characters, based on an analysis of an audio description script. In an evaluation with 215 events from 11 films, the technique performed the character detection task with Precision = 93% and Recall = 71%. We then go on to show how novel access modes to film content are enabled by our analysis. The specific examples illustrated include video retrieval via a combination of event-type and character name and our first steps towards visualization of narrative and character interplay based on characters occurrence and co-occurrence in events
Towards dense object tracking in a 2D honeybee hive
From human crowds to cells in tissue, the detection and efficient tracking of
multiple objects in dense configurations is an important and unsolved problem.
In the past, limitations of image analysis have restricted studies of dense
groups to tracking a single or subset of marked individuals, or to
coarse-grained group-level dynamics, all of which yield incomplete information.
Here, we combine convolutional neural networks (CNNs) with the model
environment of a honeybee hive to automatically recognize all individuals in a
dense group from raw image data. We create new, adapted individual labeling and
use the segmentation architecture U-Net with a loss function dependent on both
object identity and orientation. We additionally exploit temporal regularities
of the video recording in a recurrent manner and achieve near human-level
performance while reducing the network size by 94% compared to the original
U-Net architecture. Given our novel application of CNNs, we generate extensive
problem-specific image data in which labeled examples are produced through a
custom interface with Amazon Mechanical Turk. This dataset contains over
375,000 labeled bee instances across 720 video frames at 2 FPS, representing an
extensive resource for the development and testing of tracking methods. We
correctly detect 96% of individuals with a location error of ~7% of a typical
body dimension, and orientation error of 12 degrees, approximating the
variability of human raters. Our results provide an important step towards
efficient image-based dense object tracking by allowing for the accurate
determination of object location and orientation across time-series image data
efficiently within one network architecture.Comment: 15 pages, including supplementary figures. 1 supplemental movie
available as an ancillary fil
Designing an interface for a digital movie browsing system in the film studies domain
This article explains our work in designing an interface for a digital movie browsing system in the specific application context of film studies. The development of MOVIEBROWSER2 follows some general design guidelines based on an earlier user study with film studies students at Dublin City University. These design guidelines have been used as an input to the MOVIEBROWSER2 system design. The rationale for the interface design decisions has been elaborated. An experiment has been carried out among film studies student, together with a one-semester trial deployment. The results show positive feedback and a better performance in the students’ essay outcome with higher perceived satisfaction level
A system for event-based film browsing
The recent past has seen a proliferation in the amount of digital video content being created and consumed. This is perhaps being driven by the increase in audiovisual quality, as well as the ease with which production, reproduction and consumption is now possible. The widespread use of digital video, as opposed its analogue counterpart, has opened up a plethora of previously impossible applications. This paper builds upon previous work that analysed digital video, namely movies, in order to facilitate presentation in an easily navigable manner. A film browsing interface, termed the MovieBrowser, is described, which allows users to easily locate specific portions of movies, as well as to obtain an understanding of the filming being perused. A number of experiments which assess the system’s performance are also presented
A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation
Recent work has shown that optical flow estimation can be formulated as a
supervised learning task and can be successfully solved with convolutional
networks. Training of the so-called FlowNet was enabled by a large
synthetically generated dataset. The present paper extends the concept of
optical flow estimation via convolutional networks to disparity and scene flow
estimation. To this end, we propose three synthetic stereo video datasets with
sufficient realism, variation, and size to successfully train large networks.
Our datasets are the first large-scale datasets to enable training and
evaluating scene flow methods. Besides the datasets, we present a convolutional
network for real-time disparity estimation that provides state-of-the-art
results. By combining a flow and disparity estimation network and training it
jointly, we demonstrate the first scene flow estimation with a convolutional
network.Comment: Includes supplementary materia
Movie Description
Audio Description (AD) provides linguistic descriptions of movies and allows
visually impaired people to follow a movie along with their peers. Such
descriptions are by design mainly visual and thus naturally form an interesting
data source for computer vision and computational linguistics. In this work we
propose a novel dataset which contains transcribed ADs, which are temporally
aligned to full length movies. In addition we also collected and aligned movie
scripts used in prior work and compare the two sources of descriptions. In
total the Large Scale Movie Description Challenge (LSMDC) contains a parallel
corpus of 118,114 sentences and video clips from 202 movies. First we
characterize the dataset by benchmarking different approaches for generating
video descriptions. Comparing ADs to scripts, we find that ADs are indeed more
visual and describe precisely what is shown rather than what should happen
according to the scripts created prior to movie production. Furthermore, we
present and compare the results of several teams who participated in a
challenge organized in the context of the workshop "Describing and
Understanding Video & The Large Scale Movie Description Challenge (LSMDC)", at
ICCV 2015
- …