38,526 research outputs found
Action Sets: Weakly Supervised Action Segmentation without Ordering Constraints
Action detection and temporal segmentation of actions in videos are topics of
increasing interest. While fully supervised systems have gained much attention
lately, full annotation of each action within the video is costly and
impractical for large amounts of video data. Thus, weakly supervised action
detection and temporal segmentation methods are of great importance. While most
works in this area assume an ordered sequence of occurring actions to be given,
our approach only uses a set of actions. Such action sets provide much less
supervision since neither action ordering nor the number of action occurrences
are known. In exchange, they can be easily obtained, for instance, from
meta-tags, while ordered sequences still require human annotation. We introduce
a system that automatically learns to temporally segment and label actions in a
video, where the only supervision that is used are action sets. An evaluation
on three datasets shows that our method still achieves good results although
the amount of supervision is significantly smaller than for other related
methods.Comment: CVPR 201
Scene extraction in motion pictures
This paper addresses the challenge of bridging the semantic gap between the rich meaning users desire when they query to locate and browse media and the shallowness of media descriptions that can be computed in today\u27s content management systems. To facilitate high-level semantics-based content annotation and interpretation, we tackle the problem of automatic decomposition of motion pictures into meaningful story units, namely scenes. Since a scene is a complicated and subjective concept, we first propose guidelines from fill production to determine when a scene change occurs. We then investigate different rules and conventions followed as part of Fill Grammar that would guide and shape an algorithmic solution for determining a scene. Two different techniques using intershot analysis are proposed as solutions in this paper. In addition, we present different refinement mechanisms, such as film-punctuation detection founded on Film Grammar, to further improve the results. These refinement techniques demonstrate significant improvements in overall performance. Furthermore, we analyze errors in the context of film-production techniques, which offer useful insights into the limitations of our method
- …