818 research outputs found
Localization and recognition of the scoreboard in sports video based on SIFT point matching
In broadcast sports video, the scoreboard is attached at a fixed location in the video and generally the scoreboard always exists in all video frames in order to help viewers to understand the match’s progression quickly. Based on these observations, we present a new localization and recognition method for scoreboard text in sport videos in this paper. The method first matches the Scale Invariant Feature Transform (SIFT) points using a modified matching technique between two frames extracted from a video clip and then localizes the scoreboard by computing a robust estimate of the matched point cloud in a two-stage non-scoreboard filter process based on some domain rules. Next some enhancement operations are performed on the localized scoreboard, and a Multi-frame Voting Decision is used. Both aim to increasing the OCR rate. Experimental results demonstrate the effectiveness and efficiency of our proposed method
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
Beat-Event Detection in Action Movie Franchises
While important advances were recently made towards temporally localizing and
recognizing specific human actions or activities in videos, efficient detection
and classification of long video chunks belonging to semantically defined
categories such as "pursuit" or "romance" remains challenging.We introduce a
new dataset, Action Movie Franchises, consisting of a collection of Hollywood
action movie franchises. We define 11 non-exclusive semantic categories -
called beat-categories - that are broad enough to cover most of the movie
footage. The corresponding beat-events are annotated as groups of video shots,
possibly overlapping.We propose an approach for localizing beat-events based on
classifying shots into beat-categories and learning the temporal constraints
between shots. We show that temporal constraints significantly improve the
classification performance. We set up an evaluation protocol for beat-event
localization as well as for shot classification, depending on whether movies
from the same franchise are present or not in the training data
- …