64,132 research outputs found
Lucid Data Dreaming for Video Object Segmentation
Convolutional networks reach top quality in pixel-level video object
segmentation but require a large amount of training data (1k~100k) to deliver
such results. We propose a new training strategy which achieves
state-of-the-art results across three evaluation datasets while using 20x~1000x
less annotated data than competing methods. Our approach is suitable for both
single and multiple object segmentation. Instead of using large training sets
hoping to generalize across domains, we generate in-domain training data using
the provided annotation on the first frame of each video to synthesize ("lucid
dream") plausible future video frames. In-domain per-video training data allows
us to train high quality appearance- and motion-based models, as well as tune
the post-processing stage. This approach allows to reach competitive results
even when training from only a single annotated frame, without ImageNet
pre-training. Our results indicate that using a larger training set is not
automatically better, and that for the video object segmentation task a smaller
training set that is closer to the target domain is more effective. This
changes the mindset regarding how many training samples and general
"objectness" knowledge are required for the video object segmentation task.Comment: Accepted in International Journal of Computer Vision (IJCV
Future Frame Prediction for Anomaly Detection -- A New Baseline
Anomaly detection in videos refers to the identification of events that do
not conform to expected behavior. However, almost all existing methods tackle
the problem by minimizing the reconstruction errors of training data, which
cannot guarantee a larger reconstruction error for an abnormal event. In this
paper, we propose to tackle the anomaly detection problem within a video
prediction framework. To the best of our knowledge, this is the first work that
leverages the difference between a predicted future frame and its ground truth
to detect an abnormal event. To predict a future frame with higher quality for
normal events, other than the commonly used appearance (spatial) constraints on
intensity and gradient, we also introduce a motion (temporal) constraint in
video prediction by enforcing the optical flow between predicted frames and
ground truth frames to be consistent, and this is the first work that
introduces a temporal constraint into the video prediction task. Such spatial
and motion constraints facilitate the future frame prediction for normal
events, and consequently facilitate to identify those abnormal events that do
not conform the expectation. Extensive experiments on both a toy dataset and
some publicly available datasets validate the effectiveness of our method in
terms of robustness to the uncertainty in normal events and the sensitivity to
abnormal events.Comment: IEEE Conference on Computer Vision and Pattern Recognition 201
- …