4 research outputs found
Scene-Adaptive Video Frame Interpolation via Meta-Learning
Video frame interpolation is a challenging problem because there are
different scenarios for each video depending on the variety of foreground and
background motion, frame rate, and occlusion. It is therefore difficult for a
single network with fixed parameters to generalize across different videos.
Ideally, one could have a different network for each scenario, but this is
computationally infeasible for practical applications. In this work, we propose
to adapt the model to each video by making use of additional information that
is readily available at test time and yet has not been exploited in previous
works. We first show the benefits of `test-time adaptation' through simple
fine-tuning of a network, then we greatly improve its efficiency by
incorporating meta-learning. We obtain significant performance gains with only
a single gradient update without any additional parameters. Finally, we show
that our meta-learning framework can be easily employed to any video frame
interpolation network and can consistently improve its performance on multiple
benchmark datasets.Comment: CVPR 202
Faster R-CNN for Robust Pedestrian Detection Using Semantic Segmentation Network
Convolutional neural networks (CNN) have enabled significant improvements in pedestrian detection owing to the strong representation ability of the CNN features. However, it is generally difficult to reduce false positives on hard negative samples such as tree leaves, traffic lights, poles, etc. Some of these hard negatives can be removed by making use of high level semantic vision cues. In this paper, we propose a region-based CNN method which makes use of semantic cues for better pedestrian detection. Our method extends the Faster R-CNN detection framework by adding a branch of network for semantic image segmentation. The semantic network aims to compute complementary higher level semantic features to be integrated with the convolutional features. We make use of multi-resolution feature maps extracted from different network layers in order to ensure good detection accuracy for pedestrians at different scales. Boosted forest is used for training the integrated features in a cascaded manner for hard negatives mining. Experiments on the Caltech pedestrian dataset show improvements on detection accuracy with the semantic network. With the deep VGG16 model, our pedestrian detection method achieves robust detection performance on the Caltech dataset