2 research outputs found
Foreground Clustering for Joint Segmentation and Localization in Videos and Images
This paper presents a novel framework in which video/image segmentation and
localization are cast into a single optimization problem that integrates
information from low level appearance cues with that of high level localization
cues in a very weakly supervised manner. The proposed framework leverages two
representations at different levels, exploits the spatial relationship between
bounding boxes and superpixels as linear constraints and simultaneously
discriminates between foreground and background at bounding box and superpixel
level. Different from previous approaches that mainly rely on discriminative
clustering, we incorporate a foreground model that minimizes the histogram
difference of an object across all image frames. Exploiting the geometric
relation between the superpixels and bounding boxes enables the transfer of
segmentation cues to improve localization output and vice-versa. Inclusion of
the foreground model generalizes our discriminative framework to video data
where the background tends to be similar and thus, not discriminative. We
demonstrate the effectiveness of our unified framework on the YouTube Object
video dataset, Internet Object Discovery dataset and Pascal VOC 2007.Comment: In Proceedings of NIPS 201
Appearance Fusion of Multiple Cues for Video Co-localization
This work addresses the joint object discovery problem in videos while
utilizing multiple object-related cues. In contrast to the usual spatial fusion
approach, a novel appearance fusion approach is presented here. Specifically,
this paper proposes an effective fusion process of different GMMs derived from
multiple cues into one GMM. Much the same as any fusion strategy, this approach
also needs some guidance. The proposed method relies on reliability and
consensus phenomenon for guidance. As a case study, we pursue the "video
co-localization" object discovery problem to propose our methodology. Our
experiments on YouTube Objects and YouTube Co-localization datasets demonstrate
that the proposed method of appearance fusion undoubtedly has an advantage over
both the spatial fusion strategy and the current state-of-the-art video
co-localization methods.Comment: 17 Pages and 8 figures. Submitted to ACCV2