4 research outputs found
Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video
We explore object discovery and detector adaptation based on unlabeled video
sequences captured from a mobile platform. We propose a fully automatic
approach for object mining from video which builds upon a generic object
tracking approach. By applying this method to three large video datasets from
autonomous driving and mobile robotics scenarios, we demonstrate its robustness
and generality. Based on the object mining results, we propose a novel approach
for unsupervised object discovery by appearance-based clustering. We show that
this approach successfully discovers interesting objects relevant to driving
scenarios. In addition, we perform self-supervised detector adaptation in order
to improve detection performance on the KITTI dataset for existing categories.
Our approach has direct relevance for enabling large-scale object learning for
autonomous driving.Comment: CVPR'18 submissio
A probabilistic constrained clustering for transfer learning and image category discovery
Neural network-based clustering has recently gained popularity, and in
particular a constrained clustering formulation has been proposed to perform
transfer learning and image category discovery using deep learning. The core
idea is to formulate a clustering objective with pairwise constraints that can
be used to train a deep clustering network; therefore the cluster assignments
and their underlying feature representations are jointly optimized end-to-end.
In this work, we provide a novel clustering formulation to address scalability
issues of previous work in terms of optimizing deeper networks and larger
amounts of categories. The proposed objective directly minimizes the negative
log-likelihood of cluster assignment with respect to the pairwise constraints,
has no hyper-parameters, and demonstrates improved scalability and performance
on both supervised learning and unsupervised transfer learning.Comment: CVPR 2018 Deep-Vision Worksho
Self-supervised Transfer Learning for Instance Segmentation through Physical Interaction
Instance segmentation of unknown objects from images is regarded as relevant
for several robot skills including grasping, tracking and object sorting.
Recent results in computer vision have shown that large hand-labeled datasets
enable high segmentation performance. To overcome the time-consuming process of
manually labeling data for new environments, we present a transfer learning
approach for robots that learn to segment objects by interacting with their
environment in a self-supervised manner. Our robot pushes unknown objects on a
table and uses information from optical flow to create training labels in the
form of object masks. To achieve this, we fine-tune an existing DeepMask
network for instance segmentation on the self-labeled training data acquired by
the robot. We evaluate our trained network (SelfDeepMask) on a set of real
images showing challenging and cluttered scenes with novel objects. Here,
SelfDeepMask outperforms the DeepMask network trained on the COCO dataset by
9.5% in average precision. Furthermore, we combine our approach with recent
approaches for training with noisy labels in order to better cope with induced
label noise.Comment: Extended version and code release of accepted IROS 2019 pape
Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation
Current state-of-the-art approaches for Semi-supervised Video Object
Segmentation (Semi-VOS) propagates information from previous frames to generate
segmentation mask for the current frame. This results in high-quality
segmentation across challenging scenarios such as changes in appearance and
occlusion. But it also leads to unnecessary computations for stationary or
slow-moving objects where the change across frames is minimal. In this work, we
exploit this observation by using temporal information to quickly identify
frames with minimal change and skip the heavyweight mask generation step. To
realize this efficiency, we propose a novel dynamic network that estimates
change across frames and decides which path -- computing a full network or
reusing previous frame's feature -- to choose depending on the expected
similarity. Experimental results show that our approach significantly improves
inference speed without much accuracy degradation on challenging Semi-VOS
datasets -- DAVIS 16, DAVIS 17, and YouTube-VOS. Furthermore, our approach can
be applied to multiple Semi-VOS methods demonstrating its generality. The code
is available in https://github.com/HYOJINPARK/Reuse_VOS.Comment: CVPR202