19 research outputs found
SeekNet: Improved Human Instance Segmentation via Reinforcement Learning Based Optimized Robot Relocation
Amodal recognition is the ability of the system to detect occluded objects.
Most state-of-the-art Visual Recognition systems lack the ability to perform
amodal recognition. Few studies have achieved amodal recognition through
passive prediction or embodied recognition approaches. However, these
approaches suffer from challenges in real-world applications, such as dynamic
objects. We propose SeekNet, an improved optimization method for amodal
recognition through embodied visual recognition. Additionally, we implement
SeekNet for social robots, where there are multiple interactions with crowded
humans. Hence, we focus on occluded human detection & tracking and showcase the
superiority of our algorithm over other baselines. We also experiment with
SeekNet to improve the confidence of COVID-19 symptoms pre-screening algorithms
using our efficient embodied recognition system
Amodal Segmentation through Out-of-Task and Out-of-Distribution Generalization with a Bayesian Model
Amodal completion is a visual task that humans perform easily but which is
difficult for computer vision algorithms. The aim is to segment those object
boundaries which are occluded and hence invisible. This task is particularly
challenging for deep neural networks because data is difficult to obtain and
annotate. Therefore, we formulate amodal segmentation as an out-of-task and
out-of-distribution generalization problem. Specifically, we replace the fully
connected classifier in neural networks with a Bayesian generative model of the
neural network features. The model is trained from non-occluded images using
bounding box annotations and class labels only, but is applied to generalize
out-of-task to object segmentation and to generalize out-of-distribution to
segment occluded objects. We demonstrate how such Bayesian models can naturally
generalize beyond the training task labels when they learn a prior that models
the object's background context and shape. Moreover, by leveraging an outlier
process, Bayesian models can further generalize out-of-distribution to segment
partially occluded objects and to predict their amodal object boundaries. Our
algorithm outperforms alternative methods that use the same supervision by a
large margin, and even outperforms methods where annotated amodal segmentations
are used during training, when the amount of occlusion is large. Code is
publically available at https://github.com/YihongSun/Bayesian-Amodal
Amodal Segmentation Based on Visible Region Segmentation and Shape Prior
Almost all existing amodal segmentation methods make the inferences of
occluded regions by using features corresponding to the whole image. This is
against the human's amodal perception, where human uses the visible part and
the shape prior knowledge of the target to infer the occluded region. To mimic
the behavior of human and solve the ambiguity in the learning, we propose a
framework, it firstly estimates a coarse visible mask and a coarse amodal mask.
Then based on the coarse prediction, our model infers the amodal mask by
concentrating on the visible region and utilizing the shape prior in the
memory. In this way, features corresponding to background and occlusion can be
suppressed for amodal mask estimation. Consequently, the amodal mask would not
be affected by what the occlusion is given the same visible regions. The
leverage of shape prior makes the amodal mask estimation more robust and
reasonable. Our proposed model is evaluated on three datasets. Experiments show
that our proposed model outperforms existing state-of-the-art methods. The
visualization of shape prior indicates that the category-specific feature in
the codebook has certain interpretability.Comment: Accepted by AAAI 202
AIMS: All-Inclusive Multi-Level Segmentation
Despite the progress of image segmentation for accurate visual entity
segmentation, completing the diverse requirements of image editing applications
for different-level region-of-interest selections remains unsolved. In this
paper, we propose a new task, All-Inclusive Multi-Level Segmentation (AIMS),
which segments visual regions into three levels: part, entity, and relation
(two entities with some semantic relationships). We also build a unified AIMS
model through multi-dataset multi-task training to address the two major
challenges of annotation inconsistency and task correlation. Specifically, we
propose task complementarity, association, and prompt mask encoder for
three-level predictions. Extensive experiments demonstrate the effectiveness
and generalization capacity of our method compared to other state-of-the-art
methods on a single dataset or the concurrent work on segmenting anything. We
will make our code and training model publicly available.Comment: Technical Repor