4,472 research outputs found
DDNet: Dual-path Decoder Network for Occlusion Relationship Reasoning
Occlusion relationship reasoning based on convolution neural networks
consists of two subtasks: occlusion boundary extraction and occlusion
orientation inference. Due to the essential differences between the two
subtasks in the feature expression at the higher and lower stages, it is
challenging to carry on them simultaneously in one network. To address this
issue, we propose a novel Dual-path Decoder Network, which uniformly extracts
occlusion information at higher stages and separates into two paths to recover
boundary and occlusion orientation respectively in lower stages. Besides,
considering the restriction of occlusion orientation presentation to occlusion
orientation learning, we design a new orthogonal representation for occlusion
orientation and proposed the Orthogonal Orientation Regression loss which can
get rid of the unfitness between occlusion representation and learning and
further prompt the occlusion orientation learning. Finally, we apply a
multi-scale loss together with our proposed orientation regression loss to
guide the boundary and orientation path learning respectively. Experiments
demonstrate that our proposed method achieves state-of-the-art results on PIOD
and BSDS ownership datasets
Parsing Occluded People by Flexible Compositions
This paper presents an approach to parsing humans when there is significant
occlusion. We model humans using a graphical model which has a tree structure
building on recent work [32, 6] and exploit the connectivity prior that, even
in presence of occlusion, the visible nodes form a connected subtree of the
graphical model. We call each connected subtree a flexible composition of
object parts. This involves a novel method for learning occlusion cues. During
inference we need to search over a mixture of different flexible models. By
exploiting part sharing, we show that this inference can be done extremely
efficiently requiring only twice as many computations as searching for the
entire object (i.e., not modeling occlusion). We evaluate our model on the
standard benchmarked "We Are Family" Stickmen dataset and obtain significant
performance improvements over the best alternative algorithms.Comment: CVPR 15 Camera Read
Occlusion-Aware Instance Segmentation via BiLayer Network Architectures
Segmenting highly-overlapping image objects is challenging, because there is
typically no distinction between real object contours and occlusion boundaries
on images. Unlike previous instance segmentation methods, we model image
formation as a composition of two overlapping layers, and propose Bilayer
Convolutional Network (BCNet), where the top layer detects occluding objects
(occluders) and the bottom layer infers partially occluded instances
(occludees). The explicit modeling of occlusion relationship with bilayer
structure naturally decouples the boundaries of both the occluding and occluded
instances, and considers the interaction between them during mask regression.
We investigate the efficacy of bilayer structure using two popular
convolutional network designs, namely, Fully Convolutional Network (FCN) and
Graph Convolutional Network (GCN). Further, we formulate bilayer decoupling
using the vision transformer (ViT), by representing instances in the image as
separate learnable occluder and occludee queries. Large and consistent
improvements using one/two-stage and query-based object detectors with various
backbones and network layer choices validate the generalization ability of
bilayer decoupling, as shown by extensive experiments on image instance
segmentation benchmarks (COCO, KINS, COCOA) and video instance segmentation
benchmarks (YTVIS, OVIS, BDD100K MOTS), especially for heavy occlusion cases.
Code and data are available at https://github.com/lkeab/BCNet.Comment: Extended version of "Deep Occlusion-Aware Instance Segmentation with
Overlapping BiLayers", CVPR 2021 (arXiv:2103.12340
Occlusion reasoning for multiple object visual tracking
Thesis (Ph.D.)--Boston UniversityOcclusion reasoning for visual object tracking in uncontrolled environments is a challenging problem. It becomes significantly more difficult when dense groups of indistinguishable objects are present in the scene that cause frequent inter-object interactions and occlusions. We present several practical solutions that tackle the inter-object occlusions for video surveillance applications.
In particular, this thesis proposes three methods. First, we propose "reconstruction-tracking," an online multi-camera spatial-temporal data association method for tracking large groups of objects imaged with low resolution. As a variant of the well-known Multiple-Hypothesis-Tracker, our approach localizes the positions of objects in 3D space with possibly occluded observations from multiple camera views and performs temporal data association in 3D. Second, we develop "track linking," a class of offline batch processing algorithms for long-term occlusions, where the decision has to be made based on the observations from the entire tracking sequence. We construct a graph representation to characterize occlusion events and propose an efficient graph-based/combinatorial algorithm to resolve occlusions.
Third, we propose a novel Bayesian framework where detection and data association are combined into a single module and solved jointly. Almost all traditional tracking systems address the detection and data association tasks separately in sequential order. Such a design implies that the output of the detector has to be reliable in order to make the data association work. Our framework takes advantage of the often complementary nature of the two subproblems, which not only avoids the error propagation issue from which traditional "detection-tracking approaches" suffer but also eschews common heuristics such as "nonmaximum suppression" of hypotheses by modeling the likelihood of the entire image.
The thesis describes a substantial number of experiments, involving challenging, notably distinct simulated and real data, including infrared and visible-light data sets recorded ourselves or taken from data sets publicly available. In these videos, the number of objects ranges from a dozen to a hundred per frame in both monocular and multiple views. The experiments demonstrate that our approaches achieve results comparable to those of state-of-the-art approaches
Scrutinizing and De-Biasing Intuitive Physics with Neural Stethoscopes
Visually predicting the stability of block towers is a popular task in the
domain of intuitive physics. While previous work focusses on prediction
accuracy, a one-dimensional performance measure, we provide a broader analysis
of the learned physical understanding of the final model and how the learning
process can be guided. To this end, we introduce neural stethoscopes as a
general purpose framework for quantifying the degree of importance of specific
factors of influence in deep neural networks as well as for actively promoting
and suppressing information as appropriate. In doing so, we unify concepts from
multitask learning as well as training with auxiliary and adversarial losses.
We apply neural stethoscopes to analyse the state-of-the-art neural network for
stability prediction. We show that the baseline model is susceptible to being
misled by incorrect visual cues. This leads to a performance breakdown to the
level of random guessing when training on scenarios where visual cues are
inversely correlated with stability. Using stethoscopes to promote meaningful
feature extraction increases performance from 51% to 90% prediction accuracy.
Conversely, training on an easy dataset where visual cues are positively
correlated with stability, the baseline model learns a bias leading to poor
performance on a harder dataset. Using an adversarial stethoscope, the network
is successfully de-biased, leading to a performance increase from 66% to 88%
Occluded Person Re-Identification via Relational Adaptive Feature Correction Learning
Occluded person re-identification (Re-ID) in images captured by multiple
cameras is challenging because the target person is occluded by pedestrians or
objects, especially in crowded scenes. In addition to the processes performed
during holistic person Re-ID, occluded person Re-ID involves the removal of
obstacles and the detection of partially visible body parts. Most existing
methods utilize the off-the-shelf pose or parsing networks as pseudo labels,
which are prone to error. To address these issues, we propose a novel Occlusion
Correction Network (OCNet) that corrects features through relational-weight
learning and obtains diverse and representative features without using external
networks. In addition, we present a simple concept of a center feature in order
to provide an intuitive solution to pedestrian occlusion scenarios.
Furthermore, we suggest the idea of Separation Loss (SL) for focusing on
different parts between global features and part features. We conduct extensive
experiments on five challenging benchmark datasets for occluded and holistic
Re-ID tasks to demonstrate that our method achieves superior performance to
state-of-the-art methods especially on occluded scene.Comment: ICASSP 202
- …