12,866 research outputs found
Intrinsic Physical Concepts Discovery with Object-Centric Predictive Models
The ability to discover abstract physical concepts and understand how they
work in the world through observing lies at the core of human intelligence. The
acquisition of this ability is based on compositionally perceiving the
environment in terms of objects and relations in an unsupervised manner. Recent
approaches learn object-centric representations and capture visually observable
concepts of objects, e.g., shape, size, and location. In this paper, we take a
step forward and try to discover and represent intrinsic physical concepts such
as mass and charge. We introduce the PHYsical Concepts Inference NEtwork
(PHYCINE), a system that infers physical concepts in different abstract levels
without supervision. The key insights underlining PHYCINE are two-fold,
commonsense knowledge emerges with prediction, and physical concepts of
different abstract levels should be reasoned in a bottom-up fashion. Empirical
evaluation demonstrates that variables inferred by our system work in
accordance with the properties of the corresponding physical concepts. We also
show that object representations containing the discovered physical concepts
variables could help achieve better performance in causal reasoning tasks,
i.e., ComPhy.Comment: Accepted to Computer Vision and Pattern Recognition (CVPR)202
Broadcasting Convolutional Network for Visual Relational Reasoning
In this paper, we propose the Broadcasting Convolutional Network (BCN) that
extracts key object features from the global field of an entire input image and
recognizes their relationship with local features. BCN is a simple network
module that collects effective spatial features, embeds location information
and broadcasts them to the entire feature maps. We further introduce the
Multi-Relational Network (multiRN) that improves the existing Relation Network
(RN) by utilizing the BCN module. In pixel-based relation reasoning problems,
with the help of BCN, multiRN extends the concept of `pairwise relations' in
conventional RNs to `multiwise relations' by relating each object with multiple
objects at once. This yields in O(n) complexity for n objects, which is a vast
computational gain from RNs that take O(n^2). Through experiments, multiRN has
achieved a state-of-the-art performance on CLEVR dataset, which proves the
usability of BCN on relation reasoning problems.Comment: Accepted paper at ECCV 2018. 24 page
Context Disentangling and Prototype Inheriting for Robust Visual Grounding
Visual grounding (VG) aims to locate a specific target in an image based on a
given language query. The discriminative information from context is important
for distinguishing the target from other objects, particularly for the targets
that have the same category as others. However, most previous methods
underestimate such information. Moreover, they are usually designed for the
standard scene (without any novel object), which limits their generalization to
the open-vocabulary scene. In this paper, we propose a novel framework with
context disentangling and prototype inheriting for robust visual grounding to
handle both scenes. Specifically, the context disentangling disentangles the
referent and context features, which achieves better discrimination between
them. The prototype inheriting inherits the prototypes discovered from the
disentangled visual features by a prototype bank to fully utilize the seen
data, especially for the open-vocabulary scene. The fused features, obtained by
leveraging Hadamard product on disentangled linguistic and visual features of
prototypes to avoid sharp adjusting the importance between the two types of
features, are then attached with a special token and feed to a vision
Transformer encoder for bounding box regression. Extensive experiments are
conducted on both standard and open-vocabulary scenes. The performance
comparisons indicate that our method outperforms the state-of-the-art methods
in both scenarios. {The code is available at
https://github.com/WayneTomas/TransCP
- …