1,520 research outputs found
PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation
Today, scene graph generation(SGG) task is largely limited in realistic
scenarios, mainly due to the extremely long-tailed bias of predicate annotation
distribution. Thus, tackling the class imbalance trouble of SGG is critical and
challenging. In this paper, we first discover that when predicate labels have
strong correlation with each other, prevalent re-balancing strategies(e.g.,
re-sampling and re-weighting) will give rise to either over-fitting the tail
data(e.g., bench sitting on sidewalk rather than on), or still suffering the
adverse effect from the original uneven distribution(e.g., aggregating varied
parked on/standing on/sitting on into on). We argue the principal reason is
that re-balancing strategies are sensitive to the frequencies of predicates yet
blind to their relatedness, which may play a more important role to promote the
learning of predicate features. Therefore, we propose a novel
Predicate-Correlation Perception Learning(PCPL for short) scheme to adaptively
seek out appropriate loss weights by directly perceiving and utilizing the
correlation among predicate classes. Moreover, our PCPL framework is further
equipped with a graph encoder module to better extract context features.
Extensive experiments on the benchmark VG150 dataset show that the proposed
PCPL performs markedly better on tail classes while well-preserving the
performance on head ones, which significantly outperforms previous
state-of-the-art methods.Comment: To be appeared on ACMMM 202
Unbiased Directed Object Attention Graph for Object Navigation
Object navigation tasks require agents to locate specific objects in unknown
environments based on visual information. Previously, graph convolutions were
used to implicitly explore the relationships between objects. However, due to
differences in visibility among objects, it is easy to generate biases in
object attention. Thus, in this paper, we propose a directed object attention
(DOA) graph to guide the agent in explicitly learning the attention
relationships between objects, thereby reducing the object attention bias. In
particular, we use the DOA graph to perform unbiased adaptive object attention
(UAOA) on the object features and unbiased adaptive image attention (UAIA) on
the raw images, respectively. To distinguish features in different branches, a
concise adaptive branch energy distribution (ABED) method is proposed. We
assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art
(SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate
(SR), success weighted by path length (SPL) and success weighted by action
efficiency (SAE), respectively.Comment: 13 pages, ready to ACM Mutimedia, under revie
Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting
Images contain rich relational knowledge that can help machines understand
the world. Existing methods on visual knowledge extraction often rely on the
pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation
types), restricting the expressiveness of the extracted knowledge. In this
work, we take a first exploration to a new paradigm of open visual knowledge
extraction. To achieve this, we present OpenVik which consists of an open
relational region detector to detect regions potentially containing relational
knowledge and a visual knowledge generator that generates format-free knowledge
by prompting the large multimodality model with the detected region of
interest. We also explore two data enhancement techniques for diversifying the
generated format-free visual knowledge. Extensive knowledge quality evaluations
highlight the correctness and uniqueness of the extracted open visual knowledge
by OpenVik. Moreover, integrating our extracted knowledge across various visual
reasoning applications shows consistent improvements, indicating the real-world
applicability of OpenVik.Comment: Accepted to NeurIPS 202
- …