Search CORE

1,520 research outputs found

PCPL: Predicate-Correlation Perception Learning for Unbiased Scene Graph Generation

Author: Chen Yaowu
Hua Xian-Sheng
Huang Jianqiang
Jiang Rongxin
Jin Zhongming
Shen Chen
Yan Shaotian
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 02/09/2020
Field of study

Today, scene graph generation(SGG) task is largely limited in realistic scenarios, mainly due to the extremely long-tailed bias of predicate annotation distribution. Thus, tackling the class imbalance trouble of SGG is critical and challenging. In this paper, we first discover that when predicate labels have strong correlation with each other, prevalent re-balancing strategies(e.g., re-sampling and re-weighting) will give rise to either over-fitting the tail data(e.g., bench sitting on sidewalk rather than on), or still suffering the adverse effect from the original uneven distribution(e.g., aggregating varied parked on/standing on/sitting on into on). We argue the principal reason is that re-balancing strategies are sensitive to the frequencies of predicates yet blind to their relatedness, which may play a more important role to promote the learning of predicate features. Therefore, we propose a novel Predicate-Correlation Perception Learning(PCPL for short) scheme to adaptively seek out appropriate loss weights by directly perceiving and utilizing the correlation among predicate classes. Moreover, our PCPL framework is further equipped with a graph encoder module to better extract context features. Extensive experiments on the benchmark VG150 dataset show that the proposed PCPL performs markedly better on tail classes while well-preserving the performance on head ones, which significantly outperforms previous state-of-the-art methods.Comment: To be appeared on ACMMM 202

arXiv.org e-Print Archive

Crossref

Unbiased Directed Object Attention Graph for Object Navigation

Author: Chen Qijun
Dang Ronghao
He Zongtao
Liu Chengju
Shi Zhuofan
Wang Liuyi
Publication venue
Publication date: 09/04/2022
Field of study

Object navigation tasks require agents to locate specific objects in unknown environments based on visual information. Previously, graph convolutions were used to implicitly explore the relationships between objects. However, due to differences in visibility among objects, it is easy to generate biases in object attention. Thus, in this paper, we propose a directed object attention (DOA) graph to guide the agent in explicitly learning the attention relationships between objects, thereby reducing the object attention bias. In particular, we use the DOA graph to perform unbiased adaptive object attention (UAOA) on the object features and unbiased adaptive image attention (UAIA) on the raw images, respectively. To distinguish features in different branches, a concise adaptive branch energy distribution (ABED) method is proposed. We assess our methods on the AI2-Thor dataset. Compared with the state-of-the-art (SOTA) method, our method reports 7.4%, 8.1% and 17.6% increase in success rate (SR), success weighted by path length (SPL) and success weighted by action efficiency (SAE), respectively.Comment: 13 pages, ready to ACM Mutimedia, under revie

arXiv.org e-Print Archive

Fourteenth Biennial Status Report: März 2017 - February 2019

Author
Publication venue: Max-Planck-Institut für Informatik
Publication date: 01/01/2019
Field of study

MPG.PuRe

Machine perception of interactivity in videos

Author: Chen S.
Publication venue
Publication date: 01/01/2023
Field of study

International Migration, Integration and Social Cohesion online publications

Machine perception of interactivity in videos

Author: Chen S.
Publication venue
Publication date: 01/01/2023
Field of study

International Migration, Integration and Social Cohesion online publications

Open Visual Knowledge Extraction via Relation-Oriented Multimodality Model Prompting

Author: Cui Hejie
Fang Xinyu
Kan Xuan
Li Manling
Liu Xin
Song Yangqiu
Xu Ran
Yang Carl
Yu Yue
Zhang Zihan
Publication venue
Publication date: 28/10/2023
Field of study

Images contain rich relational knowledge that can help machines understand the world. Existing methods on visual knowledge extraction often rely on the pre-defined format (e.g., sub-verb-obj tuples) or vocabulary (e.g., relation types), restricting the expressiveness of the extracted knowledge. In this work, we take a first exploration to a new paradigm of open visual knowledge extraction. To achieve this, we present OpenVik which consists of an open relational region detector to detect regions potentially containing relational knowledge and a visual knowledge generator that generates format-free knowledge by prompting the large multimodality model with the detected region of interest. We also explore two data enhancement techniques for diversifying the generated format-free visual knowledge. Extensive knowledge quality evaluations highlight the correctness and uniqueness of the extracted open visual knowledge by OpenVik. Moreover, integrating our extracted knowledge across various visual reasoning applications shows consistent improvements, indicating the real-world applicability of OpenVik.Comment: Accepted to NeurIPS 202

arXiv.org e-Print Archive