Search CORE

13 research outputs found

Question-guided hybrid convolution for visual question answering

Author: GAO Peng
HOI Steven C. H.
LI Hongsheng
LI Shuang
LI Yikang
LU Pan
WANG Xiaogang
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/08/2018
Field of study

National Research Foundation (NRF) Singapore under International Research Centre @ Singapore Funding Initiativ

arXiv.org e-Print Archive

Crossref

Institutional Knowledge at Singapore Management University

A^2-Net: Molecular Structure Estimation from Cryo-EM Density Volumes

Author: Li Hongsheng
Shi Jiangping
Wang Zhe
Xu Kui
Zhang Qiangfeng Cliff
Publication venue
Publication date: 12/02/2019
Field of study

Constructing of molecular structural models from Cryo-Electron Microscopy (Cryo-EM) density volumes is the critical last step of structure determination by Cryo-EM technologies. Methods have evolved from manual construction by structural biologists to perform 6D translation-rotation searching, which is extremely compute-intensive. In this paper, we propose a learning-based method and formulate this problem as a vision-inspired 3D detection and pose estimation task. We develop a deep learning framework for amino acid determination in a 3D Cryo-EM density volume. We also design a sequence-guided Monte Carlo Tree Search (MCTS) to thread over the candidate amino acids to form the molecular structure. This framework achieves 91% coverage on our newly proposed dataset and takes only a few minutes for a typical structure with a thousand amino acids. Our method is hundreds of times faster and several times more accurate than existing automated solutions without any human intervention.Comment: 8 pages, 5 figures, 4 table

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

Recent, rapid advancement in visual question answering architecture: a review

Author: Berleant Daniel
Kodali Venkat
Publication venue
Publication date: 30/03/2022
Field of study

Understanding visual question answering is going to be crucial for numerous human activities. However, it presents major challenges at the heart of the artificial intelligence endeavor. This paper presents an update on the rapid advancements in visual question answering using images that have occurred in the last couple of years. Tremendous growth in research on improving visual question answering system architecture has been published recently, showing the importance of multimodal architectures. Several points on the benefits of visual question answering are mentioned in the review paper by Manmadhan et al. (2020), on which the present article builds, including subsequent updates in the field.Comment: 11 page

arXiv.org e-Print Archive

Towards Language-guided Visual Recognition via Dynamic Convolutions

Author: Ding Xinghao
Gao Yue
Huang Feiyue
Ji Rongrong
Luo Gen
Sun Xiaoshuai
Wu Yongjian
Zhou Yiyi
Publication venue
Publication date: 17/10/2021
Field of study

In this paper, we are committed to establishing an unified and end-to-end multi-modal network via exploring the language-guided visual recognition. To approach this target, we first propose a novel multi-modal convolution module called Language-dependent Convolution (LaConv). Its convolution kernels are dynamically generated based on natural language information, which can help extract differentiated visual features for different multi-modal examples. Based on the LaConv module, we further build the first fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure. To validate LaConv and LaConvNet, we conduct extensive experiments on four benchmark datasets of two vision-and-language tasks, i.e., visual question answering (VQA) and referring expression comprehension (REC). The experimental results not only shows the performance gains of LaConv compared to the existing multi-modal modules, but also witness the merits of LaConvNet as an unified network, including compact network, high generalization ability and excellent performance, e.g., +4.7% on RefCOCO+

arXiv.org e-Print Archive