Search CORE

3 research outputs found

Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models

Author: Chhikara Prateek
Ilievski Filip
Khayatkhoei Mahyar
Zhang Jiarui
Publication venue
Publication date: 31/05/2023
Field of study

Visual Question Answering is a challenging task, as it requires seamless interaction between perceptual, linguistic, and background knowledge systems. While the recent progress of visual and natural language models like BLIP has led to improved performance on this task, we lack understanding of the ability of such models to perform on different kinds of questions and reasoning types. As our initial analysis of BLIP-family models revealed difficulty with answering fine-detail questions, we investigate the following question: Can visual cropping be employed to improve the performance of state-of-the-art visual question answering models on fine-detail questions? Given the recent success of the BLIP-family models, we study a zero-shot and a fine-tuned BLIP model. We define three controlled subsets of the popular VQA-v2 benchmark to measure whether cropping can help model performance. Besides human cropping, we devise two automatic cropping strategies based on multi-modal embedding by CLIP and BLIP visual QA model gradients. Our experiments demonstrate that the performance of BLIP model variants can be significantly improved through human cropping, and automatic cropping methods can produce comparable benefits. A deeper dive into our findings indicates that the performance enhancement is more pronounced in zero-shot models than in fine-tuned models and more salient with smaller bounding boxes than larger ones. We perform case studies to connect quantitative differences with qualitative observations across question types and datasets. Finally, we see that the cropping enhancement is robust, as we gain an improvement of 4.59% (absolute) in the general VQA-random task by simply inputting a concatenation of the original and gradient-based cropped images. We make our code available to facilitate further innovation on visual cropping methods for question answering.Comment: 16 pages, 5 figures, 7 table

arXiv.org e-Print Archive

Unsupervised K

Author: Ben-Yosef Matan
Brock Andrew
Dani Lischinski
Daniel Cohen-Or
Diederik
Dilokthanakul Nat
Gabbay Aviv
Karras Tero
Khayatkhoei Mahyar
Kingma Durk P
Lee Hsin-Ying
Mirza Mehdi
Omry Sendik
Pandeva Teodora
Radford Alec
Seonghyeon Kim
Xiao Chang
Yu Fisher
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref

A Survey of Unsupervised Deep Domain Adaptation

Author: Arjovsky Martin
Arora Sanjeev
Arora Sanjeev
Atapour-Abarghouei Amir
Athiwaratkun Ben
Bak Slawomir
Benaim Sagie
Berthelot David
Bińkowski Mikołaj
Blanchard Gilles
Blitzer John
Bousmalis Konstantinos
Bungum Lars
Cao Jinming
Cao Zhangjie
Chapelle Olivier
Chen Changhao
Chen Cheng
Chen Minmin
Chen Xi
Chu Chenhui
Chung Junyoung
Courty Nicolas
Csurka Gabriela
Damodaran Bharath Bhushan
Das Debasmit
Denton Emily L.
Donahue Jeff
Duan Lixin
Durugkar Ishan
Dziugaite Gintare Karolina
Fedus William
French Geoff
Fu Lisheng
Gan Zhe
Ganin Yaroslav
Ganin Yaroslav
Ghifary Muhammad
Ghosh Arnab
Goodfellow Ian
Goodfellow Ian
Gretton Arthur
Gulrajani Ishaan
Hal Daumé
Heusel Martin
Hindupur Avinash
Hoang Quan
Hoffman Judy
Hoffman Judy
Hosseini-Asl Ehsan
Hsu Yen-Chang
Huang Ling
Hubert Tsai Yao-Hung
Ioffe Sergey
Isola Phillip
Joshi Mahesh
Kang Guoliang
Karras Tero
Khayatkhoei Mahyar
Kim Taeksoo
Kumar Abhishek
Kundu Jogendra Nath
Kurmi Vinod Kumar
Laine Samuli
LeCun Yann
Lee Kuan-Hui
Li Chun-Liang
Li Yujia
Liu Ming-Yu
Long Mingsheng
Long Mingsheng
Long Mingsheng
Mansour Yishay
Mejjati Youssef Alami
Metz Luke
Miyato Takeru
Miyato Takeru
Miyato Takeru
Moiseev Boris
Morerio Pietro
Muandet Krikamol
Netzer Yuval
Nowozin Sebastian
Odena Augustus
Odena Augustus
Pei Zhongyi
Purushotham Sanjay
Redko Ievgen
Rippel Oren
Rippel Oren
Royer Amélie
Saenko Kate
Saito Kuniaki
Saito Kuniaki
Salimans Tim
Santurkar Shibani
Schneider Steffen
Sebag Alice Schoenauer
Shao Ling
Shen Jian
Shu Rui
Sinclair Stephen
Sohn Kihyuk
Sun Baochen
Sutherland Dougal J.
Taigman Yaniv
Tan Chuanqi
Tarvainen Antti
Taylor Matthew E.
Theis Lucas
Tolstikhin Ilya O.
Vercruyssen Vincent
Vu Tuan-Hung
Wang Chang
Wei Kai-Ya
Wu Yuhuai
Wu Yuxin
Xie Qizhe
Xin Zhao
Yang Yongxin
Yu Fisher
Zhang JiChao
Zhang Yue
Zhao Han
Zhao Junbo
Zhao Mingmin
Zhong Erheng
Zhou Joey Tianyi
Zhu Jun-Yan
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date
Field of study

Crossref