Search CORE

8 research outputs found

Learning Representations Toward the Understanding of Out-of-Distribution for Neural Networks

Author: Kwon Gukyeong
Publication venue: Georgia Institute of Technology
Publication date: 15/09/2021
Field of study

Data-driven representations achieve powerful generalization performance in diverse information processing tasks. However, the generalization is often limited to test data from the same distribution as training data (in-distribution (ID)). In addition, the neural networks often make overconfident and incorrect predictions for data outside training distribution, called out-of-distribution (OOD). In this dissertation, we develop representations that can characterize OOD for the neural networks and utilize the characterization to efficiently generalize to OOD. We categorize the data-driven representations based on information flow in neural networks and develop novel gradient-based representations. In particular, we utilize the backpropagated gradients to represent what the neural networks has not learned in the data. The capability of gradient-based representations for OOD characterization is comprehensively analyzed in comparison with standard activation-based representations. We also utilize a regularization technique for the gradient-based representations to better characterize OOD. Finally, we develop activation-based representations learned with auxiliary information to efficiently generalize to data from OOD. We use an unsupervised learning framework to learn the aligned representations of visual and attribute data. These aligned representations are utilized to calibrate the overconfident prediction toward ID classes and the generalization performance is validated in the application of generalized zero-shot learning (GZSL). The developed GZSL method, GatingAE, achieves state-of-the-art performance in generalizing to OOD with significantly less number of model parameters compared to other state-of-the-art methods.Ph.D

Scholarly Materials And Research @ Georgia Tech

Distorted Representation Space Characterization Through Backpropagated Gradients

Author: AlRegib Ghassan
Kwon Gukyeong
Prabhushankar Mohit
Temel Dogancan
Publication venue
Publication date: 26/08/2019
Field of study

In this paper, we utilize weight gradients from backpropagation to characterize the representation space learned by deep learning algorithms. We demonstrate the utility of such gradients in applications including perceptual image quality assessment and out-of-distribution classification. The applications are chosen to validate the effectiveness of gradients as features when the test image distribution is distorted from the train image distribution. In both applications, the proposed gradient based features outperform activation features. In image quality assessment, the proposed approach is compared with other state of the art approaches and is generally the top performing method on TID 2013 and MULTI-LIVE databases in terms of accuracy, consistency, linearity, and monotonic behavior. Finally, we analyze the effect of regularization on gradients using CURE-TSR dataset for out-of-distribution classification.Comment: 5 pages, 5 figures, 2 tables, ICIP 201

arXiv.org e-Print Archive

Crossref

Masked Vision and Language Modeling for Multi-modal Representation Learning

Author: Bas Erhan
Bhotika Rahul
Cai Zhaowei
Kwon Gukyeong
Ravichandran Avinash
Soatto Stefano
Publication venue
Publication date: 03/08/2022
Field of study

In this paper, we study how to use masked signal modeling in vision and language (V+L) representation learning. Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality. This is motivated by the nature of image-text paired data that both of the image and the text convey almost the same information but in different formats. The masked signal reconstruction of one modality conditioned on another modality can also implicitly learn cross-modal alignment between language tokens and image patches. Our experiments on various V+L tasks show that the proposed method not only achieves state-of-the-art performances by using a large amount of data, but also outperforms the other competitors by a significant margin in the regimes of limited training data

arXiv.org e-Print Archive

Generate then Select: Open-ended Visual Question Answering Guided by World Knowledge

Author: Castelli Vittorio
Fu Xingyu
Kwon Gukyeong
Li Alexander Hanbo
Ng Patrick
Perera Pramuditha
Roth Dan
Wang William Yang
Wang Zhiguo
Xiang Bing
Zhang Sheng
Zhang Yuhao
Zhu Henghui
Publication venue
Publication date: 30/05/2023
Field of study

The open-ended Visual Question Answering (VQA) task requires AI models to jointly reason over visual and natural language inputs using world knowledge. Recently, pre-trained Language Models (PLM) such as GPT-3 have been applied to the task and shown to be powerful world knowledge sources. However, these methods suffer from low knowledge coverage caused by PLM bias -- the tendency to generate certain tokens over other tokens regardless of prompt changes, and high dependency on the PLM quality -- only models using GPT-3 can achieve the best result. To address the aforementioned challenges, we propose RASO: a new VQA pipeline that deploys a generate-then-select strategy guided by world knowledge for the first time. Rather than following the de facto standard to train a multi-modal model that directly generates the VQA answer, RASO first adopts PLM to generate all the possible answers, and then trains a lightweight answer selection model for the correct answer. As proved in our analysis, RASO expands the knowledge coverage from in-domain training data by a large margin. We provide extensive experimentation and show the effectiveness of our pipeline by advancing the state-of-the-art by 4.1% on OK-VQA, without additional computation cost. Code and models are released at http://cogcomp.org/page/publication_view/1010Comment: Accepted to ACL 2023 Finding

arXiv.org e-Print Archive