Search CORE

24 research outputs found

Tag: Automated Image Captioning

Author: Funckes Nathan
Publication venue: ScholarWorks@GVSU
Publication date: 01/09/2020
Field of study

Many websites remain non-ADA compliant, containing images which lack accompanying textual descriptions. This leaves sight-impaired individuals unable to fully enjoy the rich wonders of the web. To address this inequity, our research aims to create an autonomous system capable of generating semantically accurate descriptions of images. This problem involves two tasks: recognizing an image and linguistically describing it. Our solution uses state-of-the-art deep learning: employing a convolutional neural network that learns to understand images and extracts their salient features, and a recurrent neural network that learns to generate structured, coherent sentences. These two networks are merged to create a single model that takes as input arbitrary images and outputs relevant captions. The model\u27s accuracy is quantified using various language metrics, such as the Bilingual Evaluation Understudy designed to rate language translation systems. After training, we hope to validate our approach by deploying our model on local, online social media feeds

Scholarworks@GVSU

CMIB: unsupervised image object categorization in multiple visual contexts

Author: Manic Milos
Qiu Xueying
Yan Xiaoqiang
Ye Yangdong
Yu Hui
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/06/2020
Field of study

Portsmouth University Research Portal (Pure)

Image Captioning with Unseen Objects

Author: Cinbis Ramazan Gokberk
Demirel Berkan
Ikizler-Cinbis Nazli
Publication venue
Publication date: 31/07/2019
Field of study

Image caption generation is a long standing and challenging problem at the intersection of computer vision and natural language processing. A number of recently proposed approaches utilize a fully supervised object recognition model within the captioning approach. Such models, however, tend to generate sentences which only consist of objects predicted by the recognition models, excluding instances of the classes without labelled training examples. In this paper, we propose a new challenging scenario that targets the image captioning problem in a fully zero-shot learning setting, where the goal is to be able to generate captions of test images containing objects that are not seen during training. The proposed approach jointly uses a novel zero-shot object detection model and a template-based sentence generator. Our experiments show promising results on the COCO dataset.Comment: To appear in British Machine Vision Conference (BMVC) 201

arXiv.org e-Print Archive

OpenMETU (Middle East Technical University)

ADVISE: Symbolism and External Knowledge for Decoding Advertisements

Author: A Kembhavi
H Hotelling
H Xu
JH Leigh
LM Scott
NE Spears
R Krishna
SJ Levy
TY Lin
W Goo
W Liu
Publication venue
Publication date: 29/07/2018
Field of study

In order to convey the most content in their limited space, advertisements embed references to outside knowledge via symbolism. For example, a motorcycle stands for adventure (a positive property the ad wants associated with the product being sold), and a gun stands for danger (a negative property to dissuade viewers from undesirable behaviors). We show how to use symbolic references to better understand the meaning of an ad. We further show how anchoring ad understanding in general-purpose object recognition and image captioning improves results. We formulate the ad understanding task as matching the ad image to human-generated statements that describe the action that the ad prompts, and the rationale it provides for taking this action. Our proposed method outperforms the state of the art on this task, and on an alternative formulation of question-answering on ads. We show additional applications of our learned representations for matching ads to slogans, and clustering ads according to their topic, without extra training.Comment: To appear, Proceedings of the European Conference on Computer Vision (ECCV

arXiv.org e-Print Archive

Crossref

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Author: Xu Xiaojun
Chen Xinyun
Liu Chang
Rohrbach Anna
Darrell Trevor
Song Dawn
Publication venue
Publication date: 01/01/2017
Field of study

Adversarial attacks are known to succeed on classifiers, but it has been an open question whether more complex vision systems are vulnerable. In this paper, we study adversarial examples for vision and language models, which incorporate natural language understanding and complex structures such as attention, localization, and modular architectures. In particular, we investigate attacks on a dense captioning model and on two visual question answering (VQA) models. Our evaluation shows that we can generate adversarial examples with a high success rate (i.e., > 90%) for these models. Our work sheds new light on understanding adversarial attacks on vision systems which have a language component and shows that attention, bounding box localization, and compositional internal structures are vulnerable to adversarial attacks. These observations will inform future work towards building effective defenses.Comment: CVPR 201

arXiv.org e-Print Archive

Crossref

Directory of Open Access Journals

Copenhagen University Research Information System

Munin - Open Research Archive

NORA - Norwegian Open Research Archives

Fooling Vision and Language Models Despite Localization and Attention Mechanism

Author: Chen Xinyun
Darrell Trevor
Liu Chang
Rohrbach Anna
Song Dawn
Xu Xiaojun
Publication venue
Publication date: 01/01/2018
Field of study

arXiv.org e-Print Archive

Crossref

MPG.PuRe