Search CORE

7 research outputs found

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Author: Antol Stanislaw
Boudin Florian
Devlin Jacob
Li Gen
Li Liunian Harold
Pennington Jeffrey
Su Weijie
Tan Hao
Ye Keren
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/02/2020
Field of study

There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.Comment: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 202

arXiv.org e-Print Archive

Crossref

VQA: Visual Question Answering

Author: Aishwarya Agrawal
C Kong
C. Lawrence Zitnick
CL Zitnick
DB Lenat
Devi Parikh
Dhruv Batra
H Liu
Jiasen Lu
K Tu
Margaret Mitchell
Stanislaw Antol
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Matching images and text with multi-modal tensor fusion and re-ranking

Author: Antol Stanislaw
He Kaiming
Hedi
Huang Yan
Jabri Allan
Jorge Garc'i
Karpathy A.
Kiros Ryan
Krishna Ranjay
Li Shuang
Liu Yu
Nam Hyeonseob
Niu Zhenxing
Qin Danfeng
Ren Shaoqing
Wang Liwei
Wang Shuhui
Xu Kelvin
Zhang Ying
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 15/10/2019
Field of study

A major challenge in matching images and text is that they have intrinsically different data distributions and feature representations. Most existing approaches are based either on embedding or classification, the first one mapping image and text instances into a common embedding space for distance measuring, and the second one regarding image-text matching as a binary classification problem. Neither of these approaches can, however, balance the matching accuracy and model complexity well. We propose a novel framework that achieves remarkable matching performance with acceptable model complexity. Specifically, in the training stage, we propose a novel Multi-modal Tensor Fusion Network (MTFN) to explicitly learn an accurate image-text similarity function with rank-based tensor fusion rather than seeking a common embedding space for each image-text instance. Then, during testing, we deploy a generic Cross-modal Re-ranking (RR) scheme for refinement without requiring additional training procedure. Extensive experiments on two datasets demonstrate that our MTFN-RR consistently achieves the state-of-the-art matching performance with much less time complexity.Accepted author manuscriptIntelligent System

Crossref

TU Delft Repository

CoQA: A Conversational Question Answering Challenge

Crossref

B reak

Crossref