Search CORE

250 research outputs found

Multimodal knowledge integration for object detection and visual reasoning

Author: Ye Keren
Publication venue
Publication date: 08/09/2021
Field of study

We humans still perceive and reason in a different way than artificial intelligence models. We witness, we listen, we touch, we understand the world via multi-modal sensing, while machine models rely only on a single or a few modalities and ignore abundant information. In this thesis, we explore techniques for reducing the perception gap between machines and humans and focus on two families of tasks, reasoning and detection. First, we incorporate information from text, audio, motion, external knowledge bases, for training computer vision models. We find that data inputs from more extensive channels provide complementary information to improve models. Second, we study how multimodal inputs can be fully utilized. We argue that most existing deep learning methods are prone to pay too large attention to shallow patterns in the input features, which causes the resulting models to be biased. We propose robust training to overcome the issue. Third, we extend the benefits of multi-modal information to the supervision signals instead of the inputs, by learning a weakly supervised detection model from the natural supervision of textual captions or audio narrations. With the help of NLP constituency parsing, it is possible to extract structural knowledges from the captions and narrations, hence determines the entities and relations of visual objects

D-Scholarship@Pitt

Automatic Understanding of Image and Video Advertisements

Author: Agha Zuha
Hussain Zaeem
Kovashka Adriana
Ong Nathan
Thomas Christopher
Ye Keren
Zhang Mingda
Zhang Xiaozhong
Publication venue
Publication date: 10/07/2017
Field of study

There is more to images than their objective physical content: for example, advertisements are created to persuade a viewer to take a certain action. We propose the novel problem of automatic advertisement understanding. To enable research on this problem, we create two datasets: an image dataset of 64,832 image ads, and a video dataset of 3,477 ads. Our data contains rich annotations encompassing the topic and sentiment of the ads, questions and answers describing what actions the viewer is prompted to take and the reasoning that the ad presents to persuade the viewer ("What should I do according to this ad, and why should I do it?"), and symbolic references ads make (e.g. a dove symbolizes peace). We also analyze the most common persuasive strategies ads use, and the capabilities that computer vision systems should have to understand these strategies. We present baseline classification results for several prediction tasks, including automatically answering questions about the messages of the ads.Comment: To appear in CVPR 2017; data available on http://cs.pitt.edu/~kovashka/ad

arXiv.org e-Print Archive

Crossref

What Explains Natives and Sojourners Preventive Health Behavior in a Pandemic: Role of Media and Scientific Self-Efficacy

Author: Anwar Muhammad Azfar
Asmi Fahad
Keren Fang
Siddiquei Ahmad Nabeel
Ye Qing
Publication venue: 'Frontiers Media SA'
Publication date: 29/06/2021
Field of study

The COVID-19 pandemic triggered a severe global public health emergency. The current research investigated and compared “Natives and Sojourners” health-protective behavior in Mainland China during the pandemic. We adopted a unified view to propose our theoretical model by adapting the Health Belief Model (HBM) and Institutional Theory (IT). The data obtained through an online survey questionnaire from 435 respondents during the second and third quarters of were analyzed. Structural equation modeling (SEM) was used to empirically analyze the proposed model. The media self-efficacy (MSE), scientific self-efficacy (SSE), perceived health risks (PHRs), and the perceived benefits of being protected have positive and significant effects on the definition of health-protective behavioral intentions among natives and sojourners in mainland China. Media and SSE can play a strategic role in formulating public health-protective behavior. The current research recommends an effective communication with sojourners during crisis for them to be a part of the national crisis management plan (i.e., infectious disease)

Bond University Research Portal

PubMed Central

VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining

Author: Ke Junjie
Milanfar Peyman
Wu Yonghui
Yang Feng
Ye Keren
Yu Jiahui
Publication venue
Publication date: 02/06/2023
Field of study

Assessing the aesthetics of an image is challenging, as it is influenced by multiple factors including composition, color, style, and high-level semantics. Existing image aesthetic assessment (IAA) methods primarily rely on human-labeled rating scores, which oversimplify the visual aesthetic information that humans perceive. Conversely, user comments offer more comprehensive information and are a more natural way to express human opinions and preferences regarding image aesthetics. In light of this, we propose learning image aesthetics from user comments, and exploring vision-language pretraining methods to learn multimodal aesthetic representations. Specifically, we pretrain an image-text encoder-decoder model with image-comment pairs, using contrastive and generative objectives to learn rich and generic aesthetic semantics without human labels. To efficiently adapt the pretrained model for downstream IAA tasks, we further propose a lightweight rank-based adapter that employs text as an anchor to learn the aesthetic ranking concept. Our results show that our pretrained aesthetic vision-language model outperforms prior works on image aesthetic captioning over the AVA-Captions dataset, and it has powerful zero-shot capability for aesthetic tasks such as zero-shot style classification and zero-shot IAA, surpassing many supervised baselines. With only minimal finetuning parameters using the proposed adapter module, our model achieves state-of-the-art IAA performance over the AVA dataset.Comment: CVPR 2023, https://github.com/google-research/google-research/tree/master/vil

arXiv.org e-Print Archive

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

Author: Antol Stanislaw
Boudin Florian
Devlin Jacob
Li Gen
Li Liunian Harold
Pennington Jeffrey
Su Weijie
Tan Hao
Ye Keren
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 27/02/2020
Field of study

There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.Comment: 7 pages, 8 figures, 2 tables, accepted by The Web Conference 202

arXiv.org e-Print Archive

Crossref

Precursors and Pathways Leading to Enhanced Secondary Organic Aerosol Formation during Severe Haze Episodes

Author: Cai Jing
Chen Qi
Chen Shiyi
Cheng Xi
Fu Pingqing
Ge Yanli
Huang Wei
Koenig Theodore K.
Liao Keren
Miao Ruqian
Mohr Claudia
Qiu Xinghua
Shi Xiaodi
Shrivastava Manish
Ye Penglin
Zheng Yan
Publication venue
Publication date: 07/12/2021
Field of study

Publisher Copyright: © 2021 American Chemical SocietyMolecular analyses help to investigate the key precursors and chemical processes of secondary organic aerosol (SOA) formation. We obtained the sources and molecular compositions of organic aerosol in PM2.5in winter in Beijing by online and offline mass spectrometer measurements. Photochemical and aqueous processing were both involved in producing SOA during the haze events. Aromatics, isoprene, long-chain alkanes or alkenes, and carbonyls such as glyoxal and methylglyoxal were all important precursors. The enhanced SOA formation during the severe haze event was predominantly contributed by aqueous processing that was promoted by elevated amounts of aerosol water for which multifunctional organic nitrates contributed the most followed by organic compounds having four oxygen atoms in their formulae. The latter included dicarboxylic acids and various oxidation products from isoprene and aromatics as well as products or oligomers from methylglyoxal aqueous uptake. Nitrated phenols, organosulfates, and methanesulfonic acid were also important SOA products but their contributions to the elevated SOA mass during the severe haze event were minor. Our results highlight the importance of reducing nitrogen oxides and nitrate for future SOA control. Additionally, the formation of highly oxygenated long-chain molecules with a low degree of unsaturation in polluted urban environments requires further research.Peer reviewe

Helsingin yliopiston digitaalinen arkisto

Electronic properties of guanine-based nanowires

Author: A. Calzolari
Adessi
Adessi
Aviram
Becke
Bernasconi
Braun
Calzolari
Calzolari
Calzolari
de Pablo
Dekker
Di Felice
Di Felice
Dreizler
E. Molinari
Eley
Fink
Friesner
Gervasio
Giese
Giorgi
Gottarelli
Gottarelli
Gu
Heller
Kasumov
Kelley
Keren
Kleinman
Laughlan
Lee
Lehn
Louie
Marsh
Maruccio
Meggers
Murphy
Phillips
Porath
Porath
R. Di Felice
Rakitin
Rinaldi
Rinaldi
Seminario
Storm
Sugiyama
Troullier
Williams
Williamson
Ye
Ye
Šponer
Publication venue: 'Elsevier BV'
Publication date: 01/01/2004
Field of study

We present a first-principle study of the electronic and conduction properties of a few classes of nanowires constituted of guanine (G) molecules, self-assembled in different geometries. We first analyze the effect of the vertical

\pi

\pi

interaction in model G-stack columns. Then, we exploit the results obtained from those models to interpret the features of realistic stacked and hydrogen-bonded structures, namely the guanine quadruple helices and the planar ribbons. With respect to natural DNA, the different structures as well as the inclusion of metal cations, drastically affect the bonding pattern among the bases, introducing novel features in the electronic properties of the systems. These supramolecular G-aggregates, alternative to DNA, are expected to show intersting properties for molecular elec tronics applications.Comment: 30 pages (preprint format), 8 figures. To appear in Solid State Communications - Special Issue on "New advances on collective phenomena in one-dimensional systems

arXiv.org e-Print Archive

Crossref

Archivio istituzionale della ricerca - Università di Modena e Reggio Emilia