Search CORE

125 research outputs found

Image Aesthetics Assessment via Learnable Queries

Author: Ren Peiran
Shen Zhiqi
Xiong Zhiwei
Yu Han
Zhang Yunfan
Publication venue
Publication date: 06/09/2023
Field of study

Image aesthetics assessment (IAA) aims to estimate the aesthetics of images. Depending on the content of an image, diverse criteria need to be selected to assess its aesthetics. Existing works utilize pre-trained vision backbones based on content knowledge to learn image aesthetics. However, training those backbones is time-consuming and suffers from attention dispersion. Inspired by learnable queries in vision-language alignment, we propose the Image Aesthetics Assessment via Learnable Queries (IAA-LQ) approach. It adapts learnable queries to extract aesthetic features from pre-trained image features obtained from a frozen image encoder. Extensive experiments on real-world data demonstrate the advantages of IAA-LQ, beating the best state-of-the-art method by 2.2% and 2.1% in terms of SRCC and PLCC, respectively.Comment: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibl

arXiv.org e-Print Archive

GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning

Author: Dong Peiran
Guo Jingcai
Guo Song
Liu Ziming
Lu Xiaocheng
Zhang Jiewei
Publication venue
Publication date: 02/09/2023
Field of study

This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein, the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics, and transfer the learned model to unseen ones. But they ignore the effective integration of local and global features. That is, in the process of inferring unseen classes, global features represent the principal direction of the image in the feature space, while local features should maintain uniqueness within a certain range. This integrated neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Specifically, we split the feature maps into several feature groups, of which each feature group can be trained independently with the Local Information Distinguishing Module (LID) to ensure uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to preserve the principal direction. Besides, a static graph structure is designed to construct the correlation of local features. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.Comment: 11 pages, 8 figure

arXiv.org e-Print Archive

Do Household Cable TV Viewing Patterns Demonstrate Efficiency and Concentration?

Author: CHANG Rae
GHOSH Pulak
JUNG Gwangjae
KAUFFMAN Robert J.
ZHANG Peiran
Publication venue
Publication date: 01/01/2013
Field of study

Institutional Knowledge at Singapore Management University