Search CORE

153 research outputs found

Generative Action Description Prompts for Skeleton-based Action Recognition

Author: Li Chao
Wang Biao
Xiang Wangmeng
Zhang Lei
Zhou Yuxuan
Publication venue
Publication date: 05/09/2023
Field of study

Skeleton-based action recognition has recently received considerable attention. Current approaches to skeleton-based action recognition are typically formulated as one-hot classification tasks and do not fully exploit the semantic relations between actions. For example, "make victory sign" and "thumb up" are two actions of hand gestures, whose major difference lies in the movement of hands. This information is agnostic from the categorical one-hot encoding of action classes but could be unveiled from the action description. Therefore, utilizing action description in training could potentially benefit representation learning. In this work, we propose a Generative Action-description Prompts (GAP) approach for skeleton-based action recognition. More specifically, we employ a pre-trained large-scale language model as the knowledge engine to automatically generate text descriptions for body parts movements of actions, and propose a multi-modal training scheme by utilizing the text encoder to generate feature vectors for different body parts and supervise the skeleton encoder for action representation learning. Experiments show that our proposed GAP method achieves noticeable improvements over various baseline models without extra computation cost at inference. GAP achieves new state-of-the-arts on popular skeleton-based action recognition benchmarks, including NTU RGB+D, NTU RGB+D 120 and NW-UCLA. The source code is available at https://github.com/MartinXM/GAP.Comment: Accepted by ICCV2

arXiv.org e-Print Archive

Anomaly Detection by Adapting a pre-trained Vision Language Model

Author: Bai Xiang
Cai Yuxuan
He Xinwei
Liang Dingkang
Tong Ao
Publication venue
Publication date: 14/03/2024
Field of study

Recently, large vision and language models have shown their success when adapting them to many downstream tasks. In this paper, we present a unified framework named CLIP-ADA for Anomaly Detection by Adapting a pre-trained CLIP model. To this end, we make two important improvements: 1) To acquire unified anomaly detection across industrial images of multiple categories, we introduce the learnable prompt and propose to associate it with abnormal patterns through self-supervised learning. 2) To fully exploit the representation power of CLIP, we introduce an anomaly region refinement strategy to refine the localization quality. During testing, the anomalies are localized by directly calculating the similarity between the representation of the learnable prompt and the image. Comprehensive experiments demonstrate the superiority of our framework, e.g., we achieve the state-of-the-art 97.5/55.6 and 89.3/33.1 on MVTec-AD and VisA for anomaly detection and localization. In addition, the proposed method also achieves encouraging performance with marginal training data, which is more challenging

arXiv.org e-Print Archive

Highly reversible transition metal migration in superstructure-free Li-rich oxide boosting voltage stability and redox symmetry

Author: Cui Tianwei
Fu Yongzhu
Li Xiang
Liu Longxiang
Wang Xin
Xiang Yuxuan
Xu Jialiang
Zhu Hong
Publication venue: Nature Research
Publication date: 04/06/2024
Field of study

The further practical applications of Li-rich layered oxides are impeded by voltage decay and redox asymmetry, which are closely related to the structural degradation involving irreversible transition metal migration. It has been demonstrated that the superstructure ordering in O2-type materials can effectively suppress voltage decay and redox asymmetry. Herein, we elucidate that the absence of this superstructure ordering arrangement in a Ru-based O2-type oxide can still facilitate the highly reversible transition metal migration. We certify that Ru in superstructure-free O2-type structure can unlock a quite different migration path from Mn in mostly studied cases. The highly reversible migration of Ru helps the cathode maintain the structural robustness, thus realizing terrific capacity retention with neglectable voltage decay and inhibited oxygen redox asymmetry. We untie the knot that the absence of superstructure ordering fails to enable a high-performance Li-rich layered oxide cathode material with suppressed voltage decay and redox asymmetry

Oxford University Research Archive

Plasma-catalytic removal of formaldehyde over Cu-Ce catalysts in a dielectric barrier discharge reactor

Author: Gao Xiang
Qin Rui
Qu Ruiyang
Tu Xin
Zeng Yuxuan
Zheng Chenghang
Zhu Xinbo
Publication venue: 'Elsevier BV'
Publication date: 24/01/2015
Field of study

University of Liverpool Repository

Crossref

SARDet-100K: Towards Open-Source Benchmark and ToolKit for Large-Scale SAR Object Detection

Author: Cheng Ming-Ming
Hou Qibin
Li Weijie
Li Xiang
Li Yuxuan
Liu Li
Yang Jian
Publication venue
Publication date: 11/03/2024
Field of study

Synthetic Aperture Radar (SAR) object detection has gained significant attention recently due to its irreplaceable all-weather imaging capabilities. However, this research field suffers from both limited public datasets (mostly comprising <2K images with only mono-category objects) and inaccessible source code. To tackle these challenges, we establish a new benchmark dataset and an open-source method for large-scale SAR object detection. Our dataset, SARDet-100K, is a result of intense surveying, collecting, and standardizing 10 existing SAR detection datasets, providing a large-scale and diverse dataset for research purposes. To the best of our knowledge, SARDet-100K is the first COCO-level large-scale multi-class SAR object detection dataset ever created. With this high-quality dataset, we conducted comprehensive experiments and uncovered a crucial challenge in SAR object detection: the substantial disparities between the pretraining on RGB datasets and finetuning on SAR datasets in terms of both data domain and model structure. To bridge these gaps, we propose a novel Multi-Stage with Filter Augmentation (MSFA) pretraining framework that tackles the problems from the perspective of data input, domain transition, and model migration. The proposed MSFA method significantly enhances the performance of SAR object detection models while demonstrating exceptional generalizability and flexibility across diverse models. This work aims to pave the way for further advancements in SAR object detection. The dataset and code is available at https://github.com/zcablii/SARDet_100K.Comment: 22 Pages, 10 Figures, 9 Table

arXiv.org e-Print Archive

Diff-ID: An Explainable Identity Difference Quantification Framework for DeepFake Detection

Author: Chen Wenzhi
Duan Yuxuan
Ji Shouling
Wang Zonghui
Xiang Yang
Yan Senbo
Yu Chuer
Zhang Xuhong
Publication venue
Publication date: 30/03/2023
Field of study

Despite the fact that DeepFake forgery detection algorithms have achieved impressive performance on known manipulations, they often face disastrous performance degradation when generalized to an unseen manipulation. Some recent works show improvement in generalization but rely on features fragile to image distortions such as compression. To this end, we propose Diff-ID, a concise and effective approach that explains and measures the identity loss induced by facial manipulations. When testing on an image of a specific person, Diff-ID utilizes an authentic image of that person as a reference and aligns them to the same identity-insensitive attribute feature space by applying a face-swapping generator. We then visualize the identity loss between the test and the reference image from the image differences of the aligned pairs, and design a custom metric to quantify the identity loss. The metric is then proved to be effective in distinguishing the forgery images from the real ones. Extensive experiments show that our approach achieves high detection performance on DeepFake images and state-of-the-art generalization ability to unknown forgery methods, while also being robust to image distortions

arXiv.org e-Print Archive