Search CORE

44 research outputs found

Diffusion Deepfake

Author: Bhattacharyya Chaitali
Kim Sungho
Wang Hanxiao
Zhang Feng
Zhu Xiatian
Publication venue
Publication date: 01/04/2024
Field of study

Recent progress in generative AI, primarily through diffusion models, presents significant challenges for real-world deepfake detection. The increased realism in image details, diverse content, and widespread accessibility to the general public complicates the identification of these sophisticated deepfakes. Acknowledging the urgency to address the vulnerability of current deepfake detectors to this evolving threat, our paper introduces two extensive deepfake datasets generated by state-of-the-art diffusion models as other datasets are less diverse and low in quality. Our extensive experiments also showed that our dataset is more challenging compared to the other face deepfake datasets. Our strategic dataset creation not only challenge the deepfake detectors but also sets a new benchmark for more evaluation. Our comprehensive evaluation reveals the struggle of existing detection methods, often optimized for specific image domains and manipulations, to effectively adapt to the intricate nature of diffusion deepfakes, limiting their practical utility. To address this critical issue, we investigate the impact of enhancing training data diversity on representative detection methods. This involves expanding the diversity of both manipulation techniques and image domains. Our findings underscore that increasing training data diversity results in improved generalizability. Moreover, we propose a novel momentum difficulty boosting strategy to tackle the additional challenge posed by training data heterogeneity. This strategy dynamically assigns appropriate sample weights based on learning difficulty, enhancing the model's adaptability to both easy and challenging samples. Extensive experiments on both existing and newly proposed benchmarks demonstrate that our model optimization approach surpasses prior alternatives significantly.Comment: 28 pages including Supplementary materia

arXiv.org e-Print Archive

Vision-language Assisted Attribute Learning

Author: Gao Donghui
Guo Jun
Jin Ling
Liang Kongming
Liu Weidong
Ma Zhanyu
Wang Rui
Wang Xinran
Zhu Xiatian
Publication venue
Publication date: 14/12/2023
Field of study

Attribute labeling at large scale is typically incomplete and partial, posing significant challenges to model optimization. Existing attribute learning methods often treat the missing labels as negative or simply ignore them all during training, either of which could hamper the model performance to a great extent. To overcome these limitations, in this paper we leverage the available vision-language knowledge to explicitly disclose the missing labels for enhancing model learning. Given an image, we predict the likelihood of each missing attribute label assisted by an off-the-shelf vision-language model, and randomly select to ignore those with high scores in training. Our strategy strikes a good balance between fully ignoring and negatifying the missing labels, as these high scores are found to be informative on revealing label ambiguity. Extensive experiments show that our proposed vision-language assisted loss can achieve state-of-the-art performance on the newly cleaned VAW dataset. Qualitative evaluation demonstrates the ability of the proposed method in predicting more complete attributes.Comment: Accepted by IEEE IC-NIDC 202

arXiv.org e-Print Archive

Identification and comprehensive analyses of the CBL and CIPK gene families in wheat (Triticum aestivum L.)

Author: Guangxiao Yang
Guangyuan He
Meng Wang
Shuya Wei
Tao Sun
Tingting Li
Xiatian Wang
Yan Wang
Yi Zhou
Publication venue: Springer Nature
Publication date: 01/01/2015
Field of study

The interaction analysis of wheat TaCBL and TaCIPK proteins were performed by Y2H method. (PDF 191Â kb

Springer - Publisher Connector

The Francis Crick Institute

Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models

Author: Ge Yuying
Li Hongsheng
Lin Ziyi
Pan Junting
Qiao Yu
Wang Yi
Zhang Renrui
Zhu Xiatian
Publication venue
Publication date: 15/06/2023
Field of study

Video Question Answering (VideoQA) has been significantly advanced from the scaling of recent Large Language Models (LLMs). The key idea is to convert the visual information into the language feature space so that the capacity of LLMs can be fully exploited. Existing VideoQA methods typically take two paradigms: (1) learning cross-modal alignment, and (2) using an off-the-shelf captioning model to describe the visual data. However, the first design needs costly training on many extra multi-modal data, whilst the second is further limited by limited domain generalization. To address these limitations, a simple yet effective Retrieving-to-Answer (R2A) framework is proposed.Given an input video, R2A first retrieves a set of semantically similar texts from a generic text corpus using a pre-trained multi-modal model (e.g., CLIP). With both the question and the retrieved texts, a LLM (e.g., DeBERTa) can be directly used to yield a desired answer. Without the need for cross-modal fine-tuning, R2A allows for all the key components (e.g., LLM, retrieval model, and text corpus) to plug-and-play. Extensive experiments on several VideoQA benchmarks show that despite with 1.3B parameters and no fine-tuning, our R2A can outperform the 61 times larger Flamingo-80B model even additionally trained on nearly 2.1B multi-modal data

arXiv.org e-Print Archive

Corrigendum to “A CDT-Based Heuristic Zone Design Approach for Economic Census Investigators”

Author: Changxiu Cheng
Jing Yang
Lijun Wang
Shi Shen
Xiaomei Song
Xiatian Hu
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2016
Field of study

Crossref

Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers

Author: Feng Jianfeng
Fu Yanwei
Lu Jiachen
Luo Zekun
Torr Philip H. S.
Wang Yabiao
Xiang Tao
Zhang Li
Zhao Hengshuang
Zheng Sixiao
Zhu Xiatian
Publication venue
Publication date: 01/01/2021
Field of study

Most recent semantic segmentation methods adopt a fully-convolutional network (FCN) with an encoder-decoder architecture. The encoder progressively reduces the spatial resolution and learns more abstract/semantic visual concepts with larger receptive fields. Since context modeling is critical for segmentation, the latest efforts have been focused on increasing the receptive field, through either dilated/atrous convolutions or inserting attention modules. However, the encoder-decoder based FCN architecture remains unchanged. In this paper, we aim to provide an alternative perspective by treating semantic segmentation as a sequence-to-sequence prediction task. Specifically, we deploy a pure transformer (ie, without convolution and resolution reduction) to encode an image as a sequence of patches. With the global context modeled in every layer of the transformer, this encoder can be combined with a simple decoder to provide a powerful segmentation model, termed SEgmentation TRansformer (SETR). Extensive experiments show that SETR achieves new state of the art on ADE20K (50.28% mIoU), Pascal Context (55.83% mIoU) and competitive results on Cityscapes. Particularly, we achieve the first position in the highly competitive ADE20K test server leaderboard on the day of submission.Comment: CVPR 2021. Project page at https://fudan-zvg.github.io/SETR

arXiv.org e-Print Archive

Oxford University Research Archive

A miniature multi-functional photoacoustic probe

Author: Gao Wen
Gong Xiaojing
Lam Kwok-ho
Lin Riqiang
Lv Shengmiao
Wang Xiatian
Zhang Jiaming
Publication venue: 'MDPI AG'
Publication date: 19/06/2023
Field of study

Photoacoustic technology is a promising tool to provide morphological and functional information in biomedical research. To enhance the imaging efficiency, the reported photoacoustic probes have been designed coaxially involving complicated optical/acoustic prisms to bypass the opaque piezoelectric layer of ultrasound transducers, but this has led to bulky probes and has hindered the applications in limited space. Though the emergence of transparent piezoelectric materials helps to save effort on the coaxial design, the reported transparent ultrasound transducers were still bulky. In this work, a miniature photoacoustic probe with an outer diameter of 4 mm was developed, in which an acoustic stack was made with a combination of transparent piezoelectric material and a gradient-index lens as a backing layer. The transparent ultrasound transducer exhibited a high center frequency of ~47 MHz and a −6 dB bandwidth of 29.4%, which could be easily assembled with a pigtailed ferrule of a single-mode fiber. The multi-functional capability of the probe was successfully validated through experiments of fluid flow sensing and photoacoustic imaging

PolyU Institutional Repository

Enlighten

Miniature intravascular photoacoustic endoscopy with coaxial excitation and detection

Author: Gong Xiaojing
Lam Koko Ho
Lin Riqiang
Lv Shengmiao
Shi Dongliang
Wang Xiatian
Zhang Jiaming
Zhang Qi
Publication venue: 'Wiley'
Publication date: 12/12/2022
Field of study

Recent research pointed out that the degree of inflammation in the adventitia could correlate with the severity of atherosclerotic plaques. Intravascular photoacoustic endoscopy can provide the information of arterial morphology and plaque composition, and even detecting the inflammation. However, most reported work used a non-coaxial configuration for the photoacoustic catheter design, which formed a limited light-sound overlap area for imaging so as to miss the adventitia information. Here we developed a novel 0.9 mm-diameter intravascular photoacoustic catheter with coaxial excitation and detection to resolve the aforementioned issue. A miniature hollow ultrasound transducer with a 0.18 mm-diameter orifice in the center was successfully fabricated. To show the significance and merits of our design, phantom and ex vivo imaging experiments were conducted on both coaxial and non-coaxial catheters for comparison. The results demonstrated that the coaxial catheter exhibited much better photoacoustic/ultrasound imaging performance from the intima to the adventitia

Enlighten

Unsupervised Person Re-identification by Deep Learning Tracklet Association

Author: C Loy
C Qin
C Su
D Gray
G Lisanti
M Hirzer
R Zhao
RK Ando
S Gong
T Wang
W Li
X Ma
Xiatian Zhu
YC Chen
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 08/09/2018
Field of study

© 2018, Springer Nature Switzerland AG. Most existing person re-identification (re-id) methods rely on supervised model learning on per-camera-pair manually labelled pairwise training data. This leads to poor scalability in practical re-id deployment due to the lack of exhaustive identity labelling of image positive and negative pairs for every camera pair. In this work, we address this problem by proposing an unsupervised re-id deep learning approach capable of incrementally discovering and exploiting the underlying re-id discriminative information from automatically generated person tracklet data from videos in an end-to-end model optimisation. We formulate a Tracklet Association Unsupervised Deep Learning (TAUDL) framework characterised by jointly learning per-camera (within-camera) tracklet association (labelling) and cross-camera tracklet correlation by maximising the discovery of most likely tracklet relationships across camera views. Extensive experiments demonstrate the superiority of the proposed TAUDL model over the state-of-the-art unsupervised and domain adaptation re-id methods using six person re-id benchmarking datasets

Crossref

Queen Mary Research Online