Search CORE

92 research outputs found

Test-Time Poisoning Attacks Against Test-Time Adaptation Models

Author: Cong Tianshuo
He Xinlei
Shen Yun
Zhang Yang
Publication venue
Publication date: 16/08/2023
Field of study

Deploying machine learning (ML) models in the wild is challenging as it suffers from distribution shifts, where the model trained on an original domain cannot generalize well to unforeseen diverse transfer domains. To address this challenge, several test-time adaptation (TTA) methods have been proposed to improve the generalization ability of the target pre-trained models under test data to cope with the shifted distribution. The success of TTA can be credited to the continuous fine-tuning of the target model according to the distributional hint from the test samples during test time. Despite being powerful, it also opens a new attack surface, i.e., test-time poisoning attacks, which are substantially different from previous poisoning attacks that occur during the training time of ML models (i.e., adversaries cannot intervene in the training process). In this paper, we perform the first test-time poisoning attack against four mainstream TTA methods, including TTT, DUA, TENT, and RPL. Concretely, we generate poisoned samples based on the surrogate models and feed them to the target TTA models. Experimental results show that the TTA methods are generally vulnerable to test-time poisoning attacks. For instance, the adversary can feed as few as 10 poisoned samples to degrade the performance of the target model from 76.20% to 41.83%. Our results demonstrate that TTA algorithms lacking a rigorous security assessment are unsuitable for deployment in real-life scenarios. As such, we advocate for the integration of defenses against test-time poisoning attacks into the design of TTA methods.Comment: To Appear in the 45th IEEE Symposium on Security and Privacy, May 20-23, 202

arXiv.org e-Print Archive

Backdoor Attacks in the Supply Chain of Masked Image Modeling

Author: Backes Michael
He Xinlei
Li Zheng
Shen Xinyue
Shen Yun
Zhang Yang
Publication venue
Publication date: 04/10/2022
Field of study

Masked image modeling (MIM) revolutionizes self-supervised learning (SSL) for image pre-training. In contrast to previous dominating self-supervised methods, i.e., contrastive learning, MIM attains state-of-the-art performance by masking and reconstructing random patches of the input image. However, the associated security and privacy risks of this novel generative method are unexplored. In this paper, we perform the first security risk quantification of MIM through the lens of backdoor attacks. Different from previous work, we are the first to systematically threat modeling on SSL in every phase of the model supply chain, i.e., pre-training, release, and downstream phases. Our evaluation shows that models built with MIM are vulnerable to existing backdoor attacks in release and downstream phases and are compromised by our proposed method in pre-training phase. For instance, on CIFAR10, the attack success rate can reach 99.62%, 96.48%, and 98.89% in the downstream phase, release phase, and pre-training phase, respectively. We also take the first step to investigate the success factors of backdoor attacks in the pre-training phase and find the trigger number and trigger pattern play key roles in the success of backdoor attacks while trigger location has only tiny effects. In the end, our empirical study of the defense mechanisms across three detection-level on model supply chain phases indicates that different defenses are suitable for backdoor attacks in different phases. However, backdoor attacks in the release phase cannot be detected by all three detection-level methods, calling for more effective defenses in future research

arXiv.org e-Print Archive

MGTBench: Benchmarking Machine-Generated Text Detection

Author: Backes Michael
Chen Zeyuan
He Xinlei
Shen Xinyue
Zhang Yang
Publication venue
Publication date: 26/03/2023
Field of study

Nowadays large language models (LLMs) have shown revolutionary power in a variety of natural language processing (NLP) tasks such as text classification, sentiment analysis, language translation, and question-answering. In this way, detecting machine-generated texts (MGTs) is becoming increasingly important as LLMs become more advanced and prevalent. These models can generate human-like language that can be difficult to distinguish from text written by a human, which raises concerns about authenticity, accountability, and potential bias. However, existing detection methods against MGTs are evaluated under different model architectures, datasets, and experimental settings, resulting in a lack of a comprehensive evaluation framework across different methodologies In this paper, we fill this gap by proposing the first benchmark framework for MGT detection, named MGTBench. Extensive evaluations on public datasets with curated answers generated by ChatGPT (the most representative and powerful LLMs thus far) show that most of the current detection methods perform less satisfactorily against MGTs. An exceptional case is ChatGPT Detector, which is trained with ChatGPT-generated texts and shows great performance in detecting MGTs. Nonetheless, we note that only a small fraction of adversarial-crafted perturbations on MGTs can evade the ChatGPT Detector, thus highlighting the need for more robust MGT detection methods. We envision that MGTBench will serve as a benchmark tool to accelerate future investigations involving the evaluation of state-of-the-art MGT detection methods on their respective datasets and the development of more advanced MGT detection methods. Our source code and datasets are available at https://github.com/xinleihe/MGTBench

arXiv.org e-Print Archive

Predictive Performance of a Wastewater Source Heat Pump Using Artificial Neural Networks

Author: Jiang Yiqiang
Shen Chao
Wang Xinlei
Yang Liangcheng
Yao Yang
Publication venue: ISU ReD: Research and eData
Publication date: 01/01/2014
Field of study

A pilot-scale wastewater source heat pump was operated for 30 days to recover heat from waste bathwater and to warm up fresh bathwater. The results indicated that the fresh water successfully warmed up to the designated 45℃, 50℃, and 55℃ with the coefficients of performance of 2.3–3.5. Artificial neural networks including back propagation, radial basis function, and nonlinear autoregressive model with exogenous input were used to simulate this process. The root-mean-square error and coefficient of variation of the simulated results, using the experimental data taken on the first 18, 21, 24, and 27 days, respectively, as a package of training data, showed that taking the data measured on more days as the training data improved simulation accuracy. The nonlinear autoregressive model with exogenous input needed at least 24 days’ training data to achieve acceptable simulation results, the back propagation needed 27 days, while the radial basis function did not achieve acceptable results. Predictions based on the nonlinear autoregressive model with exogenous input modeling showed that the performance of the wastewater source heat pump system could gradually be stabilized within 42 days. Practical application: This study showed that the wastewater source heat pump can recover heat from waste bathwater to warm up fresh bathwater, and also demonstrated that the artificial neural network, especially nonlinear autoregressive model with exogenous input, is appropriate for predicting heat pump performance. Using this system can reduce building energy consumption, while using the artificial neural network model can help operate and maintain the wastewater source heat pump syste

ISU ReD: Research and eData

Unsafe Diffusion: On the Generation of Unsafe Images and Hateful Memes From Text-To-Image Models

Author: Backes Michael
He Xinlei
Qu Yiting
Shen Xinyue
Zannettou Savvas
Zhang Yang
Publication venue
Publication date: 23/05/2023
Field of study

State-of-the-art Text-to-Image models like Stable Diffusion and DALLE

\cdot

2 are revolutionizing how people generate visual content. At the same time, society has serious concerns about how adversaries can exploit such models to generate unsafe images. In this work, we focus on demystifying the generation of unsafe images and hateful memes from Text-to-Image models. We first construct a typology of unsafe images consisting of five categories (sexually explicit, violent, disturbing, hateful, and political). Then, we assess the proportion of unsafe images generated by four advanced Text-to-Image models using four prompt datasets. We find that these models can generate a substantial percentage of unsafe images; across four models and four prompt datasets, 14.56% of all generated images are unsafe. When comparing the four models, we find different risk levels, with Stable Diffusion being the most prone to generating unsafe content (18.92% of all generated images are unsafe). Given Stable Diffusion's tendency to generate more unsafe content, we evaluate its potential to generate hateful meme variants if exploited by an adversary to attack a specific individual or community. We employ three image editing methods, DreamBooth, Textual Inversion, and SDEdit, which are supported by Stable Diffusion. Our evaluation result shows that 24% of the generated images using DreamBooth are hateful meme variants that present the features of the original hateful meme and the target individual/community; these generated images are comparable to hateful meme variants collected from the real world. Overall, our results demonstrate that the danger of large-scale generation of unsafe images is imminent. We discuss several mitigating measures, such as curating training data, regulating prompts, and implementing safety filters, and encourage better safeguard tools to be developed to prevent unsafe generation.Comment: To Appear in the ACM Conference on Computer and Communications Security, November 26, 202

arXiv.org e-Print Archive

Asymmetric Feedback Learning in Online Convex Games

Author: Johansson Karl H.
Shen Yi
Wang Zifan
Yi Xinlei
Zavlanos Michael M.
Publication venue
Publication date: 17/07/2023
Field of study

This paper considers online convex games involving multiple agents that aim to minimize their own cost functions using locally available feedback. A common assumption in the study of such games is that the agents are symmetric, meaning that they have access to the same type of information or feedback. Here we lift this assumption, which is often violated in practice, and instead consider asymmetric agents; specifically, we assume some agents have access to first-order gradient feedback and others have access to the zeroth-order oracles (cost function evaluations). We propose an asymmetric feedback learning algorithm that combines the agent feedback mechanisms. We analyze the regret and Nash equilibrium convergence of this algorithm for convex games and strongly monotone games, respectively. Specifically, we show that our algorithm always performs between pure first-order and zeroth-order methods, and can match the performance of these two extremes by adjusting the number of agents with access to zeroth-order oracles. Therefore, our algorithm incorporates the pure first-order and zeroth-order methods as special cases. We provide numerical experiments on an online market problem for both deterministic and risk-averse games to demonstrate the performance of the proposed algorithm.Comment: 16page

arXiv.org e-Print Archive

Generated Graph Detection

Author: Backes Michael
He Xinlei
Ma Yihan
Shen Yun
Yu Ning
Zhang Yang
Zhang Zhikun
Publication venue
Publication date: 13/06/2023
Field of study

Graph generative models become increasingly effective for data distribution approximation and data augmentation. While they have aroused public concerns about their malicious misuses or misinformation broadcasts, just as what Deepfake visual and auditory media has been delivering to society. Hence it is essential to regulate the prevalence of generated graphs. To tackle this problem, we pioneer the formulation of the generated graph detection problem to distinguish generated graphs from real ones. We propose the first framework to systematically investigate a set of sophisticated models and their performance in four classification scenarios. Each scenario switches between seen and unseen datasets/generators during testing to get closer to real-world settings and progressively challenge the classifiers. Extensive experiments evidence that all the models are qualified for generated graph detection, with specific models having advantages in specific scenarios. Resulting from the validated generality and oblivion of the classifiers to unseen datasets/generators, we draw a safe conclusion that our solution can sustain for a decent while to curb generated graph misuses.Comment: Accepted by ICML 202

arXiv.org e-Print Archive