36 research outputs found
Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?
Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods. The dataset and code can be found here
Stop reasoning! When multimodal LLMs with chain-of-thought reasoning meets adversarial images
Recently, Multimodal LLMs (MLLMs) have
shown a great ability to understand images. However, like traditional vision models, they are
still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has
been widely explored on MLLMs, which not only
improves modelâs performance, but also enhances
modelâs explainability by giving intermediate reasoning steps. Nevertheless, there is still a lack
of study regarding MLLMsâ adversarial robustness with CoT and an understanding of what the
rationale looks like when MLLMs infer wrong
answers with adversarial images. Our research
evaluates the adversarial robustness of MLLMs
when employing CoT reasoning, finding that
CoT marginally improves adversarial robustness
against existing attack methods. Moreover, we
introduce a novel stop-reasoning attack technique
that effectively bypasses the CoT-induced robustness enhancements. Finally, we demonstrate the
alterations in CoT reasoning when MLLMs confront adversarial images, shedding light on their
reasoning process under adversarial attacks
Inducing high energy-latency of large vision-language models with verbose images
Large vision-language models (VLMs) such as GPT-4 have achieved exceptional performance across various multi-modal tasks. However, the deployment of VLMs necessitates substantial energy consumption and computational resources. Once attackers maliciously induce high energy consumption and latency time (energy-latency cost) during inference of VLMs, it will exhaust computational resources. In this paper, we explore this attack surface about availability of VLMs and aim to induce high energy-latency cost during inference of VLMs. We find that high energy-latency cost during inference of VLMs can be manipulated by maximizing the length of generated sequences. To this end, we propose verbose images, with the goal of crafting an imperceptible perturbation to induce VLMs to generate long sentences during inference. Concretely, we design three loss objectives. First, a loss is proposed to delay the occurrence of end-of-sequence (EOS) token, where EOS token is a signal for VLMs to stop generating further tokens. Moreover, an uncertainty loss and a token diversity loss are proposed to increase the uncertainty over each generated token and the diversity among all tokens of the whole generated sequence, respectively, which can break output dependency at token-level and sequence-level. Furthermore, a temporal weight adjustment algorithm is proposed, which can effectively balance these losses. Extensive experiments demonstrate that our verbose images can increase the length of generated sequences by 7.87Ă and 8.56Ă compared to original images on MS-COCO and ImageNet datasets, which presents potential challenges for various applications. Our code is available at
https://github.com/KuofengGao/Verbose_Images
Recommended from our members
The hot summer-cold winter region in China: Challenges in the low carbon adaptation of residential slab buildings to enhance comfort
The UK-China research project Low carbon climate-responsive Heating and Cooling of Cities (LoHCool) investigates enhanced indoor summer comfort in the 9 Billion m2 of building stock of the challenging Hot Summer and Cold Winter (HSCW) zone of China. The HSCW region lies South of the Huai River-Qin Mountain line below which central heating and cooling are deemed 'not required'. If this central government direction is relaxed, a significant carbon penalty could arise if the existing building stock is sealed and air conditioned to Western standards. LoHCool investigates the alternative strategy of increasing the existing stock's resilience to climate through low energy, low technology adaptation. The approach is applied here to typical slab building forms in Hangzhou and Chongqing in the East and West of the HSCW zone. Internal thermal conditions are simulated using the dynamic thermal model EnergyPlus calibrated using field data. Insofar as ventilation is a critical component of the adaptation schemes proposed, the local wind environment of the case study buildings is simulated using an advanced large eddy simulation model, Fluidity. Out of these diagnostic analyses, adaptation schemes are configured, specified and modelled, and found to significantly increase comfort with viable payback periods although supplementary cooling will be required as the century advances.National Natural Science Foundation of Chin