1,060 research outputs found
Stop reasoning! When multimodal LLMs with chain-of-thought reasoning meets adversarial images
Recently, Multimodal LLMs (MLLMs) have
shown a great ability to understand images. However, like traditional vision models, they are
still vulnerable to adversarial images. Meanwhile, Chain-of-Thought (CoT) reasoning has
been widely explored on MLLMs, which not only
improves model’s performance, but also enhances
model’s explainability by giving intermediate reasoning steps. Nevertheless, there is still a lack
of study regarding MLLMs’ adversarial robustness with CoT and an understanding of what the
rationale looks like when MLLMs infer wrong
answers with adversarial images. Our research
evaluates the adversarial robustness of MLLMs
when employing CoT reasoning, finding that
CoT marginally improves adversarial robustness
against existing attack methods. Moreover, we
introduce a novel stop-reasoning attack technique
that effectively bypasses the CoT-induced robustness enhancements. Finally, we demonstrate the
alterations in CoT reasoning when MLLMs confront adversarial images, shedding light on their
reasoning process under adversarial attacks
Red teaming GPT-4V: are GPT-4V safe against uni/multi-modal jailbreak attacks?
Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs. Besides, some methods are not limited to the textual modality and extend the jailbreak attack to Multimodal Large Language Models (MLLMs) by perturbing the visual input. However, the absence of a universal evaluation benchmark complicates the performance reproduction and fair comparison. Besides, there is a lack of comprehensive evaluation of closed-source state-of-the-art (SOTA) models, especially MLLMs, such as GPT-4V. To address these issues, this work first builds a comprehensive jailbreak evaluation dataset with 1445 harmful questions covering 11 different safety policies. Based on this dataset, extensive red-teaming experiments are conducted on 11 different LLMs and MLLMs, including both SOTA proprietary models and open-source models. We then conduct a deep analysis of the evaluated results and find that (1) GPT4 and GPT-4V demonstrate better robustness against jailbreak attacks compared to open-source LLMs and MLLMs. (2) Llama2 and Qwen-VL-Chat are more robust compared to other open-source models. (3) The transferability of visual jailbreak methods is relatively limited compared to textual jailbreak methods. The dataset and code can be found here
Efficient discriminative learning of parametric nearest neighbor classifiers
Linear SVMs are efficient in both training and testing, however the data in real applications is rarely linearly separable. Non-linear kernel SVMs are too computationally intensive for applications with large-scale data sets. Recently locally linear classifiers have gained popularity due to their efficiency whilst remaining competitive with kernel methods. The vanilla nearest neighbor algorithm is one of the simplest locally linear classifiers, but it lacks robustness due to the noise often present in real-world data. In this paper, we introduce a novel local classifier, Parametric Nearest Neighbor (P-NN) and its extension Ensemble of P-NN (EP-NN). We parameterize the nearest neighbor algorithm based on the minimum weighted squared Euclidean distances between the data points and the prototypes, where a prototype is represented by a locally linear combination of some data points. Meanwhile, our method attempts to jointly learn both the prototypes and the classifier parameters discriminatively via max-margin. This makes our classifiers suitable to approximate the classification decision boundaries locally based on nonlinear functions. During testing, the computational complexity of both classifiers is linear in the product of the dimension of data and the number of prototypes. Our classification results on MNIST, USPS, LETTER, and Chars 74K are comparable and in some cases are better than many other methods such as the state-of-the-art locally linear classifiers
Recommended from our members
Day-ahead industrial load forecasting for electric RTG cranes
Given the increase in international trading and the significant energy and environmental challenges in ports around the world, there is a need for a greater understanding of the energy demand behaviour at ports. The move towards electrified rubber-tyred gantry (RTG) cranes is expected to reduce gas emissions and increase energy savings compared to diesel RTG cranes but it will increase electrical energy demand. Electrical load forecasting is a key tool for understanding the energy demand which is usually applied to data with strong regularities and seasonal patterns. However, the highly volatile and stochastic behaviour of the RTG crane demand creates a substantial prediction challenge. This paper is one of the first extensive investigations into short term load forecasts for electrified RTG crane demand. Options for model inputs are investigated depending on extensive data and correlation analysis. The effect of estimation accuracy of exogenous variables on the forecast accuracy is investigated as well. The models are tested on two different RTG crane data sets that were collected from the Port of Felixstowe in the UK. The results reveal the effectiveness of the forecast models when the estimation of the number of crane moves and container gross weight are accurate
Evaluation of host-derived volatiles for trapping Culicoides biting midges (Diptera: Ceratopogonidae)
Culicoides biting midges (Diptera: Ceratopognidae) cause pain and distress through blood feeding, and transmit viruses that threaten both animal and human health worldwide. There are few effective tools for monitoring and control of biting midges, with semiochemical-based strategies offering the advantage of targeting host-seeking populations. In previous studies, we identified the host preference of multiple Culicoides species, including Culicoides impunctatus, as well as cattle-derived compounds that modulate the behavioral responses of C. nubeculosus under laboratory conditions. Here, we test the efficacy of these compounds, when released at different rates, in attracting C. impunctatus under field conditions in Southern Sweden. Traps releasing 1-octen-3-ol, decanal, phenol, 4-methylphenol or 3-propylphenol, when combined with carbon dioxide (CO2), captured significantly higher numbers of C. impunctatus compared to control traps baited with CO2 alone, with low release rates (0.1 mg h−1, 1 mg h−1) being generally more attractive. In contrast, traps releasing octanal or (E)-2-nonenal at 1 mg h−1 and 10 mg h−1 collected significantly lower numbers of C. impunctatus than control traps baited with CO2 only. Nonanal and 2-ethylhexanol did not affect the attraction of C. impunctatus when compared to CO2 alone at any of the release rates tested. The potential use of these semiochemicals as attractants and repellents for biting midge control is discussed
PVUW 2024 challenge on complex video understanding: methods and results
Pixel-level Video Understanding in the Wild Challenge
(PVUW) focus on complex video understanding. In this
CVPR 2024 workshop, we add two new tracks, Complex
Video Object Segmentation Track based on MOSE dataset
and Motion Expression guided Video Segmentation track
based on MeViS dataset. In the two new tracks, we provide
additional videos and annotations that feature challenging
elements, such as the disappearance and reappearance of
objects, inconspicuous small objects, heavy occlusions, and
crowded environments in MOSE. Moreover, we provide a
new motion expression guided video segmentation dataset
MeViS to study the natural language-guided video understanding in complex environments. These new videos, sentences, and annotations enable us to foster the development
of a more comprehensive and robust pixel-level understanding of video scenes in complex environments and realistic
scenarios. The MOSE challenge had 140 registered teams
in total, 65 teams participated the validation phase and 12
teams made valid submissions in the final challenge phase.
The MeViS challenge had 225 registered teams in total, 50
teams participated the validation phase and 5 teams made
valid submissions in the final challenge phase
Observation of two new baryon resonances
Two structures are observed close to the kinematic threshold in the mass spectrum in a sample of proton-proton collision data, corresponding
to an integrated luminosity of 3.0 fb recorded by the LHCb experiment.
In the quark model, two baryonic resonances with quark content are
expected in this mass region: the spin-parity and
states, denoted and .
Interpreting the structures as these resonances, we measure the mass
differences and the width of the heavier state to be
MeV,
MeV,
MeV, where the first and second
uncertainties are statistical and systematic, respectively. The width of the
lighter state is consistent with zero, and we place an upper limit of
MeV at 95% confidence level. Relative
production rates of these states are also reported.Comment: 17 pages, 2 figure
Measurement of the CP-violating phase \phi s in Bs->J/\psi\pi+\pi- decays
Measurement of the mixing-induced CP-violating phase phi_s in Bs decays is of
prime importance in probing new physics. Here 7421 +/- 105 signal events from
the dominantly CP-odd final state J/\psi pi+ pi- are selected in 1/fb of pp
collision data collected at sqrt{s} = 7 TeV with the LHCb detector. A
time-dependent fit to the data yields a value of
phi_s=-0.019^{+0.173+0.004}_{-0.174-0.003} rad, consistent with the Standard
Model expectation. No evidence of direct CP violation is found.Comment: 15 pages, 10 figures; minor revisions on May 23, 201
Search for direct stau production in events with two hadronic tau-leptons in root s=13 TeV pp collisions with the ATLAS detector
A search for the direct production of the supersymmetric partners ofτ-leptons (staus) in final stateswith two hadronically decayingτ-leptons is presented. The analysis uses a dataset of pp collisions corresponding to an integrated luminosity of139fb−1, recorded with the ATLAS detector at the LargeHadron Collider at a center-of-mass energy of 13 TeV. No significant deviation from the expected StandardModel background is observed. Limits are derived in scenarios of direct production of stau pairs with eachstau decaying into the stable lightest neutralino and oneτ-lepton in simplified models where the two staumass eigenstates are degenerate. Stau masses from 120 GeV to 390 GeV are excluded at 95% confidencelevel for a massless lightest neutralino
Search for CP violation in D+→ϕπ+ and D+s→K0Sπ+ decays
A search for CP violation in D + → ϕπ + decays is performed using data collected in 2011 by the LHCb experiment corresponding to an integrated luminosity of 1.0 fb−1 at a centre of mass energy of 7 TeV. The CP -violating asymmetry is measured to be (−0.04 ± 0.14 ± 0.14)% for candidates with K − K + mass within 20 MeV/c 2 of the ϕ meson mass. A search for a CP -violating asymmetry that varies across the ϕ mass region of the D + → K − K + π + Dalitz plot is also performed, and no evidence for CP violation is found. In addition, the CP asymmetry in the D+s→K0Sπ+ decay is measured to be (0.61 ± 0.83 ± 0.14)%
- …