17 research outputs found
A Deep Sequential Model for Discourse Parsing on Multi-Party Dialogues
Discourse structures are beneficial for various NLP tasks such as dialogue
understanding, question answering, sentiment analysis, and so on. This paper
presents a deep sequential model for parsing discourse dependency structures of
multi-party dialogues. The proposed model aims to construct a discourse
dependency tree by predicting dependency relations and constructing the
discourse structure jointly and alternately. It makes a sequential scan of the
Elementary Discourse Units (EDUs) in a dialogue. For each EDU, the model
decides to which previous EDU the current one should link and what the
corresponding relation type is. The predicted link and relation type are then
used to build the discourse structure incrementally with a structured encoder.
During link prediction and relation classification, the model utilizes not only
local information that represents the concerned EDUs, but also global
information that encodes the EDU sequence and the discourse structure that is
already built at the current step. Experiments show that the proposed model
outperforms all the state-of-the-art baselines.Comment: Accepted to AAAI 201
Defending LLMs against Jailbreaking Attacks via Backtranslation
Although many large language models (LLMs) have been trained to refuse
harmful requests, they are still vulnerable to jailbreaking attacks which
rewrite the original prompt to conceal its harmful intent. In this paper, we
propose a new method for defending LLMs against jailbreaking attacks by
``backtranslation''. Specifically, given an initial response generated by the
target LLM from an input prompt, our backtranslation prompts a language model
to infer an input prompt that can lead to the response. The inferred prompt is
called the backtranslated prompt which tends to reveal the actual intent of the
original prompt, since it is generated based on the LLM's response and not
directly manipulated by the attacker. We then run the target LLM again on the
backtranslated prompt, and we refuse the original prompt if the model refuses
the backtranslated prompt. We explain that the proposed defense provides
several benefits on its effectiveness and efficiency. We empirically
demonstrate that our defense significantly outperforms the baselines, in the
cases that are hard for the baselines, and our defense also has little impact
on the generation quality for benign input prompts. Our implementation is based
on our library for LLM jailbreaking defense algorithms at
\url{https://github.com/YihanWang617/llm-jailbreaking-defense}, and the code
for reproducing our experiments is available at
\url{https://github.com/YihanWang617/LLM-Jailbreaking-Defense-Backtranslation}
On the Adversarial Robustness of Vision Transformers
Following the success in advancing natural language processing and
understanding, transformers are expected to bring revolutionary changes to
computer vision. This work provides the first and comprehensive study on the
robustness of vision transformers (ViTs) against adversarial perturbations.
Tested on various white-box and transfer attack settings, we find that ViTs
possess better adversarial robustness when compared with convolutional neural
networks (CNNs). This observation also holds for certified robustness. We
summarize the following main observations contributing to the improved
robustness of ViTs:
1) Features learned by ViTs contain less low-level information and are more
generalizable, which contributes to superior robustness against adversarial
perturbations.
2) Introducing convolutional or tokens-to-token blocks for learning low-level
features in ViTs can improve classification accuracy but at the cost of
adversarial robustness.
3) Increasing the proportion of transformers in the model structure (when the
model consists of both transformer and CNN blocks) leads to better robustness.
But for a pure transformer model, simply increasing the size or adding layers
cannot guarantee a similar effect.
4) Pre-training on larger datasets does not significantly improve adversarial
robustness though it is critical for training ViTs.
5) Adversarial training is also applicable to ViT for training robust models.
Furthermore, feature visualization and frequency analysis are conducted for
explanation. The results show that ViTs are less sensitive to high-frequency
perturbations than CNNs and there is a high correlation between how well the
model learns low-level features and its robustness against different
frequency-based perturbations
Lyapunov-stable Neural Control for State and Output Feedback: A Novel Formulation
Learning-based neural network (NN) control policies have shown impressive
empirical performance in a wide range of tasks in robotics and control.
However, formal (Lyapunov) stability guarantees over the region-of-attraction
(ROA) for NN controllers with nonlinear dynamical systems are challenging to
obtain, and most existing approaches rely on expensive solvers such as
sums-of-squares (SOS), mixed-integer programming (MIP), or satisfiability
modulo theories (SMT). In this paper, we demonstrate a new framework for
learning NN controllers together with Lyapunov certificates using fast
empirical falsification and strategic regularizations. We propose a novel
formulation that defines a larger verifiable region-of-attraction (ROA) than
shown in the literature, and refines the conventional restrictive constraints
on Lyapunov derivatives to focus only on certifiable ROAs. The Lyapunov
condition is rigorously verified post-hoc using branch-and-bound with scalable
linear bound propagation-based NN verification techniques. The approach is
efficient and flexible, and the full training and verification procedure is
accelerated on GPUs without relying on expensive solvers for SOS, MIP, nor SMT.
The flexibility and efficiency of our framework allow us to demonstrate
Lyapunov-stable output feedback control with synthesized NN-based controllers
and NN-based observers with formal stability guarantees, for the first time in
literature. Source code at
https://github.com/Verified-Intelligence/Lyapunov_Stable_NN_ControllersComment: Paper accepted by ICML 202
Red Teaming Language Model Detectors with Language Models
The prevalence and strong capability of large language models (LLMs) present
significant safety and ethical risks if exploited by malicious users. To
prevent the potentially deceptive usage of LLMs, recent works have proposed
algorithms to detect LLM-generated text and protect LLMs. In this paper, we
investigate the robustness and reliability of these LLM detectors under
adversarial attacks. We study two types of attack strategies: 1) replacing
certain words in an LLM's output with their synonyms given the context; 2)
automatically searching for an instructional prompt to alter the writing style
of the generation. In both strategies, we leverage an auxiliary LLM to generate
the word replacements or the instructional prompt. Different from previous
works, we consider a challenging setting where the auxiliary LLM can also be
protected by a detector. Experiments reveal that our attacks effectively
compromise the performance of all detectors in the study with plausible
generations, underscoring the urgent need to improve the robustness of
LLM-generated text detection systems.Comment: Preprint. Accepted by TAC