94 research outputs found
Certifying LLM Safety against Adversarial Prompting
Large language models (LLMs) released for public use incorporate guardrails
to ensure their output is safe, often referred to as "model alignment." An
aligned language model should decline a user's request to produce harmful
content. However, such safety measures are vulnerable to adversarial prompts,
which contain maliciously designed token sequences to circumvent the model's
safety guards and cause it to produce harmful content. In this work, we
introduce erase-and-check, the first framework to defend against adversarial
prompts with verifiable safety guarantees. We erase tokens individually and
inspect the resulting subsequences using a safety filter. Our procedure labels
the input prompt as harmful if any subsequences or the input prompt are
detected as harmful by the filter. This guarantees that any adversarial
modification of a harmful prompt up to a certain size is also labeled harmful.
We defend against three attack modes: i) adversarial suffix, which appends an
adversarial sequence at the end of the prompt; ii) adversarial insertion, where
the adversarial sequence is inserted anywhere in the middle of the prompt; and
iii) adversarial infusion, where adversarial tokens are inserted at arbitrary
positions in the prompt, not necessarily as a contiguous block. Empirical
results demonstrate that our technique obtains strong certified safety
guarantees on harmful prompts while maintaining good performance on safe
prompts. For example, against adversarial suffixes of length 20, it certifiably
detects 93% of the harmful prompts and labels 94% of the safe prompts as safe
using the open source language model Llama 2 as the safety filter
Schema-adaptable Knowledge Graph Construction
Conventional Knowledge Graph Construction (KGC) approaches typically follow
the static information extraction paradigm with a closed set of pre-defined
schema. As a result, such approaches fall short when applied to dynamic
scenarios or domains, whereas a new type of knowledge emerges. This
necessitates a system that can handle evolving schema automatically to extract
information for KGC. To address this need, we propose a new task called
schema-adaptable KGC, which aims to continually extract entity, relation, and
event based on a dynamically changing schema graph without re-training. We
first split and convert existing datasets based on three principles to build a
benchmark, i.e., horizontal schema expansion, vertical schema expansion, and
hybrid schema expansion; then investigate the schema-adaptable performance of
several well-known approaches such as Text2Event, TANL, UIE and GPT-3.5. We
further propose a simple yet effective baseline dubbed \textsc{AdaKGC}, which
contains schema-enriched prefix instructor and schema-conditioned dynamic
decoding to better handle evolving schema. Comprehensive experimental results
illustrate that AdaKGC can outperform baselines but still have room for
improvement. We hope the proposed work can deliver benefits to the community.
Code and datasets available at https://github.com/zjunlp/AdaKGC.Comment: EMNLP 2023 (Findings
ClusterLLM: Large Language Models as a Guide for Text Clustering
We introduce ClusterLLM, a novel text clustering framework that leverages
feedback from an instruction-tuned large language model, such as ChatGPT.
Compared with traditional unsupervised methods that builds upon "small"
embedders, ClusterLLM exhibits two intriguing advantages: (1) it enjoys the
emergent capability of LLM even if its embeddings are inaccessible; and (2) it
understands the user's preference on clustering through textual instruction
and/or a few annotated data. First, we prompt ChatGPT for insights on
clustering perspective by constructing hard triplet questions <does A better
correspond to B than C>, where A, B and C are similar data points that belong
to different clusters according to small embedder. We empirically show that
this strategy is both effective for fine-tuning small embedder and
cost-efficient to query ChatGPT. Second, we prompt ChatGPT for helps on
clustering granularity by carefully designed pairwise questions <do A and B
belong to the same category>, and tune the granularity from cluster hierarchies
that is the most consistent with the ChatGPT answers. Extensive experiments on
14 datasets show that ClusterLLM consistently improves clustering quality, at
an average cost of ~$0.6 per dataset
- …