5 research outputs found
Latent Jailbreak: A Test Suite for Evaluating Both Text Safety and Output Robustness of Large Language Models
Considerable research efforts have been devoted to ensuring that large
language models (LLMs) align with human values and generate safe text. However,
an excessive focus on sensitivity to certain topics can compromise the model's
robustness in following instructions, thereby impacting its overall performance
in completing tasks. Previous benchmarks for jailbreaking LLMs have primarily
focused on evaluating the safety of the models without considering their
robustness. In this paper, we propose a benchmark that assesses both the safety
and robustness of LLMs, emphasizing the need for a balanced approach. To
comprehensively study text safety and output robustness, we introduce a latent
jailbreak prompt dataset, each involving malicious instruction embedding.
Specifically, we instruct the model to complete a regular task, such as
translation, with the text to be translated containing malicious instructions.
To further analyze safety and robustness, we design a hierarchical annotation
framework. We present a systematic analysis of the safety and robustness of
LLMs regarding the position of explicit normal instructions, word replacements
(verbs in explicit normal instructions, target groups in malicious
instructions, cue words for explicit normal instructions), and instruction
replacements (different explicit normal instructions). Our results demonstrate
that current LLMs not only prioritize certain instruction verbs but also
exhibit varying jailbreak rates for different instruction verbs in explicit
normal instructions. Code and data are available at
https://github.com/qiuhuachuan/latent-jailbreak.Comment: Code and data are available at
https://github.com/qiuhuachuan/latent-jailbrea
Selecting the optimal dialogue response once for all from a panoramic view
As an essential component of dialogue systems, response selection aims to
pick out the optimal response among candidates to continue the dialogue. In
existing studies, this task is usually regarded as a binary classification
problem, where every candidate is ranked respectively for appropriateness. To
improve its performance, we reformulate this task as a multiple-choice problem
that allows the best selection to be made in one-shot inference. This new view
inspires us to propose an architecture called Panoramic-encoder (Our work will
be open-source for reproducibility and future research.) with a novel
Candidates Attention Mechanism (CAM), which allows context-wise attention
between responses and leads to fine-grained comparisons. Furthermore, we
investigate and incorporate several techniques that have been proven effective
for improving response selection. Experiments on three benchmarks show that our
method pushes the state-of-the-art while achieving approximately 3X faster
inference speed
Understanding Client Reactions in Online Mental Health Counseling
Communication success relies heavily on reading participants' reactions. Such
feedback is especially important for mental health counselors, who must
carefully consider the client's progress and adjust their approach accordingly.
However, previous NLP research on counseling has mainly focused on studying
counselors' intervention strategies rather than their clients' reactions to the
intervention. This work aims to fill this gap by developing a theoretically
grounded annotation framework that encompasses counselors' strategies and
client reaction behaviors. The framework has been tested against a large-scale,
high-quality text-based counseling dataset we collected over the past two years
from an online welfare counseling platform. Our study shows how clients react
to counselors' strategies, how such reactions affect the final counseling
outcomes, and how counselors can adjust their strategies in response to these
reactions. We also demonstrate that this study can help counselors
automatically predict their clients' states.Comment: Accept to ACL 2023, oral. For code and data, see
https://github.com/dll-wu/Client-Reac
PsyBench: a balanced and in-depth Psychological Chinese Evaluation Benchmark for Foundation Models
As Large Language Models (LLMs) are becoming prevalent in various fields,
there is an urgent need for improved NLP benchmarks that encompass all the
necessary knowledge of individual discipline. Many contemporary benchmarks for
foundational models emphasize a broad range of subjects but often fall short in
presenting all the critical subjects and encompassing necessary professional
knowledge of them. This shortfall has led to skewed results, given that LLMs
exhibit varying performance across different subjects and knowledge areas. To
address this issue, we present psybench, the first comprehensive Chinese
evaluation suite that covers all the necessary knowledge required for graduate
entrance exams. psybench offers a deep evaluation of a model's strengths and
weaknesses in psychology through multiple-choice questions. Our findings show
significant differences in performance across different sections of a subject,
highlighting the risk of skewed results when the knowledge in test sets is not
balanced. Notably, only the ChatGPT model reaches an average accuracy above
, indicating that there is still plenty of room for improvement. We
expect that psybench will help to conduct thorough evaluations of base models'
strengths and weaknesses and assist in practical application in the field of
psychology