123 research outputs found
Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations
Large Language Models (LLMs) have shown remarkable success in various tasks,
but concerns about their safety and the potential for generating malicious
content have emerged. In this paper, we explore the power of In-Context
Learning (ICL) in manipulating the alignment ability of LLMs. We find that by
providing just few in-context demonstrations without fine-tuning, LLMs can be
manipulated to increase or decrease the probability of jailbreaking, i.e.
answering malicious prompts. Based on these observations, we propose In-Context
Attack (ICA) and In-Context Defense (ICD) methods for jailbreaking and guarding
aligned language model purposes. ICA crafts malicious contexts to guide models
in generating harmful outputs, while ICD enhances model robustness by
demonstrations of rejecting to answer harmful prompts. Our experiments show the
effectiveness of ICA and ICD in increasing or reducing the success rate of
adversarial jailbreaking attacks. Overall, we shed light on the potential of
ICL to influence LLM behavior and provide a new perspective for enhancing the
safety and alignment of LLMs
Treatment for a Full Weathering Rock Dam Foundation
The main dam for the upper reservoir of the Tianhuanping pumped storage power station is a rockfill dam with an asphalt concrete impervious lining on the upstream face constructed on a full weathering rock foundation. In this paper, we present the case study on the treatment for this full weathering rock dam foundation. The treatment includes the partial excavation of the full weathering rock at the main dam foundation, the increase of the transition curvature at the parts where the lining is extended from the upstream face to the reservoir bottom and turned to both the left and the right banks, and the reinforcement for the asphalt concrete impervious lining with a layer of polyester mesh at the parts where the tensile strain of the lining is large. A 3D FEM analysis is carried out for the main dam. The calculated results provide a good basis for the above compound treatment method. So far, this project has operated well for more than three years, illustrating the success of the treatment for the full weathering rock dam foundation
Symmetry Hierarchy and Thermalization Frustration in Graphene Nanoresonators
As the essential cause of the intrinsic dissipation that limits the quality
of graphene nanoresonators, intermodal energy transfer is also a key issue in
thermalization dynamics. Typically systems with larger initial energy demand
shorter time to be thermalized. However, we find quantitatively that instead of
becoming shorter, the equipartition time of the graphene nanoresonator can
increase abruptly by one order of magnitude. This thermalization frustration
emerges due to the partition of the normal modes based on the hierarchical
symmetry, and a sensitive on-off switching of the energy flow channels between
symmetry classes controlled by Mathieu instabilities. The results uncover the
decisive roles of symmetry in the thermalization at the nanoscale, and may also
lead to strategies for improving the performance of graphene nanoresonators
Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding
Spectral embedding is a powerful graph embedding technique that has received
a lot of attention recently due to its effectiveness on Graph Transformers.
However, from a theoretical perspective, the universal expressive power of
spectral embedding comes at the price of losing two important invariance
properties of graphs, sign and basis invariance, which also limits its
effectiveness on graph data. To remedy this issue, many previous methods
developed costly approaches to learn new invariants and suffer from high
computation complexity. In this work, we explore a minimal approach that
resolves the ambiguity issues by directly finding canonical directions for the
eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing
method, LC is light-weighted and can be applied to any existing GNNs. We
provide a thorough investigation, from theory to algorithm, on this approach,
and discover an efficient algorithm named Maximal Axis Projection (MAP) that
works for both sign and basis invariance and successfully canonizes more than
90% of all eigenvectors. Experiments on real-world benchmark datasets like
ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing
methods while bringing minimal computation overhead. Code is available at
https://github.com/PKU-ML/LaplacianCanonization.Comment: In Thirty-seventh Conference on Neural Information Processing Systems
(2023
How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders
Masked Autoencoders (MAE) based on a reconstruction task have risen to be a
promising paradigm for self-supervised learning (SSL) and achieve
state-of-the-art performance across different benchmark datasets. However,
despite its impressive empirical success, there is still limited theoretical
understanding of it. In this paper, we propose a theoretical understanding of
how masking matters for MAE to learn meaningful features. We establish a close
connection between MAE and contrastive learning, which shows that MAE implicit
aligns the mask-induced positive pairs. Built upon this connection, we develop
the first downstream guarantees for MAE methods, and analyze the effect of mask
ratio. Besides, as a result of the implicit alignment, we also point out the
dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE
(U-MAE) loss that can effectively address this issue and bring significant
improvements on real-world datasets, including CIFAR-10, ImageNet-100, and
ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE)
Identifiable Contrastive Learning with Automatic Feature Importance Discovery
Existing contrastive learning methods rely on pairwise sample contrast
to learn data representations, but the learned features often
lack clear interpretability from a human perspective. Theoretically, it lacks
feature identifiability and different initialization may lead to totally
different features. In this paper, we study a new method named tri-factor
contrastive learning (triCL) that involves a 3-factor contrast in the form of
, where is a learnable
diagonal matrix that automatically captures the importance of each feature. We
show that by this simple extension, triCL can not only obtain identifiable
features that eliminate randomness but also obtain more interpretable features
that are ordered according to the importance matrix . We show that features
with high importance have nice interpretability by capturing common classwise
features, and obtain superior performance when evaluated for image retrieval
using a few features. The proposed triCL objective is general and can be
applied to different contrastive learning methods like SimCLR and CLIP. We
believe that it is a better alternative to existing 2-factor contrastive
learning by improving its identifiability and interpretability with minimal
overhead. Code is available at
https://github.com/PKU-ML/Tri-factor-Contrastive-Learning
- …