Search CORE

123 research outputs found

Jailbreak and Guard Aligned Language Models with Only Few In-Context Demonstrations

Author: Wang Yifei
Wang Yisen
Wei Zeming
Publication venue
Publication date: 10/10/2023
Field of study

Large Language Models (LLMs) have shown remarkable success in various tasks, but concerns about their safety and the potential for generating malicious content have emerged. In this paper, we explore the power of In-Context Learning (ICL) in manipulating the alignment ability of LLMs. We find that by providing just few in-context demonstrations without fine-tuning, LLMs can be manipulated to increase or decrease the probability of jailbreaking, i.e. answering malicious prompts. Based on these observations, we propose In-Context Attack (ICA) and In-Context Defense (ICD) methods for jailbreaking and guarding aligned language model purposes. ICA crafts malicious contexts to guide models in generating harmful outputs, while ICD enhances model robustness by demonstrations of rejecting to answer harmful prompts. Our experiments show the effectiveness of ICA and ICD in increasing or reducing the success rate of adversarial jailbreaking attacks. Overall, we shed light on the potential of ICL to influence LLM behavior and provide a new perspective for enhancing the safety and alignment of LLMs

arXiv.org e-Print Archive

Treatment for a Full Weathering Rock Dam Foundation

Author: Liu Sihong
Wang Yisen
Publication venue: Scholars\u27 Mine
Publication date: 13/04/2004
Field of study

The main dam for the upper reservoir of the Tianhuanping pumped storage power station is a rockfill dam with an asphalt concrete impervious lining on the upstream face constructed on a full weathering rock foundation. In this paper, we present the case study on the treatment for this full weathering rock dam foundation. The treatment includes the partial excavation of the full weathering rock at the main dam foundation, the increase of the transition curvature at the parts where the lining is extended from the upstream face to the reservoir bottom and turned to both the left and the right banks, and the reinforcement for the asphalt concrete impervious lining with a layer of polyester mesh at the parts where the tensile strain of the lining is large. A 3D FEM analysis is carried out for the main dam. The calculated results provide a good basis for the above compound treatment method. So far, this project has operated well for more than three years, illustrating the success of the treatment for the full weathering rock dam foundation

Missouri University of Science and Technology (Missouri S&T): Scholars' Mine

Symmetry Hierarchy and Thermalization Frustration in Graphene Nanoresonators

Author: Huang Liang
Wang Yisen
Publication venue
Publication date: 11/03/2024
Field of study

As the essential cause of the intrinsic dissipation that limits the quality of graphene nanoresonators, intermodal energy transfer is also a key issue in thermalization dynamics. Typically systems with larger initial energy demand shorter time to be thermalized. However, we find quantitatively that instead of becoming shorter, the equipartition time of the graphene nanoresonator can increase abruptly by one order of magnitude. This thermalization frustration emerges due to the partition of the normal modes based on the hierarchical symmetry, and a sensitive on-off switching of the energy flow channels between symmetry classes controlled by Mathieu instabilities. The results uncover the decisive roles of symmetry in the thermalization at the nanoscale, and may also lead to strategies for improving the performance of graphene nanoresonators

arXiv.org e-Print Archive

Laplacian Canonization: A Minimalist Approach to Sign and Basis Invariant Spectral Embedding

Author: Ma Jiangyan
Wang Yifei
Wang Yisen
Publication venue
Publication date: 10/01/2024
Field of study

Spectral embedding is a powerful graph embedding technique that has received a lot of attention recently due to its effectiveness on Graph Transformers. However, from a theoretical perspective, the universal expressive power of spectral embedding comes at the price of losing two important invariance properties of graphs, sign and basis invariance, which also limits its effectiveness on graph data. To remedy this issue, many previous methods developed costly approaches to learn new invariants and suffer from high computation complexity. In this work, we explore a minimal approach that resolves the ambiguity issues by directly finding canonical directions for the eigenvectors, named Laplacian Canonization (LC). As a pure pre-processing method, LC is light-weighted and can be applied to any existing GNNs. We provide a thorough investigation, from theory to algorithm, on this approach, and discover an efficient algorithm named Maximal Axis Projection (MAP) that works for both sign and basis invariance and successfully canonizes more than 90% of all eigenvectors. Experiments on real-world benchmark datasets like ZINC, MOLTOX21, and MOLPCBA show that MAP consistently outperforms existing methods while bringing minimal computation overhead. Code is available at https://github.com/PKU-ML/LaplacianCanonization.Comment: In Thirty-seventh Conference on Neural Information Processing Systems (2023

arXiv.org e-Print Archive

How Mask Matters: Towards Theoretical Understandings of Masked Autoencoders

Author: Wang Yifei
Wang Yisen
Zhang Qi
Publication venue
Publication date: 26/03/2023
Field of study

Masked Autoencoders (MAE) based on a reconstruction task have risen to be a promising paradigm for self-supervised learning (SSL) and achieve state-of-the-art performance across different benchmark datasets. However, despite its impressive empirical success, there is still limited theoretical understanding of it. In this paper, we propose a theoretical understanding of how masking matters for MAE to learn meaningful features. We establish a close connection between MAE and contrastive learning, which shows that MAE implicit aligns the mask-induced positive pairs. Built upon this connection, we develop the first downstream guarantees for MAE methods, and analyze the effect of mask ratio. Besides, as a result of the implicit alignment, we also point out the dimensional collapse issue of MAE, and propose a Uniformity-enhanced MAE (U-MAE) loss that can effectively address this issue and bring significant improvements on real-world datasets, including CIFAR-10, ImageNet-100, and ImageNet-1K. Code is available at (https://github.com/zhangq327/U-MAE)

arXiv.org e-Print Archive

Identifiable Contrastive Learning with Automatic Feature Importance Discovery

Author: Wang Yifei
Wang Yisen
Zhang Qi
Publication venue
Publication date: 29/10/2023
Field of study

Existing contrastive learning methods rely on pairwise sample contrast

z_x^\top z_{x'}

to learn data representations, but the learned features often lack clear interpretability from a human perspective. Theoretically, it lacks feature identifiability and different initialization may lead to totally different features. In this paper, we study a new method named tri-factor contrastive learning (triCL) that involves a 3-factor contrast in the form of

z_x^\top S z_{x'}

, where

S=\text{diag}(s_1,\dots,s_k)

is a learnable diagonal matrix that automatically captures the importance of each feature. We show that by this simple extension, triCL can not only obtain identifiable features that eliminate randomness but also obtain more interpretable features that are ordered according to the importance matrix

S

. We show that features with high importance have nice interpretability by capturing common classwise features, and obtain superior performance when evaluated for image retrieval using a few features. The proposed triCL objective is general and can be applied to different contrastive learning methods like SimCLR and CLIP. We believe that it is a better alternative to existing 2-factor contrastive learning by improving its identifiability and interpretability with minimal overhead. Code is available at https://github.com/PKU-ML/Tri-factor-Contrastive-Learning

arXiv.org e-Print Archive