45 research outputs found

    Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks

    Full text link
    Pre-trained models (PTMs) have been widely used in various downstream tasks. The parameters of PTMs are distributed on the Internet and may suffer backdoor attacks. In this work, we demonstrate the universal vulnerability of PTMs, where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary downstream tasks. Specifically, attackers can add a simple pre-training task, which restricts the output representations of trigger instances to pre-defined vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor functionality is not eliminated during fine-tuning, the triggers can make the fine-tuned model predict fixed labels by pre-defined vectors. In the experiments of both natural language processing (NLP) and computer vision (CV), we show that NeuBA absolutely controls the predictions for trigger instances without any knowledge of downstream tasks. Finally, we apply several defense methods to NeuBA and find that model pruning is a promising direction to resist NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the wide use of PTMs. Our source code and models are available at \url{https://github.com/thunlp/NeuBA}

    Emergent Modularity in Pre-trained Transformers

    Full text link
    This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202

    Plug-and-Play Document Modules for Pre-trained Models

    Full text link
    Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save 69%69\% computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training.Comment: Accepted by ACL 202

    Adversarial Language Games for Advanced Natural Language Intelligence

    Full text link
    We study the problem of adversarial language games, in which multiple agents with conflicting goals compete with each other via natural language interactions. While adversarial language games are ubiquitous in human activities, little attention has been devoted to this field in natural language processing. In this work, we propose a challenging adversarial language game called Adversarial Taboo as an example, in which an attacker and a defender compete around a target word. The attacker is tasked with inducing the defender to utter the target word invisible to the defender, while the defender is tasked with detecting the target word before being induced by the attacker. In Adversarial Taboo, a successful attacker must hide its intention and subtly induce the defender, while a competitive defender must be cautious with its utterances and infer the intention of the attacker. Such language abilities can facilitate many important downstream NLP tasks. To instantiate the game, we create a game environment and a competition platform. Comprehensive experiments and empirical studies on several baseline attack and defense strategies show promising and interesting results. Based on the analysis on the game and experiments, we discuss multiple promising directions for future research.Comment: Accepted by AAAI 202

    Plug-and-Play Knowledge Injection for Pre-trained Language Models

    Full text link
    Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. To this end, we explore a new paradigm plug-and-play knowledge injection, where knowledge bases are injected into frozen existing downstream models by a knowledge plugin. Correspondingly, we propose a plug-and-play injection method map-tuning, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embeddings while keeping model parameters frozen. Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models. Moreover, we show that a frozen downstream model can be well adapted to different domains with different mapping networks of domain knowledge. Our code and models are available at https://github.com/THUNLP/Knowledge-Plugin.Comment: ACL 202

    Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules

    Full text link
    Pre-trained language models (PLMs) have achieved remarkable results on NLP tasks but at the expense of huge parameter sizes and the consequent computational costs. In this paper, we propose Variator, a parameter-efficient acceleration method that enhances computational efficiency through plug-and-play compression plugins. Compression plugins are designed to reduce the sequence length via compressing multiple hidden vectors into one and trained with original PLMs frozen. Different from traditional model acceleration methods, which compress PLMs to smaller sizes, Variator offers two distinct advantages: (1) In real-world applications, the plug-and-play nature of our compression plugins enables dynamic selection of different compression plugins with varying acceleration ratios based on the current workload. (2) The compression plugin comprises a few compact neural network layers with minimal parameters, significantly saving storage and memory overhead, particularly in scenarios with a growing number of tasks. We validate the effectiveness of Variator on seven datasets. Experimental results show that Variator can save 53% computational costs using only 0.9% additional parameters with a performance drop of less than 2%. Moreover, when the model scales to billions of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL

    Gadolinium‐Doped Iron Oxide Nanoprobe as Multifunctional Bioimaging Agent and Drug Delivery System

    Full text link
    Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/116012/1/adfm201502868.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/116012/2/adfm201502868-sup-0001-S1.pd

    Quality Quantification and Control via Novel Self-Growing Process-Quality Model of Parts Fabricated by LPBF Process

    No full text
    Laser Powder Bed Fusion (LPBF) presents a more extensive allowable design complexity and manufacturability compared with the traditional manufacturing processes by depositing materials in a layer-wised manner. However, the process variability in the LPBF process induces quality uncertainty and inconsistency. Specifically, the mechanical properties, e.g., tensile strength, are hard to be predicted and controlled in the LPBF process. Much research has recently been reported exploring the qualitative influence of single/two process parameters on tensile strength. In fact, mechanical properties are comprehensively affected by multiple correlated process parameters with unclear and complex interactions. Thus, the study on the quantitative process-quality model of the metal LPBF process is urgently needed to provide an enough-strength component via the metal LPBF process. Recent progress in artificial intelligence (AI) and machine learning (ML) provides new insight into quality prediction in terms of computational accuracy and speed. However, the predictive model quality through the traditional AL/ML is heavily determined by the training data size, and the experimental analysis can be expansive on LPBF. This paper explores the comprehensive effect of the tensile strength of 316L stainless-steel parts on LPBF and proposes a valid quantitative predictive model through a novel self-growing machine-learning framework. The self-growing framework can autonomously expand and classify the growing dataset to provide a high-accuracy prediction with fewer input data. To verify this predictive model of tensile strength, specimens manufactured by the LPBF process with different group process parameters (laser power, scanning speed, and hatch spacing) are collected. The experimental results validate the predicted tensile strengths within a less than 3% deviation
    corecore