45 research outputs found
Red Alarm for Pre-trained Models: Universal Vulnerability to Neuron-Level Backdoor Attacks
Pre-trained models (PTMs) have been widely used in various downstream tasks.
The parameters of PTMs are distributed on the Internet and may suffer backdoor
attacks. In this work, we demonstrate the universal vulnerability of PTMs,
where fine-tuned PTMs can be easily controlled by backdoor attacks in arbitrary
downstream tasks. Specifically, attackers can add a simple pre-training task,
which restricts the output representations of trigger instances to pre-defined
vectors, namely neuron-level backdoor attack (NeuBA). If the backdoor
functionality is not eliminated during fine-tuning, the triggers can make the
fine-tuned model predict fixed labels by pre-defined vectors. In the
experiments of both natural language processing (NLP) and computer vision (CV),
we show that NeuBA absolutely controls the predictions for trigger instances
without any knowledge of downstream tasks. Finally, we apply several defense
methods to NeuBA and find that model pruning is a promising direction to resist
NeuBA by excluding backdoored neurons. Our findings sound a red alarm for the
wide use of PTMs. Our source code and models are available at
\url{https://github.com/thunlp/NeuBA}
Emergent Modularity in Pre-trained Transformers
This work examines the presence of modularity in pre-trained Transformers, a
feature commonly found in human brains and thought to be vital for general
intelligence. In analogy to human brains, we consider two main characteristics
of modularity: (1) functional specialization of neurons: we evaluate whether
each neuron is mainly specialized in a certain function, and find that the
answer is yes. (2) function-based neuron grouping: we explore finding a
structure that groups neurons into modules by function, and each module works
for its corresponding function. Given the enormous amount of possible
structures, we focus on Mixture-of-Experts as a promising candidate, which
partitions neurons into experts and usually activates different experts for
different inputs. Experimental results show that there are functional experts,
where clustered are the neurons specialized in a certain function. Moreover,
perturbing the activations of functional experts significantly affects the
corresponding function. Finally, we study how modularity emerges during
pre-training, and find that the modular structure is stabilized at the early
stage, which is faster than neuron stabilization. It suggests that Transformers
first construct the modular structure and then learn fine-grained neuron
functions. Our code and data are available at
https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
Plug-and-Play Document Modules for Pre-trained Models
Large-scale pre-trained models (PTMs) have been widely used in
document-oriented NLP tasks, such as question answering. However, the
encoding-task coupling requirement results in the repeated encoding of the same
documents for different tasks and queries, which is highly computationally
inefficient. To this end, we target to decouple document encoding from
downstream tasks, and propose to represent each document as a plug-and-play
document module, i.e., a document plugin, for PTMs (PlugD). By inserting
document plugins into the backbone PTM for downstream tasks, we can encode a
document one time to handle multiple tasks, which is more efficient than
conventional encoding-task coupling methods that simultaneously encode
documents and input queries using task-specific encoders. Extensive experiments
on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode
documents once and for all across different scenarios. Especially, PlugD can
save computational costs while achieving comparable performance to
state-of-the-art encoding-task coupling methods. Additionally, we show that
PlugD can serve as an effective post-processing way to inject knowledge into
task-specific models, improving model performance without any additional model
training.Comment: Accepted by ACL 202
Adversarial Language Games for Advanced Natural Language Intelligence
We study the problem of adversarial language games, in which multiple agents
with conflicting goals compete with each other via natural language
interactions. While adversarial language games are ubiquitous in human
activities, little attention has been devoted to this field in natural language
processing. In this work, we propose a challenging adversarial language game
called Adversarial Taboo as an example, in which an attacker and a defender
compete around a target word. The attacker is tasked with inducing the defender
to utter the target word invisible to the defender, while the defender is
tasked with detecting the target word before being induced by the attacker. In
Adversarial Taboo, a successful attacker must hide its intention and subtly
induce the defender, while a competitive defender must be cautious with its
utterances and infer the intention of the attacker. Such language abilities can
facilitate many important downstream NLP tasks. To instantiate the game, we
create a game environment and a competition platform. Comprehensive experiments
and empirical studies on several baseline attack and defense strategies show
promising and interesting results. Based on the analysis on the game and
experiments, we discuss multiple promising directions for future research.Comment: Accepted by AAAI 202
Plug-and-Play Knowledge Injection for Pre-trained Language Models
Injecting external knowledge can improve the performance of pre-trained
language models (PLMs) on various downstream NLP tasks. However, massive
retraining is required to deploy new knowledge injection methods or knowledge
bases for downstream tasks. In this work, we are the first to study how to
improve the flexibility and efficiency of knowledge injection by reusing
existing downstream models. To this end, we explore a new paradigm
plug-and-play knowledge injection, where knowledge bases are injected into
frozen existing downstream models by a knowledge plugin. Correspondingly, we
propose a plug-and-play injection method map-tuning, which trains a mapping of
knowledge embeddings to enrich model inputs with mapped embeddings while
keeping model parameters frozen. Experimental results on three knowledge-driven
NLP tasks show that existing injection methods are not suitable for the new
paradigm, while map-tuning effectively improves the performance of downstream
models. Moreover, we show that a frozen downstream model can be well adapted to
different domains with different mapping networks of domain knowledge. Our code
and models are available at https://github.com/THUNLP/Knowledge-Plugin.Comment: ACL 202
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Pre-trained language models (PLMs) have achieved remarkable results on NLP
tasks but at the expense of huge parameter sizes and the consequent
computational costs. In this paper, we propose Variator, a parameter-efficient
acceleration method that enhances computational efficiency through
plug-and-play compression plugins. Compression plugins are designed to reduce
the sequence length via compressing multiple hidden vectors into one and
trained with original PLMs frozen. Different from traditional model
acceleration methods, which compress PLMs to smaller sizes, Variator offers two
distinct advantages: (1) In real-world applications, the plug-and-play nature
of our compression plugins enables dynamic selection of different compression
plugins with varying acceleration ratios based on the current workload. (2) The
compression plugin comprises a few compact neural network layers with minimal
parameters, significantly saving storage and memory overhead, particularly in
scenarios with a growing number of tasks. We validate the effectiveness of
Variator on seven datasets. Experimental results show that Variator can save
53% computational costs using only 0.9% additional parameters with a
performance drop of less than 2%. Moreover, when the model scales to billions
of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL
Gadolinium‐Doped Iron Oxide Nanoprobe as Multifunctional Bioimaging Agent and Drug Delivery System
Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/116012/1/adfm201502868.pdfhttp://deepblue.lib.umich.edu/bitstream/2027.42/116012/2/adfm201502868-sup-0001-S1.pd
Quality Quantification and Control via Novel Self-Growing Process-Quality Model of Parts Fabricated by LPBF Process
Laser Powder Bed Fusion (LPBF) presents a more extensive allowable design complexity and manufacturability compared with the traditional manufacturing processes by depositing materials in a layer-wised manner. However, the process variability in the LPBF process induces quality uncertainty and inconsistency. Specifically, the mechanical properties, e.g., tensile strength, are hard to be predicted and controlled in the LPBF process. Much research has recently been reported exploring the qualitative influence of single/two process parameters on tensile strength. In fact, mechanical properties are comprehensively affected by multiple correlated process parameters with unclear and complex interactions. Thus, the study on the quantitative process-quality model of the metal LPBF process is urgently needed to provide an enough-strength component via the metal LPBF process. Recent progress in artificial intelligence (AI) and machine learning (ML) provides new insight into quality prediction in terms of computational accuracy and speed. However, the predictive model quality through the traditional AL/ML is heavily determined by the training data size, and the experimental analysis can be expansive on LPBF. This paper explores the comprehensive effect of the tensile strength of 316L stainless-steel parts on LPBF and proposes a valid quantitative predictive model through a novel self-growing machine-learning framework. The self-growing framework can autonomously expand and classify the growing dataset to provide a high-accuracy prediction with fewer input data. To verify this predictive model of tensile strength, specimens manufactured by the LPBF process with different group process parameters (laser power, scanning speed, and hatch spacing) are collected. The experimental results validate the predicted tensile strengths within a less than 3% deviation