39 research outputs found
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a
list of non-discrete attributes for each entity. Intuitively, these attributes
such as height, price or population count are able to richly characterize
entities in knowledge graphs. This additional source of information may help to
alleviate the inherent sparsity and incompleteness problem that are prevalent
in knowledge graphs. Unfortunately, many state-of-the-art relational learning
models ignore this information due to the challenging nature of dealing with
non-discrete data types in the inherently binary-natured knowledge graphs. In
this paper, we propose a novel multi-task neural network approach for both
encoding and prediction of non-discrete attribute information in a relational
setting. Specifically, we train a neural network for triplet prediction along
with a separate network for attribute value regression. Via multi-task
learning, we are able to learn representations of entities, relations and
attributes that encode information about both tasks. Moreover, such attributes
are not only central to many predictive tasks as an information source but also
as a prediction target. Therefore, models that are able to encode, incorporate
and predict such information in a relational learning context are highly
attractive as well. We show that our approach outperforms many state-of-the-art
methods for the tasks of relational triplet classification and attribute value
prediction.Comment: Accepted at CIKM 201
Emergent Modularity in Pre-trained Transformers
This work examines the presence of modularity in pre-trained Transformers, a
feature commonly found in human brains and thought to be vital for general
intelligence. In analogy to human brains, we consider two main characteristics
of modularity: (1) functional specialization of neurons: we evaluate whether
each neuron is mainly specialized in a certain function, and find that the
answer is yes. (2) function-based neuron grouping: we explore finding a
structure that groups neurons into modules by function, and each module works
for its corresponding function. Given the enormous amount of possible
structures, we focus on Mixture-of-Experts as a promising candidate, which
partitions neurons into experts and usually activates different experts for
different inputs. Experimental results show that there are functional experts,
where clustered are the neurons specialized in a certain function. Moreover,
perturbing the activations of functional experts significantly affects the
corresponding function. Finally, we study how modularity emerges during
pre-training, and find that the modular structure is stabilized at the early
stage, which is faster than neuron stabilization. It suggests that Transformers
first construct the modular structure and then learn fine-grained neuron
functions. Our code and data are available at
https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
Plug-and-Play Document Modules for Pre-trained Models
Large-scale pre-trained models (PTMs) have been widely used in
document-oriented NLP tasks, such as question answering. However, the
encoding-task coupling requirement results in the repeated encoding of the same
documents for different tasks and queries, which is highly computationally
inefficient. To this end, we target to decouple document encoding from
downstream tasks, and propose to represent each document as a plug-and-play
document module, i.e., a document plugin, for PTMs (PlugD). By inserting
document plugins into the backbone PTM for downstream tasks, we can encode a
document one time to handle multiple tasks, which is more efficient than
conventional encoding-task coupling methods that simultaneously encode
documents and input queries using task-specific encoders. Extensive experiments
on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode
documents once and for all across different scenarios. Especially, PlugD can
save computational costs while achieving comparable performance to
state-of-the-art encoding-task coupling methods. Additionally, we show that
PlugD can serve as an effective post-processing way to inject knowledge into
task-specific models, improving model performance without any additional model
training.Comment: Accepted by ACL 202
Plug-and-Play Knowledge Injection for Pre-trained Language Models
Injecting external knowledge can improve the performance of pre-trained
language models (PLMs) on various downstream NLP tasks. However, massive
retraining is required to deploy new knowledge injection methods or knowledge
bases for downstream tasks. In this work, we are the first to study how to
improve the flexibility and efficiency of knowledge injection by reusing
existing downstream models. To this end, we explore a new paradigm
plug-and-play knowledge injection, where knowledge bases are injected into
frozen existing downstream models by a knowledge plugin. Correspondingly, we
propose a plug-and-play injection method map-tuning, which trains a mapping of
knowledge embeddings to enrich model inputs with mapped embeddings while
keeping model parameters frozen. Experimental results on three knowledge-driven
NLP tasks show that existing injection methods are not suitable for the new
paradigm, while map-tuning effectively improves the performance of downstream
models. Moreover, we show that a frozen downstream model can be well adapted to
different domains with different mapping networks of domain knowledge. Our code
and models are available at https://github.com/THUNLP/Knowledge-Plugin.Comment: ACL 202
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Pre-trained language models (PLMs) have achieved remarkable results on NLP
tasks but at the expense of huge parameter sizes and the consequent
computational costs. In this paper, we propose Variator, a parameter-efficient
acceleration method that enhances computational efficiency through
plug-and-play compression plugins. Compression plugins are designed to reduce
the sequence length via compressing multiple hidden vectors into one and
trained with original PLMs frozen. Different from traditional model
acceleration methods, which compress PLMs to smaller sizes, Variator offers two
distinct advantages: (1) In real-world applications, the plug-and-play nature
of our compression plugins enables dynamic selection of different compression
plugins with varying acceleration ratios based on the current workload. (2) The
compression plugin comprises a few compact neural network layers with minimal
parameters, significantly saving storage and memory overhead, particularly in
scenarios with a growing number of tasks. We validate the effectiveness of
Variator on seven datasets. Experimental results show that Variator can save
53% computational costs using only 0.9% additional parameters with a
performance drop of less than 2%. Moreover, when the model scales to billions
of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL
Brief research report: in-depth immunophenotyping reveals stability of CD19 CAR T-cells over time
Variability or stability might have an impact on treatment success and toxicity of CD19 CAR T-cells. We conducted a prospective observational study of 12 patients treated with Tisagenlecleucel for CD19+ B-cell malignancies. Using a 31-color spectral flow cytometry panel, we analyzed differentiation stages and exhaustion markers of CAR T-cell subsets prior to CAR T-cell infusion and longitudinally during 6 months of follow-up. The majority of activation markers on CAR T-cells showed stable expression patterns over time and were not associated with response to therapy or toxicity. Unsupervised cluster analysis revealed an immune signature of CAR T-cell products associated with the development of immune cell-associated neurotoxicity syndrome. Warranting validation in an independent patient cohort, in-depth phenotyping of CAR T-cell products as well as longitudinal monitoring post cell transfer might become a valuable tool to increase efficacy and safety of CAR T-cell therapy
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure