48 research outputs found
Multi-task Neural Network for Non-discrete Attribute Prediction in Knowledge Graphs
Many popular knowledge graphs such as Freebase, YAGO or DBPedia maintain a
list of non-discrete attributes for each entity. Intuitively, these attributes
such as height, price or population count are able to richly characterize
entities in knowledge graphs. This additional source of information may help to
alleviate the inherent sparsity and incompleteness problem that are prevalent
in knowledge graphs. Unfortunately, many state-of-the-art relational learning
models ignore this information due to the challenging nature of dealing with
non-discrete data types in the inherently binary-natured knowledge graphs. In
this paper, we propose a novel multi-task neural network approach for both
encoding and prediction of non-discrete attribute information in a relational
setting. Specifically, we train a neural network for triplet prediction along
with a separate network for attribute value regression. Via multi-task
learning, we are able to learn representations of entities, relations and
attributes that encode information about both tasks. Moreover, such attributes
are not only central to many predictive tasks as an information source but also
as a prediction target. Therefore, models that are able to encode, incorporate
and predict such information in a relational learning context are highly
attractive as well. We show that our approach outperforms many state-of-the-art
methods for the tasks of relational triplet classification and attribute value
prediction.Comment: Accepted at CIKM 201
Boosting Inference Efficiency: Unleashing the Power of Parameter-Shared Pre-trained Language Models
Parameter-shared pre-trained language models (PLMs) have emerged as a
successful approach in resource-constrained environments, enabling substantial
reductions in model storage and memory costs without significant performance
compromise. However, it is important to note that parameter sharing does not
alleviate computational burdens associated with inference, thus impeding its
practicality in situations characterized by limited stringent latency
requirements or computational resources. Building upon neural ordinary
differential equations (ODEs), we introduce a straightforward technique to
enhance the inference efficiency of parameter-shared PLMs. Additionally, we
propose a simple pre-training technique that leads to fully or partially shared
models capable of achieving even greater inference acceleration. The
experimental results demonstrate the effectiveness of our methods on both
autoregressive and autoencoding PLMs, providing novel insights into more
efficient utilization of parameter-shared models in resource-constrained
settings.Comment: EMNLP 2023 Finding
Interaction Embeddings for Prediction and Explanation in Knowledge Graphs
Knowledge graph embedding aims to learn distributed representations for
entities and relations, and is proven to be effective in many applications.
Crossover interactions --- bi-directional effects between entities and
relations --- help select related information when predicting a new triple, but
haven't been formally discussed before. In this paper, we propose CrossE, a
novel knowledge graph embedding which explicitly simulates crossover
interactions. It not only learns one general embedding for each entity and
relation as most previous methods do, but also generates multiple triple
specific embeddings for both of them, named interaction embeddings. We evaluate
embeddings on typical link prediction tasks and find that CrossE achieves
state-of-the-art results on complex and more challenging datasets. Furthermore,
we evaluate embeddings from a new perspective --- giving explanations for
predicted triples, which is important for real applications. In this work, an
explanation for a triple is regarded as a reliable closed-path between the head
and the tail entity. Compared to other baselines, we show experimentally that
CrossE, benefiting from interaction embeddings, is more capable of generating
reliable explanations to support its predictions.Comment: This paper is accepted by WSDM201
Emergent Modularity in Pre-trained Transformers
This work examines the presence of modularity in pre-trained Transformers, a
feature commonly found in human brains and thought to be vital for general
intelligence. In analogy to human brains, we consider two main characteristics
of modularity: (1) functional specialization of neurons: we evaluate whether
each neuron is mainly specialized in a certain function, and find that the
answer is yes. (2) function-based neuron grouping: we explore finding a
structure that groups neurons into modules by function, and each module works
for its corresponding function. Given the enormous amount of possible
structures, we focus on Mixture-of-Experts as a promising candidate, which
partitions neurons into experts and usually activates different experts for
different inputs. Experimental results show that there are functional experts,
where clustered are the neurons specialized in a certain function. Moreover,
perturbing the activations of functional experts significantly affects the
corresponding function. Finally, we study how modularity emerges during
pre-training, and find that the modular structure is stabilized at the early
stage, which is faster than neuron stabilization. It suggests that Transformers
first construct the modular structure and then learn fine-grained neuron
functions. Our code and data are available at
https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
Variator: Accelerating Pre-trained Models with Plug-and-Play Compression Modules
Pre-trained language models (PLMs) have achieved remarkable results on NLP
tasks but at the expense of huge parameter sizes and the consequent
computational costs. In this paper, we propose Variator, a parameter-efficient
acceleration method that enhances computational efficiency through
plug-and-play compression plugins. Compression plugins are designed to reduce
the sequence length via compressing multiple hidden vectors into one and
trained with original PLMs frozen. Different from traditional model
acceleration methods, which compress PLMs to smaller sizes, Variator offers two
distinct advantages: (1) In real-world applications, the plug-and-play nature
of our compression plugins enables dynamic selection of different compression
plugins with varying acceleration ratios based on the current workload. (2) The
compression plugin comprises a few compact neural network layers with minimal
parameters, significantly saving storage and memory overhead, particularly in
scenarios with a growing number of tasks. We validate the effectiveness of
Variator on seven datasets. Experimental results show that Variator can save
53% computational costs using only 0.9% additional parameters with a
performance drop of less than 2%. Moreover, when the model scales to billions
of parameters, Variator matches the strong performance of uncompressed PLMs.Comment: Accepted by Findings of EMNL
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation
Understanding events in texts is a core objective of natural language
understanding, which requires detecting event occurrences, extracting event
arguments, and analyzing inter-event relationships. However, due to the
annotation challenges brought by task complexity, a large-scale dataset
covering the full process of event understanding has long been absent. In this
paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event
argument annotations, making the first all-in-one dataset supporting event
detection, event argument extraction (EAE), and event relation extraction. As
an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive
schema covering 162 event types and 612 argument roles, all with expert-written
definitions and examples; (2) a large data scale, containing 98,591 events and
290,613 arguments obtained with laborious human annotation; (3) the exhaustive
annotation supporting all task variants of EAE, which annotates both entity and
non-entity event arguments in document level. Experiments indicate that
MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary
large language models (LLMs). Furthermore, to demonstrate the benefits of an
all-in-one dataset, we preliminarily explore a potential application, future
event prediction, with LLMs. MAVEN-Arg and our code can be obtained from
https://github.com/THU-KEG/MAVEN-Argument.Comment: Working in progres
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure