24 research outputs found
Reliable Natural Language Understanding with Large Language Models and Answer Set Programming
Humans understand language by extracting information (meaning) from
sentences, combining it with existing commonsense knowledge, and then
performing reasoning to draw conclusions. While large language models (LLMs)
such as GPT-3 and ChatGPT are able to leverage patterns in the text to solve a
variety of NLP tasks, they fall short in problems that require reasoning. They
also cannot reliably explain the answers generated for a given question. In
order to emulate humans better, we propose STAR, a framework that combines LLMs
with Answer Set Programming (ASP). We show how LLMs can be used to effectively
extract knowledge -- represented as predicates -- from language. Goal-directed
ASP is then employed to reliably reason over this knowledge. We apply the STAR
framework to three different NLU tasks requiring reasoning: qualitative
reasoning, mathematical reasoning, and goal-directed conversation. Our
experiments reveal that STAR is able to bridge the gap of reasoning in NLU
tasks, leading to significant performance improvements, especially for smaller
LLMs, i.e., LLMs with a smaller number of parameters. NLU applications
developed using the STAR framework are also explainable: along with the
predicates generated, a justification in the form of a proof tree can be
produced for a given output.Comment: In Proceedings ICLP 2023, arXiv:2308.1489
Emergent Modularity in Pre-trained Transformers
This work examines the presence of modularity in pre-trained Transformers, a
feature commonly found in human brains and thought to be vital for general
intelligence. In analogy to human brains, we consider two main characteristics
of modularity: (1) functional specialization of neurons: we evaluate whether
each neuron is mainly specialized in a certain function, and find that the
answer is yes. (2) function-based neuron grouping: we explore finding a
structure that groups neurons into modules by function, and each module works
for its corresponding function. Given the enormous amount of possible
structures, we focus on Mixture-of-Experts as a promising candidate, which
partitions neurons into experts and usually activates different experts for
different inputs. Experimental results show that there are functional experts,
where clustered are the neurons specialized in a certain function. Moreover,
perturbing the activations of functional experts significantly affects the
corresponding function. Finally, we study how modularity emerges during
pre-training, and find that the modular structure is stabilized at the early
stage, which is faster than neuron stabilization. It suggests that Transformers
first construct the modular structure and then learn fine-grained neuron
functions. Our code and data are available at
https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202
Plug-and-Play Knowledge Injection for Pre-trained Language Models
Injecting external knowledge can improve the performance of pre-trained
language models (PLMs) on various downstream NLP tasks. However, massive
retraining is required to deploy new knowledge injection methods or knowledge
bases for downstream tasks. In this work, we are the first to study how to
improve the flexibility and efficiency of knowledge injection by reusing
existing downstream models. To this end, we explore a new paradigm
plug-and-play knowledge injection, where knowledge bases are injected into
frozen existing downstream models by a knowledge plugin. Correspondingly, we
propose a plug-and-play injection method map-tuning, which trains a mapping of
knowledge embeddings to enrich model inputs with mapped embeddings while
keeping model parameters frozen. Experimental results on three knowledge-driven
NLP tasks show that existing injection methods are not suitable for the new
paradigm, while map-tuning effectively improves the performance of downstream
models. Moreover, we show that a frozen downstream model can be well adapted to
different domains with different mapping networks of domain knowledge. Our code
and models are available at https://github.com/THUNLP/Knowledge-Plugin.Comment: ACL 202
MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation
Understanding events in texts is a core objective of natural language
understanding, which requires detecting event occurrences, extracting event
arguments, and analyzing inter-event relationships. However, due to the
annotation challenges brought by task complexity, a large-scale dataset
covering the full process of event understanding has long been absent. In this
paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event
argument annotations, making the first all-in-one dataset supporting event
detection, event argument extraction (EAE), and event relation extraction. As
an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive
schema covering 162 event types and 612 argument roles, all with expert-written
definitions and examples; (2) a large data scale, containing 98,591 events and
290,613 arguments obtained with laborious human annotation; (3) the exhaustive
annotation supporting all task variants of EAE, which annotates both entity and
non-entity event arguments in document level. Experiments indicate that
MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary
large language models (LLMs). Furthermore, to demonstrate the benefits of an
all-in-one dataset, we preliminarily explore a potential application, future
event prediction, with LLMs. MAVEN-Arg and our code can be obtained from
https://github.com/THU-KEG/MAVEN-Argument.Comment: Working in progres
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
Building Intelligent Systems by Combining Machine Learning and Automated Commonsense Reasoning
We present an approach to building systems that emulate human-like intelligence. Our approach uses machine learning technology (including generative AI systems) to extract knowledge from pictures, text, etc., and represents it as (pre-defined) predicates. Next, we use the s(CASP) automated commonsense reasoning system to check the consistency of this extracted knowledge and reason over it in a manner very similar to how a human would do it. We have used our approach for building systems for visual question answering, task-specific chatbots that can ``understand" human dialogs and interactively talk to them, and autonomous driving systems that rely on commonsense reasoning. Essentially, our approach emulates how humans process knowledge where they use sensing and pattern recognition to gain knowledge (Kahneman's System 1 thinking, akin to using a machine learning model), and then use reasoning to draw conclusions, generate response, or take actions (Kahneman's System 2 thinking, akin to automated reasoning)
A fine-grained and noise-aware method for neural relation extraction
Distant supervision is an efficient way to generate large-scale training data for relation extraction without human efforts. However, a coin has two sides. The automatically annotated labels for training data are problematic, which can be summarized as multi-instance multi-label problem and coarse-grained (bag-level) supervised signal. To address these problems, we propose two reasonable assumptions and craft reinforcement learning to capture the expressive sentence for each relation mentioned in a bag. More specifically, we extend the original expressed-at-least-once assumption to multi-label level, and introduce a novel express-at-most-one assumption. Besides, we design a fine-grained reward function, and model the sentence selection process as an auction where different relations for a bag need to compete together to achieve the possession of a specific sentence based on its expressiveness. In this way, our model can be dynamically self-adapted, and eventually implements the accurate one-to-one mapping from a relation label to its chosen expressive sentence, which serves as training instances for the extractor. The experimental results on a public dataset demonstrate that our model constantly and substantially outperforms current state-of-the-art methods for relation extraction
Inter- and trans-generational impacts of real-world PM2.5 exposure on male-specific primary hypogonadism
Abstract Exposure to PM2.5, a harmful type of air pollution, has been associated with compromised male reproductive health; however, it remains unclear whether such exposure can elicit transgenerational effects on male fertility. Here, we aim to examine the effect of paternal exposure to real-world PM2.5 on the reproductive health of male offspring. We have observed that paternal exposure to real-world PM2.5 can lead to transgenerational primary hypogonadism in a sex-selective manner, and we have also confirmed this phenotype by using an external model. Mechanically, we have identified small RNAs (sRNAs) that play a critical role in mediating these transgenerational effects. Specifically, miR6240 and piR016061, which are present in F0 PM sperm, regulate intergenerational transmission by targeting Lhcgr and Nsd1, respectively. We have also uncovered that piR033435 and piR006695 indirectly regulate F1 PM sperm methylation by binding to the 3′-untranslated region of Tet1 mRNA. The reduced expression of Tet1 resulted in hypermethylation of several testosterone synthesis genes, including Lhcgr and Gnas, impaired Leydig cell function and ultimately led to transgenerational primary hypogonadism. Our findings provide insights into the mechanisms underlying the transgenerational effects of paternal PM2.5 exposure on reproductive health, highlighting the crucial role played by sRNAs in mediating these effects. The findings underscore the significance of paternal pre-conception interventions in alleviating the adverse effects of environmental pollutants on reproductive health