24 research outputs found

    Reliable Natural Language Understanding with Large Language Models and Answer Set Programming

    Full text link
    Humans understand language by extracting information (meaning) from sentences, combining it with existing commonsense knowledge, and then performing reasoning to draw conclusions. While large language models (LLMs) such as GPT-3 and ChatGPT are able to leverage patterns in the text to solve a variety of NLP tasks, they fall short in problems that require reasoning. They also cannot reliably explain the answers generated for a given question. In order to emulate humans better, we propose STAR, a framework that combines LLMs with Answer Set Programming (ASP). We show how LLMs can be used to effectively extract knowledge -- represented as predicates -- from language. Goal-directed ASP is then employed to reliably reason over this knowledge. We apply the STAR framework to three different NLU tasks requiring reasoning: qualitative reasoning, mathematical reasoning, and goal-directed conversation. Our experiments reveal that STAR is able to bridge the gap of reasoning in NLU tasks, leading to significant performance improvements, especially for smaller LLMs, i.e., LLMs with a smaller number of parameters. NLU applications developed using the STAR framework are also explainable: along with the predicates generated, a justification in the form of a proof tree can be produced for a given output.Comment: In Proceedings ICLP 2023, arXiv:2308.1489

    Emergent Modularity in Pre-trained Transformers

    Full text link
    This work examines the presence of modularity in pre-trained Transformers, a feature commonly found in human brains and thought to be vital for general intelligence. In analogy to human brains, we consider two main characteristics of modularity: (1) functional specialization of neurons: we evaluate whether each neuron is mainly specialized in a certain function, and find that the answer is yes. (2) function-based neuron grouping: we explore finding a structure that groups neurons into modules by function, and each module works for its corresponding function. Given the enormous amount of possible structures, we focus on Mixture-of-Experts as a promising candidate, which partitions neurons into experts and usually activates different experts for different inputs. Experimental results show that there are functional experts, where clustered are the neurons specialized in a certain function. Moreover, perturbing the activations of functional experts significantly affects the corresponding function. Finally, we study how modularity emerges during pre-training, and find that the modular structure is stabilized at the early stage, which is faster than neuron stabilization. It suggests that Transformers first construct the modular structure and then learn fine-grained neuron functions. Our code and data are available at https://github.com/THUNLP/modularity-analysis.Comment: Findings of ACL 202

    Plug-and-Play Knowledge Injection for Pre-trained Language Models

    Full text link
    Injecting external knowledge can improve the performance of pre-trained language models (PLMs) on various downstream NLP tasks. However, massive retraining is required to deploy new knowledge injection methods or knowledge bases for downstream tasks. In this work, we are the first to study how to improve the flexibility and efficiency of knowledge injection by reusing existing downstream models. To this end, we explore a new paradigm plug-and-play knowledge injection, where knowledge bases are injected into frozen existing downstream models by a knowledge plugin. Correspondingly, we propose a plug-and-play injection method map-tuning, which trains a mapping of knowledge embeddings to enrich model inputs with mapped embeddings while keeping model parameters frozen. Experimental results on three knowledge-driven NLP tasks show that existing injection methods are not suitable for the new paradigm, while map-tuning effectively improves the performance of downstream models. Moreover, we show that a frozen downstream model can be well adapted to different domains with different mapping networks of domain knowledge. Our code and models are available at https://github.com/THUNLP/Knowledge-Plugin.Comment: ACL 202

    MAVEN-Arg: Completing the Puzzle of All-in-One Event Understanding Dataset with Event Argument Annotation

    Full text link
    Understanding events in texts is a core objective of natural language understanding, which requires detecting event occurrences, extracting event arguments, and analyzing inter-event relationships. However, due to the annotation challenges brought by task complexity, a large-scale dataset covering the full process of event understanding has long been absent. In this paper, we introduce MAVEN-Arg, which augments MAVEN datasets with event argument annotations, making the first all-in-one dataset supporting event detection, event argument extraction (EAE), and event relation extraction. As an EAE benchmark, MAVEN-Arg offers three main advantages: (1) a comprehensive schema covering 162 event types and 612 argument roles, all with expert-written definitions and examples; (2) a large data scale, containing 98,591 events and 290,613 arguments obtained with laborious human annotation; (3) the exhaustive annotation supporting all task variants of EAE, which annotates both entity and non-entity event arguments in document level. Experiments indicate that MAVEN-Arg is quite challenging for both fine-tuned EAE models and proprietary large language models (LLMs). Furthermore, to demonstrate the benefits of an all-in-one dataset, we preliminarily explore a potential application, future event prediction, with LLMs. MAVEN-Arg and our code can be obtained from https://github.com/THU-KEG/MAVEN-Argument.Comment: Working in progres

    Real-time Monitoring for the Next Core-Collapse Supernova in JUNO

    Full text link
    Core-collapse supernova (CCSN) is one of the most energetic astrophysical events in the Universe. The early and prompt detection of neutrinos before (pre-SN) and during the SN burst is a unique opportunity to realize the multi-messenger observation of the CCSN events. In this work, we describe the monitoring concept and present the sensitivity of the system to the pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is a 20 kton liquid scintillator detector under construction in South China. The real-time monitoring system is designed with both the prompt monitors on the electronic board and online monitors at the data acquisition stage, in order to ensure both the alert speed and alert coverage of progenitor stars. By assuming a false alert rate of 1 per year, this monitoring system can be sensitive to the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos up to about 370 (360) kpc for a progenitor mass of 30M⊙M_{\odot} for the case of normal (inverted) mass ordering. The pointing ability of the CCSN is evaluated by using the accumulated event anisotropy of the inverse beta decay interactions from pre-SN or SN neutrinos, which, along with the early alert, can play important roles for the followup multi-messenger observations of the next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure

    Building Intelligent Systems by Combining Machine Learning and Automated Commonsense Reasoning

    No full text
    We present an approach to building systems that emulate human-like intelligence. Our approach uses machine learning technology (including generative AI systems) to extract knowledge from pictures, text, etc., and represents it as (pre-defined) predicates. Next, we use the s(CASP) automated commonsense reasoning system to check the consistency of this extracted knowledge and reason over it in a manner very similar to how a human would do it. We have used our approach for building systems for visual question answering, task-specific chatbots that can ``understand" human dialogs and interactively talk to them, and autonomous driving systems that rely on commonsense reasoning. Essentially, our approach emulates how humans process knowledge where they use sensing and pattern recognition to gain knowledge (Kahneman's System 1 thinking, akin to using a machine learning model), and then use reasoning to draw conclusions, generate response, or take actions (Kahneman's System 2 thinking, akin to automated reasoning)

    A fine-grained and noise-aware method for neural relation extraction

    No full text
    Distant supervision is an efficient way to generate large-scale training data for relation extraction without human efforts. However, a coin has two sides. The automatically annotated labels for training data are problematic, which can be summarized as multi-instance multi-label problem and coarse-grained (bag-level) supervised signal. To address these problems, we propose two reasonable assumptions and craft reinforcement learning to capture the expressive sentence for each relation mentioned in a bag. More specifically, we extend the original expressed-at-least-once assumption to multi-label level, and introduce a novel express-at-most-one assumption. Besides, we design a fine-grained reward function, and model the sentence selection process as an auction where different relations for a bag need to compete together to achieve the possession of a specific sentence based on its expressiveness. In this way, our model can be dynamically self-adapted, and eventually implements the accurate one-to-one mapping from a relation label to its chosen expressive sentence, which serves as training instances for the extractor. The experimental results on a public dataset demonstrate that our model constantly and substantially outperforms current state-of-the-art methods for relation extraction

    Inter- and trans-generational impacts of real-world PM2.5 exposure on male-specific primary hypogonadism

    No full text
    Abstract Exposure to PM2.5, a harmful type of air pollution, has been associated with compromised male reproductive health; however, it remains unclear whether such exposure can elicit transgenerational effects on male fertility. Here, we aim to examine the effect of paternal exposure to real-world PM2.5 on the reproductive health of male offspring. We have observed that paternal exposure to real-world PM2.5 can lead to transgenerational primary hypogonadism in a sex-selective manner, and we have also confirmed this phenotype by using an external model. Mechanically, we have identified small RNAs (sRNAs) that play a critical role in mediating these transgenerational effects. Specifically, miR6240 and piR016061, which are present in F0 PM sperm, regulate intergenerational transmission by targeting Lhcgr and Nsd1, respectively. We have also uncovered that piR033435 and piR006695 indirectly regulate F1 PM sperm methylation by binding to the 3′-untranslated region of Tet1 mRNA. The reduced expression of Tet1 resulted in hypermethylation of several testosterone synthesis genes, including Lhcgr and Gnas, impaired Leydig cell function and ultimately led to transgenerational primary hypogonadism. Our findings provide insights into the mechanisms underlying the transgenerational effects of paternal PM2.5 exposure on reproductive health, highlighting the crucial role played by sRNAs in mediating these effects. The findings underscore the significance of paternal pre-conception interventions in alleviating the adverse effects of environmental pollutants on reproductive health
    corecore