68 research outputs found

    Decouple knowledge from paramters for plug-and-play language modeling

    Full text link
    Pre-trained language models(PLM) have made impressive results in various NLP tasks. It has been revealed that one of the key factors to their success is the parameters of these models implicitly learn all kinds of knowledge during pre-training. However, encoding knowledge implicitly in the model parameters has two fundamental drawbacks. First, the knowledge is neither editable nor scalable once the model is trained, which is especially problematic in that knowledge is consistently evolving. Second, it lacks interpretability and prevents humans from understanding which knowledge PLM requires for a certain problem. In this paper, we introduce PlugLM, a pre-training model with differentiable plug-in memory(DPM). The key intuition is to decouple the knowledge storage from model parameters with an editable and scalable key-value memory and leverage knowledge in an explainable manner by knowledge retrieval in the DPM. To justify this design choice, we conduct evaluations in three settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements across four domains on average without any in-domain pre-training. (2) knowledge update. PlugLM could absorb new knowledge in a training-free way after pre-training is done. (3) in-task knowledge learning. PlugLM could be further improved by incorporating training samples into DPM with knowledge prompting.Comment: ACL2023 Finding

    Plug-and-Play Document Modules for Pre-trained Models

    Full text link
    Large-scale pre-trained models (PTMs) have been widely used in document-oriented NLP tasks, such as question answering. However, the encoding-task coupling requirement results in the repeated encoding of the same documents for different tasks and queries, which is highly computationally inefficient. To this end, we target to decouple document encoding from downstream tasks, and propose to represent each document as a plug-and-play document module, i.e., a document plugin, for PTMs (PlugD). By inserting document plugins into the backbone PTM for downstream tasks, we can encode a document one time to handle multiple tasks, which is more efficient than conventional encoding-task coupling methods that simultaneously encode documents and input queries using task-specific encoders. Extensive experiments on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode documents once and for all across different scenarios. Especially, PlugD can save 69%69\% computational costs while achieving comparable performance to state-of-the-art encoding-task coupling methods. Additionally, we show that PlugD can serve as an effective post-processing way to inject knowledge into task-specific models, improving model performance without any additional model training.Comment: Accepted by ACL 202

    A Survey on Large Language Model based Autonomous Agents

    Full text link
    Autonomous agents have long been a prominent research topic in the academic community. Previous research in this field often focuses on training agents with limited knowledge within isolated environments, which diverges significantly from the human learning processes, and thus makes the agents hard to achieve human-like decisions. Recently, through the acquisition of vast amounts of web knowledge, large language models (LLMs) have demonstrated remarkable potential in achieving human-level intelligence. This has sparked an upsurge in studies investigating autonomous agents based on LLMs. To harness the full potential of LLMs, researchers have devised diverse agent architectures tailored to different applications. In this paper, we present a comprehensive survey of these studies, delivering a systematic review of the field of autonomous agents from a holistic perspective. More specifically, our focus lies in the construction of LLM-based agents, for which we propose a unified framework that encompasses a majority of the previous work. Additionally, we provide a summary of the various applications of LLM-based AI agents in the domains of social science, natural science, and engineering. Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI agents. Based on the previous studies, we also present several challenges and future directions in this field. To keep track of this field and continuously update our survey, we maintain a repository for the related references at https://github.com/Paitesanshi/LLM-Agent-Survey.Comment: 32 pages, 3 figure

    When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm

    Full text link
    User behavior analysis is crucial in human-centered AI applications. In this field, the collection of sufficient and high-quality user behavior data has always been a fundamental yet challenging problem. An intuitive idea to address this problem is automatically simulating the user behaviors. However, due to the subjective and complex nature of human cognitive processes, reliably simulating the user behavior is difficult. Recently, large language models (LLM) have obtained remarkable successes, showing great potential to achieve human-like intelligence. We argue that these models present significant opportunities for reliable user simulation, and have the potential to revolutionize traditional study paradigms in user behavior analysis. In this paper, we take recommender system as an example to explore the potential of using LLM for user simulation. Specifically, we regard each user as an LLM-based autonomous agent, and let different agents freely communicate, behave and evolve in a virtual simulator called RecAgent. For comprehensively simulation, we not only consider the behaviors within the recommender system (\emph{e.g.}, item browsing and clicking), but also accounts for external influential factors, such as, friend chatting and social advertisement. Our simulator contains at most 1000 agents, and each agent is composed of a profiling module, a memory module and an action module, enabling it to behave consistently, reasonably and reliably. In addition, to more flexibly operate our simulator, we also design two global functions including real-human playing and system intervention. To evaluate the effectiveness of our simulator, we conduct extensive experiments from both agent and system perspectives. In order to advance this direction, we have released our project at {https://github.com/RUC-GSAI/YuLan-Rec}.Comment: 26 pages, 9 figure

    Mechanism of crocin I on ANIT-induced intrahepatic cholestasis by combined metabolomics and transcriptomics

    Get PDF
    Background: Intrahepatic cholestasis (IC) is a disorder of bile production, secretion, and excretion with various causes. Crocin I (CR) is effective in the treatment of IC, but its underlying mechanisms need to be further explored. We aimed to reveal the therapeutic mechanism of crocin I for IC by combining an integrated strategy of metabolomics and transcriptomics.Methods: The hepatoprotective effect of CR against cholestasis liver injury induced by α-naphthylisothiocyanate (ANIT) was evaluated in rats. The serum biochemical indices, including alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bile acid (TBA), total bilirubin (TBIL), direct bilirubin (DBIL), tumor necrosis factor-α (TNF-α), interleukin 6 (IL-6), and interleukin 1β (IL-1β), as well as the liver oxidative stress indexes and the pathological characteristics of the liver were analyzed. In addition, we also performed a serum metabolomics study using UPLC-Q Exactive HF-X technology to investigate the effect of CR on the serum of rats with ANIT-induced IC and screened potential biomarkers. The enrichment analysis of differential expressed genes (DEGs) was performed by transcriptomics. Finally, the regulatory targets of CR on potential biomarkers were obtained by combined analysis, and the relevant key targets were verified by western blotting.Results: CR improved serum and liver homogenate indexes and alleviated liver histological injury. Compared with ANIT group, the CR group had 76 differential metabolites, and 10 metabolic pathways were enriched. There were 473 DEGs significantly changed after CR treatment, most of which were enriched in the retinol metabolism, calcium signaling pathway, PPAR signaling pathway, circadian rhythm, chemokine signaling pathway, arachidonic acid metabolism, bile secretion, primary bile acid biosynthesis, and other pathways. By constructing the “compound-reaction-enzyme-gene” interaction network, three potential key-target regulation biomarkers were obtained, including 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR), ATP-binding cassette transporter G5 (ABCG5), and sulfotransferase2A1(SULT2A1), which were further verified by western blotting. Compared with the ANIT group, the CR group significantly increased the expression of ABCG5 and SULT2A1, and the expression of HMGCR significantly decreased.Conclusion: Combined metabolomic and transcriptomic analyses show that CR has a therapeutic effect on IC through regulation of the biosynthesis of bile acids and bilirubin in the bile secretion pathway and regulation of the expression of HMGCR, ABCG5, and SULT2A1

    Early Second-Trimester Serum MiRNA Profiling Predicts Gestational Diabetes Mellitus

    Get PDF
    BACKGROUND: Gestational diabetes mellitus (GDM) is one type of diabetes that presents during pregnancy and significantly increases the risk of a number of adverse consequences for the fetus and mother. The microRNAs (miRNA) have recently been demonstrated to abundantly and stably exist in serum and to be potentially disease-specific. However, no reported study investigates the associations between serum miRNA and GDM. METHODOLOGY/PRINCIPAL FINDINGS: We systematically used the TaqMan Low Density Array followed by individual quantitative reverse transcription polymerase chain reaction assays to screen miRNAs in serum collected at 16-19 gestational weeks. The expression levels of three miRNAs (miR-132, miR-29a and miR-222) were significantly decreased in GDM women with respect to the controls in similar gestational weeks in our discovery evaluation and internal validation, and two miRNAs (miR-29a and miR-222) were also consistently validated in two-centric external validation sample sets. In addition, the knockdown of miR-29a could increase Insulin-induced gene 1 (Insig1) expression level and subsequently the level of Phosphoenolpyruvate Carboxy Kinase2 (PCK2) in HepG2 cell lines. CONCLUSIONS/SIGNIFICANCE: Serum miRNAs are differentially expressed between GDM women and controls and could be candidate biomarkers for predicting GDM. The utility of miR-29a, miR-222 and miR-132 as serum-based non-invasive biomarkers warrants further evaluation and optimization

    Real-time Monitoring for the Next Core-Collapse Supernova in JUNO

    Full text link
    Core-collapse supernova (CCSN) is one of the most energetic astrophysical events in the Universe. The early and prompt detection of neutrinos before (pre-SN) and during the SN burst is a unique opportunity to realize the multi-messenger observation of the CCSN events. In this work, we describe the monitoring concept and present the sensitivity of the system to the pre-SN and SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is a 20 kton liquid scintillator detector under construction in South China. The real-time monitoring system is designed with both the prompt monitors on the electronic board and online monitors at the data acquisition stage, in order to ensure both the alert speed and alert coverage of progenitor stars. By assuming a false alert rate of 1 per year, this monitoring system can be sensitive to the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos up to about 370 (360) kpc for a progenitor mass of 30M⊙M_{\odot} for the case of normal (inverted) mass ordering. The pointing ability of the CCSN is evaluated by using the accumulated event anisotropy of the inverse beta decay interactions from pre-SN or SN neutrinos, which, along with the early alert, can play important roles for the followup multi-messenger observations of the next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure

    ST-GIN: An Uncertainty Quantification Approach in Traffic Data Imputation with Spatio-temporal Graph Attention and Bidirectional Recurrent United Neural Networks

    Full text link
    Traffic data serves as a fundamental component in both research and applications within intelligent transportation systems. However, real-world transportation data, collected from loop detectors or similar sources, often contain missing values (MVs), which can adversely impact associated applications and research. Instead of discarding this incomplete data, researchers have sought to recover these missing values through numerical statistics, tensor decomposition, and deep learning techniques. In this paper, we propose an innovative deep-learning approach for imputing missing data. A graph attention architecture is employed to capture the spatial correlations present in traffic data, while a bidirectional neural network is utilized to learn temporal information. Experimental results indicate that our proposed method outperforms all other benchmark techniques, thus demonstrating its effectiveness.Comment: In submission to IEEE-ITSC 202
    • …
    corecore