68 research outputs found
Decouple knowledge from paramters for plug-and-play language modeling
Pre-trained language models(PLM) have made impressive results in various NLP
tasks. It has been revealed that one of the key factors to their success is the
parameters of these models implicitly learn all kinds of knowledge during
pre-training. However, encoding knowledge implicitly in the model parameters
has two fundamental drawbacks. First, the knowledge is neither editable nor
scalable once the model is trained, which is especially problematic in that
knowledge is consistently evolving. Second, it lacks interpretability and
prevents humans from understanding which knowledge PLM requires for a certain
problem. In this paper, we introduce PlugLM, a pre-training model with
differentiable plug-in memory(DPM). The key intuition is to decouple the
knowledge storage from model parameters with an editable and scalable key-value
memory and leverage knowledge in an explainable manner by knowledge retrieval
in the DPM. To justify this design choice, we conduct evaluations in three
settings including: (1) domain adaptation. PlugLM obtains 3.95 F1 improvements
across four domains on average without any in-domain pre-training. (2)
knowledge update. PlugLM could absorb new knowledge in a training-free way
after pre-training is done. (3) in-task knowledge learning. PlugLM could be
further improved by incorporating training samples into DPM with knowledge
prompting.Comment: ACL2023 Finding
Plug-and-Play Document Modules for Pre-trained Models
Large-scale pre-trained models (PTMs) have been widely used in
document-oriented NLP tasks, such as question answering. However, the
encoding-task coupling requirement results in the repeated encoding of the same
documents for different tasks and queries, which is highly computationally
inefficient. To this end, we target to decouple document encoding from
downstream tasks, and propose to represent each document as a plug-and-play
document module, i.e., a document plugin, for PTMs (PlugD). By inserting
document plugins into the backbone PTM for downstream tasks, we can encode a
document one time to handle multiple tasks, which is more efficient than
conventional encoding-task coupling methods that simultaneously encode
documents and input queries using task-specific encoders. Extensive experiments
on 8 datasets of 4 typical NLP tasks show that PlugD enables models to encode
documents once and for all across different scenarios. Especially, PlugD can
save computational costs while achieving comparable performance to
state-of-the-art encoding-task coupling methods. Additionally, we show that
PlugD can serve as an effective post-processing way to inject knowledge into
task-specific models, improving model performance without any additional model
training.Comment: Accepted by ACL 202
A Survey on Large Language Model based Autonomous Agents
Autonomous agents have long been a prominent research topic in the academic
community. Previous research in this field often focuses on training agents
with limited knowledge within isolated environments, which diverges
significantly from the human learning processes, and thus makes the agents hard
to achieve human-like decisions. Recently, through the acquisition of vast
amounts of web knowledge, large language models (LLMs) have demonstrated
remarkable potential in achieving human-level intelligence. This has sparked an
upsurge in studies investigating autonomous agents based on LLMs. To harness
the full potential of LLMs, researchers have devised diverse agent
architectures tailored to different applications. In this paper, we present a
comprehensive survey of these studies, delivering a systematic review of the
field of autonomous agents from a holistic perspective. More specifically, our
focus lies in the construction of LLM-based agents, for which we propose a
unified framework that encompasses a majority of the previous work.
Additionally, we provide a summary of the various applications of LLM-based AI
agents in the domains of social science, natural science, and engineering.
Lastly, we discuss the commonly employed evaluation strategies for LLM-based AI
agents. Based on the previous studies, we also present several challenges and
future directions in this field. To keep track of this field and continuously
update our survey, we maintain a repository for the related references at
https://github.com/Paitesanshi/LLM-Agent-Survey.Comment: 32 pages, 3 figure
When Large Language Model based Agent Meets User Behavior Analysis: A Novel User Simulation Paradigm
User behavior analysis is crucial in human-centered AI applications. In this
field, the collection of sufficient and high-quality user behavior data has
always been a fundamental yet challenging problem. An intuitive idea to address
this problem is automatically simulating the user behaviors. However, due to
the subjective and complex nature of human cognitive processes, reliably
simulating the user behavior is difficult. Recently, large language models
(LLM) have obtained remarkable successes, showing great potential to achieve
human-like intelligence. We argue that these models present significant
opportunities for reliable user simulation, and have the potential to
revolutionize traditional study paradigms in user behavior analysis. In this
paper, we take recommender system as an example to explore the potential of
using LLM for user simulation. Specifically, we regard each user as an
LLM-based autonomous agent, and let different agents freely communicate, behave
and evolve in a virtual simulator called RecAgent. For comprehensively
simulation, we not only consider the behaviors within the recommender system
(\emph{e.g.}, item browsing and clicking), but also accounts for external
influential factors, such as, friend chatting and social advertisement. Our
simulator contains at most 1000 agents, and each agent is composed of a
profiling module, a memory module and an action module, enabling it to behave
consistently, reasonably and reliably. In addition, to more flexibly operate
our simulator, we also design two global functions including real-human playing
and system intervention. To evaluate the effectiveness of our simulator, we
conduct extensive experiments from both agent and system perspectives. In order
to advance this direction, we have released our project at
{https://github.com/RUC-GSAI/YuLan-Rec}.Comment: 26 pages, 9 figure
Mechanism of crocin I on ANIT-induced intrahepatic cholestasis by combined metabolomics and transcriptomics
Background: Intrahepatic cholestasis (IC) is a disorder of bile production, secretion, and excretion with various causes. Crocin I (CR) is effective in the treatment of IC, but its underlying mechanisms need to be further explored. We aimed to reveal the therapeutic mechanism of crocin I for IC by combining an integrated strategy of metabolomics and transcriptomics.Methods: The hepatoprotective effect of CR against cholestasis liver injury induced by α-naphthylisothiocyanate (ANIT) was evaluated in rats. The serum biochemical indices, including alanine aminotransferase (ALT), aspartate aminotransferase (AST), total bile acid (TBA), total bilirubin (TBIL), direct bilirubin (DBIL), tumor necrosis factor-α (TNF-α), interleukin 6 (IL-6), and interleukin 1β (IL-1β), as well as the liver oxidative stress indexes and the pathological characteristics of the liver were analyzed. In addition, we also performed a serum metabolomics study using UPLC-Q Exactive HF-X technology to investigate the effect of CR on the serum of rats with ANIT-induced IC and screened potential biomarkers. The enrichment analysis of differential expressed genes (DEGs) was performed by transcriptomics. Finally, the regulatory targets of CR on potential biomarkers were obtained by combined analysis, and the relevant key targets were verified by western blotting.Results: CR improved serum and liver homogenate indexes and alleviated liver histological injury. Compared with ANIT group, the CR group had 76 differential metabolites, and 10 metabolic pathways were enriched. There were 473 DEGs significantly changed after CR treatment, most of which were enriched in the retinol metabolism, calcium signaling pathway, PPAR signaling pathway, circadian rhythm, chemokine signaling pathway, arachidonic acid metabolism, bile secretion, primary bile acid biosynthesis, and other pathways. By constructing the “compound-reaction-enzyme-gene” interaction network, three potential key-target regulation biomarkers were obtained, including 3-hydroxy-3-methylglutaryl-coenzyme A reductase (HMGCR), ATP-binding cassette transporter G5 (ABCG5), and sulfotransferase2A1(SULT2A1), which were further verified by western blotting. Compared with the ANIT group, the CR group significantly increased the expression of ABCG5 and SULT2A1, and the expression of HMGCR significantly decreased.Conclusion: Combined metabolomic and transcriptomic analyses show that CR has a therapeutic effect on IC through regulation of the biosynthesis of bile acids and bilirubin in the bile secretion pathway and regulation of the expression of HMGCR, ABCG5, and SULT2A1
Early Second-Trimester Serum MiRNA Profiling Predicts Gestational Diabetes Mellitus
BACKGROUND: Gestational diabetes mellitus (GDM) is one type of diabetes that presents during pregnancy and significantly increases the risk of a number of adverse consequences for the fetus and mother. The microRNAs (miRNA) have recently been demonstrated to abundantly and stably exist in serum and to be potentially disease-specific. However, no reported study investigates the associations between serum miRNA and GDM. METHODOLOGY/PRINCIPAL FINDINGS: We systematically used the TaqMan Low Density Array followed by individual quantitative reverse transcription polymerase chain reaction assays to screen miRNAs in serum collected at 16-19 gestational weeks. The expression levels of three miRNAs (miR-132, miR-29a and miR-222) were significantly decreased in GDM women with respect to the controls in similar gestational weeks in our discovery evaluation and internal validation, and two miRNAs (miR-29a and miR-222) were also consistently validated in two-centric external validation sample sets. In addition, the knockdown of miR-29a could increase Insulin-induced gene 1 (Insig1) expression level and subsequently the level of Phosphoenolpyruvate Carboxy Kinase2 (PCK2) in HepG2 cell lines. CONCLUSIONS/SIGNIFICANCE: Serum miRNAs are differentially expressed between GDM women and controls and could be candidate biomarkers for predicting GDM. The utility of miR-29a, miR-222 and miR-132 as serum-based non-invasive biomarkers warrants further evaluation and optimization
Real-time Monitoring for the Next Core-Collapse Supernova in JUNO
Core-collapse supernova (CCSN) is one of the most energetic astrophysical
events in the Universe. The early and prompt detection of neutrinos before
(pre-SN) and during the SN burst is a unique opportunity to realize the
multi-messenger observation of the CCSN events. In this work, we describe the
monitoring concept and present the sensitivity of the system to the pre-SN and
SN neutrinos at the Jiangmen Underground Neutrino Observatory (JUNO), which is
a 20 kton liquid scintillator detector under construction in South China. The
real-time monitoring system is designed with both the prompt monitors on the
electronic board and online monitors at the data acquisition stage, in order to
ensure both the alert speed and alert coverage of progenitor stars. By assuming
a false alert rate of 1 per year, this monitoring system can be sensitive to
the pre-SN neutrinos up to the distance of about 1.6 (0.9) kpc and SN neutrinos
up to about 370 (360) kpc for a progenitor mass of 30 for the case
of normal (inverted) mass ordering. The pointing ability of the CCSN is
evaluated by using the accumulated event anisotropy of the inverse beta decay
interactions from pre-SN or SN neutrinos, which, along with the early alert,
can play important roles for the followup multi-messenger observations of the
next Galactic or nearby extragalactic CCSN.Comment: 24 pages, 9 figure
ST-GIN: An Uncertainty Quantification Approach in Traffic Data Imputation with Spatio-temporal Graph Attention and Bidirectional Recurrent United Neural Networks
Traffic data serves as a fundamental component in both research and
applications within intelligent transportation systems. However, real-world
transportation data, collected from loop detectors or similar sources, often
contain missing values (MVs), which can adversely impact associated
applications and research. Instead of discarding this incomplete data,
researchers have sought to recover these missing values through numerical
statistics, tensor decomposition, and deep learning techniques. In this paper,
we propose an innovative deep-learning approach for imputing missing data. A
graph attention architecture is employed to capture the spatial correlations
present in traffic data, while a bidirectional neural network is utilized to
learn temporal information. Experimental results indicate that our proposed
method outperforms all other benchmark techniques, thus demonstrating its
effectiveness.Comment: In submission to IEEE-ITSC 202
- …