53 research outputs found
Masked Pretraining for Multi-Agent Decision Making
Building a single generalist agent with zero-shot capability has recently
sparked significant advancements in decision-making. However, extending this
capability to multi-agent scenarios presents challenges. Most current works
struggle with zero-shot capabilities, due to two challenges particular to the
multi-agent settings: a mismatch between centralized pretraining and
decentralized execution, and varying agent numbers and action spaces, making it
difficult to create generalizable representations across diverse downstream
tasks. To overcome these challenges, we propose a \textbf{Mask}ed pretraining
framework for \textbf{M}ulti-\textbf{a}gent decision making (MaskMA). This
model, based on transformer architecture, employs a mask-based collaborative
learning strategy suited for decentralized execution with partial observation.
Moreover, MaskMA integrates a generalizable action representation by dividing
the action space into actions toward self-information and actions related to
other entities. This flexibility allows MaskMA to tackle tasks with varying
agent numbers and thus different action spaces. Extensive experiments in SMAC
reveal MaskMA, with a single model pretrained on 11 training maps, can achieve
an impressive 77.8% zero-shot win rate on 60 unseen test maps by decentralized
execution, while also performing effectively on other types of downstream tasks
(\textit{e.g.,} varied policies collaboration and ad hoc team play).Comment: 17 page
ProKD: An Unsupervised Prototypical Knowledge Distillation Network for Zero-Resource Cross-Lingual Named Entity Recognition
For named entity recognition (NER) in zero-resource languages, utilizing
knowledge distillation methods to transfer language-independent knowledge from
the rich-resource source languages to zero-resource languages is an effective
means. Typically, these approaches adopt a teacher-student architecture, where
the teacher network is trained in the source language, and the student network
seeks to learn knowledge from the teacher network and is expected to perform
well in the target language. Despite the impressive performance achieved by
these methods, we argue that they have two limitations. Firstly, the teacher
network fails to effectively learn language-independent knowledge shared across
languages due to the differences in the feature distribution between the source
and target languages. Secondly, the student network acquires all of its
knowledge from the teacher network and ignores the learning of target
language-specific knowledge. Undesirably, these limitations would hinder the
model's performance in the target language. This paper proposes an unsupervised
prototype knowledge distillation network (ProKD) to address these issues.
Specifically, ProKD presents a contrastive learning-based prototype alignment
method to achieve class feature alignment by adjusting the distance among
prototypes in the source and target languages, boosting the teacher network's
capacity to acquire language-independent knowledge. In addition, ProKD
introduces a prototypical self-training method to learn the intrinsic structure
of the language by retraining the student network on the target data using
samples' distance information from prototypes, thereby enhancing the student
network's ability to acquire language-specific knowledge. Extensive experiments
on three benchmark cross-lingual NER datasets demonstrate the effectiveness of
our approach.Comment: AAAI 202
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Offline-to-online Reinforcement Learning (O2O RL) aims to improve the
performance of offline pretrained policy using only a few online samples. Built
on offline RL algorithms, most O2O methods focus on the balance between RL
objective and pessimism, or the utilization of offline and online samples. In
this paper, from a novel perspective, we systematically study the challenges
that remain in O2O RL and identify that the reason behind the slow improvement
of the performance and the instability of online finetuning lies in the
inaccurate Q-value estimation inherited from offline pretraining. Specifically,
we demonstrate that the estimation bias and the inaccurate rank of Q-value
cause a misleading signal for the policy update, making the standard offline RL
algorithms, such as CQL and TD3-BC, ineffective in the online finetuning. Based
on this observation, we address the problem of Q-value estimation by two
techniques: (1) perturbed value update and (2) increased frequency of Q-value
updates. The first technique smooths out biased Q-value estimation with sharp
peaks, preventing early-stage policy exploitation of sub-optimal actions. The
second one alleviates the estimation bias inherited from offline pretraining by
accelerating learning. Extensive experiments on the MuJoco and Adroit
environments demonstrate that the proposed method, named SO2, significantly
alleviates Q-value estimation issues, and consistently improves the performance
against the state-of-the-art methods by up to 83.1%.Comment: Accepted at AAAI 202
Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning
Model-based reinforcement learning (RL) has demonstrated remarkable successes
on a range of continuous control tasks due to its high sample efficiency. To
save the computation cost of conducting planning online, recent practices tend
to distill optimized action sequences into an RL policy during the training
phase. Although the distillation can incorporate both the foresight of planning
and the exploration ability of RL policies, the theoretical understanding of
these methods is yet unclear. In this paper, we extend the policy improvement
step of Soft Actor-Critic (SAC) by developing an approach to distill from
model-based planning to the policy. We then demonstrate that such an approach
of policy improvement has a theoretical guarantee of monotonic improvement and
convergence to the maximum value defined in SAC. We discuss effective design
choices and implement our theory as a practical algorithm -- Model-based
Planning Distilled to Policy (MPDP) -- that updates the policy jointly over
multiple future time steps. Extensive experiments show that MPDP achieves
better sample efficiency and asymptotic performance than both model-free and
model-based planning algorithms on six continuous control benchmark tasks in
MuJoCo
Homozygous p.Ser267Phe in SLC10A1 is associated with a new type of hypercholanemia and implications for personalized medicine.
SLC10A1 codes for the sodium-taurocholate cotransporting polypeptide (NTCP), which is a hepatocellular transporter for bile acids (BAs) and the receptor for hepatitis B and D viruses. NTCP is also a target of multiple drugs. We aimed to evaluate the medical consequences of the loss of function mutation p.Ser267Phe in SLC10A1. We identified eight individuals with homozygous p.Ser267Phe mutation in SLC10A1 and followed up for 8-90 months. We compared their total serum BAs and 6 species of BAs with 170 wild-type and 107 heterozygous healthy individuals. We performed in-depth medical examinations and exome sequencing in the homozygous individuals. All homozygous individuals had persistent hypercholanemia (P = 5.8 × 10-29). Exome sequencing excluded the involvement of other BA metabolism-associated genes in the hypercholanemia. Although asymptomatic, all individuals had low vitamin D levels. Of six adults that were subjected to bone mineral density analysis, three presented with osteoporosis/osteopenia. Sex hormones and blood lipids were deviated in all subjects. Homozygosity of p.Ser267Phe in SLC10A1 is associated with asymptomatic hypercholanemia. Individuals with homozygous p.Ser267Phe in SLC10A1 are prone to vitamin D deficiency, deviated sex hormones and blood lipids. Surveillance of these parameters may also be needed in patients treated with drugs targeting NTCP.This project is supported by the National Natural Science Foundation of China (31471193, 81570539, 81370535, 91331204 and 31525014). S.X. acknowledges financial support from the Strategic Priority Research Program (XDB13040100) and Key Research Program of Frontier Sciences (QYZDJ-SSW-SYS009) of the Chinese Academy of Sciences (CAS)
Association between methionine sulfoxide and risk of moyamoya disease
ObjectiveMethionine sulfoxide (MetO) has been identified as a risk factor for vascular diseases and was considered as an important indicator of oxidative stress. However, the effects of MetO and its association with moyamoya disease (MMD) remained unclear. Therefore, we performed this study to evaluate the association between serum MetO levels and the risk of MMD and its subtypes.MethodsWe eventually included consecutive 353 MMD patients and 88 healthy controls (HCs) with complete data from September 2020 to December 2021 in our analyzes. Serum levels of MetO were quantified using liquid chromatography-mass spectrometry (LC–MS) analysis. We evaluated the role of MetO in MMD using logistic regression models and confirmed by receiver-operating characteristic (ROC) curves and area under curve (AUC) values.ResultsWe found that the levels of MetO were significantly higher in MMD and its subtypes than in HCs (p < 0.001 for all). After adjusting for traditional risk factors, serum MetO levels were significantly associated with the risk of MMD and its subtypes (p < 0.001 for all). We further divided the MetO levels into low and high groups, and the high MetO level was significantly associated with the risk of MMD and its subtypes (p < 0.05 for all). When MetO levels were assessed as quartiles, we found that the third (Q3) and fourth (Q4) MetO quartiles had a significantly increased risk of MMD compared with the lowest quartile (Q3, OR: 2.323, 95%CI: 1.088–4.959, p = 0.029; Q4, OR: 5.559, 95%CI: 2.088–14.805, p = 0.001).ConclusionIn this study, we found that a high level of serum MetO was associated with an increased risk of MMD and its subtypes. Our study raised a novel perspective on the pathogenesis of MMD and suggested potential therapeutic targets
Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial
Background: Previous cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.
Methods: We conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.
Results: Forty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference − 0.40 [95% CI − 0.71 to − 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference − 1.6% [95% CI − 4.3% to 1.2%]; P = 0.42) between groups.
Conclusions: In this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness. Trial registration: ISRCTN, ISRCTN12233792. Registered November 20th, 2017
Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial.
BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017
Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial (vol 26, 46, 2022)
BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017
Microstructure, texture evolution and mechanical properties of a large-scale multidirectionally forged Mg-Gd-Y-Zn-Zr-Ag alloy
A large-scale Mg-6.2Gd-3.7Y-0.9Zn-0.3Zr-0.3Ag (wt.%) magnesium component with the dimension of 480 × 250 × 160 mm3 was fabricated via direct-chill (DC) casting, homogenization and multidirectional forging (MDF). The evolution of the microstructure, texture and uniaxial tensile properties during MDF process were comprehensively investigated. 2.5%, 5.6% and 10.9% anisotropy were obtained in the MDF alloy subjected to 9, 18 and 27 passes, respectively. The MDF alloy subjected to 27 passes tensile along FD exhibits superior comprehensive mechanical properties, with a yield strength (TYS) of 292 MPa, ultimate tensile strength (UTS) of 384 MPa and elongation of 9.0%. Interdendritic Mg5(Gd, Y, Zn) phases dissolved after homogenization, with the precipitation of intragranular lamellar 14H long period stacking ordered (LPSO) phases from α-Mg matrix. During the MDF process, intragranular lamellar LPSO phases were initially kinked, suppressing dynamic recrystallization (DRX) behavior in the first 9 passes, and subsequently partially dissolved. Dynamic precipitation of Mg5(Gd, Y) and ultrafine LPSO phase were also induced by MDF. With the cumulative deformation of MDF, increased volume fraction of Mg5(Gd, Y) phases, enhanced texture and refined α-Mg grains are likely responsible for the improved mechanical properties via MDF. We also found that prismatic slip is activated in the deformed grains with the c-axis around FD and multiple slip is activated in the other deformed grains during the MDF process. This paper provides a superior MDF processing route to manufacture high-performance large-scale Mg-Gd-Y-Zn-Zr-Ag components for industrial production
- …