53 research outputs found

    Masked Pretraining for Multi-Agent Decision Making

    Full text link
    Building a single generalist agent with zero-shot capability has recently sparked significant advancements in decision-making. However, extending this capability to multi-agent scenarios presents challenges. Most current works struggle with zero-shot capabilities, due to two challenges particular to the multi-agent settings: a mismatch between centralized pretraining and decentralized execution, and varying agent numbers and action spaces, making it difficult to create generalizable representations across diverse downstream tasks. To overcome these challenges, we propose a \textbf{Mask}ed pretraining framework for \textbf{M}ulti-\textbf{a}gent decision making (MaskMA). This model, based on transformer architecture, employs a mask-based collaborative learning strategy suited for decentralized execution with partial observation. Moreover, MaskMA integrates a generalizable action representation by dividing the action space into actions toward self-information and actions related to other entities. This flexibility allows MaskMA to tackle tasks with varying agent numbers and thus different action spaces. Extensive experiments in SMAC reveal MaskMA, with a single model pretrained on 11 training maps, can achieve an impressive 77.8% zero-shot win rate on 60 unseen test maps by decentralized execution, while also performing effectively on other types of downstream tasks (\textit{e.g.,} varied policies collaboration and ad hoc team play).Comment: 17 page

    ProKD: An Unsupervised Prototypical Knowledge Distillation Network for Zero-Resource Cross-Lingual Named Entity Recognition

    Full text link
    For named entity recognition (NER) in zero-resource languages, utilizing knowledge distillation methods to transfer language-independent knowledge from the rich-resource source languages to zero-resource languages is an effective means. Typically, these approaches adopt a teacher-student architecture, where the teacher network is trained in the source language, and the student network seeks to learn knowledge from the teacher network and is expected to perform well in the target language. Despite the impressive performance achieved by these methods, we argue that they have two limitations. Firstly, the teacher network fails to effectively learn language-independent knowledge shared across languages due to the differences in the feature distribution between the source and target languages. Secondly, the student network acquires all of its knowledge from the teacher network and ignores the learning of target language-specific knowledge. Undesirably, these limitations would hinder the model's performance in the target language. This paper proposes an unsupervised prototype knowledge distillation network (ProKD) to address these issues. Specifically, ProKD presents a contrastive learning-based prototype alignment method to achieve class feature alignment by adjusting the distance among prototypes in the source and target languages, boosting the teacher network's capacity to acquire language-independent knowledge. In addition, ProKD introduces a prototypical self-training method to learn the intrinsic structure of the language by retraining the student network on the target data using samples' distance information from prototypes, thereby enhancing the student network's ability to acquire language-specific knowledge. Extensive experiments on three benchmark cross-lingual NER datasets demonstrate the effectiveness of our approach.Comment: AAAI 202

    A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning

    Full text link
    Offline-to-online Reinforcement Learning (O2O RL) aims to improve the performance of offline pretrained policy using only a few online samples. Built on offline RL algorithms, most O2O methods focus on the balance between RL objective and pessimism, or the utilization of offline and online samples. In this paper, from a novel perspective, we systematically study the challenges that remain in O2O RL and identify that the reason behind the slow improvement of the performance and the instability of online finetuning lies in the inaccurate Q-value estimation inherited from offline pretraining. Specifically, we demonstrate that the estimation bias and the inaccurate rank of Q-value cause a misleading signal for the policy update, making the standard offline RL algorithms, such as CQL and TD3-BC, ineffective in the online finetuning. Based on this observation, we address the problem of Q-value estimation by two techniques: (1) perturbed value update and (2) increased frequency of Q-value updates. The first technique smooths out biased Q-value estimation with sharp peaks, preventing early-stage policy exploitation of sub-optimal actions. The second one alleviates the estimation bias inherited from offline pretraining by accelerating learning. Extensive experiments on the MuJoco and Adroit environments demonstrate that the proposed method, named SO2, significantly alleviates Q-value estimation issues, and consistently improves the performance against the state-of-the-art methods by up to 83.1%.Comment: Accepted at AAAI 202

    Theoretically Guaranteed Policy Improvement Distilled from Model-Based Planning

    Full text link
    Model-based reinforcement learning (RL) has demonstrated remarkable successes on a range of continuous control tasks due to its high sample efficiency. To save the computation cost of conducting planning online, recent practices tend to distill optimized action sequences into an RL policy during the training phase. Although the distillation can incorporate both the foresight of planning and the exploration ability of RL policies, the theoretical understanding of these methods is yet unclear. In this paper, we extend the policy improvement step of Soft Actor-Critic (SAC) by developing an approach to distill from model-based planning to the policy. We then demonstrate that such an approach of policy improvement has a theoretical guarantee of monotonic improvement and convergence to the maximum value defined in SAC. We discuss effective design choices and implement our theory as a practical algorithm -- Model-based Planning Distilled to Policy (MPDP) -- that updates the policy jointly over multiple future time steps. Extensive experiments show that MPDP achieves better sample efficiency and asymptotic performance than both model-free and model-based planning algorithms on six continuous control benchmark tasks in MuJoCo

    Homozygous p.Ser267Phe in SLC10A1 is associated with a new type of hypercholanemia and implications for personalized medicine.

    Get PDF
    SLC10A1 codes for the sodium-taurocholate cotransporting polypeptide (NTCP), which is a hepatocellular transporter for bile acids (BAs) and the receptor for hepatitis B and D viruses. NTCP is also a target of multiple drugs. We aimed to evaluate the medical consequences of the loss of function mutation p.Ser267Phe in SLC10A1. We identified eight individuals with homozygous p.Ser267Phe mutation in SLC10A1 and followed up for 8-90 months. We compared their total serum BAs and 6 species of BAs with 170 wild-type and 107 heterozygous healthy individuals. We performed in-depth medical examinations and exome sequencing in the homozygous individuals. All homozygous individuals had persistent hypercholanemia (P = 5.8 × 10-29). Exome sequencing excluded the involvement of other BA metabolism-associated genes in the hypercholanemia. Although asymptomatic, all individuals had low vitamin D levels. Of six adults that were subjected to bone mineral density analysis, three presented with osteoporosis/osteopenia. Sex hormones and blood lipids were deviated in all subjects. Homozygosity of p.Ser267Phe in SLC10A1 is associated with asymptomatic hypercholanemia. Individuals with homozygous p.Ser267Phe in SLC10A1 are prone to vitamin D deficiency, deviated sex hormones and blood lipids. Surveillance of these parameters may also be needed in patients treated with drugs targeting NTCP.This project is supported by the National Natural Science Foundation of China (31471193, 81570539, 81370535, 91331204 and 31525014). S.X. acknowledges financial support from the Strategic Priority Research Program (XDB13040100) and Key Research Program of Frontier Sciences (QYZDJ-SSW-SYS009) of the Chinese Academy of Sciences (CAS)

    Association between methionine sulfoxide and risk of moyamoya disease

    Get PDF
    ObjectiveMethionine sulfoxide (MetO) has been identified as a risk factor for vascular diseases and was considered as an important indicator of oxidative stress. However, the effects of MetO and its association with moyamoya disease (MMD) remained unclear. Therefore, we performed this study to evaluate the association between serum MetO levels and the risk of MMD and its subtypes.MethodsWe eventually included consecutive 353 MMD patients and 88 healthy controls (HCs) with complete data from September 2020 to December 2021 in our analyzes. Serum levels of MetO were quantified using liquid chromatography-mass spectrometry (LC–MS) analysis. We evaluated the role of MetO in MMD using logistic regression models and confirmed by receiver-operating characteristic (ROC) curves and area under curve (AUC) values.ResultsWe found that the levels of MetO were significantly higher in MMD and its subtypes than in HCs (p < 0.001 for all). After adjusting for traditional risk factors, serum MetO levels were significantly associated with the risk of MMD and its subtypes (p < 0.001 for all). We further divided the MetO levels into low and high groups, and the high MetO level was significantly associated with the risk of MMD and its subtypes (p < 0.05 for all). When MetO levels were assessed as quartiles, we found that the third (Q3) and fourth (Q4) MetO quartiles had a significantly increased risk of MMD compared with the lowest quartile (Q3, OR: 2.323, 95%CI: 1.088–4.959, p = 0.029; Q4, OR: 5.559, 95%CI: 2.088–14.805, p = 0.001).ConclusionIn this study, we found that a high level of serum MetO was associated with an increased risk of MMD and its subtypes. Our study raised a novel perspective on the pathogenesis of MMD and suggested potential therapeutic targets

    Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial

    Get PDF
    Background: Previous cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes. Methods: We conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment. Results: Forty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference − 0.40 [95% CI − 0.71 to − 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference − 1.6% [95% CI − 4.3% to 1.2%]; P = 0.42) between groups. Conclusions: In this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness. Trial registration: ISRCTN, ISRCTN12233792. Registered November 20th, 2017

    Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial.

    Get PDF
    BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017

    Actively implementing an evidence-based feeding guideline for critically ill patients (NEED): a multicenter, cluster-randomized, controlled trial (vol 26, 46, 2022)

    Get PDF
    BackgroundPrevious cluster-randomized controlled trials evaluating the impact of implementing evidence-based guidelines for nutrition therapy in critical illness do not consistently demonstrate patient benefits. A large-scale, sufficiently powered study is therefore warranted to ascertain the effects of guideline implementation on patient-centered outcomes.MethodsWe conducted a multicenter, cluster-randomized, parallel-controlled trial in intensive care units (ICUs) across China. We developed an evidence-based feeding guideline. ICUs randomly allocated to the guideline group formed a local "intervention team", which actively implemented the guideline using standardized educational materials, a graphical feeding protocol, and live online education outreach meetings conducted by members of the study management committee. ICUs assigned to the control group remained unaware of the guideline content. All ICUs enrolled patients who were expected to stay in the ICU longer than seven days. The primary outcome was all-cause mortality within 28 days of enrollment.ResultsForty-eight ICUs were randomized to the guideline group and 49 to the control group. From March 2018 to July 2019, the guideline ICUs enrolled 1399 patients, and the control ICUs enrolled 1373 patients. Implementation of the guideline resulted in significantly earlier EN initiation (1.20 vs. 1.55 mean days to initiation of EN; difference - 0.40 [95% CI - 0.71 to - 0.09]; P = 0.01) and delayed PN initiation (1.29 vs. 0.80 mean days to start of PN; difference 1.06 [95% CI 0.44 to 1.67]; P = 0.001). There was no significant difference in 28-day mortality (14.2% vs. 15.2%; difference - 1.6% [95% CI - 4.3% to 1.2%]; P = 0.42) between groups.ConclusionsIn this large-scale, multicenter trial, active implementation of an evidence-based feeding guideline reduced the time to commencement of EN and overall PN use but did not translate to a reduction in mortality from critical illness.Trial registrationISRCTN, ISRCTN12233792 . Registered November 20th, 2017

    Microstructure, texture evolution and mechanical properties of a large-scale multidirectionally forged Mg-Gd-Y-Zn-Zr-Ag alloy

    No full text
    A large-scale Mg-6.2Gd-3.7Y-0.9Zn-0.3Zr-0.3Ag (wt.%) magnesium component with the dimension of 480 × 250 × 160 mm3 was fabricated via direct-chill (DC) casting, homogenization and multidirectional forging (MDF). The evolution of the microstructure, texture and uniaxial tensile properties during MDF process were comprehensively investigated. 2.5%, 5.6% and 10.9% anisotropy were obtained in the MDF alloy subjected to 9, 18 and 27 passes, respectively. The MDF alloy subjected to 27 passes tensile along FD exhibits superior comprehensive mechanical properties, with a yield strength (TYS) of 292 MPa, ultimate tensile strength (UTS) of 384 MPa and elongation of 9.0%. Interdendritic Mg5(Gd, Y, Zn) phases dissolved after homogenization, with the precipitation of intragranular lamellar 14H long period stacking ordered (LPSO) phases from α-Mg matrix. During the MDF process, intragranular lamellar LPSO phases were initially kinked, suppressing dynamic recrystallization (DRX) behavior in the first 9 passes, and subsequently partially dissolved. Dynamic precipitation of Mg5(Gd, Y) and ultrafine LPSO phase were also induced by MDF. With the cumulative deformation of MDF, increased volume fraction of Mg5(Gd, Y) phases, enhanced texture and refined α-Mg grains are likely responsible for the improved mechanical properties via MDF. We also found that prismatic slip is activated in the deformed grains with the c-axis around FD and multiple slip is activated in the other deformed grains during the MDF process. This paper provides a superior MDF processing route to manufacture high-performance large-scale Mg-Gd-Y-Zn-Zr-Ag components for industrial production
    • …
    corecore