73 research outputs found

    Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

    Full text link
    Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.Comment: Accepted by the conference of International Joint Conference on Neural Networks (IJCNN) 202

    Learning Multi-Agent Action Coordination via Electing First-Move Agent

    No full text
    Learning to coordinate actions among agents is essential in complicated multi-agent systems. Prior works are constrained mainly by the assumption that all agents act simultaneously, and asynchronous action coordination between agents is rarely considered. This paper introduces a bi-level multi-agent decision hierarchy for coordinated behavior planning. We propose a novel election mechanism in which we adopt a graph convolutional network to model the interaction among agents and elect a first-move agent for asynchronous guidance. We also propose a dynamically weighted mixing network to effectively reduce the misestimation of the value function during training. This work is the first to explicitly model the asynchronous multi-agent action coordination, and this explicitness enables to choose the optimal first-move agent. The results on Cooperative Navigation and Google Football demonstrate that the proposed algorithm can achieve superior performance in cooperative environments. Our code is available at https://github.com/Amanda-1997/EFA-DWM

    Mixture of personality improved Spiking actor network for efficient multi-agent cooperation

    Full text link
    Adaptive human-agent and agent-agent cooperation are becoming more and more critical in the research area of multi-agent reinforcement learning (MARL), where remarked progress has been made with the help of deep neural networks. However, many established algorithms can only perform well during the learning paradigm but exhibit poor generalization during cooperation with other unseen partners. The personality theory in cognitive psychology describes that humans can well handle the above cooperation challenge by predicting others' personalities first and then their complex actions. Inspired by this two-step psychology theory, we propose a biologically plausible mixture of personality (MoP) improved spiking actor network (SAN), whereby a determinantal point process is used to simulate the complex formation and integration of different types of personality in MoP, and dynamic and spiking neurons are incorporated into the SAN for the efficient reinforcement learning. The benchmark Overcooked task, containing a strong requirement for cooperative cooking, is selected to test the proposed MoP-SAN. The experimental results show that the MoP-SAN can achieve both high performances during not only the learning paradigm but also the generalization test (i.e., cooperation with other unseen agents) paradigm where most counterpart deep actor networks failed. Necessary ablation experiments and visualization analyses were conducted to explain why MoP and SAN are effective in multi-agent reinforcement learning scenarios while DNN performs poorly in the generalization test.Comment: 20 pages, 7 figure

    Effects of Si-Miao-Yong-An decoction on myocardial I/R rats by regulating gut microbiota to inhibit LPS-induced TLR4/NF-κB signaling pathway

    No full text
    Abstract Background Coronary Artery Disease (CAD) is primarily caused by inflammation which is closely linked to the gut microbiota. Si-Miao-Yong-An (SMYA) decoction is a traditional Chinese herbal formula with anti-inflammatory properties that found to be effective against CAD. However, it is still unclear whether SMYA can modulate gut microbiota and whether it contributes to the improvement of CAD by reducing inflammation and regulating the gut microbiota. Methods The identification of components in the SMYA extract was conducted using the HPLC method. A total of four groups of SD rats were orally administered with SMYA for 28 days. The levels of inflammatory biomarkers and myocardial damage biomarkers were measured through ELISA, while echocardiography was used to assess heart function. Histological alterations in the myocardial and colonic tissues were examined following H&E staining. Western blotting was performed to evaluate protein expression, whereas alterations in gut microbiota were determined by 16 s rDNA sequencing. Results SMYA was found to enhance cardiac function and decrease the expression of serum CK-MB and LDH. SMYA was also observed to inhibit the TLR4/NF-κB signaling pathway by downregulating the protein expression of myocardial TLR4, MyD88, and p-P65, leading to a reduction in serum pro-inflammatory factors. SMYA modified the composition of gut microbiota by decreasing the Firmicutes/Bacteroidetes ratio, modulating Prevotellaceae_Ga6A1 and Prevotellaceae_NK3B3 linked to the LPS/TLR4/NF-κB pathway, and increasing beneficial microbiota such as Bacteroidetes, Alloprevotella, and other bacterial species. Moreover, SMYA was found to safeguard the intestinal mucosal and villi structures, elevate the expression of tight junction protein (ZO-1, occludin), and reduce intestinal permeability and inflammation. Conclusions The results indicate that SMYA has the potential to modulate the gut microbiota and protect the intestinal barrier, thereby reducing the translocation of LPS into circulation. SMYA was also found to inhibit the LPS-induced TLR4/NF-κB signaling pathway, leading to a decrease in the release of inflammatory factors, which ultimately mitigated myocardial injury. Hence, SMYA holds promise as a therapeutic agent for the management of CAD

    Effect of the Cooling Rate of Thermal Simulation on the Microstructure and Mechanical Properties of Low-Carbon Bainite Steel by Laser-Arc Hybrid Welding

    No full text
    A new kind of low-carbon bainite steel with excellent strength and toughness was developed, serving as the bogie of the next-generation high-speed train. However, the softening of the heat-affected zone (HAZ) in laser-arc hybrid welding (LAHW) needs to be overcome. In this study, the effect of the cooling rate of the LAHW process on the microstructure and mechanical properties in the HAZ was explored via thermal simulation. The results showed that with increased cooling rate, the grain size increased, the content of lath martensite decreased, and the lath bainite gradually changed to a granular shape in the thermal simulation specimen. With the decrease in the cooling rate, i.e., with the increase of t8/5, the strength–toughness matching of the material showed a downward trend. The thermal simulation specimen with a t8/5 of 6~8 s had higher strength and good toughness, which can be considered a potential welding parameter reference. The content of martensitic austenite (M-A) constituents was the main factor that determined the strength and toughness of the joint. During the tensile test, the axial force caused the material to tighten, and the transverse stress as obvious in the part of the M-A constituents that are prone to microcracks and many defects, resulting in cracks, paths, and multi-component layers in the center. As a result, the thermal cycle specimens had mixed fracture characteristics

    Simulation and Experimental Study on Jet Velocity of Zr-Based Amorphous Alloy Liner

    No full text
    Zr-based amorphous alloy is a new energetic material that has been closely monitored and extensively studied for the design of highly effective shaped charge warheads in recent years. In order to accurately determine the motion parameters of shaped charge jets during the detonation-driven formation process of Zr-based amorphous alloy liners, we prepared conical ZrCuNiAlAg liners by vacuum die casting and supercooled liquid high-rheological-rate formation processes. Based on jet-formation numerical simulation, pulsed X-ray imaging and copper foil target velocity measuring tests were conducted to identify the variation trend of the jet velocity of Zr-based amorphous alloy liners with time. The jet velocities at typical moments in the free flight stage were verified. The research results showed that Zr-based amorphous alloy liners could produce solid jets, whose velocity was in gradient descent from the head to the tail, and that the jet’s head velocity peaked at 12 μs and then slowly decreased with time. The average velocities measured by the X-ray imaging and copper foil target tests were 6913 m/s and 7177 m/s, respectively, and both of them were in good agreement with the simulation results, verifying the accuracy of the numerical simulation model for jet formation. The formation processes of shaped charge liners were found to affect the mechanical properties of the material and thus, the jet’s formation process and motion parameters. The Zr-based amorphous alloy liner formed by the supercooled liquid-phase high-rheological-rate formation process exhibited a jet velocity 6.5% higher than that formed by the vacuum die casting process

    The effect of cognitive behavioral therapy on chemotherapy-induced side effects and immune function in colorectal cancer patients undergoing chemotherapy : study protocol for a randomized controlled trial

    No full text
    Background: Colorectal cancer (CRC) was one of the most widely diagnosed cancers in the United States in 2021. CRC patients may experience significant psychological stress and are susceptible to depression and anxiety. Previous studies have shown that cognitive behavioral therapy (CBT) can reduce fatigue and improve quality of life among breast cancer patients. However, as a non-pharmaceutical treatment, it remains unclear whether CBT improves chemotherapy-induced side effects and immune function in CRC patients. In this study, we will conduct a randomized controlled trial (RCT) among CRC patients undergoing chemotherapy to determine whether CBT can reduce the side effects of chemotherapy and improve the immune function of CRC patients. Methods: The study will be a single-center RCT. CRC patients undergoing chemotherapy will receive either eight sessions of group-based CBT (every 2-3 weeks) or usual care (usual oncology care). Each participant will undergo assessments at baseline (T0), immediately post-intervention (T1), 3 months postintervention (T2), and 6 months post-intervention (T3). The primary outcome will include chemotherapyinduced side effects in CRC patients. The secondary outcome will be immune function (measured by levels of inflammatory cytokines). Other outcomes will include the levels of tumor markers, assessments of psychological status (perception of stress, depression and anxiety, self-efficacy, sleep quality, quality of life, social support condition, and cognitive function), and necessary laboratory examinations (biochemical index and blood cell counts) among CRC patients undergoing chemotherapy. Discussion: Our study will provide clinical evidence regarding whether CBT should be generalized in clinical treatment and the extent to which CBT reduces chemotherapy- induced side effects for CRC patients

    Efficacy of sequential three-step empirical therapy for chronic cough

    No full text
    Background: Empirical three-step therapy has been proved in just one hospital. This study aimed to demonstrate applicability of the sequential empirical three-step therapy for chronic cough in different clinical settings. Methods: Sequential empirical three-step therapy was given to patients with chronic cough in one tertiary and three secondary care respiratory clinics. Recruiters were initially treated with methoxyphenamine compound as the first-step therapy, followed by corticosteroids as the second-step therapy and the combination of a proton-pump inhibitor and a prokinetic agent as the third-step therapy. The efficacy of the therapy was verified according to the changes in cough symptom score between pre- and post-treatment, and compared among the different clinics. Results: In total 155 patients in one tertiary clinic and 193 patients in secondary care clinics were recruited. The total dropout ratio is significantly higher in the secondary care clinics than that in the tertiary clinic (9.3% versus 3.2%, p = 0.023). The therapeutic success rate for cough was 38.7% at first-step therapy, 32.3% at second-step therapy and 20.0% at third-step therapy in the tertiary clinic, and comparable to corresponding 49.7%, 31.1% and 4.1% in secondary care clinics. Furthermore, the overall cough resolution rate was not significantly different (91.0% versus 85.0%, p = 0.091). However, the efficacy of the third-step therapy is much higher (20.0% versus 4.1%, p = 0.001) in the tertiary clinic than in the secondary care clinics. Conclusions: Sequential empirical three-step therapy is universally efficacious and useful for management of chronic cough in different clinical settings

    The Effect of Esmolol on Tissue Perfusion and Clinical Prognosis of Patients with Severe Sepsis: A Prospective Cohort Study

    No full text
    Purpose. This study was aimed at investigating the effect of esmolol on tissue perfusion and the clinical prognosis of patients with severe sepsis. Materials and Methods. One hundred fifty-one patients with severe sepsis were selected and divided into the esmolol group (n=75) or the control group (n=76), who received conventional antiseptic shock treatment. The esmolol group received a continuous infusion of esmolol via a central venous catheter, and their heart rate (HR) was maintained at 70–100 bpm over 72 hours. Results. The HR of all patients reached the target level within 72 hours of treatment for both groups. The effect of esmolol on PvaCO2 was only significant at 48 hours (P<0.05). ScvO2 increased in the esmolol group and decreased in the control group (P<0.01). Lac showed a linear downward trend over the treatment time, but the reduction was more significant in the control group at 48 hours (P<0.05) between the two groups. Kaplan-Meier analysis showed a significantly shorter duration of mechanical ventilation in the esmolol group than in the control group (P<0.05). Conclusions. Esmolol reduced the duration of mechanical ventilation in patients with severe sepsis, with no significant effect on circulatory function or tissue perfusion
    • …
    corecore