11 research outputs found

    Design from Policies: Conservative Test-Time Adaptation for Offline Policy Optimization

    Full text link
    In this work, we decouple the iterative bi-level offline RL (value estimation and policy extraction) from the offline training phase, forming a non-iterative bi-level paradigm and avoiding the iterative error propagation over two levels. Specifically, this non-iterative paradigm allows us to conduct inner-level optimization (value estimation) in training, while performing outer-level optimization (policy extraction) in testing. Naturally, such a paradigm raises three core questions that are not fully answered by prior non-iterative offline RL counterparts like reward-conditioned policy: (q1) What information should we transfer from the inner-level to the outer-level? (q2) What should we pay attention to when exploiting the transferred information for safe/confident outer-level optimization? (q3) What are the benefits of concurrently conducting outer-level optimization during testing? Motivated by model-based optimization (MBO), we propose DROP (design from policies), which fully answers the above questions. Specifically, in the inner-level, DROP decomposes offline data into multiple subsets, and learns an MBO score model (a1). To keep safe exploitation to the score model in the outer-level, we explicitly learn a behavior embedding and introduce a conservative regularization (a2). During testing, we show that DROP permits deployment adaptation, enabling an adaptive inference across states (a3). Empirically, we evaluate DROP on various tasks, showing that DROP gains comparable or better performance compared to prior methods.Comment: NeurIPS 202

    CEIL: Generalized Contextual Imitation Learning

    Full text link
    In this paper, we present \textbf{C}ont\textbf{E}xtual \textbf{I}mitation \textbf{L}earning~(CEIL), a general and broadly applicable algorithm for imitation learning (IL). Inspired by the formulation of hindsight information matching, we derive CEIL by explicitly learning a hindsight embedding function together with a contextual policy using the hindsight embeddings. To achieve the expert matching objective for IL, we advocate for optimizing a contextual variable such that it biases the contextual policy towards mimicking expert behaviors. Beyond the typical learning from demonstrations (LfD) setting, CEIL is a generalist that can be effectively applied to multiple settings including: 1)~learning from observations (LfO), 2)~offline IL, 3)~cross-domain IL (mismatched experts), and 4) one-shot IL settings. Empirically, we evaluate CEIL on the popular MuJoCo tasks (online) and the D4RL dataset (offline). Compared to prior state-of-the-art baselines, we show that CEIL is more sample-efficient in most online IL tasks and achieves better or competitive performances in offline tasks.Comment: NeurIPS 202

    Beyond OOD State Actions: Supported Cross-Domain Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) aims to learn a policy using only pre-collected and fixed data. Although avoiding the time-consuming online interactions in RL, it poses challenges for out-of-distribution (OOD) state actions and often suffers from data inefficiency for training. Despite many efforts being devoted to addressing OOD state actions, the latter (data inefficiency) receives little attention in offline RL. To address this, this paper proposes the cross-domain offline RL, which assumes offline data incorporate additional source-domain data from varying transition dynamics (environments), and expects it to contribute to the offline data efficiency. To do so, we identify a new challenge of OOD transition dynamics, beyond the common OOD state actions issue, when utilizing cross-domain offline data. Then, we propose our method BOSA, which employs two support-constrained objectives to address the above OOD issues. Through extensive experiments in the cross-domain offline RL setting, we demonstrate BOSA can greatly improve offline data efficiency: using only 10\% of the target data, BOSA could achieve {74.4\%} of the SOTA offline RL performance that uses 100\% of the target data. Additionally, we also show BOSA can be effortlessly plugged into model-based offline RL and noising data augmentation techniques (used for generating source-domain data), which naturally avoids the potential dynamics mismatch between target-domain data and newly generated source-domain data

    RSG: Fast Learning Adaptive Skills for Quadruped Robots by Skill Graph

    Full text link
    Developing robotic intelligent systems that can adapt quickly to unseen wild situations is one of the critical challenges in pursuing autonomous robotics. Although some impressive progress has been made in walking stability and skill learning in the field of legged robots, their ability to fast adaptation is still inferior to that of animals in nature. Animals are born with massive skills needed to survive, and can quickly acquire new ones, by composing fundamental skills with limited experience. Inspired by this, we propose a novel framework, named Robot Skill Graph (RSG) for organizing massive fundamental skills of robots and dexterously reusing them for fast adaptation. Bearing a structure similar to the Knowledge Graph (KG), RSG is composed of massive dynamic behavioral skills instead of static knowledge in KG and enables discovering implicit relations that exist in be-tween of learning context and acquired skills of robots, serving as a starting point for understanding subtle patterns existing in robots' skill learning. Extensive experimental results demonstrate that RSG can provide rational skill inference upon new tasks and environments and enable quadruped robots to adapt to new scenarios and learn new skills rapidly

    Research on adaptive shooting algorithm for unmanned aerial vehicles under typical operating conditions of overhead transmission lines

    No full text
    We consider typical environmental conditions, such as different shooting backgrounds, lighting, and shooting distances, and study adaptive shooting algorithms for unmanned aerial vehicle (UAV) on-site environments under typical working conditions to achieve high-quality acquisition of UAV inspection images and reduce the technical difficulty of artificial intelligence recognition of inspection images. Based on the collected high-quality images and the characteristics of unmanned aerial vehicle inspection images, a deep learning technology framework is adopted to study the intelligent identification method of hidden dangers in transmission line equipment and channel environment, achieving the intelligent recognition of pin level fine particle defects

    Development and validation of machine learning based prediction model for postoperative pain risk after extraction of impacted mandibular third molars

    No full text
    Background: Predicting postoperative pain risk in patients with impacted mandibular third molar extractions is helpful in guiding clinical decision-making, enhancing perioperative pain management, and improving the patients’ medical experience. This study aims to develop a prediction model based on machine learning algorithms to identify patients at high risk of postoperative pain after tooth extraction. Methods: We conducted a prospective cohort study. Outpatients with impacted mandibular third molars were recruited and the outcome was defined as the NRS (Numerical Rating Scale) score of peak postoperative pain within 24 h after the operation ≥7, which is considered a high risk of postoperative pain. We compared the models built using nine different machine learning algorithms and conducted internal and time-series external validations to evaluate the model's predictive performances in terms of the area under the curve (AUC), accuracy, sensitivity, specificity, and F1-value. Results: A total of 185 patients and 202 cases of impacted mandibular third molar data were included in this study. Five modeling variables were screened out using least absolute selection and shrinkage operator regression, including physician qualification, patient self-reported maximum pain sensitivity, OHI–S–CI, BMI, and systolic blood pressure. The overall performance of the random forest model was evaluated. The AUC, sensitivity, and specificity of the prediction model built using the random forest method were 0.879 (0.861–0.891), 0.857, and 0.846, respectively, for the training set and 0.724 (0.673–0.732), 0.667, and 0.600, respectively, for the time series validation set. Conclusions: This study developed a machine learning-based postoperative pain risk prediction model for impacted mandibular third molar extraction, which is promising for providing a theoretical basis for better pain management to reduce postoperative pain after third molar extraction

    Troxerutin Reduces Kidney Damage against BDE-47-Induced Apoptosis via Inhibiting NOX2 Activity and Increasing Nrf2 Activity

    No full text
    2,2,4,4-Tetrabromodiphenyl ether (BDE-47), one of the persistent organic pollutants, seriously influences the quality of life; however, its pathological mechanism remains unclear. Troxerutin is a flavonoid with pharmacological activity of antioxidation and anti-inflammation. In the present study, we investigated troxerutin against BDE-47-induced kidney cell apoptosis and explored the underlying mechanism. The results show that troxerutin reduced renal cell apoptosis and urinary protein secretion in BDE-47-treated mice. Western blot analysis shows that troxerutin supplement enhanced the ratio of Bcl-2/Bax; inhibited the release of cytochrome c from mitochondria, the activation of procaspase-9 and procaspase-3, and the cleavage of PARP; and reduced FAS, FASL, and caspase-8 levels induced by BDE-47. In addition, troxerutin decreased the production of reactive oxygen species (ROS) and increased the activities of antioxidative enzymes. Furthermore, troxerutin blunted Nrf2 ubiquitylation, enhanced the activity of Nrf2, decreased the activity of NOX2, and ameliorated kidney oxidant status of BDE-47-treated mice. Together, these results confirm that troxerutin could alleviate the cytotoxicity of BDE-47 through antioxidation and antiapoptosis, which suggests that its protective mechanism is involved in the inhibition of apoptosis via suppressing NOX2 activity and increasing Nrf2 signaling pathway

    Dispersion and Polishing Mechanism of a Novel CeO2-LaOF-Based Chemical Mechanical Polishing Slurry for Quartz Glass

    No full text
    Quartz glass shows superior physicochemical properties and is used in modern high technology. Due to its hard and brittle characteristics, traditional polishing slurry mostly uses strong acid, strong alkali, and potent corrosive additives, which cause environmental pollution. Furthermore, the degree of damage reduces service performance of the parts due to the excessive corrosion. Therefore, a novel quartz glass green and efficient non-damaging chemical mechanical polishing slurry was developed, consisting of cerium oxide (CeO2), Lanthanum oxyfluoride (LaOF), potassium pyrophosphate (K4P2O7), sodium N-lauroyl sarcosinate (SNLS), and sodium polyacrylate (PAAS). Among them, LaOF abrasive showed hexahedral morphology, which increased the cutting sites and uniformed the load. The polishing slurry was maintained by two anionic dispersants, namely SNLS and PAAS, to maintain the suspension stability of the slurry, which makes the abrasive in the slurry have a more uniform particle size and a smoother sample surface after polishing. After the orthogonal test, a surface roughness (Sa) of 0.23 nm was obtained in the range of 50 × 50 μm2, which was lower than the current industry rating of 0.9 nm, and obtained a material removal rate (MRR) of 530.52 nm/min

    Isolation of infectious SARS-CoV-2 from urine of a COVID-19 patient

    No full text
    SARS-CoV-2 caused a major outbreak of severe pneumonia (COVID-19) in humans. Viral RNA was detected in multiple organs in COVID-19 patients. However, infectious SARS-CoV-2 was only isolated from respiratory specimens. Here, infectious SARS-CoV-2 was successfully isolated from urine of a COVID-19 patient. The virus isolated could infect new susceptible cells and was recognized by its’ own patient sera. Appropriate precautions should be taken to avoid transmission from urine

    Efficacy and safety of GST-HG171 in adult patients with mild to moderate COVID-19: a randomised, double-blind, placebo-controlled phase 2/3 trialResearch in context

    No full text
    Summary: Background: GST-HG171 is a potent, broad-spectrum, orally bioavailable small-molecule 3C like protease inhibitor that has demonstrated greater potency and efficacy compared to Nirmatrelvir in pre-clinical studies. We aimed to evaluate the efficacy and safety of orally administered GST-HG171 plus Ritonavir in patients with coronavirus disease 2019 (COVID-19) infected with emerging XBB and non-XBB variants. Methods: This randomised, double-blind, placebo-controlled phase 2/3 trial was conducted in 47 sites in China among adult patients with mild-to-moderate COVID-19 with symptoms onset ≤72 h. Eligible patients were randomised 1:1 to receive GST-HG171 (150 mg) plus Ritonavir (100 mg) or corresponding placebo tablets twice daily for 5 days, with stratification factors including the risk level of disease progression and vaccination status. The primary efficacy endpoint was time to sustained recovery of clinical symptoms within 28 days, defined as a score of 0 for 11 COVID-19-related target symptoms for 2 consecutive days, assessed in the modified intention-to-treat (mITT) population. This trial was registered at ClinicalTrials.gov (NCT05656443) and Chinese Clinical Trial Registry (ChiCTR2200067088). Findings: Between Dec 19, 2022, and May 4, 2023, 1525 patients were screened. Among 1246 patients who underwent randomisation, most completed basic (21.2%) or booster (74.9%) COVID-19 immunization, and most had a low risk of disease progression at baseline. 610 of 617 who received GST-HG171 plus Ritonavir and 603 of 610 who received placebo were included in the mITT population. Patients who received GST-HG171 plus Ritonavir showed shortened median time to sustained recovery of clinical symptoms compared to the placebo group (13.0 days [95.45% confidence interval 12.0–15.0] vs. 15.0 days [14.0–15.0], P = 0.031). Consistent results were observed in both SARS-CoV-2 XBB (45.7%, 481/1053 of mITT population) and non-XBB variants (54.3%, 572/1053 of mITT population) subgroups. Incidence of adverse events was similar in the GST-HG171 plus Ritonavir (320/617, 51.9%) and placebo group (298/610, 48.9%). The most common adverse events in both placebo and treatment groups were hypertriglyceridaemia (10.0% vs. 14.7%). No deaths occurred. Interpretation: Treatment with GST-HG171 plus Ritonavir has demonstrated benefits in symptom recovery and viral clearance among low-risk vaccinated adult patients with COVID-19, without apparent safety concerns. As most patients were treated within 2 days after symptom onset in our study, confirming the potential benefits of symptom recovery for patients with a longer duration between symptom onset and treatment initiation will require real-world studies. Funding: Fujian Akeylink Biotechnology Co., Ltd
    corecore