30 research outputs found

    Tackling Visual Control via Multi-View Exploration Maximization

    Full text link
    We present MEM: Multi-view Exploration Maximization for tackling complex visual control tasks. To the best of our knowledge, MEM is the first approach that combines multi-view representation learning and intrinsic reward-driven exploration in reinforcement learning (RL). More specifically, MEM first extracts the specific and shared information of multi-view observations to form high-quality features before performing RL on the learned features, enabling the agent to fully comprehend the environment and yield better actions. Furthermore, MEM transforms the multi-view features into intrinsic rewards based on entropy maximization to encourage exploration. As a result, MEM can significantly promote the sample-efficiency and generalization ability of the RL agent, facilitating solving real-world problems with high-dimensional observations and spare-reward space. We evaluate MEM on various tasks from DeepMind Control Suite and Procgen games. Extensive simulation results demonstrate that MEM can achieve superior performance and outperform the benchmarking schemes with simple architecture and higher efficiency.Comment: 21 pages, 9 figure

    Automatic Intrinsic Reward Shaping for Exploration in Deep Reinforcement Learning

    Full text link
    We present AIRS: Automatic Intrinsic Reward Shaping that intelligently and adaptively provides high-quality intrinsic rewards to enhance exploration in reinforcement learning (RL). More specifically, AIRS selects shaping function from a predefined set based on the estimated task return in real-time, providing reliable exploration incentives and alleviating the biased objective problem. Moreover, we develop an intrinsic reward toolkit to provide efficient and reliable implementations of diverse intrinsic reward approaches. We test AIRS on various tasks of Procgen games and DeepMind Control Suite. Extensive simulation demonstrates that AIRS can outperform the benchmarking schemes and achieve superior performance with simple architecture.Comment: 23 pages, 16 figure

    Fairness-Oriented User Scheduling for Bursty Downlink Transmission Using Multi-Agent Reinforcement Learning

    Get PDF
    In this work, we develop practical user scheduling algorithms for downlink bursty traffic with emphasis on user fairness. In contrast to the conventional scheduling algorithms that either equally divides the transmission time slots among users or maximizing some ratios without physcial meanings, we propose to use the 5%-tile user data rate (5TUDR) as the metric to evaluate user fairness. Since it is difficult to directly optimize 5TUDR, we first cast the problem into the stochastic game framework and subsequently propose a Multi-Agent Reinforcement Learning (MARL)-based algorithm to perform distributed optimization on the resource block group (RBG) allocation. Furthermore, each MARL agent is designed to take information measured by network counters from multiple network layers (e.g. Channel Quality Indicator, Buffer size) as the input states while the RBG allocation as action with a proposed reward function designed to maximize 5TUDR. Extensive simulation is performed to show that the proposed MARL-based scheduler can achieve fair scheduling while maintaining good average network throughput as compared to conventional schedulers.Comment: 30 pages, 13 figure

    RLLTE: Long-Term Evolution Project of Reinforcement Learning

    Full text link
    We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.Comment: 22 pages, 15 figure

    Serum protein biomarkers for HCC risk prediction in HIV/HBV co-infected people: a clinical proteomic study using mass spectrometry

    Get PDF
    BackgroundHBV coinfection is frequent in people living with HIV (PLWH) and is the leading cause of hepatocellular carcinoma (HCC). While risk prediction methods for HCC in patients with HBV monoinfection have been proposed, suitable biomarkers for early diagnosis of HCC in PLWH remain uncommon.MethodsLiquid chromatography-tandem mass spectrometry (LC-MS/MS) was used to examine serum protein alterations in HCC and non-HCC patients with HIV and HBV co-infection. Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Disease Ontology (DO) enrichment analysis were performed on the differentially expressed proteins (DEPs). The risk prediction model was created using five-cross-validation and LASSO regression to filter core DEPs.ResultsA total of 124 DEPs were discovered, with 95 proteins up-regulated and 29 proteins down-regulated. Extracellular matrix organization and membrane component were the DEPs that were most abundant in the categories of biological processes (BP) and cellular components (CC). Proteoglycans in cancer were one of the top three DEPs primarily enriched in the KEGG pathway, and 60.0% of DEPs were linked to various neoplasms in terms of DO enrichment. Eleven proteins, including GAPR1, PLTP, CLASP2, IGHV1-69D, IGLV5-45, A2M, VNN1, KLK11, ANPEP, DPP4 and HYI, were chosen as the core DEPs, and a nomogram was created to predict HCC risk.ConclusionIn HIV/HBV patients with HCC, several differential proteins can be detected in plasma by mass spectrometry, which can be used as screening markers for early diagnosis and risk prediction of HCC. Monitoring protease expression differences can help in the diagnosis and prognosis of HCC

    Evaluating the efficiency of a nomogram based on the data of neurosurgical intensive care unit patients to predict pulmonary infection of multidrug-resistant Acinetobacter baumannii

    Get PDF
    BackgroundPulmonary infection caused by multidrug-resistant Acinetobacter baumannii (MDR-AB) is a common and serious complication after brain injury. There are no definitive methods for its prediction and it is usually accompanied by a poor prognosis. This study aimed to construct and evaluate a nomogram based on patient data from the neurosurgical intensive care unit (NSICU) to predict the probability of MDR-AB pulmonary infection.MethodsIn this study, we retrospectively collected patient clinical profiles, early laboratory test results, and doctors’ prescriptions (66 variables). Univariate and backward stepwise regression analyses were used to screen the variables to identify predictors, and a nomogram was built in the primary cohort based on the results of a logistic regression model. Discriminatory validity, calibration validity, and clinical utility were evaluated using validation cohort 1 based on receiver operating characteristic curves, calibration curves, and decision curve analysis (DCA). For external validation based on predictors, we prospectively collected information from patients as validation cohort 2.ResultsAmong 2115 patients admitted to the NSICU between December 1, 2019, and December 31, 2021, 217 were eligible for the study, including 102 patients with MDR-AB infections (102 cases) and 115 patients with other bacterial infections (115 cases). We randomly categorized the patients into the primary cohort (70%, N=152) and validation cohort 1 (30%, N=65). Validation cohort 2 consisted of 24 patients admitted to the NSICU between January 1, 2022, and March 31, 2022, whose clinical information was prospectively collected according to predictors. The nomogram, consisting of only six predictors (age, NSICU stay, Glasgow Coma Scale, meropenem, neutrophil to lymphocyte ratio, platelet to lymphocyte ratio), had significantly high sensitivity and specificity (primary cohort AUC=0.913, validation cohort 1 AUC=0.830, validation cohort 2 AUC=0.889) for early identification of infection and had great calibration (validation cohort 1,2 P=0.3801, 0.6274). DCA confirmed that the nomogram is clinically useful.ConclusionOur nomogram could help clinicians make early predictions regarding the onset of pulmonary infection caused by MDR-AB and implement targeted interventions

    A novel risk stratification model for STEMI after primary PCI: global longitudinal strain and deep neural network assisted myocardial contrast echocardiography quantitative analysis

    Get PDF
    BackgroundIn ST-segment elevation myocardial infarction (STEMI) with the restoration of TIMI 3 flow by percutaneous coronary intervention (PCI), visually defined microvascular obstruction (MVO) was shown to be the predictor of poor prognosis, but not an ideal risk stratification method. We intend to introduce deep neural network (DNN) assisted myocardial contrast echocardiography (MCE) quantitative analysis and propose a better risk stratification model.Methods194 STEMI patients with successful primary PCI with at least 6 months follow-up were included. MCE was performed within 48 h after PCI. The major adverse cardiovascular events (MACE) were defined as cardiac death, congestive heart failure, reinfarction, stroke, and recurrent angina. The perfusion parameters were derived from a DNN-based myocardial segmentation framework. Three patterns of visual microvascular perfusion (MVP) qualitative analysis: normal, delay, and MVO. Clinical markers and imaging features, including global longitudinal strain (GLS) were analyzed. A calculator for risk was constructed and validated with bootstrap resampling.ResultsThe time-cost for processing 7,403 MCE frames is 773 s. The correlation coefficients of microvascular blood flow (MBF) were 0.99 to 0.97 for intra-observer and inter-observer variability. 38 patients met MACE in 6-month follow-up. We proposed A risk prediction model based on MBF [HR: 0.93 (0.91–0.95)] in culprit lesion areas and GLS [HR: 0.80 (0.73–0.88)]. At the best risk threshold of 40%, the AUC was 0.95 (sensitivity: 0.84, specificity: 0.94), better than visual MVP method (AUC: 0.70, Sensitivity: 0.89, Specificity: 0.40, IDI: −0.49). The Kaplan-Meier curves showed that the proposed risk prediction model allowed for better risk stratification.ConclusionThe MBF + GLS model allowed more accurate risk stratification of STEMI after PCI than visual qualitative analysis. The DNN-assisted MCE quantitative analysis is an objective, efficient and reproducible method to evaluate microvascular perfusion

    Intrinsically-Motivated Reinforcement Learning: A Brief Introduction

    Full text link
    Reinforcement learning (RL) is one of the three basic paradigms of machine learning. It has demonstrated impressive performance in many complex tasks like Go and StarCraft, which is increasingly involved in smart manufacturing and autonomous driving. However, RL consistently suffers from the exploration-exploitation dilemma. In this paper, we investigated the problem of improving exploration in RL and introduced the intrinsically-motivated RL. In sharp contrast to the classic exploration strategies, intrinsically-motivated RL utilizes the intrinsic learning motivation to provide sustainable exploration incentives. We carefully classified the existing intrinsic reward methods and analyzed their practical drawbacks. Moreover, we proposed a new intrinsic reward method via R\'enyi state entropy maximization, which overcomes the drawbacks of the preceding methods and provides powerful exploration incentives. Finally, extensive simulation demonstrated that the proposed module achieve superior performance with higher efficiency and robustness.Comment: 40 pages, 25 figure
    corecore