91 research outputs found

    In-Sample Policy Iteration for Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) seeks to derive an effective control policy from previously collected data. To circumvent errors due to inadequate data coverage, behavior-regularized methods optimize the control policy while concurrently minimizing deviation from the data collection policy. Nevertheless, these methods often exhibit subpar practical performance, particularly when the offline dataset is collected by sub-optimal policies. In this paper, we propose a novel algorithm employing in-sample policy iteration that substantially enhances behavior-regularized methods in offline RL. The core insight is that by continuously refining the policy used for behavior regularization, in-sample policy iteration gradually improves itself while implicitly avoids querying out-of-sample actions to avert catastrophic learning failures. Our theoretical analysis verifies its ability to learn the in-sample optimal policy, exclusively utilizing actions well-covered by the dataset. Moreover, we propose competitive policy improvement, a technique applying two competitive policies, both of which are trained by iteratively improving over the best competitor. We show that this simple yet potent technique significantly enhances learning efficiency when function approximation is applied. Lastly, experimental results on the D4RL benchmark indicate that our algorithm outperforms previous state-of-the-art methods in most tasks

    HarmonyDream: Task Harmonization Inside World Models

    Full text link
    Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling. In this paper, through a dedicated empirical investigation, we gain a deeper understanding of the role each task plays in world models and uncover the overlooked potential of sample-efficient MBRL by mitigating the domination of either observation or reward modeling. Our key insight is that while prevalent approaches of explicit MBRL attempt to restore abundant details of the environment via observation models, it is difficult due to the environment's complexity and limited model capacity. On the other hand, reward models, while dominating implicit MBRL and adept at learning compact task-centric dynamics, are inadequate for sample-efficient learning without richer learning signals. Motivated by these insights and discoveries, we propose a simple yet effective approach, HarmonyDream, which automatically adjusts loss coefficients to maintain task harmonization, i.e. a dynamic equilibrium between the two tasks in world model learning. Our experiments show that the base MBRL method equipped with HarmonyDream gains 10%-69% absolute performance boosts on visual robotic tasks and sets a new state-of-the-art result on the Atari 100K benchmark. Code is available at https://github.com/thuml/HarmonyDream.Comment: ICML 2024. Code is available at https://github.com/thuml/HarmonyDrea

    The impact of ferroptosis and ferroptosis-related non-coding RNAs on breast cancer progression

    Get PDF
    Ferroptosis, distinct from apoptosis, is primarily characterized by the accumulation of iron-dependent lipid peroxides (LPO) and reactive oxygen species (ROS). This process plays a pivotal role in the pathophysiology of various diseases and has recently emerged as a promising therapeutic strategy in oncology, garnering significant attention. Non-coding RNAs (ncRNAs), including microRNAs (miRNAs), long non-coding RNAs (lncRNAs), and circular RNAs (circRNAs), serve as crucial regulators in numerous biological processes, particularly in cancer initiation and progression. Increasing research efforts are focused on targeting ferroptosis through modulation of these ncRNAs. This review provides an overview of the mechanisms underlying ferroptosis and explores the roles of ncRNAs in breast cancer (BC) and its regulation. Furthermore, we examine the interactions between ferroptosis and ncRNAs in BC, aiming to identify potential therapeutic targets for BC treatment

    Ethyl Pyruvate Attenuates CaCl2-Induced Tubular Epithelial Cell Injury by Inhibiting Autophagy and Inflammatory Responses

    Get PDF
    Background/Aims: Nephrolithiasis is one of the most prevalent diseases of the urinary system. Approximately 80% of human kidney stones are composed of calcium oxalate (CaOx), and hypercalciuria is one of the most common metabolic disorders. Emerging evidence indicates that autophagy and inflammatory responses are related to the formation of CaOx nephrolithiasis. However, the roles of autophagy and inflammation in patients with hypercalciuria remain unclear. Ethyl pyruvate (EP) displays protective effects in experimental models of many illnesses. In this study, we investigated the protective effects of EP in vitro through its inhibition of autophagy and inflammatory responses after CaCl2-induced tubular epithelial cell injury. Methods: First, we cultured human tubular epithelial (HK-2) cells in the presence of various concentrations of CaCl2 (0, 0.1, 0.25, 0.5, 1.0, 1.5, and 2.0 mg/ml) for 12 h and EP (0, 1.0, 2.5, 5.0, and 10.0 mM) for 2 h to select the optimum concentration using the Cell Counting Kit-8 assay and lactate dehydrogenase (LDH) assay. Cells in culture were stimulated with CaCl2 (1.0 mg/ml, 12 h) with or without EP pretreatment (2.5 mM, 2 h). After the exposure, we detected the expression of inflammation-related proteins using an enzyme-linked immunosorbent assay and Western blot analysis. Finally, the levels of autophagy-related proteins were determined through Western blot analysis, and the number of GFP-LC3 dots and autophagic vacuoles was detected under confocal microscopy. Results: With the use of the Cell Counting Kit-8 assay and the LDH assay, we identified the optimum concentration for CaCl2 (1.0 mg/ml) treatment and EP pretreatment (2.5 mM). Our research indicated that CaCl2 can induce autophagy and inflammatory responses in HK-2 cells. Furthermore, treatment with EP prior to CaCl2 stimulation attenuated HK-2 cell injury by inhibiting autophagy and inflammation. Conclusion: Our results provide evidence that EP attenuates CaCl2-induced injury of HK-2 cells by downregulating the expression of inflammation and autophagy proteins that may be associated with the inhibition of the high-mobility group box-1 (HMGB1)/toll-like receptor 4 (TLR4)/NF-κB pathway and the competitive interaction with Beclin-1 of HMGB1

    Short-Term Power Prediction of a Wind Farm Based on Empirical Mode Decomposition and Mayfly Algorithm–Back Propagation Neural Network

    Get PDF
    With the improvement of energy consumption structure, the installed capacity of wind power increases gradually. However, the inherent intermittency and instability of wind energy bring severe challenges to the dispatching operation. Wind power forecasting is one of the main solutions. In this work, a new combined wind power prediction model is proposed. First, a quartile method is used for data cleaning, namely, identifying and eliminating the abnormal data. Then, the wind power data sequence is decomposed by empirical mode decomposition to eliminate non-stationary characteristics. Finally, the wind generator data are trained by the MA-BP network to establish the wind power prediction model. Also, the simulation tests verify the prediction effect of the proposed method. Specifically speaking, the average MAPE is decreased to 12.4979% by the proposed method. Also, the average RMSE and MAE are 107.1728 and 71.604 kW, respectively

    Optimization of pre-swirl stators based on CFD for a chemical product carrier

    No full text
    The viscous self-propulsion flow fields of a model-scaled 55k DWT chemical product carrier fitted with a rudder-bulb-fin and a pre-swirl stator are numerically simulated based on the CFD general code FLUENT. The energy saving effects of stators are evaluated through the increase of propulsive efficiency. It is found that the computed changing tendencies of almost all self-propulsion factors after being equipped with a stator are the same as in the experiments, such as a decreased revolution rate, increased thrust deduction and mean wake. A wake energy analysis is also conducted to verify the energy-saving effects of stators, and it shows that the stator decreases the flow of kinetic energy behind the propeller through its contra-propeller pre-swirl. Next, an optimization of pre-swirl stators is conducted by CFD. Aside from the prototype stator, three modified stators are designed and the self-propulsion characteristics with these stators are also numerically simulated. The increase order of the evaluated energy-saving effects of these modified stators is seen to be the same as in the design idea. The case with the highest propulsive efficiency shows the largest increase of Ktotal before the propeller and the largest decrease of Ktotal behind the propeller relative to cases without stators
    corecore