12 research outputs found

    Imitation Learning from Observation with Automatic Discount Scheduling

    Full text link
    Humans often acquire new skills through observation and imitation. For robotic agents, learning from the plethora of unlabeled video demonstration data available on the Internet necessitates imitating the expert without access to its action, presenting a challenge known as Imitation Learning from Observations (ILfO). A common approach to tackle ILfO problems is to convert them into inverse reinforcement learning problems, utilizing a proxy reward computed from the agent's and the expert's observations. Nonetheless, we identify that tasks characterized by a progress dependency property pose significant challenges for such approaches; in these tasks, the agent needs to initially learn the expert's preceding behaviors before mastering the subsequent ones. Our investigation reveals that the main cause is that the reward signals assigned to later steps hinder the learning of initial behaviors. To address this challenge, we present a novel ILfO framework that enables the agent to master earlier behaviors before advancing to later ones. We introduce an Automatic Discount Scheduling (ADS) mechanism that adaptively alters the discount factor in reinforcement learning during the training phase, prioritizing earlier rewards initially and gradually engaging later rewards only when the earlier behaviors have been mastered. Our experiments, conducted on nine Meta-World tasks, demonstrate that our method significantly outperforms state-of-the-art methods across all tasks, including those that are unsolvable by them.Comment: Accepted by ICLR 202

    Krüppel-like factors in tumors: Key regulators and therapeutic avenues

    Get PDF
    Krüppel-like factors (KLFs) are a group of DNA-binding transcriptional regulators with multiple essential functions in various cellular processes, including proliferation, migration, inflammation, and angiogenesis. The aberrant expression of KLFs is often found in tumor tissues and is essential for tumor development. At the molecular level, KLFs regulate multiple signaling pathways and mediate crosstalk among them. Some KLFs may also be molecular switches for specific biological signals, driving their transition from tumor suppressors to promoters. At the histological level, the abnormal expression of KLFs is closely associated with tumor cell stemness, proliferation, apoptosis, and alterations in the tumor microenvironment. Notably, the role of each KLF in tumors varies according to tumor type and different stages of tumor development rather than being invariant. In this review, we focus on the advances in the molecular biology of KLFs, particularly the regulations of several classical signaling pathways by these factors, and the critical role of KLFs in tumor development. We also highlight their strong potential as molecular targets in tumor therapy and suggest potential directions for clinical translational research

    On the Role of Discount Factor in Offline Reinforcement Learning

    Full text link
    Offline reinforcement learning (RL) enables effective learning from previously collected data without exploration, which shows great promise in real-world applications when exploration is expensive or even infeasible. The discount factor, γ\gamma, plays a vital role in improving online RL sample efficiency and estimation accuracy, but the role of the discount factor in offline RL is not well explored. This paper examines two distinct effects of γ\gamma in offline RL with theoretical analysis, namely the regularization effect and the pessimism effect. On the one hand, γ\gamma is a regulator to trade-off optimality with sample efficiency upon existing offline techniques. On the other hand, lower guidance γ\gamma can also be seen as a way of pessimism where we optimize the policy's performance in the worst possible models. We empirically verify the above theoretical observation with tabular MDPs and standard D4RL tasks. The results show that the discount factor plays an essential role in the performance of offline RL algorithms, both under small data regimes upon existing offline methods and in large data regimes without other conservative methods.Comment: Thirty-ninth International Conference on Machine Learnin

    Modeling based-on semi-tensor product for the start-up stage of the radiant cooling system

    Full text link
    For nonlinear systems with multiple outputs and multiple inputs which cannot be decoupled, we make use of the sampling data of the real system to obtain a fuzzy relation matrix model via the semi-tensor product (STP) operation of matrices, and establish the mathematical model for a complicated system based on STP. This method has been applied to analyze the dynamic performance of floor for the radiant floor system. In this paper, a radiant floor cooling system based on concrete core radiant floors is examined. To analyze dynamic behaviour of floors during non working time operation, model of fuzzy relation matrix based on STP is established. The model is used to estimate quantitativelyrelated parameters in the start-up period of the floor systems under the impact of outdoor environment such as temperature and humidity during the actual cool-down times. The results show that it is not evident to time delay of the floor surface temperature and indoor air temperature due to the thermal inertia while the amplitude of outdoor air temperature vibration is reduced significantly

    Flow to Control: Offline Reinforcement Learning with Lossless Primitive Discovery

    No full text
    Offline reinforcement learning (RL) enables the agent to effectively learn from logged data, which significantly extends the applicability of RL algorithms in real-world scenarios where exploration can be expensive or unsafe. Previous works have shown that extracting primitive skills from the recurring and temporally extended structures in the logged data yields better learning. However, these methods suffer greatly when the primitives have limited representation ability to recover the original policy space, especially in offline settings. In this paper, we give a quantitative characterization of the performance of offline hierarchical learning and highlight the importance of learning lossless primitives. To this end, we propose to use a flow-based structure as the representation for low-level policies. This allows us to represent the behaviors in the dataset faithfully while keeping the expression ability to recover the whole policy space. We show that such lossless primitives can drastically improve the performance of hierarchical policies. The experimental results and extensive ablation studies on the standard D4RL benchmark show that our method has a good representation ability for policies and achieves superior performance in most tasks

    Protective Effects and Mechanism of a Novel Probiotic Strain <i>Ligilactobacillus salivarius</i> YL20 against <i>Cronobacter sakazakii</i>-Induced Necrotizing Enterocolitis In Vitro and In Vivo

    No full text
    Exposure to probiotics in early life contributes to host intestinal development and prevention of necrotizing enterocolitis (NEC). Cronobacter sakazakii (C. sakazakii), an opportunistic pathogen, can cause NEC, bacteremia, and meningitis in neonates, but the research of probiotics against C. sakazakii is limited relative to other enteropathogens. Here, the protective effect and mechanism of a novel probiotic Ligilactobacillus salivarius (L. salivarius) YL20 isolated from breast milk on C. sakazakii-induced intestinal injury were explored by using two in vitro models, including an C. sakazakii-infected intestinal organoid model and intestinal barrier model, as well as an in vivo experimental animal model. Our results revealed that L. salivarius YL20 could promote epithelial cell proliferation in intestinal organoids, rescue budding-impaired organoids, prevent the decrease of mRNA levels of leucine-rich repeat containing G protein-coupled receptor 5 (Lgr5), zonula occludens-1 (Zo-1) and Occludin, and reverse C. sakazakii-induced low level of Mucin 2 (MUC2) in intestinal organoids. Additionally, YL20 could inhibit C. sakazakii invasion, increase the expression of ZO-1 and occludin in C. sakazakii-infected HT-29 cells, and reverse TEER decrease and corresponding permeability increase across C. sakazakii-infected Caco-2 monolayers. Furthermore, YL20 administration could alleviate NEC in C. sakazakii-infected neonatal mice by increasing the mice survival ratio, decreasing pathology scores, and downregulating pro-inflammatory cytokines. Meanwhile, YL20 could also enhance intestinal barrier function in vivo by increasing the number of goblet cells, the level of MUC-2 and the expression of ZO-1. Our overall findings demonstrated for the first time the beneficial effects of L. salivarius YL20 against C. sakazakii-induced NEC by improving intestinal stem cell function and enhancing intestinal barrier integrity
    corecore