12 research outputs found

    Speeding-up reinforcement learning through abstraction and transfer learning

    Get PDF
    We are interested in the following general question: is it pos-\ud sible to abstract knowledge that is generated while learning\ud the solution of a problem, so that this abstraction can ac-\ud celerate the learning process? Moreover, is it possible to\ud transfer and reuse the acquired abstract knowledge to ac-\ud celerate the learning process for future similar tasks? We\ud propose a framework for conducting simultaneously two lev-\ud els of reinforcement learning, where an abstract policy is\ud learned while learning of a concrete policy for the problem,\ud such that both policies are refined through exploration and\ud interaction of the agent with the environment. We explore\ud abstraction both to accelerate the learning process for an op-\ud timal concrete policy for the current problem, and to allow\ud the application of the generated abstract policy in learning\ud solutions for new problems. We report experiments in a\ud robot navigation environment that show our framework to\ud be effective in speeding up policy construction for practical\ud problems and in generating abstractions that can be used to\ud accelerate learning in new similar problems.This research was partially supported by FAPESP (2011/ 19280-8, 2012/02190-9, 2012/19627-0) and CNPq (311058/ 2011-6, 305395/2010-6

    Omecamtiv mecarbil in chronic heart failure with reduced ejection fraction, GALACTIC‐HF: baseline characteristics and comparison with contemporary clinical trials

    Get PDF
    Aims: The safety and efficacy of the novel selective cardiac myosin activator, omecamtiv mecarbil, in patients with heart failure with reduced ejection fraction (HFrEF) is tested in the Global Approach to Lowering Adverse Cardiac outcomes Through Improving Contractility in Heart Failure (GALACTIC‐HF) trial. Here we describe the baseline characteristics of participants in GALACTIC‐HF and how these compare with other contemporary trials. Methods and Results: Adults with established HFrEF, New York Heart Association functional class (NYHA) ≄ II, EF ≀35%, elevated natriuretic peptides and either current hospitalization for HF or history of hospitalization/ emergency department visit for HF within a year were randomized to either placebo or omecamtiv mecarbil (pharmacokinetic‐guided dosing: 25, 37.5 or 50 mg bid). 8256 patients [male (79%), non‐white (22%), mean age 65 years] were enrolled with a mean EF 27%, ischemic etiology in 54%, NYHA II 53% and III/IV 47%, and median NT‐proBNP 1971 pg/mL. HF therapies at baseline were among the most effectively employed in contemporary HF trials. GALACTIC‐HF randomized patients representative of recent HF registries and trials with substantial numbers of patients also having characteristics understudied in previous trials including more from North America (n = 1386), enrolled as inpatients (n = 2084), systolic blood pressure < 100 mmHg (n = 1127), estimated glomerular filtration rate < 30 mL/min/1.73 m2 (n = 528), and treated with sacubitril‐valsartan at baseline (n = 1594). Conclusions: GALACTIC‐HF enrolled a well‐treated, high‐risk population from both inpatient and outpatient settings, which will provide a definitive evaluation of the efficacy and safety of this novel therapy, as well as informing its potential future implementation

    TransferĂȘncia relacional entre tarefas de aprendizado por reforço via polĂ­ticas abstratas.

    Full text link
    When designing intelligent agents that must solve sequential decision problems, often we do not have enough knowledge to build a complete model for the problems at hand. Reinforcement learning enables an agent to learn behavior by acquiring experience through trial-and-error interactions with the environment. However, knowledge is usually built from scratch and learning the optimal policy may take a long time. In this work, we improve the learning performance by exploring transfer learning; that is, the knowledge acquired in previous source tasks is used to accelerate learning in new target tasks. If the tasks present similarities, then the transferred knowledge guides the agent towards faster learning. We explore the use of a relational representation that allows description of relationships among objects. This representation simplifies the use of abstraction and the extraction of the similarities among tasks, enabling the generalization of solutions that can be used across different, but related, tasks. This work presents two model-free algorithms for online learning of abstract policies: AbsSarsa(λ) and AbsProb-RL. The former builds a deterministic abstract policy from value functions, while the latter builds a stochastic abstract policy through direct search on the space of policies. We also propose the S2L-RL agent architecture, containing two levels of learning: an abstract level and a ground level. The agent simultaneously builds a ground policy and an abstract policy; not only the abstract policy can accelerate learning on the current task, but also it can guide the agent in a future task. Experiments in a robotic navigation environment show that these techniques are effective in improving the agents learning performance, especially during the early stages of the learning process, when the agent is completely unaware of the new task.Na construção de agentes inteligentes para a solução de problemas de decisĂŁo sequenciais, o uso de aprendizado por reforço Ă© necessĂĄrio quando o agente nĂŁo possui conhecimento suficiente para construir um modelo completo do problema. Entretanto, o aprendizado de uma polĂ­tica Ăłtima Ă© em geral muito lento pois deve ser atingido atravĂ©s de tentativa-e-erro e de repetidas interaçÔes do agente com o ambiente. Umas das tĂ©cnicas para se acelerar esse processo Ă© possibilitar a transferĂȘncia de aprendizado, ou seja, utilizar o conhecimento adquirido para se resolver tarefas passadas no aprendizado de novas tarefas. Assim, se as tarefas tiverem similaridades, o conhecimento prĂ©vio guiarĂĄ o agente para um aprendizado mais rĂĄpido. Neste trabalho Ă© explorado o uso de uma representação relacional, que explicita relaçÔes entre objetos e suas propriedades. Essa representação possibilita que se explore abstração e semelhanças estruturais entre as tarefas, possibilitando a generalização de polĂ­ticas de ação para o uso em tarefas diferentes, porĂ©m relacionadas. Este trabalho contribui com dois algoritmos livres de modelo para construção online de polĂ­ticas abstratas: AbsSarsa(λ) e AbsProb-RL. O primeiro constrĂłi uma polĂ­tica abstrata determinĂ­stica atravĂ©s de funçÔes-valor, enquanto o segundo constrĂłi uma polĂ­tica abstrata estocĂĄstica atravĂ©s de busca direta no espaço de polĂ­ticas. TambĂ©m Ă© proposta a arquitetura S2L-RL para o agente, que possui dois nĂ­veis de aprendizado: o nĂ­vel abstrato e o nĂ­vel concreto. Uma polĂ­tica concreta Ă© construĂ­da simultaneamente a uma polĂ­tica abstrata, que pode ser utilizada tanto para guiar o agente no problema atual quanto para guiĂĄ-lo em um novo problema futuro. Experimentos com tarefas de navegação robĂłtica mostram que essas tĂ©cnicas sĂŁo efetivas na melhoria do desempenho do agente, principalmente nas fases inicias do aprendizado, quando o agente desconhece completamente o novo problema

    RAFT polymerization to form stimuli-responsive polymers

    Full text link

    Lenvatinib plus pembrolizumab versus lenvatinib plus placebo for advanced hepatocellular carcinoma (LEAP-002): a randomised, double-blind, phase 3 trial

    Full text link
    Background Systemic therapies have improved the management of hepatocellular carcinoma, but there is still a need to further enhance overall survival in first-line advanced stages. This study aimed to evaluate the addition of pembrolizumab to lenvatinib versus lenvatinib plus placebo in the first-line setting for unresectable hepatocellular carcinoma.Methods In this global, randomised, double-blind, phase 3 study (LEAP-002), patients aged 18 years or older with unresectable hepatocellular carcinoma, Child Pugh class A liver disease, an Eastern Cooperative Oncology Group performance status of 0 or 1, and no previous systemic treatment were enrolled at 172 global sites. Patients were randomly assigned (1:1) with a central interactive voice-response system (block size of 4) to receive lenvatinib (bodyweight <60 kg, 8 mg/day; bodyweight >= 60 kg, 12 mg/day) plus pembrolizumab (200 mg every 3 weeks) or lenvatinib plus placebo. Randomisation was stratified by geographical region, macrovascular portal vein invasion or extrahepatic spread or both, alpha-fetoprotein concentration, and Eastern Cooperative Oncology Group performance status. Dual primary endpoints were overall survival (superiority threshold at final overall survival analysis, one-sided p=0019; final analysis to occur after 532 events) and progression-free survival (superiority threshold one-sided p=0002; final analysis to occur after 571 events) in the intention-to-treat population. Results from the final analysis are reported. This study is registered with ClinicalTrials.gov, NCT03713593, and is active but not recruiting.Findings Between Jan 17, 2019, and April 28, 2020, of 1309 patients assessed, 794 were randomly assigned to lenvatinib plus pembrolizumab (n=395) or lenvatinib plus placebo (n=399). Median age was 660 years (IQR 570-720), 644 (81%) of 794 were male, 150 (19%) were female, 345 (43%) were Asian, 345 (43%) were White, 22 (3%) were multiple races, 21 (3%) were American Indian or Alaska Native, 21 (3%) were Native Hawaiian or other Pacific Islander, 13 (2%) were Black or African American, and 46 (6%) did not have available race data. Median follow up as of data cutoff for the final analysis (June 21, 2022) was 321 months (IQR 294-353). Median overall survival was 212 months (95% CI 190-236; 252 [64%] of 395 died) with lenvatinib plus pembrolizumab versus 190 months (172-217; 282 [71%] of 399 died) with lenvatinib plus placebo (hazard ratio [HR] 084; 95% CI 071-100; stratified log-rank p=0023). As of data cutoff for the progression-free survival final analysis (April 5, 2021), median progression-free survival was 82 months (95% CI 64-84; 270 events occurred [42 deaths; 228 progressions]) with lenvatinib plus pembrolizumab versus 80 months (63-82; 301 events occurred [36 deaths; 265 progressions]) with lenvatinib plus placebo (HR 087; 95% CI 073-102; stratified log-rank p=0047). The most common treatment-related grade 3-4 adverse events were hypertension (69 [17%] of 395 patients in the lenvatinib plus pembrolizumab group vs 68 [17%] of 395 patients) in the lenvatinib plus placebo group), increased aspartate aminotransferase (27 [7%] vs 17 [4%]), and diarrhoea (25 [6%] vs 15 [4%]).Treatment-related deaths occurred in four (1%) patients in the lenvatinib plus pembrolizumab group (due to gastrointestinal haemorrhage and hepatorenal syndrome [n=1 each] and hepatic encephalopathy [n=2]) and in three (1%) patients in the lenvatinib plus placebo group (due to gastrointestinal haemorrhage, hepatorenal syndrome, and cerebrovascular accident [n=1 each]).Interpretation In earlier studies, the addition of pembrolizumab to lenvatinib as first-line therapy for advanced hepatocellular carcinoma has shown promising clinical activity; however, lenvatinib plus pembrolizumab did not meet prespecified significance for improved overall survival and progression-free survival versus lenvatinib plus placebo. Our findings do not support a change in clinical practice

    Transfer Learning for Multiagent Reinforcement Learning Systems

    Full text link

    Cardiac myosin activation with omecamtiv mecarbil in systolic heart failure

    Full text link
    BACKGROUND The selective cardiac myosin activator omecamtiv mecarbil has been shown to improve cardiac function in patients with heart failure with a reduced ejection fraction. Its effect on cardiovascular outcomes is unknown. METHODS We randomly assigned 8256 patients (inpatients and outpatients) with symptomatic chronic heart failure and an ejection fraction of 35% or less to receive omecamtiv mecarbil (using pharmacokinetic-guided doses of 25 mg, 37.5 mg, or 50 mg twice daily) or placebo, in addition to standard heart-failure therapy. The primary outcome was a composite of a first heart-failure event (hospitalization or urgent visit for heart failure) or death from cardiovascular causes. RESULTS During a median of 21.8 months, a primary-outcome event occurred in 1523 of 4120 patients (37.0%) in the omecamtiv mecarbil group and in 1607 of 4112 patients (39.1%) in the placebo group (hazard ratio, 0.92; 95% confidence interval [CI], 0.86 to 0.99; P = 0.03). A total of 808 patients (19.6%) and 798 patients (19.4%), respectively, died from cardiovascular causes (hazard ratio, 1.01; 95% CI, 0.92 to 1.11). There was no significant difference between groups in the change from baseline on the Kansas City Cardiomyopathy Questionnaire total symptom score. At week 24, the change from baseline for the median N-terminal pro-B-type natriuretic peptide level was 10% lower in the omecamtiv mecarbil group than in the placebo group; the median cardiac troponin I level was 4 ng per liter higher. The frequency of cardiac ischemic and ventricular arrhythmia events was similar in the two groups. CONCLUSIONS Among patients with heart failure and a reduced ejection, those who received omecamtiv mecarbil had a lower incidence of a composite of a heart-failure event or death from cardiovascular causes than those who received placebo. (Funded by Amgen and others; GALACTIC-HF ClinicalTrials.gov number, NCT02929329; EudraCT number, 2016 -002299-28.)
    corecore