6 research outputs found

    Latent Exploration for Reinforcement Learning

    Full text link
    In Reinforcement Learning, agents learn policies by exploring and interacting with the environment. Due to the curse of dimensionality, learning policies that map high-dimensional sensory input to motor output is particularly challenging. During training, state of the art methods (SAC, PPO, etc.) explore the environment by perturbing the actuation with independent Gaussian noise. While this unstructured exploration has proven successful in numerous tasks, it ought to be suboptimal for overactuated systems. When multiple actuators, such as motors or muscles, drive behavior, uncorrelated perturbations risk diminishing each other's effect, or modifying the behavior in a task-irrelevant way. While solutions to introduce time correlation across action perturbations exist, introducing correlation across actuators has been largely ignored. Here, we propose LATent TIme-Correlated Exploration (Lattice), a method to inject temporally-correlated noise into the latent state of the policy network, which can be seamlessly integrated with on- and off-policy algorithms. We demonstrate that the noisy actions generated by perturbing the network's activations can be modeled as a multivariate Gaussian distribution with a full covariance matrix. In the PyBullet locomotion tasks, Lattice-SAC achieves state of the art results, and reaches 18% higher reward than unstructured exploration in the Humanoid environment. In the musculoskeletal control environments of MyoSuite, Lattice-PPO achieves higher reward in most reaching and object manipulation tasks, while also finding more energy-efficient policies with reductions of 20-60%. Overall, we demonstrate the effectiveness of structured action noise in time and actuator space for complex motor control tasks.Comment: Code available at https://github.com/amathislab/lattic

    Inferring Symptom State of Generalized Anxiety Disorder: A Bayesian Network Approach

    No full text
    Instead of viewing psychiatric disorders as latent causes that lead to observable symptoms, a network view of psychiatric disorders argues that each disorder can be regarded as a complex network of interacting symptoms. Such a network view of psychiatric disorders enables the analysis of the inter-dependencies between individual symptoms. Here, I modeled a set of binary symptoms in Generalized Anxiety Disorder (GAD) as a Bayesian network and performed Belief Propagation on this symptom network to infer the potential states of unobserved symptom variables. In the learned symptom network, the interactions between GAD symptoms were directly supported by empirical investigation of the co-occurrences or causal relations between them. The symptom network enabled one to infer the state of unobserved symptom variables given partial observation. Furthermore, predicting symptom states on the Bayesian network out-performed state-of-the-art machine learning methods that did not explicitly model the interdependencies between symptom variables. Together, this study proposes a novel and reliable approach for measuring the risk of certain GAD symptoms for a patient by inferring the likelihood of developing the symptoms of interest on the Bayesian symptom network. The learned symptom network also predicts novel interdependencies between symptoms that can be verified in future empirical research. The Bayesian network model of GAD provides a potential mechanistic account underlying the co-occurrence of symptoms in GAD

    Action chunking as conditional policy compression

    No full text
    Many skills in our everyday lives are learned by sequencing actions towards a desired goal. The action sequence can become a ``chunk'' when individual actions are grouped together and executed as one unit, making them more efficient to store and execute. While chunking has been studied extensively across various domains, a puzzle remains as to why and under what conditions action chunking occurs. To tackle these questions, we develop a model of conditional policy compression—the reduction in cognitive cost by conditioning on an additional source of information—to explain the origin of chunking. We argue that chunking is a result of optimizing the trade-off between reward and conditional policy complexity. Chunking compresses policies when there is temporal structure in the environment that can be leveraged for action selection, reducing the amount of memory necessary to encode the policy. We experimentally confirm our model's predictions, showing that chunking reduces conditional policy complexity and reaction times. Chunking also increases with working memory load, consistent with the hypothesis that the degree of policy compression scales with the scarcity of cognitive resources. Finally, chunking also reduces overall working memory load, freeing cognitive resources for the benefit of other, not-chunked information

    Health-status outcomes with invasive or conservative care in coronary disease

    No full text
    BACKGROUND In the ISCHEMIA trial, an invasive strategy with angiographic assessment and revascularization did not reduce clinical events among patients with stable ischemic heart disease and moderate or severe ischemia. A secondary objective of the trial was to assess angina-related health status among these patients. METHODS We assessed angina-related symptoms, function, and quality of life with the Seattle Angina Questionnaire (SAQ) at randomization, at months 1.5, 3, and 6, and every 6 months thereafter in participants who had been randomly assigned to an invasive treatment strategy (2295 participants) or a conservative strategy (2322). Mixed-effects cumulative probability models within a Bayesian framework were used to estimate differences between the treatment groups. The primary outcome of this health-status analysis was the SAQ summary score (scores range from 0 to 100, with higher scores indicating better health status). All analyses were performed in the overall population and according to baseline angina frequency. RESULTS At baseline, 35% of patients reported having no angina in the previous month. SAQ summary scores increased in both treatment groups, with increases at 3, 12, and 36 months that were 4.1 points (95% credible interval, 3.2 to 5.0), 4.2 points (95% credible interval, 3.3 to 5.1), and 2.9 points (95% credible interval, 2.2 to 3.7) higher with the invasive strategy than with the conservative strategy. Differences were larger among participants who had more frequent angina at baseline (8.5 vs. 0.1 points at 3 months and 5.3 vs. 1.2 points at 36 months among participants with daily or weekly angina as compared with no angina). CONCLUSIONS In the overall trial population with moderate or severe ischemia, which included 35% of participants without angina at baseline, patients randomly assigned to the invasive strategy had greater improvement in angina-related health status than those assigned to the conservative strategy. The modest mean differences favoring the invasive strategy in the overall group reflected minimal differences among asymptomatic patients and larger differences among patients who had had angina at baseline

    Initial invasive or conservative strategy for stable coronary disease

    No full text
    BACKGROUND Among patients with stable coronary disease and moderate or severe ischemia, whether clinical outcomes are better in those who receive an invasive intervention plus medical therapy than in those who receive medical therapy alone is uncertain. METHODS We randomly assigned 5179 patients with moderate or severe ischemia to an initial invasive strategy (angiography and revascularization when feasible) and medical therapy or to an initial conservative strategy of medical therapy alone and angiography if medical therapy failed. The primary outcome was a composite of death from cardiovascular causes, myocardial infarction, or hospitalization for unstable angina, heart failure, or resuscitated cardiac arrest. A key secondary outcome was death from cardiovascular causes or myocardial infarction. RESULTS Over a median of 3.2 years, 318 primary outcome events occurred in the invasive-strategy group and 352 occurred in the conservative-strategy group. At 6 months, the cumulative event rate was 5.3% in the invasive-strategy group and 3.4% in the conservative-strategy group (difference, 1.9 percentage points; 95% confidence interval [CI], 0.8 to 3.0); at 5 years, the cumulative event rate was 16.4% and 18.2%, respectively (difference, 121.8 percentage points; 95% CI, 124.7 to 1.0). Results were similar with respect to the key secondary outcome. The incidence of the primary outcome was sensitive to the definition of myocardial infarction; a secondary analysis yielded more procedural myocardial infarctions of uncertain clinical importance. There were 145 deaths in the invasive-strategy group and 144 deaths in the conservative-strategy group (hazard ratio, 1.05; 95% CI, 0.83 to 1.32). CONCLUSIONS Among patients with stable coronary disease and moderate or severe ischemia, we did not find evidence that an initial invasive strategy, as compared with an initial conservative strategy, reduced the risk of ischemic cardiovascular events or death from any cause over a median of 3.2 years. The trial findings were sensitive to the definition of myocardial infarction that was used

    Management of coronary disease in patients with advanced kidney disease

    No full text
    BACKGROUND Clinical trials that have assessed the effect of revascularization in patients with stable coronary disease have routinely excluded those with advanced chronic kidney disease. METHODS We randomly assigned 777 patients with advanced kidney disease and moderate or severe ischemia on stress testing to be treated with an initial invasive strategy consisting of coronary angiography and revascularization (if appropriate) added to medical therapy or an initial conservative strategy consisting of medical therapy alone and angiography reserved for those in whom medical therapy had failed. The primary outcome was a composite of death or nonfatal myocardial infarction. A key secondary outcome was a composite of death, nonfatal myocardial infarction, or hospitalization for unstable angina, heart failure, or resuscitated cardiac arrest. RESULTS At a median follow-up of 2.2 years, a primary outcome event had occurred in 123 patients in the invasive-strategy group and in 129 patients in the conservative-strategy group (estimated 3-year event rate, 36.4% vs. 36.7%; adjusted hazard ratio, 1.01; 95% confidence interval [CI], 0.79 to 1.29; P=0.95). Results for the key secondary outcome were similar (38.5% vs. 39.7%; hazard ratio, 1.01; 95% CI, 0.79 to 1.29). The invasive strategy was associated with a higher incidence of stroke than the conservative strategy (hazard ratio, 3.76; 95% CI, 1.52 to 9.32; P=0.004) and with a higher incidence of death or initiation of dialysis (hazard ratio, 1.48; 95% CI, 1.04 to 2.11; P=0.03). CONCLUSIONS Among patients with stable coronary disease, advanced chronic kidney disease, and moderate or severe ischemia, we did not find evidence that an initial invasive strategy, as compared with an initial conservative strategy, reduced the risk of death or nonfatal myocardial infarction
    corecore