38 research outputs found

    Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm

    Get PDF
    An important issue in reinforcement learning systems for autonomous agents is whether it makes sense to have separate systems for predicting rewards and punishments. In robotics, learning and control are typically achieved by a single controller, with punishments coded as negative rewards. However in biological systems, some evidence suggests that the brain has a separate system for punishment. Although this may in part be due to biological constraints of implementing negative quantities, it raises the question as to whether there is any computational rationale for keeping reward and punishment prediction operationally distinct. Here we outline a basic argument supporting this idea, based on the proposition that learning best-case predictions (as in Q-learning) does not always achieve the safest behaviour. We introduce a modified RL scheme involving a new algorithm which we call 'MaxPain' - which back-ups worst-case predictions in parallel, and then scales the two predictions in a multi-attribute RL policy. i.e. independently learning 'what to do' as well as 'what not to do' and then combining this information. We show how this scheme can improve performance in benchmark RL environments, including a grid-world experiment and a delayed version of the mountain car experiment. In particular, we demonstrate how early exploration and learning are substantially improved, leading to much 'safer' behaviour. In conclusion, the results illustrate the importance of independent punishment prediction in RL, and provide a testable framework for better understanding punishment (such as pain) and avoidance in humans, in both health and disease

    Evidence for a bimodal distribution of Escherichia coli doubling times below a threshold initial cell concentration

    Get PDF
    Abstract Background In the process of developing a microplate-based growth assay, we discovered that our test organism, a native E. coli isolate, displayed very uniform doubling times (τ) only up to a certain threshold cell density. Below this cell concentration (≤ 100 -1,000 CFU mL-1 ; ≤ 27-270 CFU well-1) we observed an obvious increase in the τ scatter. Results Working with a food-borne E. coli isolate we found that τ values derived from two different microtiter platereader-based techniques (i.e., optical density with growth time {=OD[t]} fit to the sigmoidal Boltzmann equation or time to calculated 1/2-maximal OD {=tm} as a function of initial cell density {=tm[CI]}) were in excellent agreement with the same parameter acquired from total aerobic plate counting. Thus, using either Luria-Bertani (LB) or defined (MM) media at 37°C, τ ranged between 17-18 (LB) or 51-54 (MM) min. Making use of such OD[t] data we collected many observations of τ as a function of manifold initial or starting cell concentrations (CI). We noticed that τ appeared to be distributed in two populations (bimodal) at low CI. When CI ≤100 CFU mL-1 (stationary phase cells in LB), we found that about 48% of the observed τ values were normally distributed around a mean (μτ1) of 18 ± 0.68 min (± στ1) and 52% with μτ2 = 20 ± 2.5 min (n = 479). However, at higher starting cell densities (CI>100 CFU mL-1), the τ values were distributed unimodally (μτ = 18 ± 0.71 min; n = 174). Inclusion of a small amount of ethyl acetate to the LB caused a collapse of the bimodal to a unimodal form. Comparable bimodal τ distribution results were also observed using E. coli cells diluted from mid-log phase cultures. Similar results were also obtained when using either an E. coli O157:H7 or a Citrobacter strain. When sterile-filtered LB supernatants, which formerly contained relatively low concentrations of bacteria(1,000-10,000 CFU mL-1), were employed as a diluent, there was an evident shift of the two populations towards each other but the bimodal effect was still apparent using either stationary or log phase cells. Conclusion These data argue that there is a dependence of growth rate on starting cell density.</p

    Modelling interactions of acid–base balance and respiratory status in the toxicity of metal mixtures in the American oyster Crassostrea virginica

    Get PDF
    Author Posting. © The Author(s), 2009. This is the author's version of the work. It is posted here by permission of Elsevier B.V. for personal use, not for redistribution. The definitive version was published in Comparative Biochemistry and Physiology - Part A: Molecular & Integrative Physiology 155 (2010): 341-349, doi:10.1016/j.cbpa.2009.11.019.Heavy metals, such as copper, zinc and cadmium, represent some of the most common and serious pollutants in coastal estuaries. In the present study, we used a combination of linear and artificial neural network (ANN) modelling to detect and explore interactions among low-dose mixtures of these heavy metals and their impacts on fundamental physiological processes in tissues of the Eastern oyster, Crassostrea virginica. Animals were exposed to Cd (0.001 – 0.400 μM), Zn (0.001 – 3.059 μM) or Cu (0.002 – 0.787 μM), either alone or in combination for 1 to 27 days. We measured indicators of acid-base balance (hemolymph pH and total CO2), gas exchange (Po2), immunocompetence (total hemocyte counts, numbers of invasive bacteria), antioxidant status (glutathione, GSH), oxidative damage (lipid peroxidation; LPx), and metal accumulation in the gill and the hepatopancreas. Linear analysis showed that oxidative membrane damage from tissue accumulation of environmental metals was correlated with impaired acid-base balance in oysters. ANN analysis revealed interactions of metals with hemolymph acid-base chemistry in predicting oxidative damage that were not evident from linear analyses. These results highlight the usefulness of machine learning approaches, such as ANNs, for improving our ability to recognize and understand the effects of sub-acute exposure to contaminant mixtures.This study was supported by NOAA’s Center of Excellence in Oceans and Human Health at HML and the National Science Foundation

    The behaviour of giant clams (Bivalvia: Cardiidae: Tridacninae)

    Get PDF

    Cardio-respiratory development in bird embryos: new insights from a venerable animal model

    Full text link

    Parallel reward and punishment control in humans and robots: Safe reinforcement learning using the MaxPain algorithm

    No full text
    An important issue in reinforcement learning systems for autonomous agents is whether it makes sense to have separate systems for predicting rewards and punishments. In robotics, learning and control are typically achieved by a single controller, with punishments coded as negative rewards. However in biological systems, some evidence suggests that the brain has a separate system for punishment. Although this may in part be due to biological constraints of implementing negative quantities, it raises the question as to whether there is any computational rationale for keeping reward and punishment prediction operationally distinct. Here we outline a basic argument supporting this idea, based on the proposition that learning best-case predictions (as in Q-learning) does not always achieve the safest behaviour. We introduce a modified RL scheme involving a new algorithm which we call 'MaxPain' - which back-ups worst-case predictions in parallel, and then scales the two predictions in a multiattribute RL policy. i.e. independently learning 'what to do' as well as 'what not to do' and then combining this information. We show how this scheme can improve performance in benchmark RL environments, including a grid-world experiment and delayed version of the mountain car experiment. In particular, we demonstrate how early exploration and learning are substantially improved, leading to much 'safer' behaviour. In conclusion, the results illustrate the importance of independent punishment prediction in RL, and provide a testable framework for better understanding punishment (such as pain) and avoidance in humans, in both health and disease

    Multi-Task Reinforcement Learning: Shaping and Feature Selection

    Get PDF
    Abstract. Shaping functions can be used in multi-task reinforcement learning (RL) to incorporate knowledge from previously experienced source tasks to speed up learning on a new target task. Earlier work has not clearly motivated choices for the shaping function. This paper discusses and empirically compares several alternatives, and demonstrates that the most intuive one may not always be the best option. In addition, we extend previous work on identifying good representations for the value and shaping functions, and show that selecting the right representation results in improved generalization over tasks.

    Predictive uncertainty estimation for out-of-distribution detection in digital pathology.

    No full text
    Machine learning model deployment in clinical practice demands real-time risk assessment to identify situations in which the model is uncertain. Once deployed, models should be accurate for classes seen during training while providing informative estimates of uncertainty to flag abnormalities and unseen classes for further analysis. Although recent developments in uncertainty estimation have resulted in an increasing number of methods, a rigorous empirical evaluation of their performance on large-scale digital pathology datasets is lacking. This work provides a benchmark for evaluating prevalent methods on multiple datasets by comparing the uncertainty estimates on both in-distribution and realistic near and far out-of-distribution (OOD) data on a whole-slide level. To this end, we aggregate uncertainty values from patch-based classifiers to whole-slide level uncertainty scores. We show that results found in classical computer vision benchmarks do not always translate to the medical imaging setting. Specifically, we demonstrate that deep ensembles perform best at detecting far-OOD data but can be outperformed on a more challenging near-OOD detection task by multi-head ensembles trained for optimal ensemble diversity. Furthermore, we demonstrate the harmful impact OOD data can have on the performance of deployed machine learning models. Overall, we show that uncertainty estimates can be used to discriminate in-distribution from OOD data with high AUC scores. Still, model deployment might require careful tuning based on prior knowledge of prospective OOD data

    Emergence of Different Mating Strategies in Artificial Embodied Evolution

    No full text
    corecore