346 research outputs found

    Beyond Hebb: Exclusive-OR and Biological Learning

    Full text link
    A learning algorithm for multilayer neural networks based on biologically plausible mechanisms is studied. Motivated by findings in experimental neurobiology, we consider synaptic averaging in the induction of plasticity changes, which happen on a slower time scale than firing dynamics. This mechanism is shown to enable learning of the exclusive-OR (XOR) problem without the aid of error back-propagation, as well as to increase robustness of learning in the presence of noise.Comment: 4 pages RevTeX, 2 figures PostScript, revised versio

    Is there an integrative center in the vertebrate brain-stem? A robotic evaluation of a model of the reticular formation viewed as an action selection device

    Get PDF
    Neurobehavioral data from intact, decerebrate, and neonatal rats, suggests that the reticular formation provides a brainstem substrate for action selection in the vertebrate central nervous system. In this article, Kilmer, McCulloch and Blum’s (1969, 1997) landmark reticular formation model is described and re-evaluated, both in simulation and, for the first time, as a mobile robot controller. Particular model configurations are found to provide effective action selection mechanisms in a robot survival task using either simulated or physical robots. The model’s competence is dependent on the organization of afferents from model sensory systems, and a genetic algorithm search identified a class of afferent configurations which have long survival times. The results support our proposal that the reticular formation evolved to provide effective arbitration between innate behaviors and, with the forebrain basal ganglia, may constitute the integrative, ’centrencephalic’ core of vertebrate brain architecture. Additionally, the results demonstrate that the Kilmer et al. model provides an alternative form of robot controller to those usually considered in the adaptive behavior literature

    A two step algorithm for learning from unspecific reinforcement

    Get PDF
    We study a simple learning model based on the Hebb rule to cope with "delayed", unspecific reinforcement. In spite of the unspecific nature of the information-feedback, convergence to asymptotically perfect generalization is observed, with a rate depending, however, in a non- universal way on learning parameters. Asymptotic convergence can be as fast as that of Hebbian learning, but may be slower. Moreover, for a certain range of parameter settings, it depends on initial conditions whether the system can reach the regime of asymptotically perfect generalization, or rather approaches a stationary state of poor generalization.Comment: 13 pages LaTeX, 4 figures, note on biologically motivated stochastic variant of the algorithm adde

    Active Learning in Persistent Surveillance UAV Missions

    Get PDF
    The performance of many complex UAV decision-making problems can be extremely sensitive to small errors in the model parameters. One way of mitigating this sensitivity is by designing algorithms that more effectively learn the model throughout the course of a mission. This paper addresses this important problem by considering model uncertainty in a multi-agent Markov Decision Process (MDP) and using an active learning approach to quickly learn transition model parameters. We build on previous research that allowed UAVs to passively update model parameter estimates by incorporating new state transition observations. In this work, however, the UAVs choose to actively reduce the uncertainty in their model parameters by taking exploratory and informative actions. These actions result in a faster adaptation and, by explicitly accounting for UAV fuel dynamics, also mitigates the risk of the exploration. This paper compares the nominal, passive learning approach against two methods for incorporating active learning into the MDP framework: (1) All state transitions are rewarded equally, and (2) State transition rewards are weighted according to the expected resulting reduction in the variance of the model parameter. In both cases, agent behaviors emerge that enable faster convergence of the uncertain model parameters to their true values

    Embodied imitation-enhanced reinforcement learning in multi-agent systems

    Get PDF
    Imitation is an example of social learning in which an individual observes and copies another's actions. This paper presents a new method for using imitation as a way of enhancing the learning speed of individual agents that employ a well-known reinforcement learning algorithm, namely Q-learning. Compared with other research that uses imitation with reinforcement learning, our method uses imitation of purely observed behaviours to enhance learning, with no internal state access or sharing of experiences between agents. The paper evaluates our imitation-enhanced reinforcement learning approach in both simulation and with real robots in continuous space. Both simulation and real robot experimental results show that the learning speed of the group is improved. © The Author(s) 2013

    Algebraic Theory of Promise Constraint Satisfaction Problems, First Steps

    Full text link
    What makes a computational problem easy (e.g., in P, that is, solvable in polynomial time) or hard (e.g., NP-hard)? This fundamental question now has a satisfactory answer for a quite broad class of computational problems, so called fixed-template constraint satisfaction problems (CSPs) -- it has turned out that their complexity is captured by a certain specific form of symmetry. This paper explains an extension of this theory to a much broader class of computational problems, the promise CSPs, which includes relaxed versions of CSPs such as the problem of finding a 137-coloring of a 3-colorable graph

    Human Demonstrations for Fast and Safe Exploration in Reinforcement Learning

    Full text link
    <p>Reinforcement learning is a promising framework for controlling complex vehicles with a high level of autonomy, since it does not need a dynamic model of the vehicle, and it is able to adapt to changing conditions. When learning from scratch, the performance of a reinforcement learning controller may initially be poor and -for real life applications- unsafe. In this paper the effects of using human demonstrations on the performance of reinforcement learning is investigated, using a combination of offline and online least squares policy iteration. It is found that using the human as an efficient explorer improves learning time and performance for a benchmark reinforcement learning problem. The benefit of the human demonstration is larger for problems where the human can make use of its understanding of the problem to efficiently explore the state space. Applied to a simplified quadrotor slung load drop off problem, the use of human demonstrations reduces the number of crashes during learning. As such, this paper contributes to safer and faster learning for model-free, adaptive control problems.</p

    Major liver resection, systemic fibrinolytic activity, and the impact of tranexamic acid

    Get PDF
    The final publication is available at Elsevier via http://dx.doi.org/10.1016/j.hpb.2016.09.005 © 2016. This manuscript version is made available under the CC-BY-NC-ND 4.0 license http://creativecommons.org/licenses/by-nc-nd/4.0/Background: Hyperfibrinolysis may occur due to systemic inflammation or hepatic injury that occurs during liver resection. Tranexamic acid (TXA) is an antifibrinolytic agent that decreases bleeding in various settings, but has not been well studied in patients undergoing liver resection. Methods: In this prospective, phase II trial, 18 patients undergoing major liver resection were sequentially assigned to one of three cohorts: (i) Control (no TXA); (ii) TXA Dose I - 1 g bolus followed by 1 g infusion over 8 h; (iii) TXA Dose II - 1 g bolus followed by 10 mg/kg/hr until the end of surgery. Serial blood samples were collected for thromboelastography (TEG), coagulation components and TXA concentration. Results: No abnormalities in hemostatic function were identified on TEG. PAP complex levels increased to peak at 1106 mu g/L (normal 0-512 mu g/L) following parenchymal transection, then decreased to baseline by the morning following surgery. TXA reached stable, therapeutic concentrations early in both dosing regimens. There were no differences between patients based on TXA. Conclusions: There is no thromboelastographic evidence of hyperfibrinolysis in patients undergoing major liver resection. TXA does not influence the change in systemic fibrinolysis; it may reduce bleeding through a different mechanism of action
    • …
    corecore