3,476 research outputs found
Evolutionary Algorithms for Reinforcement Learning
There are two distinct approaches to solving reinforcement learning problems,
namely, searching in value function space and searching in policy space.
Temporal difference methods and evolutionary algorithms are well-known examples
of these approaches. Kaelbling, Littman and Moore recently provided an
informative survey of temporal difference methods. This article focuses on the
application of evolutionary algorithms to the reinforcement learning problem,
emphasizing alternative policy representations, credit assignment methods, and
problem-specific genetic operators. Strengths and weaknesses of the
evolutionary approach to reinforcement learning are presented, along with a
survey of representative applications
Sequential Decision-Making for Drug Design: Towards closed-loop drug design
Drug design is a process of trial and error to design molecules with a desired response toward a biological target, with the ultimate goal of finding a new medication. It is estimated to be up to 10^{60} molecules that are of potential interest as drugs, making it a difficult problem to find suitable molecules. A crucial part of drug design is to design and determine what molecules should be experimentally tested, to determine their activity toward the biological target. To experimentally test the properties of a molecule, it has to be successfully made, often requiring a sequence of reactions to obtain the desired product. Machine learning can be utilized to predict the outcome of a reaction, helping to find successful reactions, but requires data for the reaction type of interest. This thesis presents a work that combinatorially investigates the use of active learning to acquire training data for reaching a certain level of predictive ability in predicting whether a reaction is successful or not. However, only a limited number of molecules can often be synthesized every time. Therefore, another line of work in this thesis investigates which designed molecules should be experimentally tested, given a budget of experiments, to sequentially acquire new knowledge. This is formulated as a multi-armed bandit problem and we propose an algorithm to solve this problem. To suggest potential drug molecules to choose from, recent advances in machine learning have also enabled the use of generative models to design novel molecules with certain predicted properties. Previous work has formulated this as a reinforcement learning problem with success in designing and optimizing molecules with drug-like properties. This thesis presents a systematic comparison of different reinforcement learning algorithms for string-based generation of drug molecules. This includes a study of different ways of learning from previous and current batches of samples during the iterative generation
Transient Reward Approximation for Continuous-Time Markov Chains
We are interested in the analysis of very large continuous-time Markov chains
(CTMCs) with many distinct rates. Such models arise naturally in the context of
reliability analysis, e.g., of computer network performability analysis, of
power grids, of computer virus vulnerability, and in the study of crowd
dynamics. We use abstraction techniques together with novel algorithms for the
computation of bounds on the expected final and accumulated rewards in
continuous-time Markov decision processes (CTMDPs). These ingredients are
combined in a partly symbolic and partly explicit (symblicit) analysis
approach. In particular, we circumvent the use of multi-terminal decision
diagrams, because the latter do not work well if facing a large number of
different rates. We demonstrate the practical applicability and efficiency of
the approach on two case studies.Comment: Accepted for publication in IEEE Transactions on Reliabilit
Selectionist and Evolutionary Approaches to Brain Function: A Critical Appraisal
We consider approaches to brain dynamics and function that have been claimed to be Darwinian. These include Edelman’s theory of neuronal group selection, Changeux’s theory of synaptic selection and selective stabilization of pre-representations, Seung’s Darwinian synapse, Loewenstein’s synaptic melioration, Adam’s selfish synapse, and Calvin’s replicating activity patterns. Except for the last two, the proposed mechanisms are selectionist but not truly Darwinian, because no replicators with information transfer to copies and hereditary variation can be identified in them. All of them fit, however, a generalized selectionist framework conforming to the picture of Price’s covariance formulation, which deliberately was not specific even to selection in biology, and therefore does not imply an algorithmic picture of biological evolution. Bayesian models and reinforcement learning are formally in agreement with selection dynamics. A classification of search algorithms is shown to include Darwinian replicators (evolutionary units with multiplication, heredity, and variability) as the most powerful mechanism for search in a sparsely occupied search space. Examples are given of cases where parallel competitive search with information transfer among the units is more efficient than search without information transfer between units. Finally, we review our recent attempts to construct and analyze simple models of true Darwinian evolutionary units in the brain in terms of connectivity and activity copying of neuronal groups. Although none of the proposed neuronal replicators include miraculous mechanisms, their identification remains a challenge but also a great promise
- …