288,155 research outputs found

    Cooperation in the iterated prisoner's dilemma is learned by operant conditioning mechanisms

    Get PDF
    The prisoner's dilemma (PD) is the leading metaphor for the evolution of cooperative behavior in populations of selfish agents. Although cooperation in the iterated prisoner's dilemma (IPD) has been studied for over twenty years, most of this research has been focused on strategies that involve nonlearned behavior. Another approach is to suppose that players' selection of the preferred reply might he enforced in the same way as feeding animals track the best way to feed in changing nonstationary environments. Learning mechanisms such as operant conditioning enable animals to acquire relevant characteristics of their environment in order to get reinforcements and to avoid punishments. In this study, the role of operant conditioning in the learning of cooperation was evaluated in the PD. We found that operant mechanisms allow the learning of IPD play against other strategies. When random moves are allowed in the game, the operant learning model showed low sensitivity. On the basis of this evidence, it is suggested that operant learning might be involved in reciprocal altruism.Fil: Gutnisky, D. A.. Universidad de Buenos Aires. Facultad de Ingenieria. Instituto de Ingeniería Biomédica; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; ArgentinaFil: Zanutto, Bonifacio Silvano. Consejo Nacional de Investigaciones Científicas y Técnicas. Instituto de Biología y Medicina Experimental. Fundación de Instituto de Biología y Medicina Experimental. Instituto de Biología y Medicina Experimental; Argentina. Universidad de Buenos Aires. Facultad de Ingenieria. Instituto de Ingeniería Biomédica; Argentin

    A step towards a reinforcement learning de novo genome assembler

    Full text link
    The use of reinforcement learning has proven to be very promising for solving complex activities without human supervision during their learning process. However, their successful applications are predominantly focused on fictional and entertainment problems - such as games. Based on the above, this work aims to shed light on the application of reinforcement learning to solve this relevant real-world problem, the genome assembly. By expanding the only approach found in the literature that addresses this problem, we carefully explored the aspects of intelligent agent learning, performed by the Q-learning algorithm, to understand its suitability to be applied in scenarios whose characteristics are more similar to those faced by real genome projects. The improvements proposed here include changing the previously proposed reward system and including state space exploration optimization strategies based on dynamic pruning and mutual collaboration with evolutionary computing. These investigations were tried on 23 new environments with larger inputs than those used previously. All these environments are freely available on the internet for the evolution of this research by the scientific community. The results suggest consistent performance progress using the proposed improvements, however, they also demonstrate the limitations of them, especially related to the high dimensionality of state and action spaces. We also present, later, the paths that can be traced to tackle genome assembly efficiently in real scenarios considering recent, successfully reinforcement learning applications - including deep reinforcement learning - from other domains dealing with high-dimensional inputs

    Social learning strategies modify the effect of network structure on group performance

    Full text link
    The structure of communication networks is an important determinant of the capacity of teams, organizations and societies to solve policy, business and science problems. Yet, previous studies reached contradictory results about the relationship between network structure and performance, finding support for the superiority of both well-connected efficient and poorly connected inefficient network structures. Here we argue that understanding how communication networks affect group performance requires taking into consideration the social learning strategies of individual team members. We show that efficient networks outperform inefficient networks when individuals rely on conformity by copying the most frequent solution among their contacts. However, inefficient networks are superior when individuals follow the best member by copying the group member with the highest payoff. In addition, groups relying on conformity based on a small sample of others excel at complex tasks, while groups following the best member achieve greatest performance for simple tasks. Our findings reconcile contradictory results in the literature and have broad implications for the study of social learning across disciplines

    Adaptive Investment Strategies For Periodic Environments

    Full text link
    In this paper, we present an adaptive investment strategy for environments with periodic returns on investment. In our approach, we consider an investment model where the agent decides at every time step the proportion of wealth to invest in a risky asset, keeping the rest of the budget in a risk-free asset. Every investment is evaluated in the market via a stylized return on investment function (RoI), which is modeled by a stochastic process with unknown periodicities and levels of noise. For comparison reasons, we present two reference strategies which represent the case of agents with zero-knowledge and complete-knowledge of the dynamics of the returns. We consider also an investment strategy based on technical analysis to forecast the next return by fitting a trend line to previous received returns. To account for the performance of the different strategies, we perform some computer experiments to calculate the average budget that can be obtained with them over a certain number of time steps. To assure for fair comparisons, we first tune the parameters of each strategy. Afterwards, we compare the performance of these strategies for RoIs with different periodicities and levels of noise.Comment: Paper submitted to Advances in Complex Systems (November, 2007) 22 pages, 9 figure

    Evolving Inborn Knowledge For Fast Adaptation in Dynamic POMDP Problems

    Full text link
    Rapid online adaptation to changing tasks is an important problem in machine learning and, recently, a focus of meta-reinforcement learning. However, reinforcement learning (RL) algorithms struggle in POMDP environments because the state of the system, essential in a RL framework, is not always visible. Additionally, hand-designed meta-RL architectures may not include suitable computational structures for specific learning problems. The evolution of online learning mechanisms, on the contrary, has the ability to incorporate learning strategies into an agent that can (i) evolve memory when required and (ii) optimize adaptation speed to specific online learning problems. In this paper, we exploit the highly adaptive nature of neuromodulated neural networks to evolve a controller that uses the latent space of an autoencoder in a POMDP. The analysis of the evolved networks reveals the ability of the proposed algorithm to acquire inborn knowledge in a variety of aspects such as the detection of cues that reveal implicit rewards, and the ability to evolve location neurons that help with navigation. The integration of inborn knowledge and online plasticity enabled fast adaptation and better performance in comparison to some non-evolutionary meta-reinforcement learning algorithms. The algorithm proved also to succeed in the 3D gaming environment Malmo Minecraft.Comment: 9 pages. Accepted as a full paper in the Genetic and Evolutionary Computation Conference (GECCO 2020

    Final report of work-with-IT: the JISC study into evolution of working practices

    Get PDF
    Technology is increasingly being used to underpin business processes across teaching and learning, research, knowledge exchange and business support activities in both HE and FE. The introduction of technology has a significant impact on the working practices of staff, often requiring them to work in a radically different way. Change in any situation can be unsettling and problematic and, where not effectively managed, can lead to poor service or functionality and disenfranchised staff. These issues can have a direct impact on institutional effectiveness, reputation and the resulting student experience. The Work-with-IT project, based at the University of Strathclyde, sought to examine changes to working practices across HE and FE, the impact on staff roles and relationships and the new skills sets that are required to meet these changes

    Hyper-learning for population-based incremental learning in dynamic environments

    Get PDF
    This article is posted here here with permission from IEEE - Copyright @ 2009 IEEEThe population-based incremental learning (PBIL) algorithm is a combination of evolutionary optimization and competitive learning. Recently, the PBIL algorithm has been applied for dynamic optimization problems. This paper investigates the effect of the learning rate, which is a key parameter of PBIL, on the performance of PBIL in dynamic environments. A hyper-learning scheme is proposed for PBIL, where the learning rate is temporarily raised whenever the environment changes. The hyper-learning scheme can be combined with other approaches, e.g., the restart and hypermutation schemes, for PBIL in dynamic environments. Based on a series of dynamic test problems, experiments are carried out to investigate the effect of different learning rates and the proposed hyper-learning scheme in combination with restart and hypermutation schemes on the performance of PBIL. The experimental results show that the learning rate has a significant impact on the performance of the PBIL algorithm in dynamic environments and that the effect of the proposed hyper-learning scheme depends on the environmental dynamics and other schemes combined in the PBIL algorithm.The work by Shengxiang Yang was supported by the Engineering and Physical Sciences Research Council (EPSRC) of the United Kingdom under Grant EP/E060722/1
    • …
    corecore