11 research outputs found

    Modelling collective learning in design

    Get PDF
    In this paper, a model of collective learning in design is developed in the context of team design. It explains that a team design activity uses input knowledge, environmental information, and design goals to produce output knowledge. A collective learning activity uses input knowledge from different agents and produces learned knowledge with the process of knowledge acquisition and transformation between different agents, which may be triggered by learning goals and rationale triggers. Different forms of collective learning were observed with respect to agent interactions, goal(s) of learning, and involvement of an agent. Three types of links between team design and collective learning were identified, namely teleological, rationale, and epistemic. Hypotheses of collective learning are made based upon existing theories and models in design and learning, which were tested using a protocol analysis approach. The model of collective learning in design is derived from the test results. The proposed model can be used as a basis to develop agent-based learning systems in design. In the future, collective learning between design teams, the links between collective learning and creativity, and computational support for collective learning can be investigated

    Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study

    Full text link
    In this work we theoretically and experimentally analyze Multi-Agent Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critic (IA2C), two recently proposed multi-agent reinforcement learning methods that can be applied to control traffic signals in urban areas. The two methods differ in their use of a reward calculated locally or globally and in the management of agents' communication. We analyze the methods theoretically with the framework provided by non-Markov decision processes, which provides useful insights in the analysis of the algorithms. Moreover, we analyze the efficacy and the robustness of the methods experimentally by testing them in two traffic areas in the Bologna (Italy) area, simulated by SUMO, a software tool. The experimental results indicate that MA2C achieves the best performance in the majority of cases, outperforms the alternative method considered, and displays sufficient stability during the learning process.Comment: 41 pages, 16 figure

    Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian

    Full text link
    Offline reinforcement learning (RL), which refers to decision-making from a previously-collected dataset of interactions, has received significant attention over the past years. Much effort has focused on improving offline RL practicality by addressing the prevalent issue of partial data coverage through various forms of conservative policy learning. While the majority of algorithms do not have finite-sample guarantees, several provable conservative offline RL algorithms are designed and analyzed within the single-policy concentrability framework that handles partial coverage. Yet, in the nonlinear function approximation setting where confidence intervals are difficult to obtain, existing provable algorithms suffer from computational intractability, prohibitively strong assumptions, and suboptimal statistical rates. In this paper, we leverage the marginalized importance sampling (MIS) formulation of RL and present the first set of offline RL algorithms that are statistically optimal and practical under general function approximation and single-policy concentrability, bypassing the need for uncertainty quantification. We identify that the key to successfully solving the sample-based approximation of the MIS problem is ensuring that certain occupancy validity constraints are nearly satisfied. We enforce these constraints by a novel application of the augmented Lagrangian method and prove the following result: with the MIS formulation, augmented Lagrangian is enough for statistically optimal offline RL. In stark contrast to prior algorithms that induce additional conservatism through methods such as behavior regularization, our approach provably eliminates this need and reinterprets regularizers as "enforcers of occupancy validity" than "promoters of conservatism."Comment: 49 pages, 1 figur

    Approximate universal artificial intelligence and self-play learning for games

    Full text link
    This thesis is split into two independent parts. The first is an investigation of some practical aspects of Marcus Hutter's Universal Artificial Intelligence theory. The main contributions are to show how a very general agent can be built and analysed using the mathematical tools of this theory. Before the work presented in this thesis, it was an open question as to whether this theory was of any relevance to reinforcement learning practitioners. This work suggests that it is indeed relevant and worthy of future investigation. The second part of this thesis looks at self-play learning in two player, deterministic, adversarial turn-based games. The main contribution is the introduction of a new technique for training the weights of a heuristic evaluation function from data collected by classical game tree search algorithms. This method is shown to outperform previous self-play training routines based on Temporal Difference learning when applied to the game of Chess. In particular, the main highlight was using this technique to construct a Chess program that learnt to play master level Chess by tuning a set of initially random weights from self play games

    Design agents that learn

    No full text

    Learning Actions and Action Verbs from Human-Agent Interaction

    No full text
    The goal of my research is to design agents that learn from human-agent interaction. Specifically, I am interested in acquisition of procedural, conceptual and linguistic knowledge related to novel actions from human-agent collaborative task execution

    Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent

    No full text
    Abstract — The ability for people to interact with robots and teach them new skills will be crucial to the successful application of robots in everyday human environments. In order to design agents that learn efficiently and effectively from their instruction, it is important to understand how people, that are not experts in Machine Learning or robotics, will try to teach social robots. In prior work we have shown that human trainers use positive and negative feedback differentially when interacting with a Reinforcement Learning agent. In this paper we present experiments and implementations on two platforms, a robotic and a computer game platform, that explore the multiple communicative intents of positive and negative feedback from a human partner, in particular that negative feedback is both about the past and about intentions for future action. I
    corecore