11 research outputs found
Modelling collective learning in design
In this paper, a model of collective learning in design is developed in the context of team design. It explains that a team design activity uses input knowledge, environmental information, and design goals to produce output knowledge. A collective learning activity uses input knowledge from different agents and produces learned knowledge with the process of knowledge acquisition and transformation between different agents, which may be triggered by learning goals and rationale triggers. Different forms of collective learning were observed with respect to agent interactions, goal(s) of learning, and involvement of an agent. Three types of links between team design and collective learning were identified, namely teleological, rationale, and epistemic. Hypotheses of collective learning are made based upon existing theories and models in design and learning, which were tested using a protocol analysis approach. The model of collective learning in design is derived from the test results. The proposed model can be used as a basis to develop agent-based learning systems in design. In the future, collective learning between design teams, the links between collective learning and creativity, and computational support for collective learning can be investigated
Traffic Signal Control with Communicative Deep Reinforcement Learning Agents: a Case Study
In this work we theoretically and experimentally analyze Multi-Agent
Advantage Actor-Critic (MA2C) and Independent Advantage Actor-Critic (IA2C),
two recently proposed multi-agent reinforcement learning methods that can be
applied to control traffic signals in urban areas. The two methods differ in
their use of a reward calculated locally or globally and in the management of
agents' communication. We analyze the methods theoretically with the framework
provided by non-Markov decision processes, which provides useful insights in
the analysis of the algorithms. Moreover, we analyze the efficacy and the
robustness of the methods experimentally by testing them in two traffic areas
in the Bologna (Italy) area, simulated by SUMO, a software tool. The
experimental results indicate that MA2C achieves the best performance in the
majority of cases, outperforms the alternative method considered, and displays
sufficient stability during the learning process.Comment: 41 pages, 16 figure
Optimal Conservative Offline RL with General Function Approximation via Augmented Lagrangian
Offline reinforcement learning (RL), which refers to decision-making from a
previously-collected dataset of interactions, has received significant
attention over the past years. Much effort has focused on improving offline RL
practicality by addressing the prevalent issue of partial data coverage through
various forms of conservative policy learning. While the majority of algorithms
do not have finite-sample guarantees, several provable conservative offline RL
algorithms are designed and analyzed within the single-policy concentrability
framework that handles partial coverage. Yet, in the nonlinear function
approximation setting where confidence intervals are difficult to obtain,
existing provable algorithms suffer from computational intractability,
prohibitively strong assumptions, and suboptimal statistical rates. In this
paper, we leverage the marginalized importance sampling (MIS) formulation of RL
and present the first set of offline RL algorithms that are statistically
optimal and practical under general function approximation and single-policy
concentrability, bypassing the need for uncertainty quantification. We identify
that the key to successfully solving the sample-based approximation of the MIS
problem is ensuring that certain occupancy validity constraints are nearly
satisfied. We enforce these constraints by a novel application of the augmented
Lagrangian method and prove the following result: with the MIS formulation,
augmented Lagrangian is enough for statistically optimal offline RL. In stark
contrast to prior algorithms that induce additional conservatism through
methods such as behavior regularization, our approach provably eliminates this
need and reinterprets regularizers as "enforcers of occupancy validity" than
"promoters of conservatism."Comment: 49 pages, 1 figur
Approximate universal artificial intelligence and self-play learning for games
This thesis is split into two independent parts.
The first is an investigation of some practical aspects of Marcus Hutter's Universal Artificial Intelligence theory.
The main contributions are to show how a very general agent can be built and analysed using the mathematical tools of this theory.
Before the work presented in this thesis, it was an open question as to whether this theory was of any relevance to reinforcement learning practitioners.
This work suggests that it is indeed relevant and worthy of future investigation.
The second part of this thesis looks at self-play learning in two player, deterministic, adversarial turn-based games.
The main contribution is the introduction of a new technique for training the weights of a heuristic evaluation function from data collected by classical game tree search algorithms.
This method is shown to outperform previous self-play training routines based on Temporal Difference learning when applied to the game of Chess.
In particular, the main highlight was using this technique to construct a Chess program that learnt to play master level Chess by tuning a set of initially random weights from self play games
Learning Actions and Action Verbs from Human-Agent Interaction
The goal of my research is to design agents that learn from human-agent interaction. Specifically, I am interested in acquisition of procedural, conceptual and linguistic knowledge related to novel actions from human-agent collaborative task execution
Asymmetric Interpretations of Positive and Negative Human Feedback for a Social Learning Agent
Abstract — The ability for people to interact with robots and teach them new skills will be crucial to the successful application of robots in everyday human environments. In order to design agents that learn efficiently and effectively from their instruction, it is important to understand how people, that are not experts in Machine Learning or robotics, will try to teach social robots. In prior work we have shown that human trainers use positive and negative feedback differentially when interacting with a Reinforcement Learning agent. In this paper we present experiments and implementations on two platforms, a robotic and a computer game platform, that explore the multiple communicative intents of positive and negative feedback from a human partner, in particular that negative feedback is both about the past and about intentions for future action. I