594 research outputs found
Reinforcement Learning for Value Alignment
[eng] As autonomous agents become increasingly sophisticated and we allow them to perform more complex tasks, it is of utmost importance to guarantee that they will act in alignment with human values. This problem has received in the AI literature the name of the value alignment problem. Current approaches apply reinforcement learning to align agents with values due to its recent successes at solving complex sequential decision-making problems. However, they follow an agent-centric approach by expecting that the agent applies the reinforcement learning algorithm correctly to learn an ethical behaviour, without formal guarantees that the learnt ethical behaviour will be ethical. This thesis proposes a novel environment-designer approach for solving the value alignment problem with theoretical guarantees.
Our proposed environment-designer approach advances the state of the art with a process for designing ethical environments wherein it is in the agent's best interest to learn ethical behaviours. Our process specifies the ethical knowledge of a moral value in terms that can be used in a reinforcement learning context. Next, our process embeds this knowledge in the agent's learning environment to design an ethical learning environment. The resulting ethical environment incentivises the agent to learn an ethical behaviour while pursuing its own objective.
We further contribute to the state of the art by providing a novel algorithm that, following our ethical environment design process, is formally guaranteed to create ethical environments. In other words, this algorithm guarantees that it is in the agent's best interest to learn value- aligned behaviours.
We illustrate our algorithm by applying it in a case study environment wherein the agent is expected to learn to behave in alignment with the moral value of respect. In it, a conversational agent is in charge of doing surveys, and we expect it to ask the users questions respectfully while trying to get as much information as possible. In the designed ethical environment, results confirm our theoretical results: the agent learns an ethical behaviour while pursuing its individual objective.[cat] A mesura que els agents autònoms es tornen cada cop més sofisticats i els permetem realitzar tasques més complexes, és de la màxima importància garantir que actuaran d'acord amb els valors humans. Aquest problema ha rebut a la literatura d'IA el nom del problema d'alineació de valors. Els enfocaments actuals apliquen aprenentatge per reforç per alinear els agents amb els valors a causa dels seus èxits recents a l'hora de resoldre problemes complexos de presa de decisions seqüencials. Tanmateix, segueixen un enfocament centrat en l'agent en esperar que l'agent apliqui correctament l'algorisme d'aprenentatge de reforç per aprendre un comportament ètic, sense garanties formals que el comportament ètic après serà ètic. Aquesta tesi proposa un nou enfocament de dissenyador d'entorn per resoldre el problema d'alineació de valors amb garanties teòriques.
El nostre enfocament de disseny d'entorns proposat avança l'estat de l'art amb un procés per dissenyar entorns ètics en què és del millor interès de l'agent aprendre comportaments ètics. El nostre procés especifica el coneixement ètic d'un valor moral en termes que es poden utilitzar en un context d'aprenentatge de reforç. A continuació, el nostre procés incorpora aquest coneixement a l'entorn d'aprenentatge de l'agent per dissenyar un entorn d'aprenentatge ètic. L'entorn ètic resultant incentiva l'agent a aprendre un comportament ètic mentre persegueix el seu propi objectiu.
A més, contribuïm a l'estat de l'art proporcionant un algorisme nou que, seguint el nostre procés de disseny d'entorns ètics, està garantit formalment per crear entorns ètics. En altres paraules, aquest algorisme garanteix que és del millor interès de l'agent aprendre comportaments alineats amb valors.
Il·lustrem el nostre algorisme aplicant-lo en un estudi de cas on s'espera que l'agent aprengui a comportar-se d'acord amb el valor moral del respecte. En ell, un agent de conversa s'encarrega de fer enquestes, i esperem que faci preguntes als usuaris amb respecte tot intentant obtenir la màxima informació possible. En l'entorn ètic dissenyat, els resultats confirmen els nostres resultats teòrics: l'agent aprèn un comportament ètic mentre persegueix el seu objectiu individual
A Game-Theoretic Framework for Managing Risk in Multi-Agent Systems
In order for agents in multi-agent systems (MAS) to be safe, they need to take into account the risks posed by the actions of other agents. However, the dominant paradigm in game theory (GT) assumes that agents are not affected by risk from other agents and only strive to maximise their expected utility. For example, in hybrid human-AI driving systems, it is necessary to limit large deviations in reward resulting from car crashes. Although there are equilibrium concepts in game theory that take into account risk aversion, they either assume that agents are risk-neutral with respect to the uncertainty caused by the actions of other agents, or they are not guaranteed to exist. We introduce a new GT-based Risk-Averse Equilibrium (RAE) that always produces a solution that minimises the potential variance in reward accounting for the strategy of other agents. Theoretically and empirically, we show RAE shares many properties with a Nash Equilibrium (NE), establishing convergence properties and generalising to risk-dominant NE in certain cases. To tackle large-scale problems, we extend RAE to the PSRO multi-agent reinforcement learning (MARL) framework. We empirically demonstrate the minimum reward variance benefits of RAE in matrix games with high-risk outcomes. Results on MARL experiments show RAE generalises to risk-dominant NE in a trust dilemma game and that it reduces instances of crashing by 7x in an autonomous driving setting versus the best performing baseline
A Survey of Zero-shot Generalisation in Deep Reinforcement Learning
The study of zero-shot generalisation (ZSG) in deep Reinforcement Learning
(RL) aims to produce RL algorithms whose policies generalise well to novel
unseen situations at deployment time, avoiding overfitting to their training
environments. Tackling this is vital if we are to deploy reinforcement learning
algorithms in real world scenarios, where the environment will be diverse,
dynamic and unpredictable. This survey is an overview of this nascent field. We
rely on a unifying formalism and terminology for discussing different ZSG
problems, building upon previous works. We go on to categorise existing
benchmarks for ZSG, as well as current methods for tackling these problems.
Finally, we provide a critical discussion of the current state of the field,
including recommendations for future work. Among other conclusions, we argue
that taking a purely procedural content generation approach to benchmark design
is not conducive to progress in ZSG, we suggest fast online adaptation and
tackling RL-specific problems as some areas for future work on methods for ZSG,
and we recommend building benchmarks in underexplored problem settings such as
offline RL ZSG and reward-function variation
Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation
This paper surveys the current state of the art in Natural Language
Generation (NLG), defined as the task of generating text or speech from
non-linguistic input. A survey of NLG is timely in view of the changes that the
field has undergone over the past decade or so, especially in relation to new
(usually data-driven) methods, as well as new applications of NLG technology.
This survey therefore aims to (a) give an up-to-date synthesis of research on
the core tasks in NLG and the architectures adopted in which such tasks are
organised; (b) highlight a number of relatively recent research topics that
have arisen partly as a result of growing synergies between NLG and other areas
of artificial intelligence; (c) draw attention to the challenges in NLG
evaluation, relating them to similar challenges faced in other areas of Natural
Language Processing, with an emphasis on different evaluation methods and the
relationships between them.Comment: Published in Journal of AI Research (JAIR), volume 61, pp 75-170. 118
pages, 8 figures, 1 tabl
Housing supply chain model for innovation: research report
The aim of this research is to undertake a case study analysis of successful delivery of an innovation to the Australian housing construction industry.
This study is conducted on the “innovator group”; that is, the group that created the idea of an innovation for the housing sector and then were intimately involved in creation, development and diffusion. It is apparent that there were key players involved in this process which are representative of various organisations along the supply chain – designer, developer, subcontractor and supplier.
Much rhetoric states that integration of the supply chain actors will solve construction problems, however, in reality we know little beyond this in the Australian context as there has been little research conducted previously. This study will examine in detail the process undertaken by this particular group to deliver an innovation to the housing sector which required an integrated construction supply chain model.
This report was published by the Australian Housing Supply Chain Alliance and written by Professor Kerry London, School of Property, Construction and Project Management, RMIT University with Research Fellow, Jessica Siva
Artificial Collective Intelligence Engineering: a Survey of Concepts and Perspectives
Collectiveness is an important property of many systems--both natural and
artificial. By exploiting a large number of individuals, it is often possible
to produce effects that go far beyond the capabilities of the smartest
individuals, or even to produce intelligent collective behaviour out of
not-so-intelligent individuals. Indeed, collective intelligence, namely the
capability of a group to act collectively in a seemingly intelligent way, is
increasingly often a design goal of engineered computational systems--motivated
by recent techno-scientific trends like the Internet of Things, swarm robotics,
and crowd computing, just to name a few. For several years, the collective
intelligence observed in natural and artificial systems has served as a source
of inspiration for engineering ideas, models, and mechanisms. Today, artificial
and computational collective intelligence are recognised research topics,
spanning various techniques, kinds of target systems, and application domains.
However, there is still a lot of fragmentation in the research panorama of the
topic within computer science, and the verticality of most communities and
contributions makes it difficult to extract the core underlying ideas and
frames of reference. The challenge is to identify, place in a common structure,
and ultimately connect the different areas and methods addressing intelligent
collectives. To address this gap, this paper considers a set of broad scoping
questions providing a map of collective intelligence research, mostly by the
point of view of computer scientists and engineers. Accordingly, it covers
preliminary notions, fundamental concepts, and the main research perspectives,
identifying opportunities and challenges for researchers on artificial and
computational collective intelligence engineering.Comment: This is the author's final version of the article, accepted for
publication in the Artificial Life journal. Data: 34 pages, 2 figure
Learning how to act: making good decisions with machine learning
This thesis is about machine learning and statistical approaches
to decision making. How can we learn from data to anticipate the
consequence of, and optimally select, interventions or actions?
Problems such as deciding which medication to prescribe to
patients, who should be released on bail, and how much to charge
for insurance are ubiquitous, and have far reaching impacts on
our lives. There are two fundamental approaches to learning how
to act: reinforcement learning, in which an agent directly
intervenes in a system and learns from the outcome, and
observational causal inference, whereby we seek to infer the
outcome of an intervention from observing the system.
The goal of this thesis to connect and unify these key
approaches. I introduce causal bandit problems: a synthesis that
combines causal graphical models, which were developed for
observational causal inference, with multi-armed bandit problems,
which are a subset of reinforcement learning problems that are
simple enough to admit formal analysis. I show that knowledge of
the causal structure allows us to transfer information learned
about the outcome of one action to predict the outcome of an
alternate action, yielding a novel form of structure between
bandit arms that cannot be exploited by existing algorithms. I
propose an algorithm for causal bandit problems and prove bounds
on the simple regret demonstrating it is close to mini-max
optimal and better than algorithms that do not use the additional
causal information
- …