849 research outputs found
A Benchmark Environment Motivated by Industrial Control Problems
In the research area of reinforcement learning (RL), frequently novel and
promising methods are developed and introduced to the RL community. However,
although many researchers are keen to apply their methods on real-world
problems, implementing such methods in real industry environments often is a
frustrating and tedious process. Generally, academic research groups have only
limited access to real industrial data and applications. For this reason, new
methods are usually developed, evaluated and compared by using artificial
software benchmarks. On one hand, these benchmarks are designed to provide
interpretable RL training scenarios and detailed insight into the learning
process of the method on hand. On the other hand, they usually do not share
much similarity with industrial real-world applications. For this reason we
used our industry experience to design a benchmark which bridges the gap
between freely available, documented, and motivated artificial benchmarks and
properties of real industrial problems. The resulting industrial benchmark (IB)
has been made publicly available to the RL community by publishing its Java and
Python code, including an OpenAI Gym wrapper, on Github. In this paper we
motivate and describe in detail the IB's dynamics and identify prototypic
experimental settings that capture common situations in real-world industry
control problems
A Benchmark Environment Motivated by Industrial Control Problems
In the research area of reinforcement learning (RL), frequently novel and
promising methods are developed and introduced to the RL community. However,
although many researchers are keen to apply their methods on real-world
problems, implementing such methods in real industry environments often is a
frustrating and tedious process. Generally, academic research groups have only
limited access to real industrial data and applications. For this reason, new
methods are usually developed, evaluated and compared by using artificial
software benchmarks. On one hand, these benchmarks are designed to provide
interpretable RL training scenarios and detailed insight into the learning
process of the method on hand. On the other hand, they usually do not share
much similarity with industrial real-world applications. For this reason we
used our industry experience to design a benchmark which bridges the gap
between freely available, documented, and motivated artificial benchmarks and
properties of real industrial problems. The resulting industrial benchmark (IB)
has been made publicly available to the RL community by publishing its Java and
Python code, including an OpenAI Gym wrapper, on Github. In this paper we
motivate and describe in detail the IB's dynamics and identify prototypic
experimental settings that capture common situations in real-world industry
control problems
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoístas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contínuos, e
podem ter tendências contra estratégias de equilíbrio específicas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilíbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinísticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuída. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuída. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
Fault Recovery in Swarm Robotics Systems using Learning Algorithms
When faults occur in swarm robotic systems they can have a detrimental effect on collective behaviours, to the point that failed individuals may jeopardise the swarm's ability to complete its task. Although fault tolerance is a desirable property of swarm robotic systems, fault recovery mechanisms have not yet been thoroughly explored. Individual robots may suffer a variety of faults, which will affect collective behaviours in different ways, therefore a recovery process is required that can cope with many different failure scenarios. In this thesis, we propose a novel approach for fault recovery in robot swarms that uses Reinforcement Learning and Self-Organising Maps to select the most appropriate recovery strategy for any given scenario. The learning process is evaluated in both centralised and distributed settings. Additionally, we experimentally evaluate the performance of this approach in comparison to random selection of fault recovery strategies, using simulated collective phototaxis, aggregation and foraging tasks as case studies. Our results show that this machine learning approach outperforms random selection, and allows swarm robotic systems to recover from faults that would otherwise prevent the swarm from completing its mission. This work builds upon existing research in fault detection and diagnosis in robot swarms, with the aim of creating a fully fault-tolerant swarm capable of long-term autonomy
Security Considerations in AI-Robotics: A Survey of Current Methods, Challenges, and Opportunities
Robotics and Artificial Intelligence (AI) have been inextricably intertwined
since their inception. Today, AI-Robotics systems have become an integral part
of our daily lives, from robotic vacuum cleaners to semi-autonomous cars. These
systems are built upon three fundamental architectural elements: perception,
navigation and planning, and control. However, while the integration of
AI-Robotics systems has enhanced the quality our lives, it has also presented a
serious problem - these systems are vulnerable to security attacks. The
physical components, algorithms, and data that make up AI-Robotics systems can
be exploited by malicious actors, potentially leading to dire consequences.
Motivated by the need to address the security concerns in AI-Robotics systems,
this paper presents a comprehensive survey and taxonomy across three
dimensions: attack surfaces, ethical and legal concerns, and Human-Robot
Interaction (HRI) security. Our goal is to provide users, developers and other
stakeholders with a holistic understanding of these areas to enhance the
overall AI-Robotics system security. We begin by surveying potential attack
surfaces and provide mitigating defensive strategies. We then delve into
ethical issues, such as dependency and psychological impact, as well as the
legal concerns regarding accountability for these systems. Besides, emerging
trends such as HRI are discussed, considering privacy, integrity, safety,
trustworthiness, and explainability concerns. Finally, we present our vision
for future research directions in this dynamic and promising field
Application of Fuzzy State Aggregation and Policy Hill Climbing to Multi-Agent Systems in Stochastic Environments
Reinforcement learning is one of the more attractive machine learning technologies, due to its unsupervised learning structure and ability to continually even as the operating environment changes. Applying this learning to multiple cooperative software agents (a multi-agent system) not only allows each individual agent to learn from its own experience, but also opens up the opportunity for the individual agents to learn from the other agents in the system, thus accelerating the rate of learning. This research presents the novel use of fuzzy state aggregation, as the means of function approximation, combined with the policy hill climbing methods of Win or Lose Fast (WoLF) and policy-dynamics based WoLF (PD-WoLF). The combination of fast policy hill climbing (PHC) and fuzzy state aggregation (FSA) function approximation is tested in two stochastic environments; Tileworld and the robot soccer domain, RoboCup. The Tileworld results demonstrate that a single agent using the combination of FSA and PHC learns quicker and performs better than combined fuzzy state aggregation and Q-learning lone. Results from the RoboCup domain again illustrate that the policy hill climbing algorithms perform better than Q-learning alone in a multi-agent environment. The learning is further enhanced by allowing the agents to share their experience through a weighted strategy sharing
On Specifying for Trustworthiness
As autonomous systems (AS) increasingly become part of our daily lives,
ensuring their trustworthiness is crucial. In order to demonstrate the
trustworthiness of an AS, we first need to specify what is required for an AS
to be considered trustworthy. This roadmap paper identifies key challenges for
specifying for trustworthiness in AS, as identified during the "Specifying for
Trustworthiness" workshop held as part of the UK Research and Innovation (UKRI)
Trustworthy Autonomous Systems (TAS) programme. We look across a range of AS
domains with consideration of the resilience, trust, functionality,
verifiability, security, and governance and regulation of AS and identify some
of the key specification challenges in these domains. We then highlight the
intellectual challenges that are involved with specifying for trustworthiness
in AS that cut across domains and are exacerbated by the inherent uncertainty
involved with the environments in which AS need to operate.Comment: Accepted version of paper. 13 pages, 1 table, 1 figur
- …