5,565 research outputs found
Simultaneous AUV navigation and tracking with satellite ASVs
The navigation of Autonomous Underwater Vehicles (AUV) presents a series of challenges, being the location due to the inaccessibility of GPS signal one of them. A solution is the use of several surface vehicles equipped with hydrophones to locate and track an AUV that emits an acoustic signal. This Master’s Thesis addresses this solution with the objective to optimize the energy consumption of 2 Autonomous Surface Vehicles (ASV) while maintaining a preferred geometry of the formation to reduce the uncertainty of the position estimation of the AUV. To control these surface vehicles, 2 approaches of Reinforcement Learning (RL) have been implemented, Deep Deterministic Policy Gradient (DDPG) and Multi-Agent Deep Deter- ministic Policy Gradient (MADDPG). The aim of using these 2 algorithms is to test if the inclusion of the multi-agent element is necessary in this case. The RL algorithms control the linear and angular velocities of each robot. The environment used for the training of the robots is created specifically for this project using Python. A weighted sum of Gaussian functions is designed for the reward function, which contains all the elements related to the optimization of energy consumption and the formation of the AUV. To analyse several aspects of the final models of each robot a total of 4 different tests were done. These tests focus on the analysis of the distribution of the weights in the reward func- tion and the ability to adapt to difficult scenarios. The tests are done as simulations. The re- sults show that the ASVs trained with these implementations of RL modify their behaviour depending on the weight configuration and can adapt to more difficult scenarios depending on aspects of the training such as the noise applied to the actions. Moreover, the performance between the DDPG and MADDPG is compared. A discussion with similar works that have treated this problem is done using the final results
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoÃstas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contÃnuos, e
podem ter tendências contra estratégias de equilÃbrio especÃficas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilÃbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinÃsticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuÃda. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuÃda. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
Building Machines That Learn and Think Like People
Recent progress in artificial intelligence (AI) has renewed interest in
building systems that learn and think like people. Many advances have come from
using deep neural networks trained end-to-end in tasks such as object
recognition, video games, and board games, achieving performance that equals or
even beats humans in some respects. Despite their biological inspiration and
performance achievements, these systems differ from human intelligence in
crucial ways. We review progress in cognitive science suggesting that truly
human-like learning and thinking machines will have to reach beyond current
engineering trends in both what they learn, and how they learn it.
Specifically, we argue that these machines should (a) build causal models of
the world that support explanation and understanding, rather than merely
solving pattern recognition problems; (b) ground learning in intuitive theories
of physics and psychology, to support and enrich the knowledge that is learned;
and (c) harness compositionality and learning-to-learn to rapidly acquire and
generalize knowledge to new tasks and situations. We suggest concrete
challenges and promising routes towards these goals that can combine the
strengths of recent neural network advances with more structured cognitive
models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary
proposals (until Nov. 22, 2016).
https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar
Locomotion Optimization of Photoresponsive Small-scale Robot: A Deep Reinforcement Learning Approach
Soft robots comprise of elastic and flexible structures, and actuatable soft materials are often used to provide stimuli-responses, remotely controlled with different kinds of external stimuli, which is beneficial for designing small-scale devices. Among different stimuli-responsive materials, liquid crystal networks (LCNs) have gained a significant amount of attention for soft small-scale robots in the past decade being stimulated and actuated by light, which is clean energy, able to transduce energy remotely, easily available and accessible to sophisticated control.
One of the persistent challenges in photoresponsive robotics is to produce controllable autonomous locomotion behavior. In this Thesis, different types of photoresponsive soft robots were used to realize light-powered locomotion, and an artificial intelligence-based approach was developed for controlling the movement. A robot tracking system, including an automatic laser steering function, was built for efficient robotic feature detection and steering the laser beam automatically to desired locations. Another robot prototype, a swimmer robot, driven by the automatically steered laser beam, showed directional movements including some degree of uncertainty and randomness in their locomotion behavior.
A novel approach is developed to deal with the challenges related to the locomotion of photoresponsive swimmer robots. Machine learning, particularly deep reinforcement learning method, was applied to develop a control policy for autonomous locomotion behavior. This method can learn from its experiences by interacting with the robot and its environment without explicit knowledge of the robot structure, constituent material, and robotic mechanics. Due to the requirement of a large number of experiences to correlate the goodness of behavior control, a simulator was developed, which mimicked the uncertain and random movement behavior of the swimmer robots. This approach effectively adapted the random movement behaviors and developed an optimal control policy to reach different destination points autonomously within a simulated environment. This work has successfully taken a step towards the autonomous locomotion control of soft photoresponsive robots
Human aware robot navigation
Abstract. Human aware robot navigation refers to the navigation of a robot in an environment shared with humans in such a way that the humans should feel comfortable, and natural with the presence of the robot. On top of that, the robot navigation should comply with the social norms of the environment. The robot can interact with humans in the environment, such as avoiding them, approaching them, or following them. In this thesis, we specifically focus on the approach behavior of the robot, keeping the other use cases still in mind. Studying and analyzing how humans move around other humans gives us the idea about the kind of navigation behaviors that we expect the robots to exhibit. Most of the previous research does not focus much on understanding such behavioral aspects while approaching people. On top of that, a straightforward mathematical modeling of complex human behaviors is very difficult. So, in this thesis, we proposed an Inverse Reinforcement Learning (IRL) framework based on Guided Cost Learning (GCL) to learn these behaviors from demonstration. After analyzing the CongreG8 dataset, we found that the incoming human tends to make an O-space (circle) with the rest of the group. Also, the approaching velocity slows down when the approaching human gets closer to the group. We utilized these findings in our framework that can learn the optimal reward and policy from the example demonstrations and imitate similar human motion
High level coordination and decision making of a simulated robotic soccer team
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Effective Task Transfer Through Indirect Encoding
An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Often approaches to task transfer focus on transforming the original representation to fit the new task. Such representational transformations are necessary because the target task often requires new state information that was not included in the original representation. In RoboCup Keepaway, changing from the 3 vs. 2 variant of the task to 4 vs. 3 adds state information for each of the new players. In contrast, this dissertation explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To this end, (1) the bird’s eye view (BEV) representation is introduced, which can represent different tasks on the same two-dimensional map. Because the BEV represents state information associated with positions instead of objects, it can be scaled to more objects without manipulation. In this way, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV, which is (2) demonstrated in this dissertation. Yet a challenge for such representation is that a raw two-dimensional map is highdimensional and unstructured. This dissertation demonstrates how this problem is addressed naturally by the Hypercube-based NeuroEvolution of Augmenting Topologies (HyperNEAT) approach. HyperNEAT evolves an indirect encoding, which compresses the representation by exploiting its geometry. The dissertation then explores further exploiting the power of such encoding, beginning by (3) enhancing the configuration of the BEV with a focus on iii modularity. The need for further nonlinearity is then (4) investigated through the addition of hidden nodes. Furthermore, (5) the size of the BEV can be manipulated because it is indirectly encoded. Thus the resolution of the BEV, which is dictated by its size, is increased in precision and culminates in a HyperNEAT extension that is expressed at effectively infinite resolution. Additionally, scaling to higher resolutions through gradually increasing the size of the BEV is explored. Finally, (6) the ambitious problem of scaling from the Keepaway task to the Half-field Offense task is investigated with the BEV. Overall, this dissertation demonstrates that advanced representations in conjunction with indirect encoding can contribute to scaling learning techniques to more challenging tasks, such as the Half-field Offense RoboCup soccer domain
Reinforcement Learning in Self Organizing Cellular Networks
Self-organization is a key feature as cellular networks densify and become more heterogeneous, through the additional small cells such as pico and femtocells. Self- organizing networks (SONs) can perform self-configuration, self-optimization, and self-healing. These operations can cover basic tasks such as the configuration of a newly installed base station, resource management, and fault management in the network. In other words, SONs attempt to minimize human intervention where they use measurements from the network to minimize the cost of installation, configuration, and maintenance of the network. In fact, SONs aim to bring two main factors in play: intelligence and autonomous adaptability. One of the main requirements for achieving such goals is to learn from sensory data and signal measurements in networks. Therefore, machine learning techniques can play a major role in processing underutilized sensory data to enhance the performance of SONs.
In the first part of this dissertation, we focus on reinforcement learning as a viable approach for learning from signal measurements. We develop a general framework in heterogeneous cellular networks agnostic to the learning approach. We design multiple reward functions and study different effects of the reward function, Markov state model, learning rate, and cooperation methods on the performance of reinforcement learning in cellular networks. Further, we look into the optimality of reinforcement learning solutions and provide insights into how to achieve optimal solutions.
In the second part of the dissertation, we propose a novel architecture based on spatial indexing for system-evaluation of heterogeneous 5G cellular networks. We develop an open-source platform based on the proposed architecture that can be used to study large scale directional cellular networks. The proposed platform is used for generating training data sets of accurate signal-to-interference-plus-noise-ratio (SINR) values in millimeter-wave communications for machine learning purposes. Then, with taking advantage of the developed platform, we look into dense millimeter-wave networks as one of the key technologies in 5G cellular networks. We focus on topology management of millimeter-wave backhaul networks and study and provide multiple insights on the evaluation and selection of proper performance metrics in dense millimeter-wave networks. Finally, we finish this part by proposing a self-organizing solution to achieve k-connectivity via reinforcement learning in the topology management of wireless networks
- …