5,565 research outputs found

    Simultaneous AUV navigation and tracking with satellite ASVs

    Get PDF
    The navigation of Autonomous Underwater Vehicles (AUV) presents a series of challenges, being the location due to the inaccessibility of GPS signal one of them. A solution is the use of several surface vehicles equipped with hydrophones to locate and track an AUV that emits an acoustic signal. This Master’s Thesis addresses this solution with the objective to optimize the energy consumption of 2 Autonomous Surface Vehicles (ASV) while maintaining a preferred geometry of the formation to reduce the uncertainty of the position estimation of the AUV. To control these surface vehicles, 2 approaches of Reinforcement Learning (RL) have been implemented, Deep Deterministic Policy Gradient (DDPG) and Multi-Agent Deep Deter- ministic Policy Gradient (MADDPG). The aim of using these 2 algorithms is to test if the inclusion of the multi-agent element is necessary in this case. The RL algorithms control the linear and angular velocities of each robot. The environment used for the training of the robots is created specifically for this project using Python. A weighted sum of Gaussian functions is designed for the reward function, which contains all the elements related to the optimization of energy consumption and the formation of the AUV. To analyse several aspects of the final models of each robot a total of 4 different tests were done. These tests focus on the analysis of the distribution of the weights in the reward func- tion and the ability to adapt to difficult scenarios. The tests are done as simulations. The re- sults show that the ASVs trained with these implementations of RL modify their behaviour depending on the weight configuration and can adapt to more difficult scenarios depending on aspects of the training such as the noise applied to the actions. Moreover, the performance between the DDPG and MADDPG is compared. A discussion with similar works that have treated this problem is done using the final results

    Aprendizagem de coordenação em sistemas multi-agente

    Get PDF
    The ability for an agent to coordinate with others within a system is a valuable property in multi-agent systems. Agents either cooperate as a team to accomplish a common goal, or adapt to opponents to complete different goals without being exploited. Research has shown that learning multi-agent coordination is significantly more complex than learning policies in singleagent environments, and requires a variety of techniques to deal with the properties of a system where agents learn concurrently. This thesis aims to determine how can machine learning be used to achieve coordination within a multi-agent system. It asks what techniques can be used to tackle the increased complexity of such systems and their credit assignment challenges, how to achieve coordination, and how to use communication to improve the behavior of a team. Many algorithms for competitive environments are tabular-based, preventing their use with high-dimension or continuous state-spaces, and may be biased against specific equilibrium strategies. This thesis proposes multiple deep learning extensions for competitive environments, allowing algorithms to reach equilibrium strategies in complex and partially-observable environments, relying only on local information. A tabular algorithm is also extended with a new update rule that eliminates its bias against deterministic strategies. Current state-of-the-art approaches for cooperative environments rely on deep learning to handle the environment’s complexity and benefit from a centralized learning phase. Solutions that incorporate communication between agents often prevent agents from being executed in a distributed manner. This thesis proposes a multi-agent algorithm where agents learn communication protocols to compensate for local partial-observability, and remain independently executed. A centralized learning phase can incorporate additional environment information to increase the robustness and speed with which a team converges to successful policies. The algorithm outperforms current state-of-the-art approaches in a wide variety of multi-agent environments. A permutation invariant network architecture is also proposed to increase the scalability of the algorithm to large team sizes. Further research is needed to identify how can the techniques proposed in this thesis, for cooperative and competitive environments, be used in unison for mixed environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma propriedade valiosa em sistemas multi-agente. Agentes cooperam como uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes de forma a completar objetivos egoístas sem serem explorados. Investigação demonstra que aprender coordenação multi-agente é significativamente mais complexo que aprender estratégias em ambientes com um único agente, e requer uma variedade de técnicas para lidar com um ambiente onde agentes aprendem simultaneamente. Esta tese procura determinar como aprendizagem automática pode ser usada para encontrar coordenação em sistemas multi-agente. O documento questiona que técnicas podem ser usadas para enfrentar a superior complexidade destes sistemas e o seu desafio de atribuição de crédito, como aprender coordenação, e como usar comunicação para melhorar o comportamento duma equipa. Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede o seu uso com espaços de estado de alta-dimensão ou contínuos, e podem ter tendências contra estratégias de equilíbrio específicas. Esta tese propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos, permitindo a algoritmos atingir estratégias de equilíbrio em ambientes complexos e parcialmente-observáveis, com base em apenas informação local. Um algoritmo tabular é também extendido com um novo critério de atualização que elimina a sua tendência contra estratégias determinísticas. Atuais soluções de estado-da-arte para ambientes cooperativos têm base em aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam duma fase de aprendizagem centralizada. Soluções que incorporam comunicação entre agentes frequentemente impedem os próprios de ser executados de forma distribuída. Esta tese propõe um algoritmo multi-agente onde os agentes aprendem protocolos de comunicação para compensarem por observabilidade parcial local, e continuam a ser executados de forma distribuída. Uma fase de aprendizagem centralizada pode incorporar informação adicional sobre ambiente para aumentar a robustez e velocidade com que uma equipa converge para estratégias bem-sucedidas. O algoritmo ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes multi-agente. Uma arquitetura de rede invariante a permutações é também proposta para aumentar a escalabilidade do algoritmo para grandes equipas. Mais pesquisa é necessária para identificar como as técnicas propostas nesta tese, para ambientes cooperativos e competitivos, podem ser usadas em conjunto para ambientes mistos, e averiguar se são adequadas a inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic

    Building Machines That Learn and Think Like People

    Get PDF
    Recent progress in artificial intelligence (AI) has renewed interest in building systems that learn and think like people. Many advances have come from using deep neural networks trained end-to-end in tasks such as object recognition, video games, and board games, achieving performance that equals or even beats humans in some respects. Despite their biological inspiration and performance achievements, these systems differ from human intelligence in crucial ways. We review progress in cognitive science suggesting that truly human-like learning and thinking machines will have to reach beyond current engineering trends in both what they learn, and how they learn it. Specifically, we argue that these machines should (a) build causal models of the world that support explanation and understanding, rather than merely solving pattern recognition problems; (b) ground learning in intuitive theories of physics and psychology, to support and enrich the knowledge that is learned; and (c) harness compositionality and learning-to-learn to rapidly acquire and generalize knowledge to new tasks and situations. We suggest concrete challenges and promising routes towards these goals that can combine the strengths of recent neural network advances with more structured cognitive models.Comment: In press at Behavioral and Brain Sciences. Open call for commentary proposals (until Nov. 22, 2016). https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/information/calls-for-commentary/open-calls-for-commentar

    Locomotion Optimization of Photoresponsive Small-scale Robot: A Deep Reinforcement Learning Approach

    Get PDF
    Soft robots comprise of elastic and flexible structures, and actuatable soft materials are often used to provide stimuli-responses, remotely controlled with different kinds of external stimuli, which is beneficial for designing small-scale devices. Among different stimuli-responsive materials, liquid crystal networks (LCNs) have gained a significant amount of attention for soft small-scale robots in the past decade being stimulated and actuated by light, which is clean energy, able to transduce energy remotely, easily available and accessible to sophisticated control. One of the persistent challenges in photoresponsive robotics is to produce controllable autonomous locomotion behavior. In this Thesis, different types of photoresponsive soft robots were used to realize light-powered locomotion, and an artificial intelligence-based approach was developed for controlling the movement. A robot tracking system, including an automatic laser steering function, was built for efficient robotic feature detection and steering the laser beam automatically to desired locations. Another robot prototype, a swimmer robot, driven by the automatically steered laser beam, showed directional movements including some degree of uncertainty and randomness in their locomotion behavior. A novel approach is developed to deal with the challenges related to the locomotion of photoresponsive swimmer robots. Machine learning, particularly deep reinforcement learning method, was applied to develop a control policy for autonomous locomotion behavior. This method can learn from its experiences by interacting with the robot and its environment without explicit knowledge of the robot structure, constituent material, and robotic mechanics. Due to the requirement of a large number of experiences to correlate the goodness of behavior control, a simulator was developed, which mimicked the uncertain and random movement behavior of the swimmer robots. This approach effectively adapted the random movement behaviors and developed an optimal control policy to reach different destination points autonomously within a simulated environment. This work has successfully taken a step towards the autonomous locomotion control of soft photoresponsive robots

    Human aware robot navigation

    Get PDF
    Abstract. Human aware robot navigation refers to the navigation of a robot in an environment shared with humans in such a way that the humans should feel comfortable, and natural with the presence of the robot. On top of that, the robot navigation should comply with the social norms of the environment. The robot can interact with humans in the environment, such as avoiding them, approaching them, or following them. In this thesis, we specifically focus on the approach behavior of the robot, keeping the other use cases still in mind. Studying and analyzing how humans move around other humans gives us the idea about the kind of navigation behaviors that we expect the robots to exhibit. Most of the previous research does not focus much on understanding such behavioral aspects while approaching people. On top of that, a straightforward mathematical modeling of complex human behaviors is very difficult. So, in this thesis, we proposed an Inverse Reinforcement Learning (IRL) framework based on Guided Cost Learning (GCL) to learn these behaviors from demonstration. After analyzing the CongreG8 dataset, we found that the incoming human tends to make an O-space (circle) with the rest of the group. Also, the approaching velocity slows down when the approaching human gets closer to the group. We utilized these findings in our framework that can learn the optimal reward and policy from the example demonstrations and imitate similar human motion

    High level coordination and decision making of a simulated robotic soccer team

    Get PDF
    Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

    Effective Task Transfer Through Indirect Encoding

    Get PDF
    An important goal for machine learning is to transfer knowledge between tasks. For example, learning to play RoboCup Keepaway should contribute to learning the full game of RoboCup soccer. Often approaches to task transfer focus on transforming the original representation to fit the new task. Such representational transformations are necessary because the target task often requires new state information that was not included in the original representation. In RoboCup Keepaway, changing from the 3 vs. 2 variant of the task to 4 vs. 3 adds state information for each of the new players. In contrast, this dissertation explores the idea that transfer is most effective if the representation is designed to be the same even across different tasks. To this end, (1) the bird’s eye view (BEV) representation is introduced, which can represent different tasks on the same two-dimensional map. Because the BEV represents state information associated with positions instead of objects, it can be scaled to more objects without manipulation. In this way, both the 3 vs. 2 and 4 vs. 3 Keepaway tasks can be represented on the same BEV, which is (2) demonstrated in this dissertation. Yet a challenge for such representation is that a raw two-dimensional map is highdimensional and unstructured. This dissertation demonstrates how this problem is addressed naturally by the Hypercube-based NeuroEvolution of Augmenting Topologies (HyperNEAT) approach. HyperNEAT evolves an indirect encoding, which compresses the representation by exploiting its geometry. The dissertation then explores further exploiting the power of such encoding, beginning by (3) enhancing the configuration of the BEV with a focus on iii modularity. The need for further nonlinearity is then (4) investigated through the addition of hidden nodes. Furthermore, (5) the size of the BEV can be manipulated because it is indirectly encoded. Thus the resolution of the BEV, which is dictated by its size, is increased in precision and culminates in a HyperNEAT extension that is expressed at effectively infinite resolution. Additionally, scaling to higher resolutions through gradually increasing the size of the BEV is explored. Finally, (6) the ambitious problem of scaling from the Keepaway task to the Half-field Offense task is investigated with the BEV. Overall, this dissertation demonstrates that advanced representations in conjunction with indirect encoding can contribute to scaling learning techniques to more challenging tasks, such as the Half-field Offense RoboCup soccer domain

    Reinforcement Learning in Self Organizing Cellular Networks

    Get PDF
    Self-organization is a key feature as cellular networks densify and become more heterogeneous, through the additional small cells such as pico and femtocells. Self- organizing networks (SONs) can perform self-configuration, self-optimization, and self-healing. These operations can cover basic tasks such as the configuration of a newly installed base station, resource management, and fault management in the network. In other words, SONs attempt to minimize human intervention where they use measurements from the network to minimize the cost of installation, configuration, and maintenance of the network. In fact, SONs aim to bring two main factors in play: intelligence and autonomous adaptability. One of the main requirements for achieving such goals is to learn from sensory data and signal measurements in networks. Therefore, machine learning techniques can play a major role in processing underutilized sensory data to enhance the performance of SONs. In the first part of this dissertation, we focus on reinforcement learning as a viable approach for learning from signal measurements. We develop a general framework in heterogeneous cellular networks agnostic to the learning approach. We design multiple reward functions and study different effects of the reward function, Markov state model, learning rate, and cooperation methods on the performance of reinforcement learning in cellular networks. Further, we look into the optimality of reinforcement learning solutions and provide insights into how to achieve optimal solutions. In the second part of the dissertation, we propose a novel architecture based on spatial indexing for system-evaluation of heterogeneous 5G cellular networks. We develop an open-source platform based on the proposed architecture that can be used to study large scale directional cellular networks. The proposed platform is used for generating training data sets of accurate signal-to-interference-plus-noise-ratio (SINR) values in millimeter-wave communications for machine learning purposes. Then, with taking advantage of the developed platform, we look into dense millimeter-wave networks as one of the key technologies in 5G cellular networks. We focus on topology management of millimeter-wave backhaul networks and study and provide multiple insights on the evaluation and selection of proper performance metrics in dense millimeter-wave networks. Finally, we finish this part by proposing a self-organizing solution to achieve k-connectivity via reinforcement learning in the topology management of wireless networks
    • …
    corecore