11 research outputs found

    Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model

    Full text link
    Deep reinforcement learning (DRL) algorithms have proven effective in robot navigation, especially in unknown environments, by directly mapping perception inputs into robot control commands. However, most existing methods ignore the local minimum problem in navigation and thereby cannot handle complex unknown environments. In this paper, we propose the first DRL-based navigation method modeled by a semi-Markov decision process (SMDP) with continuous action space, named Adaptive Forward Simulation Time (AFST), to overcome this problem. Specifically, we reduce the dimensions of the action space and improve the distributed proximal policy optimization (DPPO) algorithm for the specified SMDP problem by modifying its GAE to better estimate the policy gradient in SMDPs. Experiments in various unknown environments demonstrate the effectiveness of AFST

    Hierarchical multiagent reinforcement learning for maritime traffic management

    Get PDF
    Agency for Science, Technology and Research, Fujitsu Limited; National Research Foundation Singapor

    Addressing Action Oscillations through Learning Policy Inertia

    Full text link
    Deep reinforcement learning (DRL) algorithms have been demonstrated to be effective in a wide range of challenging decision making and control tasks. However, these methods typically suffer from severe action oscillations in particular in discrete action setting, which means that agents select different actions within consecutive steps even though states only slightly differ. This issue is often neglected since the policy is usually evaluated by its cumulative rewards only. Action oscillation strongly affects the user experience and can even cause serious potential security menace especially in real-world domains with the main concern of safety, such as autonomous driving. To this end, we introduce Policy Inertia Controller (PIC) which serves as a generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive trade-off between the optimality and smoothness of the learned policy in a formal way. We propose Nested Policy Iteration as a general training algorithm for PIC-augmented policy which ensures monotonically non-decreasing updates under some mild conditions. Further, we derive a practical DRL algorithm, namely Nested Soft Actor-Critic. Experiments on a collection of autonomous driving tasks and several Atari games suggest that our approach demonstrates substantial oscillation reduction in comparison to a range of commonly adopted baselines with almost no performance degradation.Comment: Accepted paper on AAAI 202

    Aplicações de Big Data e algoritmos de Machine Learning à gestão inteligente da rega

    Get PDF
    Engenharia Agronómica - Hortofruticultura e Viticultura - Instituto Superior de AgronomiaO crescimento demográfico e as alterações climáticas são os dois grandes desafios à produção agrícola do século XXI. A pressão de produzir mais com menos recursos implica mudanças nos métodos de produção e na gestão eficiente de cada um. A utilização de água para rega é um dos principais motores do crescimento vegetal e a sua disponibilidade deve ser assegurada para o futuro. Para tal é necessário reduzir o desperdício e garantir que a rega aplicada é adequada à necessidade das plantas ao longo do ciclo cultural. A evolução das tecnologias de comunicação e recolha de dados no campo (sensores de humidade do solo, dendrómetros, etc) permitem criar uma nova dinâmica entre o agricultor e a informação disponível para a tomada de decisão. Neste trabalho tirou-se partido do grande volume de dados disponível numa parcela de olival superintensivo, monitorizada por uma rede sensorial onde se incluía um dendrómetro e uma sonda de humidade do solo. A partir dos dados meteorológicos e sensoriais, recolhidos ao longo de dois anos, procurou-se encontrar uma relação entre os índices derivados da dendrometria e o estado hídrico da planta, utilizando posteriormente esta relação para construir dois algoritmos, uma rede neuronal (ANN) e uma floresta de decisão aleatória (FDA) que consigam prever com base em variáveis de simples obtenção o valor do índice e inferir por essa via o estado hídrico da planta. A rede neuronal foi depois utilizada como parte de um sistema de aprendizagem reforçada, onde um algoritmo aprendeu a regar autonomamente com base em 0, a evolução do armazenamento de água no solo e os resultados do algoritmo preditivo obtido pela análise da dendrometria. Os resultados obtidos mostram que as técnicas de “Big Data” e ML são adequadas à análise de dados recolhidos no campo, e à criação de ferramentas de apoio à decisãoN/

    Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract)

    No full text
    We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selection. Our approaches are evaluated both in simulation and on real robots
    corecore