11 research outputs found
Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model
Deep reinforcement learning (DRL) algorithms have proven effective in robot
navigation, especially in unknown environments, by directly mapping perception
inputs into robot control commands. However, most existing methods ignore the
local minimum problem in navigation and thereby cannot handle complex unknown
environments. In this paper, we propose the first DRL-based navigation method
modeled by a semi-Markov decision process (SMDP) with continuous action space,
named Adaptive Forward Simulation Time (AFST), to overcome this problem.
Specifically, we reduce the dimensions of the action space and improve the
distributed proximal policy optimization (DPPO) algorithm for the specified
SMDP problem by modifying its GAE to better estimate the policy gradient in
SMDPs. Experiments in various unknown environments demonstrate the
effectiveness of AFST
Hierarchical multiagent reinforcement learning for maritime traffic management
Agency for Science, Technology and Research, Fujitsu Limited; National Research Foundation Singapor
Addressing Action Oscillations through Learning Policy Inertia
Deep reinforcement learning (DRL) algorithms have been demonstrated to be
effective in a wide range of challenging decision making and control tasks.
However, these methods typically suffer from severe action oscillations in
particular in discrete action setting, which means that agents select different
actions within consecutive steps even though states only slightly differ. This
issue is often neglected since the policy is usually evaluated by its
cumulative rewards only. Action oscillation strongly affects the user
experience and can even cause serious potential security menace especially in
real-world domains with the main concern of safety, such as autonomous driving.
To this end, we introduce Policy Inertia Controller (PIC) which serves as a
generic plug-in framework to off-the-shelf DRL algorithms, to enables adaptive
trade-off between the optimality and smoothness of the learned policy in a
formal way. We propose Nested Policy Iteration as a general training algorithm
for PIC-augmented policy which ensures monotonically non-decreasing updates
under some mild conditions. Further, we derive a practical DRL algorithm,
namely Nested Soft Actor-Critic. Experiments on a collection of autonomous
driving tasks and several Atari games suggest that our approach demonstrates
substantial oscillation reduction in comparison to a range of commonly adopted
baselines with almost no performance degradation.Comment: Accepted paper on AAAI 202
Aplicações de Big Data e algoritmos de Machine Learning à gestão inteligente da rega
Engenharia Agronómica - Hortofruticultura e Viticultura - Instituto Superior de AgronomiaO crescimento demográfico e as alterações climáticas são os dois grandes desafios à produção agrícola do século XXI. A pressão de produzir mais com menos recursos implica mudanças nos métodos de produção e na gestão eficiente de cada um.
A utilização de água para rega é um dos principais motores do crescimento vegetal e a sua disponibilidade deve ser assegurada para o futuro. Para tal é necessário reduzir o desperdício e garantir que a rega aplicada é adequada à necessidade das plantas ao longo do ciclo cultural.
A evolução das tecnologias de comunicação e recolha de dados no campo (sensores de humidade do solo, dendrómetros, etc) permitem criar uma nova dinâmica entre o agricultor e a informação disponível para a tomada de decisão.
Neste trabalho tirou-se partido do grande volume de dados disponível numa parcela de olival superintensivo, monitorizada por uma rede sensorial onde se incluía um dendrómetro e uma sonda de humidade do solo. A partir dos dados meteorológicos e sensoriais, recolhidos ao longo de dois anos, procurou-se encontrar uma relação entre os índices derivados da dendrometria e o estado hídrico da planta, utilizando posteriormente esta relação para construir dois algoritmos, uma rede neuronal (ANN) e uma floresta de decisão aleatória (FDA) que consigam prever com base em variáveis de simples obtenção o valor do índice e inferir por essa via o estado hídrico da planta.
A rede neuronal foi depois utilizada como parte de um sistema de aprendizagem reforçada, onde um algoritmo aprendeu a regar autonomamente com base em 0, a evolução do armazenamento de água no solo e os resultados do algoritmo preditivo obtido pela análise da dendrometria.
Os resultados obtidos mostram que as técnicas de “Big Data” e ML são adequadas à análise de dados recolhidos no campo, e à criação de ferramentas de apoio à decisãoN/
Multi-Agent/Robot Deep Reinforcement Learning with Macro-Actions (Student Abstract)
We consider the challenges of learning multi-agent/robot macro-action-based deep Q-nets including how to properly update each macro-action value and accurately maintain macro-action-observation trajectories. We address these challenges by first proposing two fundamental frameworks for learning macro-action-value function and joint macro-action-value function. Furthermore, we present two new approaches of learning decentralized macro-action-based policies, which involve a new double Q-update rule that facilitates the learning of decentralized Q-nets by using a centralized Q-net for action selection. Our approaches are evaluated both in simulation and on real robots