4,147 research outputs found
Active Reward Learning for Co-Robotic Vision Based Exploration in Bandwidth Limited Environments
We present a novel POMDP problem formulation for a robot that must
autonomously decide where to go to collect new and scientifically relevant
images given a limited ability to communicate with its human operator. From
this formulation we derive constraints and design principles for the
observation model, reward model, and communication strategy of such a robot,
exploring techniques to deal with the very high-dimensional observation space
and scarcity of relevant training data. We introduce a novel active reward
learning strategy based on making queries to help the robot minimize path
"regret" online, and evaluate it for suitability in autonomous visual
exploration through simulations. We demonstrate that, in some bandwidth-limited
environments, this novel regret-based criterion enables the robotic explorer to
collect up to 17% more reward per mission than the next-best criterion.Comment: 7 pages, 4 figures; accepted for presentation in IEEE Int. Conf. on
Robotics and Automation, ICRA '20, Paris, France, June 202
Recommended from our members
Multi-SLAM Systems for Fault-Tolerant Simultaneous Localization and Mapping
Mobile robots need accurate, high fidelity models of their operating environments in order to complete their tasks safely and efficiently. Generating these models is most often done via Simultaneous Localization and Mapping (SLAM), a paradigm where the robot alternatively estimates the most up-to-date model of the environment and its position relative to this model as it acquires new information from its sensors over time. Because robots operate in many different environments with different compute, memory, sensing, and form constraints, the nature and quality of information available to individual instances of different SLAM systems varies substantially. `One-size-fits-all\u27 solutions are thus exceedingly difficult to engineer, and highly specialized systems, which represent the state-of-the-art for most types of deployments, are not robust to operating conditions in which their assumptions are not met. This thesis seeks to investigate an alternative approach to these robustness and universality problems by incorporating existing SLAM solutions within a larger framework supported by planning and learning. The central idea is to combine learned models that estimate SLAM algorithm performance under a variety of sensory conditions, in this case neural networks, with planners designed for planning under uncertainty and partial observability, in this case partially observable Markov decision problems (POMDPs). Models of existing SLAM algorithms can be learned, and these models can then be used online to estimate the performance of a range of solutions to the SLAM problem at hand. The POMDP policy then selects the appropriate algorithm, given the estimated performance, cost of switching methods, and other information. This general approach may also be applicable to many other robotics problems that rely on data-fusion, such as grasp planning, motion planning, or object identification
Thirty Years of Machine Learning: The Road to Pareto-Optimal Wireless Networks
Future wireless networks have a substantial potential in terms of supporting
a broad range of complex compelling applications both in military and civilian
fields, where the users are able to enjoy high-rate, low-latency, low-cost and
reliable information services. Achieving this ambitious goal requires new radio
techniques for adaptive learning and intelligent decision making because of the
complex heterogeneous nature of the network structures and wireless services.
Machine learning (ML) algorithms have great success in supporting big data
analytics, efficient parameter estimation and interactive decision making.
Hence, in this article, we review the thirty-year history of ML by elaborating
on supervised learning, unsupervised learning, reinforcement learning and deep
learning. Furthermore, we investigate their employment in the compelling
applications of wireless networks, including heterogeneous networks (HetNets),
cognitive radios (CR), Internet of things (IoT), machine to machine networks
(M2M), and so on. This article aims for assisting the readers in clarifying the
motivation and methodology of the various ML algorithms, so as to invoke them
for hitherto unexplored services as well as scenarios of future wireless
networks.Comment: 46 pages, 22 fig
Human Motion Trajectory Prediction: A Survey
With growing numbers of intelligent autonomous systems in human environments,
the ability of such systems to perceive, understand and anticipate human
behavior becomes increasingly important. Specifically, predicting future
positions of dynamic agents and planning considering such predictions are key
tasks for self-driving vehicles, service robots and advanced surveillance
systems. This paper provides a survey of human motion trajectory prediction. We
review, analyze and structure a large selection of work from different
communities and propose a taxonomy that categorizes existing methods based on
the motion modeling approach and level of contextual information used. We
provide an overview of the existing datasets and performance metrics. We
discuss limitations of the state of the art and outline directions for further
research.Comment: Submitted to the International Journal of Robotics Research (IJRR),
37 page
Aprendizagem de coordenação em sistemas multi-agente
The ability for an agent to coordinate with others within a system is a
valuable property in multi-agent systems. Agents either cooperate as a team
to accomplish a common goal, or adapt to opponents to complete different
goals without being exploited. Research has shown that learning multi-agent
coordination is significantly more complex than learning policies in singleagent
environments, and requires a variety of techniques to deal with the
properties of a system where agents learn concurrently. This thesis aims to
determine how can machine learning be used to achieve coordination within
a multi-agent system. It asks what techniques can be used to tackle the
increased complexity of such systems and their credit assignment challenges,
how to achieve coordination, and how to use communication to improve the
behavior of a team.
Many algorithms for competitive environments are tabular-based, preventing
their use with high-dimension or continuous state-spaces, and may be
biased against specific equilibrium strategies. This thesis proposes multiple
deep learning extensions for competitive environments, allowing algorithms
to reach equilibrium strategies in complex and partially-observable environments,
relying only on local information. A tabular algorithm is also extended
with a new update rule that eliminates its bias against deterministic strategies.
Current state-of-the-art approaches for cooperative environments rely
on deep learning to handle the environment’s complexity and benefit from a
centralized learning phase. Solutions that incorporate communication between
agents often prevent agents from being executed in a distributed
manner. This thesis proposes a multi-agent algorithm where agents learn
communication protocols to compensate for local partial-observability, and
remain independently executed. A centralized learning phase can incorporate
additional environment information to increase the robustness and speed with
which a team converges to successful policies. The algorithm outperforms
current state-of-the-art approaches in a wide variety of multi-agent environments.
A permutation invariant network architecture is also proposed
to increase the scalability of the algorithm to large team sizes. Further research
is needed to identify how can the techniques proposed in this thesis,
for cooperative and competitive environments, be used in unison for mixed
environments, and whether they are adequate for general artificial intelligence.A capacidade de um agente se coordenar com outros num sistema é uma
propriedade valiosa em sistemas multi-agente. Agentes cooperam como
uma equipa para cumprir um objetivo comum, ou adaptam-se aos oponentes
de forma a completar objetivos egoístas sem serem explorados. Investigação
demonstra que aprender coordenação multi-agente é significativamente
mais complexo que aprender estratégias em ambientes com um
único agente, e requer uma variedade de técnicas para lidar com um ambiente
onde agentes aprendem simultaneamente. Esta tese procura determinar
como aprendizagem automática pode ser usada para encontrar coordenação
em sistemas multi-agente. O documento questiona que técnicas podem ser
usadas para enfrentar a superior complexidade destes sistemas e o seu desafio
de atribuição de crédito, como aprender coordenação, e como usar
comunicação para melhorar o comportamento duma equipa.
Múltiplos algoritmos para ambientes competitivos são tabulares, o que impede
o seu uso com espaços de estado de alta-dimensão ou contínuos, e
podem ter tendências contra estratégias de equilíbrio específicas. Esta tese
propõe múltiplas extensões de aprendizagem profunda para ambientes competitivos,
permitindo a algoritmos atingir estratégias de equilíbrio em ambientes
complexos e parcialmente-observáveis, com base em apenas informação
local. Um algoritmo tabular é também extendido com um novo critério de
atualização que elimina a sua tendência contra estratégias determinísticas.
Atuais soluções de estado-da-arte para ambientes cooperativos têm base em
aprendizagem profunda para lidar com a complexidade do ambiente, e beneficiam
duma fase de aprendizagem centralizada. Soluções que incorporam
comunicação entre agentes frequentemente impedem os próprios de ser executados
de forma distribuída. Esta tese propõe um algoritmo multi-agente
onde os agentes aprendem protocolos de comunicação para compensarem
por observabilidade parcial local, e continuam a ser executados de forma
distribuída. Uma fase de aprendizagem centralizada pode incorporar informação
adicional sobre ambiente para aumentar a robustez e velocidade
com que uma equipa converge para estratégias bem-sucedidas. O algoritmo
ultrapassa abordagens estado-da-arte atuais numa grande variedade de ambientes
multi-agente. Uma arquitetura de rede invariante a permutações é
também proposta para aumentar a escalabilidade do algoritmo para grandes
equipas. Mais pesquisa é necessária para identificar como as técnicas propostas
nesta tese, para ambientes cooperativos e competitivos, podem ser
usadas em conjunto para ambientes mistos, e averiguar se são adequadas a
inteligência artificial geral.Apoio financeiro da FCT e do FSE no âmbito do III Quadro Comunitário de ApoioPrograma Doutoral em Informátic
- …