Search CORE

7 research outputs found

Answer Set Programming for Non-Stationary Markov Decision Processes

Author: C Baral
CJCH Watkins
E Even-dar
E Even-Dar
J Babb
JY Yu
Leonardo A. Ferreira
M Balduccini
M Balduccini
M Gelfond
M Nogueira
Paulo E. Santos
R Bellman
R Bellman
Ramon Lopez de Mantaras
Reinaldo A. C. Bianchi
S Zhang
V Lifschitz
Publication venue
Publication date: 03/05/2017
Field of study

Non-stationary domains, where unforeseen changes happen, present a challenge for agents to find an optimal policy for a sequential decision making problem. This work investigates a solution to this problem that combines Markov Decision Processes (MDP) and Reinforcement Learning (RL) with Answer Set Programming (ASP) in a method we call ASP(RL). In this method, Answer Set Programming is used to find the possible trajectories of an MDP, from where Reinforcement Learning is applied to learn the optimal policy of the problem. Results show that ASP(RL) is capable of efficiently finding the optimal solution of an MDP representing non-stationary domains

arXiv.org e-Print Archive

Crossref

Digital.CSIC

Learning and Reasoning for Robot Sequential Decision Making under Uncertainty

Author: Amiri Saeid
Shirazi Mohammad Shokrolah
Zhang Shiqi
Publication venue
Publication date: 10/12/2019
Field of study

Robots frequently face complex tasks that require more than one action, where sequential decision-making (SDM) capabilities become necessary. The key contribution of this work is a robot SDM framework, called LCORPP, that supports the simultaneous capabilities of supervised learning for passive state estimation, automated reasoning with declarative human knowledge, and planning under uncertainty toward achieving long-term goals. In particular, we use a hybrid reasoning paradigm to refine the state estimator, and provide informative priors for the probabilistic planner. In experiments, a mobile robot is tasked with estimating human intentions using their motion trajectories, declarative contextual knowledge, and human-robot interaction (dialog-based and motion-based). Results suggest that, in efficiency and accuracy, our framework performs better than its no-learning and no-reasoning counterparts in office environment.Comment: In proceedings of 34th AAAI conference on Artificial Intelligence, 202

arXiv.org e-Print Archive

Association for the Advancement of Artificial Intelligence: AAAI Publications

iCORPP: Interleaved Commonsense Reasoning and Probabilistic Planning on Robots

Author: Stone Peter
Zhang Shiqi
Publication venue
Publication date: 18/04/2020
Field of study

Robot sequential decision-making in the real world is a challenge because it requires the robots to simultaneously reason about the current world state and dynamics, while planning actions to accomplish complex tasks. On the one hand, declarative languages and reasoning algorithms well support representing and reasoning with commonsense knowledge. But these algorithms are not good at planning actions toward maximizing cumulative reward over a long, unspecified horizon. On the other hand, probabilistic planning frameworks, such as Markov decision processes (MDPs) and partially observable MDPs (POMDPs), well support planning to achieve long-term goals under uncertainty. But they are ill-equipped to represent or reason about knowledge that is not directly related to actions. In this article, we present a novel algorithm, called iCORPP, to simultaneously estimate the current world state, reason about world dynamics, and construct task-oriented controllers. In this process, robot decision-making problems are decomposed into two interdependent (smaller) subproblems that focus on reasoning to "understand the world" and planning to "achieve the goal" respectively. Contextual knowledge is represented in the reasoning component, which makes the planning component epistemic and enables active information gathering. The developed algorithm has been implemented and evaluated both in simulation and on real robots using everyday service tasks, such as indoor navigation, dialog management, and object delivery. Results show significant improvements in scalability, efficiency, and adaptiveness, compared to competitive baselines including handcrafted action policies

arXiv.org e-Print Archive

Answer set programming for non-stationary Markov decision processes

Author: C Baral
CJCH Watkins
E Even-dar
E Even-Dar
J Babb
JY Yu
Leonardo A. Ferreira
M Balduccini
M Balduccini
M Gelfond
M Nogueira
Paulo E. Santos
R Bellman
R Bellman
Ramon Lopez de Mantaras
Reinaldo A. C. Bianchi
S Zhang
V Lifschitz
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref