9,794 research outputs found
Stochastic Shortest Path with Energy Constraints in POMDPs
We consider partially observable Markov decision processes (POMDPs) with a
set of target states and positive integer costs associated with every
transition. The traditional optimization objective (stochastic shortest path)
asks to minimize the expected total cost until the target set is reached. We
extend the traditional framework of POMDPs to model energy consumption, which
represents a hard constraint. The energy levels may increase and decrease with
transitions, and the hard constraint requires that the energy level must remain
positive in all steps till the target is reached. First, we present a novel
algorithm for solving POMDPs with energy levels, developing on existing POMDP
solvers and using RTDP as its main method. Our second contribution is related
to policy representation. For larger POMDP instances the policies computed by
existing solvers are too large to be understandable. We present an automated
procedure based on machine learning techniques that automatically extracts
important decisions of the policy allowing us to compute succinct human
readable policies. Finally, we show experimentally that our algorithm performs
well and computes succinct policies on a number of POMDP instances from the
literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of
AAMAS 201
Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints
In this semi-tutorial paper, we first review the information-theoretic
approach to account for the computational costs incurred during the search for
optimal actions in a sequential decision-making problem. The traditional (MDP)
framework ignores computational limitations while searching for optimal
policies, essentially assuming that the acting agent is perfectly rational and
aims for exact optimality. Using the free-energy, a variational principle is
introduced that accounts not only for the value of a policy alone, but also
considers the cost of finding this optimal policy. The solution of the
variational equations arising from this formulation can be obtained using
familiar Bellman-like value iterations from dynamic programming (DP) and the
Blahut-Arimoto (BA) algorithm from rate distortion theory. Finally, we
demonstrate the utility of the approach for generating hierarchies of state
abstractions that can be used to best exploit the available computational
resources. A numerical example showcases these concepts for a path-planning
problem in a grid world environment
Autonomous Wireless Systems with Artificial Intelligence
This paper discusses technology and opportunities to embrace artificial
intelligence (AI) in the design of autonomous wireless systems. We aim to
provide readers with motivation and general AI methodology of autonomous agents
in the context of self-organization in real time by unifying knowledge
management with sensing, reasoning and active learning. We highlight
differences between training-based methods for matching problems and
training-free methods for environment-specific problems. Finally, we
conceptually introduce the functions of an autonomous agent with knowledge
management
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Control with Probabilistic Signal Temporal Logic
Autonomous agents often operate in uncertain environments where their
decisions are made based on beliefs over states of targets. We are interested
in controller synthesis for complex tasks defined over belief spaces. Designing
such controllers is challenging due to computational complexity and the lack of
expressivity of existing specification languages. In this paper, we propose a
probabilistic extension to signal temporal logic (STL) that expresses tasks
over continuous belief spaces. We present an efficient synthesis algorithm to
find a control input that maximises the probability of satisfying a given task.
We validate our algorithm through simulations of an unmanned aerial vehicle
deployed for surveillance and search missions.Comment: 7 pages, submitted to the 2016 American Control Conference (ACC 2016)
on September, 30, 2015 (under review
Data Management in Industry 4.0: State of the Art and Open Challenges
Information and communication technologies are permeating all aspects of
industrial and manufacturing systems, expediting the generation of large
volumes of industrial data. This article surveys the recent literature on data
management as it applies to networked industrial environments and identifies
several open research challenges for the future. As a first step, we extract
important data properties (volume, variety, traffic, criticality) and identify
the corresponding data enabling technologies of diverse fundamental industrial
use cases, based on practical applications. Secondly, we provide a detailed
outline of recent industrial architectural designs with respect to their data
management philosophy (data presence, data coordination, data computation) and
the extent of their distributiveness. Then, we conduct a holistic survey of the
recent literature from which we derive a taxonomy of the latest advances on
industrial data enabling technologies and data centric services, spanning all
the way from the field level deep in the physical deployments, up to the cloud
and applications level. Finally, motivated by the rich conclusions of this
critical analysis, we identify interesting open challenges for future research.
The concepts presented in this article thematically cover the largest part of
the industrial automation pyramid layers. Our approach is multidisciplinary, as
the selected publications were drawn from two fields; the communications,
networking and computation field as well as the industrial, manufacturing and
automation field. The article can help the readers to deeply understand how
data management is currently applied in networked industrial environments, and
select interesting open research opportunities to pursue
Control with probabilistic signal temporal logic
Autonomous agents often operate in uncertain environments where their decisions are made based on beliefs over states of targets. We are interested in controller synthesis for complex tasks defined over belief spaces. Designing such controllers is challenging due to computational complexity and the lack of expressivity of existing specification languages. In this paper, we propose a probabilistic extension to signal temporal logic (STL) that expresses tasks over continuous belief spaces. We present an efficient synthesis algorithm to find a control input that maximises the probability of satisfying a given task. We validate our algorithm through simulations of an unmanned aerial vehicle deployed for surveillance and search missions
Risk-Sensitive Reinforcement Learning Applied to Control under Constraints
In this paper, we consider Markov Decision Processes (MDPs) with error
states. Error states are those states entering which is undesirable or
dangerous. We define the risk with respect to a policy as the probability of
entering such a state when the policy is pursued. We consider the problem of
finding good policies whose risk is smaller than some user-specified threshold,
and formalize it as a constrained MDP with two criteria. The first criterion
corresponds to the value function originally given. We will show that the risk
can be formulated as a second criterion function based on a cumulative return,
whose definition is independent of the original value function. We present a
model free, heuristic reinforcement learning algorithm that aims at finding
good deterministic policies. It is based on weighting the original value
function and the risk. The weight parameter is adapted in order to find a
feasible solution for the constrained problem that has a good performance with
respect to the value function. The algorithm was successfully applied to the
control of a feed tank with stochastic inflows that lies upstream of a
distillation column. This control task was originally formulated as an optimal
control problem with chance constraints, and it was solved under certain
assumptions on the model to obtain an optimal solution. The power of our
learning algorithm is that it can be used even when some of these restrictive
assumptions are relaxed
A Survey on Artificial Intelligence and Data Mining for MOOCs
Massive Open Online Courses (MOOCs) have gained tremendous popularity in the
last few years. Thanks to MOOCs, millions of learners from all over the world
have taken thousands of high-quality courses for free. Putting together an
excellent MOOC ecosystem is a multidisciplinary endeavour that requires
contributions from many different fields. Artificial intelligence (AI) and data
mining (DM) are two such fields that have played a significant role in making
MOOCs what they are today. By exploiting the vast amount of data generated by
learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem
and enables MOOC practitioners to deliver better courses. Similarly, AI,
supported by DM, can greatly improve student experience and learning outcomes.
In this survey paper, we first review the state-of-the-art artificial
intelligence and data mining research applied to MOOCs, emphasising the use of
AI and DM tools and techniques to improve student engagement, learning
outcomes, and our understanding of the MOOC ecosystem. We then offer an
overview of key trends and important research to carry out in the fields of AI
and DM so that MOOCs can reach their full potential.Comment: Working Pape
Belief Space Scheduling
This thesis develops the belief space scheduling framework for scheduling under uncertainty in Stochastic Collection and Replenishment (SCAR) scenarios. SCAR scenarios involve the transportation of a resource such as fuel to agents operating in the field. Key characteristics of this scenario are persistent operation of the agents, and consideration of uncertainty. Belief space scheduling performs optimisation on probability distributions describing the state of the system. It consists of three major components---estimation of the current system state given uncertain sensor readings, prediction of the future state given a schedule of tasks, and optimisation of the schedule of the replenishing agents. The state estimation problem is complicated by a number of constraints that act on the state. A novel extension of the truncated Kalman Filter is developed for soft constraints that have uncertainty described by a Gaussian distribution. This is shown to outperform existing estimation methods, striking a balance between the high uncertainty of methods that ignore the constraints and the overconfidence of methods that ignore the uncertainty of the constraints. To predict the future state of the system, a novel analytical, continuous-time framework is proposed. This framework uses multiple Gaussian approximations to propagate the probability distributions describing the system state into the future. It is compared with a Monte Carlo framework and is shown to provide similar discrimination performance while computing, in most cases, orders of magnitude faster. Finally, several branch and bound tree search methods are developed for the optimisation problem. These methods focus optimisation efforts on earlier tasks within a model predictive control-like framework. Combined with the estimation and prediction methods, these are shown to outperform existing approaches
- …