9,794 research outputs found

    Stochastic Shortest Path with Energy Constraints in POMDPs

    Full text link
    We consider partially observable Markov decision processes (POMDPs) with a set of target states and positive integer costs associated with every transition. The traditional optimization objective (stochastic shortest path) asks to minimize the expected total cost until the target set is reached. We extend the traditional framework of POMDPs to model energy consumption, which represents a hard constraint. The energy levels may increase and decrease with transitions, and the hard constraint requires that the energy level must remain positive in all steps till the target is reached. First, we present a novel algorithm for solving POMDPs with energy levels, developing on existing POMDP solvers and using RTDP as its main method. Our second contribution is related to policy representation. For larger POMDP instances the policies computed by existing solvers are too large to be understandable. We present an automated procedure based on machine learning techniques that automatically extracts important decisions of the policy allowing us to compute succinct human readable policies. Finally, we show experimentally that our algorithm performs well and computes succinct policies on a number of POMDP instances from the literature that were naturally enhanced with energy levels.Comment: Technical report accompanying a paper published in proceedings of AAMAS 201

    Hierarchical State Abstractions for Decision-Making Problems with Computational Constraints

    Full text link
    In this semi-tutorial paper, we first review the information-theoretic approach to account for the computational costs incurred during the search for optimal actions in a sequential decision-making problem. The traditional (MDP) framework ignores computational limitations while searching for optimal policies, essentially assuming that the acting agent is perfectly rational and aims for exact optimality. Using the free-energy, a variational principle is introduced that accounts not only for the value of a policy alone, but also considers the cost of finding this optimal policy. The solution of the variational equations arising from this formulation can be obtained using familiar Bellman-like value iterations from dynamic programming (DP) and the Blahut-Arimoto (BA) algorithm from rate distortion theory. Finally, we demonstrate the utility of the approach for generating hierarchies of state abstractions that can be used to best exploit the available computational resources. A numerical example showcases these concepts for a path-planning problem in a grid world environment

    Autonomous Wireless Systems with Artificial Intelligence

    Full text link
    This paper discusses technology and opportunities to embrace artificial intelligence (AI) in the design of autonomous wireless systems. We aim to provide readers with motivation and general AI methodology of autonomous agents in the context of self-organization in real time by unifying knowledge management with sensing, reasoning and active learning. We highlight differences between training-based methods for matching problems and training-free methods for environment-specific problems. Finally, we conceptually introduce the functions of an autonomous agent with knowledge management

    Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

    Full text link
    Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

    Control with Probabilistic Signal Temporal Logic

    Full text link
    Autonomous agents often operate in uncertain environments where their decisions are made based on beliefs over states of targets. We are interested in controller synthesis for complex tasks defined over belief spaces. Designing such controllers is challenging due to computational complexity and the lack of expressivity of existing specification languages. In this paper, we propose a probabilistic extension to signal temporal logic (STL) that expresses tasks over continuous belief spaces. We present an efficient synthesis algorithm to find a control input that maximises the probability of satisfying a given task. We validate our algorithm through simulations of an unmanned aerial vehicle deployed for surveillance and search missions.Comment: 7 pages, submitted to the 2016 American Control Conference (ACC 2016) on September, 30, 2015 (under review

    Data Management in Industry 4.0: State of the Art and Open Challenges

    Full text link
    Information and communication technologies are permeating all aspects of industrial and manufacturing systems, expediting the generation of large volumes of industrial data. This article surveys the recent literature on data management as it applies to networked industrial environments and identifies several open research challenges for the future. As a first step, we extract important data properties (volume, variety, traffic, criticality) and identify the corresponding data enabling technologies of diverse fundamental industrial use cases, based on practical applications. Secondly, we provide a detailed outline of recent industrial architectural designs with respect to their data management philosophy (data presence, data coordination, data computation) and the extent of their distributiveness. Then, we conduct a holistic survey of the recent literature from which we derive a taxonomy of the latest advances on industrial data enabling technologies and data centric services, spanning all the way from the field level deep in the physical deployments, up to the cloud and applications level. Finally, motivated by the rich conclusions of this critical analysis, we identify interesting open challenges for future research. The concepts presented in this article thematically cover the largest part of the industrial automation pyramid layers. Our approach is multidisciplinary, as the selected publications were drawn from two fields; the communications, networking and computation field as well as the industrial, manufacturing and automation field. The article can help the readers to deeply understand how data management is currently applied in networked industrial environments, and select interesting open research opportunities to pursue

    Control with probabilistic signal temporal logic

    Full text link
    Autonomous agents often operate in uncertain environments where their decisions are made based on beliefs over states of targets. We are interested in controller synthesis for complex tasks defined over belief spaces. Designing such controllers is challenging due to computational complexity and the lack of expressivity of existing specification languages. In this paper, we propose a probabilistic extension to signal temporal logic (STL) that expresses tasks over continuous belief spaces. We present an efficient synthesis algorithm to find a control input that maximises the probability of satisfying a given task. We validate our algorithm through simulations of an unmanned aerial vehicle deployed for surveillance and search missions

    Risk-Sensitive Reinforcement Learning Applied to Control under Constraints

    Full text link
    In this paper, we consider Markov Decision Processes (MDPs) with error states. Error states are those states entering which is undesirable or dangerous. We define the risk with respect to a policy as the probability of entering such a state when the policy is pursued. We consider the problem of finding good policies whose risk is smaller than some user-specified threshold, and formalize it as a constrained MDP with two criteria. The first criterion corresponds to the value function originally given. We will show that the risk can be formulated as a second criterion function based on a cumulative return, whose definition is independent of the original value function. We present a model free, heuristic reinforcement learning algorithm that aims at finding good deterministic policies. It is based on weighting the original value function and the risk. The weight parameter is adapted in order to find a feasible solution for the constrained problem that has a good performance with respect to the value function. The algorithm was successfully applied to the control of a feed tank with stochastic inflows that lies upstream of a distillation column. This control task was originally formulated as an optimal control problem with chance constraints, and it was solved under certain assumptions on the model to obtain an optimal solution. The power of our learning algorithm is that it can be used even when some of these restrictive assumptions are relaxed

    A Survey on Artificial Intelligence and Data Mining for MOOCs

    Full text link
    Massive Open Online Courses (MOOCs) have gained tremendous popularity in the last few years. Thanks to MOOCs, millions of learners from all over the world have taken thousands of high-quality courses for free. Putting together an excellent MOOC ecosystem is a multidisciplinary endeavour that requires contributions from many different fields. Artificial intelligence (AI) and data mining (DM) are two such fields that have played a significant role in making MOOCs what they are today. By exploiting the vast amount of data generated by learners engaging in MOOCs, DM improves our understanding of the MOOC ecosystem and enables MOOC practitioners to deliver better courses. Similarly, AI, supported by DM, can greatly improve student experience and learning outcomes. In this survey paper, we first review the state-of-the-art artificial intelligence and data mining research applied to MOOCs, emphasising the use of AI and DM tools and techniques to improve student engagement, learning outcomes, and our understanding of the MOOC ecosystem. We then offer an overview of key trends and important research to carry out in the fields of AI and DM so that MOOCs can reach their full potential.Comment: Working Pape

    Belief Space Scheduling

    Get PDF
    This thesis develops the belief space scheduling framework for scheduling under uncertainty in Stochastic Collection and Replenishment (SCAR) scenarios. SCAR scenarios involve the transportation of a resource such as fuel to agents operating in the field. Key characteristics of this scenario are persistent operation of the agents, and consideration of uncertainty. Belief space scheduling performs optimisation on probability distributions describing the state of the system. It consists of three major components---estimation of the current system state given uncertain sensor readings, prediction of the future state given a schedule of tasks, and optimisation of the schedule of the replenishing agents. The state estimation problem is complicated by a number of constraints that act on the state. A novel extension of the truncated Kalman Filter is developed for soft constraints that have uncertainty described by a Gaussian distribution. This is shown to outperform existing estimation methods, striking a balance between the high uncertainty of methods that ignore the constraints and the overconfidence of methods that ignore the uncertainty of the constraints. To predict the future state of the system, a novel analytical, continuous-time framework is proposed. This framework uses multiple Gaussian approximations to propagate the probability distributions describing the system state into the future. It is compared with a Monte Carlo framework and is shown to provide similar discrimination performance while computing, in most cases, orders of magnitude faster. Finally, several branch and bound tree search methods are developed for the optimisation problem. These methods focus optimisation efforts on earlier tasks within a model predictive control-like framework. Combined with the estimation and prediction methods, these are shown to outperform existing approaches
    corecore