Search CORE

477 research outputs found

Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey

Author: Alsheikh Mohammad Abu
Hoang Dinh Thai
Lin Shaowei
Niyato Dusit
Tan Hwee-Pink
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 04/01/2015
Field of study

Wireless sensor networks (WSNs) consist of autonomous and resource-limited devices. The devices cooperate to monitor one or more physical phenomena within an area of interest. WSNs operate as stochastic systems because of randomness in the monitored environments. For long service time and low maintenance cost, WSNs require adaptive and robust methods to address data exchange, topology formulation, resource and power optimization, sensing coverage and object detection, and security challenges. In these problems, sensor nodes are to make optimized decisions from a set of accessible strategies to achieve design goals. This survey reviews numerous applications of the Markov decision process (MDP) framework, a powerful decision-making tool to develop adaptive algorithms and protocols for WSNs. Furthermore, various solution methods are discussed and compared to serve as a guide for using MDPs in WSNs

arXiv.org e-Print Archive

University of Canberra Research Repository

Review of Markov models for maintenance optimization in the context of offshore wind

Author: Dawid Rafael
McMillan David
Revie Matthew
Publication venue: 'PHM Society'
Publication date: 25/10/2015
Field of study

The offshore environment poses a number of challenges to wind farm operators. Harsher climatic conditions typically result in lower reliability while challenges in accessibility make maintenance difficult. One of the ways to improve availability is to optimize the Operation and Maintenance (O&M) actions such as scheduled, corrective and proactive maintenance. Many authors have attempted to model or optimize O&M through the use of Markov models. Two examples of Markov models, Hidden Markov Models (HMMs) and Partially Observable Markov Decision Processes (POMDPs) are investigated in this paper. In general, Markov models are a powerful statistical tool, which has been successfully applied for component diagnostics, prognostics and maintenance optimization across a range of industries. This paper discusses the suitability of these models to the offshore wind industry. Existing models which have been created for the wind industry are critically reviewed and discussed. As there is little evidence of widespread application of these models, this paper aims to highlight the key factors required for successful application of Markov models to practical problems. From this, the paper identifies the necessary theoretical and practical gaps that must be resolved in order to gain broad acceptance of Markov models to support O&M decision making in the offshore wind industry

University of Strathclyde Institutional Repository

Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management

Author: Andriotis C. P.
Papakonstantinou K. G.
Saifullah M.
Stoffels S. M.
Publication venue
Publication date: 22/01/2024
Field of study

We present a multi-agent Deep Reinforcement Learning (DRL) framework for managing large transportation infrastructure systems over their life-cycle. Life-cycle management of such engineering systems is a computationally intensive task, requiring appropriate sequential inspection and maintenance decisions able to reduce long-term risks and costs, while dealing with different uncertainties and constraints that lie in high-dimensional spaces. To date, static age- or condition-based maintenance methods and risk-based or periodic inspection plans have mostly addressed this class of optimization problems. However, optimality, scalability, and uncertainty limitations are often manifested under such approaches. The optimization problem in this work is cast in the framework of constrained Partially Observable Markov Decision Processes (POMDPs), which provides a comprehensive mathematical basis for stochastic sequential decision settings with observation uncertainties, risk considerations, and limited resources. To address significantly large state and action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method with Centralized Training and Decentralized Execution (CTDE), termed as DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are demonstrated in a generally representative and realistic example application of an existing transportation network in Virginia, USA. The network includes several bridge and pavement components with nonstationary degradation, agency-imposed constraints, and traffic delay and risk considerations. Compared to traditional management policies for transportation networks, the proposed DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed algorithmic framework provides near optimal solutions for transportation infrastructure management under real-world constraints and complexities

arXiv.org e-Print Archive

Continuous-observation partially observable semi-Markov decision processes for machine maintenance

Author: Revie Matthew
Zhang Mimi
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 07/11/2016
Field of study

Partially observable semi-Markov decision processes (POSMDPs) provide a rich framework for planning under both state transition uncertainty and observation uncertainty. In this paper, we widen the literature on POSMDP by studying discrete-state, discrete-action yet continuous-observation POSMDPs. We prove that the resultant α-vector set is continuous and therefore propose a point-based value iteration algorithm. This paper also bridges the gap between POSMDP and machine maintenance by incorporating various types of maintenance actions, such as actions changing machine state, actions changing degradation rate, and the temporally extended action "do nothing''. Both finite and infinite planning horizons are reviewed, and the solution methodology for each type of planning horizon is given. We illustrate the maintenance decision process via a real industrial problem and demonstrate that the developed framework can be readily applied to solve relevant maintenance problems

Crossref

University of Strathclyde Institutional Repository

Irish Universities

Life-cycle policies for large engineering systems under complete and partial observability

Author: Andriotis Charalampos
Papakonstantinou Kostas
Publication venue
Publication date: 26/05/2019
Field of study

Management of structures and infrastructure systems has gained significant attention in the pursuit of optimal inspection and maintenance life-cycle policies that are able to handle diverse deteriorating effects of stochastic nature and satisfy long-term objectives. Such sequential decision problems can be efficiently formulated along the premises of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP), which describe agent-based acting in environments with Markovian dynamics, equipped with rewards, actions, and complete or partial observations. In systems with relatively low dimensional state and action spaces, MDPs and POMDPs can be satisfactorily solved using different dynamic programming algorithms, such as value iteration with or without synchronous updates and pointbased approaches for partial observability cases. However, optimal planning for large systems with multiple components is computationally hard and severely suffers from the curse of dimensionality. Namely, the system states and actions can grow exponentially with the number of components, in the most general and adverse case, making the problem intractable by conventional dynamic programming schemes. In this work, Deep Reinforcement Learning (DRL) is implemented, with emphasis in the development and application of deep architectures, suitable for large engineering systems. The developed approach leverages componentwise information to prescribe component-wise actions, while maintaining global optimality on the system level. Thereby, the system life-cycle cost functions are efficiently parametrized for large state and action spaces through nonlinear approximations, enabling adept planning in complex decision problems. Results are presented for a multi-component system, evaluated against various condition-based policies.This material is based upon work supported by the National Science Foundation under CAREER Grant No. 1751941

SNU Open Repository and Archive

Understanding Behavior via inverse reinforcement learning

Author: Araújo Miguel Ramos de
Publication venue
Publication date: 01/01/2012
Field of study

Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201

Repositório Aberto da Universidade do Porto

Recommended from our members

Bayesian Filtering Methods For Dynamic System Monitoring and Control

Author: Skordilis Erotokritos
Publication venue: Scholarly Repository
Publication date: 17/10/2019
Field of study

Real-time system monitoring and control represent two of the most important issues that characterize modern industries in critical areas of civilian and military interest, including the power grid, energy, healthcare, aerospace, and infrastructure. During the past decade, there has been a rapid development of robust dynamic system monitoring and control methods for fault diagnosis and failure prognosis. Among various monitoring and control policies, condition-based maintenance (CBM) has been studied by many researchers due to its ability to enable a large amount of monitoring data for real-time diagnostics and prognostics. A considerable amount of literature has been published on the subject, providing a large volume of dynamic system control methods. Previously published studies are limited by assumptions that can generally be distinguished into three main categories: i) predefined system failure thresholds, ii) simplified latent dynamics, and iii) unrealistic parametric forms that describe the evolution of system dynamics through time. This thesis provides an array of solution approaches that overcome the aforementioned assumptions in a smart and effective way by introducing novel quantitative frameworks for real-time monitoring, control, and decision-making for dynamic systems. The proposed frameworks are categorized into two main phases of a comprehensive framework. The first phase contains two original Bayesian filtering methods for condition monitoring and control of systems with either linear or non-linear degradation dynamics. The former is designed only for systems with linear latent and observable dynamics and utilizes Kalman filtering for state-parameter inference. It considers a failure process that is purely stochastic and is based on logistic regression. This process is directly affected by the latent system dynamics, therefore avoiding the need for a priori failure thresholds. The latter takes into consideration multiple levels of system dynamics that evolve either linearly or non-linearly. A hybrid particle filter is developed for state-parameter inference, while an Extreme Learning Machine artificial neural network is utilized to relate sensor observations to latent system dynamics. Both frameworks are tested and validated on synthetic and real-world time-series datasets. The second phase of this thesis introduces an original method for optimal control and decision-making that employs Bayesian filtering-based deep reinforcement learning with fully stochastic environments. Sets of deep reinforcement learning agents were trained to develop control policies. Bayesian filtering methods from the first phase were utilized to provide environment states that use the estimates from latent system dynamics. This method is used in two different applications for maintenance cost minimization and estimating the remaining useful life of a system under condition monitoring. Results obtained from applying the framework on simulated and real-world time-series data suggest that the proposed Bayesian filtering-based deep reinforcement learning algorithm can be trained even with limited data, which can be useful for real-time control and decision making for many dynamic systems

University of Miami: Scholarship Miami