477 research outputs found
Markov Decision Processes with Applications in Wireless Sensor Networks: A Survey
Wireless sensor networks (WSNs) consist of autonomous and resource-limited
devices. The devices cooperate to monitor one or more physical phenomena within
an area of interest. WSNs operate as stochastic systems because of randomness
in the monitored environments. For long service time and low maintenance cost,
WSNs require adaptive and robust methods to address data exchange, topology
formulation, resource and power optimization, sensing coverage and object
detection, and security challenges. In these problems, sensor nodes are to make
optimized decisions from a set of accessible strategies to achieve design
goals. This survey reviews numerous applications of the Markov decision process
(MDP) framework, a powerful decision-making tool to develop adaptive algorithms
and protocols for WSNs. Furthermore, various solution methods are discussed and
compared to serve as a guide for using MDPs in WSNs
Review of Markov models for maintenance optimization in the context of offshore wind
The offshore environment poses a number of challenges to wind farm operators. Harsher climatic conditions typically result in lower reliability while challenges in accessibility make maintenance difficult. One of the ways to improve availability is to optimize the Operation and Maintenance (O&M) actions such as scheduled, corrective and proactive maintenance. Many authors have attempted to model or optimize O&M through the use of Markov models. Two examples of Markov models, Hidden Markov Models (HMMs) and Partially Observable Markov Decision Processes (POMDPs) are investigated in this paper. In general, Markov models are a powerful statistical tool, which has been successfully applied for component diagnostics, prognostics and maintenance optimization across a range of industries. This paper discusses the suitability of these models to the offshore wind industry. Existing models which have been created for the wind industry are critically reviewed and discussed. As there is little evidence of widespread application of these models, this paper aims to highlight the key factors required for successful application of Markov models to practical problems. From this, the paper identifies the necessary theoretical and practical gaps that must be resolved in order to gain broad acceptance of Markov models to support O&M decision making in the offshore wind industry
Multi-agent deep reinforcement learning with centralized training and decentralized execution for transportation infrastructure management
We present a multi-agent Deep Reinforcement Learning (DRL) framework for
managing large transportation infrastructure systems over their life-cycle.
Life-cycle management of such engineering systems is a computationally
intensive task, requiring appropriate sequential inspection and maintenance
decisions able to reduce long-term risks and costs, while dealing with
different uncertainties and constraints that lie in high-dimensional spaces. To
date, static age- or condition-based maintenance methods and risk-based or
periodic inspection plans have mostly addressed this class of optimization
problems. However, optimality, scalability, and uncertainty limitations are
often manifested under such approaches. The optimization problem in this work
is cast in the framework of constrained Partially Observable Markov Decision
Processes (POMDPs), which provides a comprehensive mathematical basis for
stochastic sequential decision settings with observation uncertainties, risk
considerations, and limited resources. To address significantly large state and
action spaces, a Deep Decentralized Multi-agent Actor-Critic (DDMAC) DRL method
with Centralized Training and Decentralized Execution (CTDE), termed as
DDMAC-CTDE is developed. The performance strengths of the DDMAC-CTDE method are
demonstrated in a generally representative and realistic example application of
an existing transportation network in Virginia, USA. The network includes
several bridge and pavement components with nonstationary degradation,
agency-imposed constraints, and traffic delay and risk considerations. Compared
to traditional management policies for transportation networks, the proposed
DDMAC-CTDE method vastly outperforms its counterparts. Overall, the proposed
algorithmic framework provides near optimal solutions for transportation
infrastructure management under real-world constraints and complexities
Continuous-observation partially observable semi-Markov decision processes for machine maintenance
Partially observable semi-Markov decision processes (POSMDPs) provide a rich framework for planning under both state transition uncertainty and observation uncertainty. In this paper, we widen the literature on POSMDP by studying discrete-state, discrete-action yet continuous-observation POSMDPs. We prove that the resultant α-vector set is continuous and therefore propose a point-based value iteration algorithm. This paper also bridges the gap between POSMDP and machine maintenance by incorporating various types of maintenance actions, such as actions changing machine state, actions changing degradation rate, and the temporally extended action "do nothing''. Both finite and infinite planning horizons are reviewed, and the solution methodology for each type of planning horizon is given. We illustrate the maintenance decision process via a real industrial problem and demonstrate that the developed framework can be readily applied to solve relevant maintenance problems
Life-cycle policies for large engineering systems under complete and partial observability
Management of structures and infrastructure systems has gained significant attention in the pursuit of optimal inspection and maintenance life-cycle policies that are able to handle diverse deteriorating effects of stochastic nature and satisfy long-term objectives. Such sequential decision problems can be efficiently formulated along the premises of Markov Decision Processes (MDP) and Partially Observable Markov Decision Processes (POMDP), which describe agent-based acting in environments with Markovian dynamics, equipped with rewards, actions, and complete or partial observations. In systems with relatively low dimensional state and action spaces, MDPs and POMDPs can be satisfactorily solved using different dynamic programming algorithms, such as value iteration with or without synchronous updates and pointbased approaches for partial observability cases. However, optimal planning for large systems with multiple components is computationally hard and severely suffers from the curse of dimensionality. Namely, the system states and actions can grow exponentially with the number of components, in the most general and adverse case, making the problem intractable by conventional dynamic programming schemes. In this work, Deep Reinforcement Learning (DRL) is implemented, with emphasis in the development and application of deep architectures, suitable for large engineering systems. The developed approach leverages componentwise information to prescribe component-wise actions, while maintaining global optimality on the system level. Thereby, the system life-cycle cost functions are efficiently parametrized for large state and action spaces through nonlinear approximations, enabling adept planning in complex decision problems. Results are presented for a multi-component system, evaluated against various condition-based policies.This material is based upon work supported by the National Science Foundation under CAREER Grant No. 1751941
Understanding Behavior via inverse reinforcement learning
Tese de Mestrado Integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
Recommended from our members
Bayesian Filtering Methods For Dynamic System Monitoring and Control
Real-time system monitoring and control represent two of the most important issues that characterize modern industries in critical areas of civilian and military interest, including the power grid, energy, healthcare, aerospace, and infrastructure. During the past decade, there has been a rapid development of robust dynamic system monitoring and control methods for fault diagnosis and failure prognosis. Among various monitoring and control policies, condition-based maintenance (CBM) has been studied by many researchers due to its ability to enable a large amount of monitoring data for real-time diagnostics and prognostics. A considerable amount of literature has been published on the subject, providing a large volume of dynamic system control methods. Previously published studies are limited by assumptions that can generally be distinguished into three main categories: i) predefined system failure thresholds, ii) simplified latent dynamics, and iii) unrealistic parametric forms that describe the evolution of system dynamics through time. This thesis provides an array of solution approaches that overcome the aforementioned assumptions in a smart and effective way by introducing novel quantitative frameworks for real-time monitoring, control, and decision-making for dynamic systems. The proposed frameworks are categorized into two main phases of a comprehensive framework. The first phase contains two original Bayesian filtering methods for condition monitoring and control of systems with either linear or non-linear degradation dynamics. The former is designed only for systems with linear latent and observable dynamics and utilizes Kalman filtering for state-parameter inference. It considers a failure process that is purely stochastic and is based on logistic regression. This process is directly affected by the latent system dynamics, therefore avoiding the need for a priori failure thresholds. The latter takes into consideration multiple levels of system dynamics that evolve either linearly or non-linearly. A hybrid particle filter is developed for state-parameter inference, while an Extreme Learning Machine artificial neural network is utilized to relate sensor observations to latent system dynamics. Both frameworks are tested and validated on synthetic and real-world time-series datasets. The second phase of this thesis introduces an original method for optimal control and decision-making that employs Bayesian filtering-based deep reinforcement learning with fully stochastic environments. Sets of deep reinforcement learning agents were trained to develop control policies. Bayesian filtering methods from the first phase were utilized to provide environment states that use the estimates from latent system dynamics. This method is used in two different applications for maintenance cost minimization and estimating the remaining useful life of a system under condition monitoring. Results obtained from applying the framework on simulated and real-world time-series data suggest that the proposed Bayesian filtering-based deep reinforcement learning algorithm can be trained even with limited data, which can be useful for real-time control and decision making for many dynamic systems
- …