Search CORE

19 research outputs found

Online reinforcement learning for condition-based group maintenance using factored Markov decision processes

Author: Liu Bin
Wang Xiao-Ling
Xu Jianyu
Zhao Xiujie
Publication venue
Publication date: 28/11/2023
Field of study

We investigate a condition-based group maintenance problem for multi-component systems, where the degradation process of a specific component is affected only by its neighbouring ones, leading to a special type of stochastic dependence among components. We formulate the maintenance problem into a factored Markov decision process taking advantage of this dependence property, and develop a factored value iteration algorithm to efficiently approximate the optimal policy. Through both theoretical analyses and numerical experiments, we show that the algorithm can significantly reduce computational burden and improve efficiency in solving the optimization problem. Moreover, since model parameters are unknown a priori in most practical scenarios, we further develop an online reinforcement learning algorithm to simultaneously learn the model parameters and determine an optimal maintenance action upon each inspection. A novel feature of this online learning algorithm is that it is capable of learning both transition probabilities and system structure indicating the stochastic dependence among components. We discuss the error bound and sample complexity of the developed learning algorithm theoretically, and test its performance through numerical experiments. The results reveal that our algorithm can effectively learn the model parameters and approximate the optimal maintenance policy

University of Strathclyde Institutional Repository

Learning in Congestion Games with Bandit Feedback

Author: Cui Qiwen
Du Simon S.
Fazel Maryam
Xiong Zhihan
Publication venue
Publication date: 20/01/2023
Field of study

In this paper, we investigate Nash-regret minimization in congestion games, a class of games with benign theoretical structure and broad real-world applications. We first propose a centralized algorithm based on the optimism in the face of uncertainty principle for congestion games with (semi-)bandit feedback, and obtain finite-sample guarantees. Then we propose a decentralized algorithm via a novel combination of the Frank-Wolfe method and G-optimal design. By exploiting the structure of the congestion game, we show the sample complexity of both algorithms depends only polynomially on the number of players and the number of facilities, but not the size of the action set, which can be exponentially large in terms of the number of facilities. We further define a new problem class, Markov congestion games, which allows us to model the non-stationarity in congestion games. We propose a centralized algorithm for Markov congestion games, whose sample complexity again has only polynomial dependence on all relevant problem parameters, but not the size of the action set.Comment: 34 pages, Thirty-sixth Conference on Neural Information Processing Systems (NeurIPS 2022

arXiv.org e-Print Archive

Cartesian Abstraction Can Yield ‘Cognitive Maps’

Author: Lőrincz András
Publication venue: 'Elsevier BV'
Publication date: 01/01/2016
Field of study

AbstractIt has been long debated how the so called cognitive map, the set of place cells, develops in rat hippocampus. The function of this organ is of high relevance, since the hippocampus is the key component of the medial temporal lobe memory system, responsible for forming episodic memory, declarative memory, the memory for facts and rules that serve cognition in humans. Here, a general mechanism is put forth: We introduce the novel concept of Cartesian factors. We show a non-linear projection of observations to a discretized representation of a Cartesian factor in the presence of a representation of a complementing one. The computational model is demonstrated for place cells that we produce from the egocentric observations and the head direction signals. Head direction signals make the observed factor and sparse allothetic signals make the complementing Cartesian one. We present numerical results, connect the model to the neural substrate, and elaborate on the differences between this model and other ones, including Slow Feature Analysis [17]

Elsevier - Publisher Connector

ELTE Digital Institutional Repository (EDIT)