19 research outputs found
Online reinforcement learning for condition-based group maintenance using factored Markov decision processes
We investigate a condition-based group maintenance problem for multi-component systems, where the degradation process of a specific component is affected only by its neighbouring ones, leading to a special type of stochastic dependence among components. We formulate the maintenance problem into a factored Markov decision process taking advantage of this dependence property, and develop a factored value iteration algorithm to efficiently approximate the optimal policy. Through both theoretical analyses and numerical experiments, we show that the algorithm can significantly reduce computational burden and improve efficiency in solving the optimization problem. Moreover, since model parameters are unknown a priori in most practical scenarios, we further develop an online reinforcement learning algorithm to simultaneously learn the model parameters and determine an optimal maintenance action upon each inspection. A novel feature of this online learning algorithm is that it is capable of learning both transition probabilities and system structure indicating the stochastic dependence among components. We discuss the error bound and sample complexity of the developed learning algorithm theoretically, and test its performance through numerical experiments. The results reveal that our algorithm can effectively learn the model parameters and approximate the optimal maintenance policy
Learning in Congestion Games with Bandit Feedback
In this paper, we investigate Nash-regret minimization in congestion games, a
class of games with benign theoretical structure and broad real-world
applications. We first propose a centralized algorithm based on the optimism in
the face of uncertainty principle for congestion games with (semi-)bandit
feedback, and obtain finite-sample guarantees. Then we propose a decentralized
algorithm via a novel combination of the Frank-Wolfe method and G-optimal
design. By exploiting the structure of the congestion game, we show the sample
complexity of both algorithms depends only polynomially on the number of
players and the number of facilities, but not the size of the action set, which
can be exponentially large in terms of the number of facilities. We further
define a new problem class, Markov congestion games, which allows us to model
the non-stationarity in congestion games. We propose a centralized algorithm
for Markov congestion games, whose sample complexity again has only polynomial
dependence on all relevant problem parameters, but not the size of the action
set.Comment: 34 pages, Thirty-sixth Conference on Neural Information Processing
Systems (NeurIPS 2022
Cartesian Abstraction Can Yield ‘Cognitive Maps’
AbstractIt has been long debated how the so called cognitive map, the set of place cells, develops in rat hippocampus. The function of this organ is of high relevance, since the hippocampus is the key component of the medial temporal lobe memory system, responsible for forming episodic memory, declarative memory, the memory for facts and rules that serve cognition in humans. Here, a general mechanism is put forth: We introduce the novel concept of Cartesian factors. We show a non-linear projection of observations to a discretized representation of a Cartesian factor in the presence of a representation of a complementing one. The computational model is demonstrated for place cells that we produce from the egocentric observations and the head direction signals. Head direction signals make the observed factor and sparse allothetic signals make the complementing Cartesian one. We present numerical results, connect the model to the neural substrate, and elaborate on the differences between this model and other ones, including Slow Feature Analysis [17]