22 research outputs found
On Reward Structures of Markov Decision Processes
A Markov decision process can be parameterized by a transition kernel and a
reward function. Both play essential roles in the study of reinforcement
learning as evidenced by their presence in the Bellman equations. In our
inquiry of various kinds of "costs" associated with reinforcement learning
inspired by the demands in robotic applications, rewards are central to
understanding the structure of a Markov decision process and reward-centric
notions can elucidate important concepts in reinforcement learning.
Specifically, we study the sample complexity of policy evaluation and develop
a novel estimator with an instance-specific error bound of
for estimating a single state value. Under
the online regret minimization setting, we refine the transition-based MDP
constant, diameter, into a reward-based constant, maximum expected hitting
cost, and with it, provide a theoretical explanation for how a well-known
technique, potential-based reward shaping, could accelerate learning with
expert knowledge. In an attempt to study safe reinforcement learning, we model
hazardous environments with irrecoverability and proposed a quantitative notion
of safe learning via reset efficiency. In this setting, we modify a classic
algorithm to account for resets achieving promising preliminary numerical
results. Lastly, for MDPs with multiple reward functions, we develop a planning
algorithm that computationally efficiently finds Pareto-optimal stochastic
policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and
arXiv:2002.06299; minor edit
Loop Estimator for Discounted Values in Markov Reward Processes
At the working heart of policy iteration algorithms commonly used and studied
in the discounted setting of reinforcement learning, the policy evaluation step
estimates the value of states with samples from a Markov reward process induced
by following a Markov policy in a Markov decision process. We propose a simple
and efficient estimator called loop estimator that exploits the regenerative
structure of Markov reward processes without explicitly estimating a full
model. Our method enjoys a space complexity of when estimating the value
of a single positive recurrent state unlike TD with or model-based
methods with . Moreover, the regenerative structure enables
us to show, without relying on the generative model approach, that the
estimator has an instance-dependent convergence rate of
over steps on a single sample
path, where is the maximal expected hitting time to state . In
preliminary numerical experiments, the loop estimator outperforms model-free
methods, such as TD(k), and is competitive with the model-based estimator.Comment: accepted to AAAI 202
Network analysis identifies a putative role for the PPAR and type 1 interferon pathways in glucocorticoid actions in asthmatics
<p>Abstract</p> <p>Background</p> <p>Asthma is a chronic inflammatory airway disease influenced by genetic and environmental factors that affects ~300 million people worldwide, leading to ~250,000 deaths annually. Glucocorticoids (GCs) are well-known therapeutics that are used extensively to suppress airway inflammation in asthmatics. The airway epithelium plays an important role in the initiation and modulation of the inflammatory response. While the role of GCs in disease management is well understood, few studies have examined the holistic effects on the airway epithelium.</p> <p>Methods</p> <p>Gene expression data were used to generate a co-transcriptional network, which was interrogated to identify modules of functionally related genes. In parallel, expression data were mapped to the human protein-protein interaction (PPI) network in order to identify modules with differentially expressed genes. A common pathways approach was applied to highlight genes and pathways functionally relevant and significantly altered following GC treatment.</p> <p>Results</p> <p>Co-transcriptional network analysis identified pathways involved in inflammatory processes in the epithelium of asthmatics, including the Toll-like receptor (TLR) and PPAR signaling pathways. Analysis of the PPI network identified <it>RXRA</it>, <it>PPARGC1A</it>, <it>STAT1</it> and <it>IRF9</it>, among others genes, as differentially expressed. Common pathways analysis highlighted TLR and PPAR signaling pathways, providing a link between general inflammatory processes and the actions of GCs. Promoter analysis identified genes regulated by the glucocorticoid receptor (GCR) and PPAR pathways as well as highlighted the interferon pathway as a target of GCs.</p> <p>Conclusions</p> <p>Network analyses identified known genes and pathways associated with inflammatory processes in the airway epithelium of asthmatics. This workflow illustrated a hypothesis generating experimental design that integrated multiple analysis methods to produce a weight-of-evidence based approach upon which future focused studies can be designed. In this case, results suggested a mechanism whereby GCs repress TLR-mediated interferon production via upregulation of the PPAR signaling pathway. These results highlight the role of interferons in asthma and their potential as targets of future therapeutic efforts.</p