8 research outputs found
On Reward Structures of Markov Decision Processes
A Markov decision process can be parameterized by a transition kernel and a
reward function. Both play essential roles in the study of reinforcement
learning as evidenced by their presence in the Bellman equations. In our
inquiry of various kinds of "costs" associated with reinforcement learning
inspired by the demands in robotic applications, rewards are central to
understanding the structure of a Markov decision process and reward-centric
notions can elucidate important concepts in reinforcement learning.
Specifically, we study the sample complexity of policy evaluation and develop
a novel estimator with an instance-specific error bound of
for estimating a single state value. Under
the online regret minimization setting, we refine the transition-based MDP
constant, diameter, into a reward-based constant, maximum expected hitting
cost, and with it, provide a theoretical explanation for how a well-known
technique, potential-based reward shaping, could accelerate learning with
expert knowledge. In an attempt to study safe reinforcement learning, we model
hazardous environments with irrecoverability and proposed a quantitative notion
of safe learning via reset efficiency. In this setting, we modify a classic
algorithm to account for resets achieving promising preliminary numerical
results. Lastly, for MDPs with multiple reward functions, we develop a planning
algorithm that computationally efficiently finds Pareto-optimal stochastic
policies.Comment: This PhD thesis draws heavily from arXiv:1907.02114 and
arXiv:2002.06299; minor edit
Building and Evaluating Open-Vocabulary Language Models
Language models have always been a fundamental NLP tool and application. This thesis focuses on open-vocabulary language models, i.e., models that can deal with novel and unknown words at runtime. We will propose both new ways to construct such models as well as use such models in cross-linguistic evaluations to answer questions of difficulty and language-specificity in modern NLP tools.
We start by surveying linguistic background as well as past and present NLP approaches to tokenization and open-vocabulary language modeling (Mielke et al., 2021).
Thus equipped, we establish desirable principles for such models, both from an engineering mindset as well as a linguistic one and hypothesize a model based on the marriage of neural language modeling and Bayesian nonparametrics to handle a truly infinite vocabulary, boasting attractive theoretical properties and mathematical soundness, but presenting practical implementation difficulties.
As a compromise, we thus introduce a word-based two-level language model that still has many desirable characteristics while being highly feasible to run (Mielke and Eisner, 2019). Unlike the more dominant approaches of characters or subword units as one-layer tokenization it uses words; its key feature is the ability to generate novel words in context and in isolation.
Moving on to evaluation, we ask: how do such models deal with the wide variety of languages of the world---are they struggling with some languages? Relating this question to a more linguistic one, are some languages inherently more difficult to deal with?
Using simple methods, we show that indeed they are, starting with a small pilot study that suggests typological predictors of difficulty (Cotterell et al., 2018). Thus encouraged, we design a far bigger study with more powerful methodology, a principled and highly feasible evaluation and comparison scheme based again on multi-text likelihood (Mielke et al., 2019). This larger study shows that the earlier conclusion of typological predictors is difficult to substantiate, but also offers a new insight on the complexity of Translationese.
Following that theme, we end by extending this scheme to machine translation models to answer questions traditional evaluation metrics like BLEU cannot (Bugliarello et al., 2020)