Exponential Family Predictive Representations of State.

Abstract

Many agent-environment interactions can be framed as dynamical systems in which agents take actions and receive observations. These dynamical systems are diverse, representing such things as a biped walking, a stock price changing over time, the trajectory of a missile, or the shifting fish population in a lake. Often, interacting successfully with the environment requires the use of a model, which allows the agent to predict something about the future by summarizing the past. Two of the basic problems in modeling partially observable dynamical systems are selecting a representation of state and selecting a mechanism for maintaining that state. This thesis explores both problems from a learning perspective: we are interested in learning a predictive model directly from the data that arises as an agent interacts with its environment. This thesis develops models for dynamical systems which represent state as a set of statistics about the short-term future, as opposed to treating state as a latent, unobservable quantity. In other words, the agent summarizes the past into predictions about the short-term future, which allow the agent to make further predictions about the infinite future. Because all parameters in the model are defined using only observable quantities, the learning algorithms for such models are often straightforward and have attractive theoretical properties. We examine in depth the case where state is represented as the parameters of an exponential family distribution over a short-term window of future observations. We unify a number of different existing models under this umbrella, and predict and analyze new models derived from the generalization. One goal of this research is to push models with predictively defined state towards real-world applications. We contribute models and companion learning algorithms for domains with partial observability, continuous observations, structured observations, high-dimensional observations, and/or continuous actions. Our models successfully capture standard POMDPs and benchmark nonlinear timeseries problems with performance comparable to state-of-the-art models. They also allow us to perform well on novel domains which are larger than those captured by other models with predictively defined state, including traffic prediction problems and domains analogous to autonomous mobile robots with camera sensors.Ph.D.Computer Science & EngineeringUniversity of Michigan, Horace H. Rackham School of Graduate Studieshttp://deepblue.lib.umich.edu/bitstream/2027.42/58522/1/wingated_1.pd

    Similar works

    Full text

    thumbnail-image