Robust Markov decision processes (MDPs) are used for applications of dynamic
optimization in uncertain environments and have been studied extensively. Many
of the main properties and algorithms of MDPs, such as value iteration and
policy iteration, extend directly to RMDPs. Surprisingly, there is no known
analog of the MDP convex optimization formulation for solving RMDPs. This work
describes the first convex optimization formulation of RMDPs under the
classical sa-rectangularity and s-rectangularity assumptions. By using entropic
regularization and exponential change of variables, we derive a convex
formulation with a number of variables and constraints polynomial in the number
of states and actions, but with large coefficients in the constraints. We
further simplify the formulation for RMDPs with polyhedral, ellipsoidal, or
entropy-based uncertainty sets, showing that, in these cases, RMDPs can be
reformulated as conic programs based on exponential cones, quadratic cones, and
non-negative orthants. Our work opens a new research direction for RMDPs and
can serve as a first step toward obtaining a tractable convex formulation of
RMDPs