This paper studies stochastic control problems regularized by the relative
entropy, where the action space is the space of measures. This setting includes
relaxed control problems, problems of finding Markovian controls with the
control function replaced by an idealized infinitely wide neural network and
can be extended to the search for causal optimal transport maps. By exploiting
the Pontryagin optimality principle, we identify suitable metric space on which
we construct gradient flow for the measure-valued control process along which
the cost functional is guaranteed to decrease. It is shown that under
appropriate conditions, this gradient flow has an invariant measure which is
the optimal control for the regularized stochastic control problem. If the
problem we work with is sufficiently convex, the gradient flow converges
exponentially fast. Furthermore, the optimal measured valued control admits
Bayesian interpretation which means that one can incorporate prior knowledge
when solving stochastic control problem. This work is motivated by a desire to
extend the theoretical underpinning for the convergence of stochastic gradient
type algorithms widely used in the reinforcement learning community to solve
control problems