We consider the problem of learning control policies in discrete-time
stochastic systems which guarantee that the system stabilizes within some
specified stabilization region with probability~1. Our approach is based on
the novel notion of stabilizing ranking supermartingales (sRSMs) that we
introduce in this work. Our sRSMs overcome the limitation of methods proposed
in previous works whose applicability is restricted to systems in which the
stabilizing region cannot be left once entered under any control policy. We
present a learning procedure that learns a control policy together with an sRSM
that formally certifies probability~1 stability, both learned as neural
networks. We show that this procedure can also be adapted to formally verifying
that, under a given Lipschitz continuous control policy, the stochastic system
stabilizes within some stabilizing region with probability~1. Our
experimental evaluation shows that our learning procedure can successfully
learn provably stabilizing policies in practice.Comment: Accepted at ATVA 2023. Follow-up work of arXiv:2112.0949