Advances in the experimental demonstration of quantum processors have
provoked a surge of interest to the idea of practical implementation of quantum
computing over last years. It is expected that the use of quantum algorithms
will significantly speed up the solution to certain problems in numerical
optimization and machine learning. In this paper, we propose a quantum-enhanced
policy iteration (QEPI) algorithm as widely used in the domain of reinforcement
learning and validate it with the focus on the mountain car problem. In
practice, we elaborate on the soft version of the value iteration algorithm,
which is beneficial for policy interpretation, and discuss the stochastic
discretization technique in the context of continuous state reinforcement
learning problems for the purposes of QEPI. The complexity of the algorithm is
analyzed for dense and (typical) sparse cases. Numerical results on the example
of a mountain car with the use of a quantum emulator verify the developed
procedures and benchmark the QEPI performance.Comment: 12 pages, 7 figure