We propose a deep learning algorithm for high dimensional optimal stopping
problems. Our method is inspired by the penalty method for solving free
boundary PDEs. Within our approach, the penalized PDE is approximated using the
Deep BSDE framework proposed by \cite{weinan2017deep}, which leads us to coin
the term "Deep Penalty Method (DPM)" to refer to our algorithm. We show that
the error of the DPM can be bounded by the loss function and
O(λ1)+O(λh)+O(h), where h is the step size in
time and λ is the penalty parameter. This finding emphasizes the need
for careful consideration when selecting the penalization parameter and
suggests that the discretization error converges at a rate of order
21. We validate the efficacy of the DPM through numerical tests
conducted on a high-dimensional optimal stopping model in the area of American
option pricing. The numerical tests confirm both the accuracy and the
computational efficiency of our proposed algorithm