This paper investigates real options models that violate the assumption of positive persistence of uncertainty. Without this fundamental assumption, existing methodologies are inadequate to address the firm's investment problem. To tackle this issue, we introduce a discrete-time version of a real options model and employ reinforcement learning, specifically Q-learning, to derive the optimal solution. Our findings reveal that in scenarios where the assumption of positive persistence of uncertainty is violated, the firm's investment behavior can exhibit disconnected investment regions