Sequential Decision Making under Uncertainty (SDMU) is ubiquitous in many
domains such as energy, finance, and supply chains. Some SDMU applications are
naturally modeled as Multistage Stochastic Optimization Problems (MSPs), but
the resulting optimizations are notoriously challenging from a computational
standpoint. Under assumptions of convexity and stage-wise independence of the
uncertainty, the resulting optimization can be solved efficiently using
Stochastic Dual Dynamic Programming (SDDP). Two-stage Linear Decision Rules
(TS-LDRs) have been proposed to solve MSPs without the stage-wise independence
assumption. TS-LDRs are computationally tractable, but using a policy that is a
linear function of past observations is typically not suitable for non-convex
environments arising, for example, in energy systems. This paper introduces a
novel approach, Two-Stage General Decision Rules (TS-GDR), to generalize the
policy space beyond linear functions, making them suitable for non-convex
environments. TS-GDR is a self-supervised learning algorithm that trains the
nonlinear decision rules using stochastic gradient descent (SGD); its forward
passes solve the policy implementation optimization problems, and the backward
passes leverage duality theory to obtain closed-form gradients. The
effectiveness of TS-GDR is demonstrated through an instantiation using Deep
Recurrent Neural Networks named Two-Stage Deep Decision Rules (TS-DDR). The
method inherits the flexibility and computational performance of Deep Learning
methodologies to solve SDMU problems generally tackled through large-scale
optimization techniques. Applied to the Long-Term Hydrothermal Dispatch (LTHD)
problem using actual power system data from Bolivia, the TS-DDR not only
enhances solution quality but also significantly reduces computation times by
several orders of magnitude.Comment: 10 Main Pages, 6 Appendices, 6 Figure