In real-world healthcare problems, there are often multiple competing
outcomes of interest, such as treatment efficacy and side effect severity.
However, statistical methods for estimating dynamic treatment regimes (DTRs)
usually assume a single outcome of interest, and the few methods that deal with
composite outcomes suffer from important limitations. This includes
restrictions to a single time point and two outcomes, the inability to
incorporate self-reported patient preferences and limited theoretical
guarantees. To this end, we propose a new method to address these limitations,
which we dub Latent Utility Q-Learning (LUQ-Learning). LUQ-Learning uses a
latent model approach to naturally extend Q-learning to the composite outcome
setting and adopt the ideal trade-off between outcomes to each patient. Unlike
previous approaches, our framework allows for an arbitrary number of time
points and outcomes, incorporates stated preferences and achieves strong
asymptotic performance with realistic assumptions on the data. We conduct
simulation experiments based on an ongoing trial for low back pain as well as a
well-known completed trial for schizophrenia. In all experiments, our method
achieves highly competitive empirical performance compared to several
alternative baselines.Comment: Under Revie