We consider the problem in regression analysis of identifying subpopulations
that exhibit different patterns of response, where each subpopulation requires
a different underlying model. Unlike statistical cohorts, these subpopulations
are not known a priori; thus, we refer to them as cadres. When the cadres and
their associated models are interpretable, modeling leads to insights about the
subpopulations and their associations with the regression target. We introduce
a discriminative model that simultaneously learns cadre assignment and
target-prediction rules. Sparsity-inducing priors are placed on the model
parameters, under which independent feature selection is performed for both the
cadre assignment and target-prediction processes. We learn models using
adaptive step size stochastic gradient descent, and we assess cadre quality
with bootstrapped sample analysis. We present simulated results showing that,
when the true clustering rule does not depend on the entire set of features,
our method significantly outperforms methods that learn subpopulation-discovery
and target-prediction rules separately. In a materials-by-design case study,
our model provides state-of-the-art prediction of polymer glass transition
temperature. Importantly, the method identifies cadres of polymers that respond
differently to structural perturbations, thus providing design insight for
targeting or avoiding specific transition temperature ranges. It identifies
chemically meaningful cadres, each with interpretable models. Further
experimental results show that cadre methods have generalization that is
competitive with linear and nonlinear regression models and can identify robust
subpopulations.Comment: 8 pages, 6 figure