We demonstrate the application and comparative interpretations of
three tree-based algorithms for the analysis of data arising from
flow cytometry: classification and regression trees (CARTs), random
forests (RFs), and logic regression (LR). Specifically, we consider
the question of what best predicts CD4 T-cell recovery in HIV-1
infected persons starting antiretroviral therapy with CD4 count
between 200 and 350 cell/μL. A comparison to a more standard
contingency table analysis is provided. While contingency table
analysis and RFs provide information on the importance of each
potential predictor variable, CART and LR offer additional insight
into the combinations of variables that together are predictive of
the outcome. In all cases considered, baseline CD3-DR-CD56+CD16+
emerges as an important predictor variable, while the tree-based
approaches identify additional variables as potentially informative.
Application of tree-based methods to our data suggests that a
combination of baseline immune activation states, with emphasis on
CD8 T-cell activation, may be a better predictor than any single
T-cell/innate cell subset analyzed. Taken together, we show that
tree-based methods can be successfully applied to flow cytometry data
to better inform and discover associations that may not emerge in
the context of a univariate analysis