Deep learning has been widely applied in neuroimaging, including predicting
brain-phenotype relationships from magnetic resonance imaging (MRI) volumes.
MRI data usually requires extensive preprocessing prior to modeling, but
variation introduced by different MRI preprocessing pipelines may lead to
different scientific findings, even when using the identical data. Motivated by
the data-centric perspective, we first evaluate how preprocessing pipeline
selection can impact the downstream performance of a supervised learning model.
We next propose two pipeline-invariant representation learning methodologies,
MPSL and PXL, to improve robustness in classification performance and to
capture similar neural network representations. Using 2000 human subjects from
the UK Biobank dataset, we demonstrate that proposed models present unique and
shared advantages, in particular that MPSL can be used to improve out-of-sample
generalization to new pipelines, while PXL can be used to improve within-sample
prediction performance. Both MPSL and PXL can learn more similar
between-pipeline representations. These results suggest that our proposed
models can be applied to mitigate pipeline-related biases, and to improve
prediction robustness in brain-phenotype modeling.Comment: Extended Abstract presented at Machine Learning for Health (ML4H)
symposium 2022, November 28th, 2022, New Orleans, United States & Virtual,
http://www.ml4h.cc, 17 page