The brain age has been proven to be a phenotype of relevance to cognitive
performance and brain disease. Achieving accurate brain age prediction is an
essential prerequisite for optimizing the predicted brain-age difference as a
biomarker. As a comprehensive biological characteristic, the brain age is hard
to be exploited accurately with models using feature engineering and local
processing such as local convolution and recurrent operations that process one
local neighborhood at a time. Instead, Vision Transformers learn global
attentive interaction of patch tokens, introducing less inductive bias and
modeling long-range dependencies. In terms of this, we proposed a novel network
for learning brain age interpreting with global and local dependencies, where
the corresponding representations are captured by Successive Permuted
Transformer (SPT) and convolution blocks. The SPT brings computation efficiency
and locates the 3D spatial information indirectly via continuously encoding 2D
slices from different views. Finally, we collect a large cohort of 22645
subjects with ages ranging from 14 to 97 and our network performed the best
among a series of deep learning methods, yielding a mean absolute error (MAE)
of 2.855 in validation set, and 2.911 in an independent test set