The transformer is primarily used in the field of natural language
processing. Recently, it has been adopted and shows promise in the computer
vision (CV) field. Medical image analysis (MIA), as a critical branch of CV,
also greatly benefits from this state-of-the-art technique. In this review, we
first recap the core component of the transformer, the attention mechanism, and
the detailed structures of the transformer. After that, we depict the recent
progress of the transformer in the field of MIA. We organize the applications
in a sequence of different tasks, including classification, segmentation,
captioning, registration, detection, enhancement, localization, and synthesis.
The mainstream classification and segmentation tasks are further divided into
eleven medical image modalities. A large number of experiments studied in this
review illustrate that the transformer-based method outperforms existing
methods through comparisons with multiple evaluation metrics. Finally, we
discuss the open challenges and future opportunities in this field. This
task-modality review with the latest contents, detailed information, and
comprehensive comparison may greatly benefit the broad MIA community.Comment: Computers in Biology and Medicine Accepte