This study proposed a deep learning-based tracking method for ultrasound (US)
image-guided radiation therapy. The proposed cascade deep learning model is
composed of an attention network, a mask region-based convolutional neural
network (mask R-CNN), and a long short-term memory (LSTM) network. The
attention network learns a mapping from a US image to a suspected area of
landmark motion in order to reduce the search region. The mask R-CNN then
produces multiple region-of-interest (ROI) proposals in the reduced region and
identifies the proposed landmark via three network heads: bounding box
regression, proposal classification, and landmark segmentation. The LSTM
network models the temporal relationship among the successive image frames for
bounding box regression and proposal classification. To consolidate the final
proposal, a selection method is designed according to the similarities between
sequential frames. The proposed method was tested on the liver US tracking
datasets used in the Medical Image Computing and Computer Assisted
Interventions (MICCAI) 2015 challenges, where the landmarks were annotated by
three experienced observers to obtain their mean positions. Five-fold
cross-validation on the 24 given US sequences with ground truths shows that the
mean tracking error for all landmarks is 0.65+/-0.56 mm, and the errors of all
landmarks are within 2 mm. We further tested the proposed model on 69 landmarks
from the testing dataset that has a similar image pattern to the training
pattern, resulting in a mean tracking error of 0.94+/-0.83 mm. Our experimental
results have demonstrated the feasibility and accuracy of our proposed method
in tracking liver anatomic landmarks using US images, providing a potential
solution for real-time liver tracking for active motion management during
radiation therapy