To date, endovascular surgeries are performed using the golden standard of
Fluoroscopy, which uses ionising radiation to visualise catheters and
vasculature. Prolonged Fluoroscopic exposure is harmful for the patient and the
clinician, and may lead to severe post-operative sequlae such as the
development of cancer. Meanwhile, the use of interventional Ultrasound has
gained popularity, due to its well-known benefits of small spatial footprint,
fast data acquisition, and higher tissue contrast images. However, ultrasound
images are hard to interpret, and it is difficult to localise vessels,
catheters, and guidewires within them. This work proposes a solution using an
adaptation of a state-of-the-art machine learning transformer architecture to
detect and segment catheters in axial interventional Ultrasound image
sequences. The network architecture was inspired by the Attention in Attention
mechanism, temporal tracking networks, and introduced a novel 3D segmentation
head that performs 3D deconvolution across time. In order to facilitate
training of such deep learning networks, we introduce a new data synthesis
pipeline that used physics-based catheter insertion simulations, along with a
convolutional ray-casting ultrasound simulator to produce synthetic ultrasound
images of endovascular interventions. The proposed method is validated on a
hold-out validation dataset, thus demonstrated robustness to ultrasound noise
and a wide range of scanning angles. It was also tested on data collected from
silicon-based aorta phantoms, thus demonstrated its potential for translation
from sim-to-real. This work represents a significant step towards safer and
more efficient endovascular surgery using interventional ultrasound.Comment: This work has been submitted to the IEEE for possible publicatio