1,543 research outputs found
SonoNet: Real-Time Detection and Localisation of Fetal Standard Scan Planes in Freehand Ultrasound
Identifying and interpreting fetal standard scan planes during 2D ultrasound
mid-pregnancy examinations are highly complex tasks which require years of
training. Apart from guiding the probe to the correct location, it can be
equally difficult for a non-expert to identify relevant structures within the
image. Automatic image processing can provide tools to help experienced as well
as inexperienced operators with these tasks. In this paper, we propose a novel
method based on convolutional neural networks which can automatically detect 13
fetal standard views in freehand 2D ultrasound data as well as provide a
localisation of the fetal structures via a bounding box. An important
contribution is that the network learns to localise the target anatomy using
weak supervision based on image-level labels only. The network architecture is
designed to operate in real-time while providing optimal output for the
localisation task. We present results for real-time annotation, retrospective
frame retrieval from saved videos, and localisation on a very large and
challenging dataset consisting of images and video recordings of full clinical
anomaly screenings. We found that the proposed method achieved an average
F1-score of 0.798 in a realistic classification experiment modelling real-time
detection, and obtained a 90.09% accuracy for retrospective frame retrieval.
Moreover, an accuracy of 77.8% was achieved on the localisation task.Comment: 12 pages, 8 figures, published in IEEE Transactions in Medical
Imagin
Are all the frames equally important?
In this work, we address the problem of measuring and predicting temporal
video saliency - a metric which defines the importance of a video frame for
human attention. Unlike the conventional spatial saliency which defines the
location of the salient regions within a frame (as it is done for still
images), temporal saliency considers importance of a frame as a whole and may
not exist apart from context. The proposed interface is an interactive
cursor-based algorithm for collecting experimental data about temporal
saliency. We collect the first human responses and perform their analysis. As a
result, we show that qualitatively, the produced scores have very explicit
meaning of the semantic changes in a frame, while quantitatively being highly
correlated between all the observers. Apart from that, we show that the
proposed tool can simultaneously collect fixations similar to the ones produced
by eye-tracker in a more affordable way. Further, this approach may be used for
creation of first temporal saliency datasets which will allow training
computational predictive algorithms. The proposed interface does not rely on
any special equipment, which allows to run it remotely and cover a wide
audience.Comment: CHI'20 Late Breaking Work
- …