Search CORE

31 research outputs found

Skill, or style? Classification of fetal sonography eye-tracking data

Author: Drukker Lior
Noble J Alison
Papageorghiou Aris T
Teng Clare
Publication venue: Journal of Machine Learning Research
Publication date: 03/12/2022
Field of study

We present a method for classifying human skill at fetal ultrasound scanning from eye-tracking and pupillary data of sonographers. Human skill characterization for this clinical task typically creates groupings of clinician skills such as expert and beginner based on the number of years of professional experience; experts typically have more than 10 years and beginners between 0-5 years. In some cases, they also include trainees who are not yet fully-qualified professionals. Prior work has considered eye movements that necessitates separating eye-tracking data into eye movements, such as fixations and saccades. Our method does not use prior assumptions about the relationship between years of experience and does not require the separation of eye-tracking data. Our best performing skill classification model achieves an F1 score of 98% and 70% for expert and trainee classes respectively. We also show that years of experience as a direct measure of skill, is significantly correlated to the expertise of a sonographer

Oxford University Research Archive

Towards standard plane prediction of fetal head ultrasound with domain adaption

Author: Drukker Lior
Men Qianhui
Noble Julia
Papageorghiou Aris
Zhao He
Publication venue: IEEE
Publication date: 01/09/2023
Field of study

Fetal Standard Plane (SP) acquisition is a key step in ultrasound based assessment of fetal health. The task detects an ultrasound (US) image with predefined anatomy. However, it requires skill to acquire a good SP in practice, and trainees and occasional users of ultrasound devices can find this challenging. In this work, we consider the task of automatically predicting the fetal head SP from the video approaching the SP. We adopt a domain transfer learning approach that maps the encoded spatial and temporal features of video in the source domain to the spatial representations of the desired SP image in the target domain, together with adversarial training to preserve the quality of the resulting image. Experimental results show that the predicted head plane is plausible and consistent with the anatomical features expected in a real SP. The proposed approach is motivated to support non-experts to find and analyse a trans-ventricular (TV) plane but could also be generalized to other planes, trimesters, and ultrasound imaging tasks for which standard planes are defined

Oxford University Research Archive

D2ANET: Densely Attentional-Aware Network for first trimester ultrasound CRL and NT segmentation

Author: Drukker Lior
Gridach Mourad
Noble J. Alison
Papageorghiou Aris
Yasrab Robail
Publication venue: IEEE
Publication date: 01/09/2023
Field of study

Manual annotation of medical images is time consuming for clinical experts; therefore, reliable automatic segmentation would be the ideal way to handle large medical datasets. In this paper, we are interested in detection and segmentation of two fundamental measurements in the first trimester ultrasound (US) scan: Nuchal Translucency (NT) and Crown Rump Length (CRL). There can be a significant variation in the shape, location or size of the anatomical structures in the fetal US scans. We propose a new approach, namely Densely Attentional-Aware Network for First Trimester Ultrasound CRL and NT Segmentation (DA2Net), to encode variation in feature size by relying on the powerful attention mechanism and densely connected networks. Our results show that the proposed D2ANet offers high pixel agreement (mean JSC = 84.21) with expert manual annotations

Oxford University Research Archive

Gaze-probe joint guidance with multi-task learning in obstetric ultrasound scanning

Author: Drukker Lior
Men Qianhui
Noble J Alison
Papageorghiou Aris T
Teng Clare
Publication venue: Elsevier
Publication date: 29/09/2023
Field of study

In this work, we exploit multi-task learning to jointly predict the two decision-making processes of gaze movement and probe manipulation that an experienced sonographer would perform in routine obstetric scanning. A multimodal guidance framework, Multimodal-GuideNet, is proposed to detect the causal relationship between a real-world ultrasound video signal, synchronized gaze, and probe motion. The association between the multi-modality inputs is learned and shared through a modality-aware spatial graph that leverages useful cross-modal dependencies. By estimating the probability distribution of probe and gaze movements in real scans, the predicted guidance signals also allow inter- and intra-sonographer variations and avoid a fixed scanning path. We validate the new multi-modality approach on three types of obstetric scanning examinations, and the result consistently outperforms single-task learning under various guidance policies. To simulate sonographer’s attention on multi-structure images, we also explore multi-step estimation in gaze guidance, and its visual results show that the prediction allows multiple gaze centers that are substantially aligned with underlying anatomical structures

Oxford University Research Archive

Automating the human action of first-trimester biometry measurement from real-world freehand ultrasound

Author: Drukker Lior
Fu Zeyu
Noble Julia
Papageorghiou Aris
Yasrab Robail
Zhao He
Publication venue: Elsevier
Publication date: 11/03/2024
Field of study

Objective: Automated medical image analysis solutions should closely mimic complete human actions to be useful in clinical practice. However, more often an automated image analysis solution represents only part of a human task, which restricts its practical utility. In the case of ultrasound-based fetal biometry, an automated solution should ideally recognize key fetal structures in freehand video guidance, select a standard plane from a video stream and perform biometry. A complete automated solution should automate all three subactions. Methods: In this article, we consider how to automate the complete human action of first-trimester biometry measurement from real-world freehand ultrasound. In the proposed hybrid convolutional neural network (CNN) architecture design, a classification regression-based guidance model detects and tracks fetal anatomical structures (using visual cues) in the ultrasound video. Several high-quality standard planes that contain the mid-sagittal view of the fetus are sampled at multiple time stamps (using a custom-designed confident-frame detector) based on the estimated probability values associated with predicted anatomical structures that define the biometry plane. Automated semantic segmentation is performed on the selected frames to extract fetal anatomical landmarks. A crown–rump length (CRL) estimate is calculated as the mean CRL from these multiple frames. Results: Our fully automated method has a high correlation with clinical expert CRL measurement (Pearson's p = 0.92, R-squared [R2] = 0.84) and a low mean absolute error of 0.834 (weeks) for fetal age estimation on a test data set of 42 videos. Conclusion: A novel algorithm for standard plane detection employs a quality detection mechanism defined by clinical standards, ensuring precise biometric measurements

Oxford University Research Archive

Self-supervised Representation Learning for Ultrasound Video

Author: Droste Richard
Drukker Lior
Jiao Jianbo
Noble J. Alison
Papageorghiou Aris T.
Publication venue
Publication date: 01/01/2020
Field of study

Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction.Comment: ISBI 202

arXiv.org e-Print Archive

University of Birmingham Research Portal

Oxford University Research Archive

Dual Representation Learning From Fetal Ultrasound Video and Sonographer Audio

Author: Alison Noble J.
Alsharid Mohammad
Drukker Lior
Gridach Mourad
Jiao Jianbo
Papageorghiou Aris T.
Publication venue: IEEE Computer Society Press
Publication date: 01/01/2024
Field of study

This paper tackles the challenging problem of real-world data self-supervised representation learning from two modalities: fetal ultrasound (US) video and the corresponding speech acquired when a sonographer performs a pregnancy scan. We propose to transfer knowledge between the different modalities, even though the sonographer's speech and the US video may not be semantically correlated. We design a network architecture capable of learning useful representations such as of anatomical features and structures while recognising the correlation between an US video scan and the sonographer's speech. We introduce dual representation learning from US video and audio, which consists of two concepts: Multi-Modal Contrastive Learning and Multi-Modal Similarity Learning, in a latent feature space. Experiments show that the proposed architecture learns powerful representations and transfers well for two downstream tasks. Furthermore, we experiment with two different datasets for pretraining which differ in size and length of video clips (as well as sonographer speech) to show that the quality of the sonographer's speech plays an important role in the final performance.</p

University of Birmingham Research Portal

Show from Tell:Audio-Visual Modelling in Clinical Settings

Author: Alsharid Mohammad
Drukker Lior
Jiao Jianbo
Noble J. Alison
Papageorghiou Aris T.
Zisserman Andrew
Publication venue: arXiv
Publication date: 25/10/2023
Field of study

Auditory and visual signals usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, the audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals -- usually speech. In this paper, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without human expert annotation. A simple yet effective multi-modal self-supervised learning framework is proposed for this purpose. The proposed approach is able to localise anatomical regions of interest during ultrasound imaging, with only speech audio as a reference. Experimental evaluations on a large-scale clinical multi-modal ultrasound video dataset show that the proposed self-supervised method learns good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions

University of Birmingham Research Portal

Audio-visual modelling in a clinical setting

Author: Alsharid Mohammad
Drukker Lior
Jiao Jianbo
Noble J. Alison
Papageorghiou Aris T.
Zisserman Andrew
Publication venue
Publication date: 06/07/2024
Field of study

Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals—usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer’s eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow

University of Birmingham Research Portal

Oxford University Research Archive

Discovering Salient Anatomical Landmarks by Predicting Human Gaze

Author: Chatelain Pierre
Droste Richard
Drukker Lior
Noble J. Alison
Papageorghiou Aris T.
Sharma Harshita
Publication venue
Publication date: 01/01/2020
Field of study

Anatomical landmarks are a crucial prerequisite for many medical imaging tasks. Usually, the set of landmarks for a given task is predefined by experts. The landmark locations for a given image are then annotated manually or via machine learning methods trained on manual annotations. In this paper, in contrast, we present a method to automatically discover and localize anatomical landmarks in medical images. Specifically, we consider landmarks that attract the visual attention of humans, which we term visually salient landmarks. We illustrate the method for fetal neurosonographic images. First, full-length clinical fetal ultrasound scans are recorded with live sonographer gaze-tracking. Next, a convolutional neural network (CNN) is trained to predict the gaze point distribution (saliency map) of the sonographers on scan video frames. The CNN is then used to predict saliency maps of unseen fetal neurosonographic images, and the landmarks are extracted as the local maxima of these saliency maps. Finally, the landmarks are matched across images by clustering the landmark CNN features. We show that the discovered landmarks can be used within affine image registration, with average landmark alignment errors between 4.1% and 10.9% of the fetal head long axis length.Comment: Accepted at IEEE International Symposium on Biomedical Imaging 2020 (ISBI 2020

arXiv.org e-Print Archive

Crossref

Oxford University Research Archive