188 research outputs found

    CNSEG-GAN: a lightweight generative adversarial network for segmentation of CRL and NT from first-trimester fetal ultrasound

    Get PDF
    This paper presents a novel, low-compute and efficient generative adversarial network (GAN) design for automatic segmentation called CNSeg-GAN, which combines 1-D kernel factorized networks, spatial and channel attention, and multi-scale aggregation mechanisms in a conditional GAN (cGAN) fashion. The proposed CNSeg-GAN architecture is trained and tested on a first-trimester ultrasound (US) scan video dataset for automatic detection and segmentation of anatomical structures in the midsagittal plane to enable Crown Rump Length (CRL) and Nuchal Translucency (NT) measurement. Experimental results shows that the proposed CNSeg-GAN is x15 faster than U-Net and yields mIoU of 78.20% on the CRL and 89.03% on the NT dataset, respectively with only 2.19 millions in parameters. The accuracy of this lightweight design makes it well-suited for real-time deployment in future work

    Skill, or style? Classification of fetal sonography eye-tracking data

    Get PDF
    We present a method for classifying human skill at fetal ultrasound scanning from eye-tracking and pupillary data of sonographers. Human skill characterization for this clinical task typically creates groupings of clinician skills such as expert and beginner based on the number of years of professional experience; experts typically have more than 10 years and beginners between 0-5 years. In some cases, they also include trainees who are not yet fully-qualified professionals. Prior work has considered eye movements that necessitates separating eye-tracking data into eye movements, such as fixations and saccades. Our method does not use prior assumptions about the relationship between years of experience and does not require the separation of eye-tracking data. Our best performing skill classification model achieves an F1 score of 98% and 70% for expert and trainee classes respectively. We also show that years of experience as a direct measure of skill, is significantly correlated to the expertise of a sonographer

    Towards standard plane prediction of fetal head ultrasound with domain adaption

    Get PDF
    Fetal Standard Plane (SP) acquisition is a key step in ultrasound based assessment of fetal health. The task detects an ultrasound (US) image with predefined anatomy. However, it requires skill to acquire a good SP in practice, and trainees and occasional users of ultrasound devices can find this challenging. In this work, we consider the task of automatically predicting the fetal head SP from the video approaching the SP. We adopt a domain transfer learning approach that maps the encoded spatial and temporal features of video in the source domain to the spatial representations of the desired SP image in the target domain, together with adversarial training to preserve the quality of the resulting image. Experimental results show that the predicted head plane is plausible and consistent with the anatomical features expected in a real SP. The proposed approach is motivated to support non-experts to find and analyse a trans-ventricular (TV) plane but could also be generalized to other planes, trimesters, and ultrasound imaging tasks for which standard planes are defined

    D2ANET: Densely Attentional-Aware Network for first trimester ultrasound CRL and NT segmentation

    Get PDF
    Manual annotation of medical images is time consuming for clinical experts; therefore, reliable automatic segmentation would be the ideal way to handle large medical datasets. In this paper, we are interested in detection and segmentation of two fundamental measurements in the first trimester ultrasound (US) scan: Nuchal Translucency (NT) and Crown Rump Length (CRL). There can be a significant variation in the shape, location or size of the anatomical structures in the fetal US scans. We propose a new approach, namely Densely Attentional-Aware Network for First Trimester Ultrasound CRL and NT Segmentation (DA2Net), to encode variation in feature size by relying on the powerful attention mechanism and densely connected networks. Our results show that the proposed D2ANet offers high pixel agreement (mean JSC = 84.21) with expert manual annotations

    Self-supervised Representation Learning for Ultrasound Video

    Full text link
    Recent advances in deep learning have achieved promising performance for medical image analysis, while in most cases ground-truth annotations from human experts are necessary to train the deep model. In practice, such annotations are expensive to collect and can be scarce for medical imaging applications. Therefore, there is significant interest in learning representations from unlabelled raw data. In this paper, we propose a self-supervised learning approach to learn meaningful and transferable representations from medical imaging video without any type of human annotation. We assume that in order to learn such a representation, the model should identify anatomical structures from the unlabelled data. Therefore we force the model to address anatomy-aware tasks with free supervision from the data itself. Specifically, the model is designed to correct the order of a reshuffled video clip and at the same time predict the geometric transformation applied to the video clip. Experiments on fetal ultrasound video show that the proposed approach can effectively learn meaningful and strong representations, which transfer well to downstream tasks like standard plane detection and saliency prediction.Comment: ISBI 202

    Gaze-probe joint guidance with multi-task learning in obstetric ultrasound scanning

    Get PDF
    In this work, we exploit multi-task learning to jointly predict the two decision-making processes of gaze movement and probe manipulation that an experienced sonographer would perform in routine obstetric scanning. A multimodal guidance framework, Multimodal-GuideNet, is proposed to detect the causal relationship between a real-world ultrasound video signal, synchronized gaze, and probe motion. The association between the multi-modality inputs is learned and shared through a modality-aware spatial graph that leverages useful cross-modal dependencies. By estimating the probability distribution of probe and gaze movements in real scans, the predicted guidance signals also allow inter- and intra-sonographer variations and avoid a fixed scanning path. We validate the new multi-modality approach on three types of obstetric scanning examinations, and the result consistently outperforms single-task learning under various guidance policies. To simulate sonographer’s attention on multi-structure images, we also explore multi-step estimation in gaze guidance, and its visual results show that the prediction allows multiple gaze centers that are substantially aligned with underlying anatomical structures

    Self-Supervised Ultrasound to MRI Fetal Brain Image Synthesis

    Full text link
    Fetal brain magnetic resonance imaging (MRI) offers exquisite images of the developing brain but is not suitable for second-trimester anomaly screening, for which ultrasound (US) is employed. Although expert sonographers are adept at reading US images, MR images which closely resemble anatomical images are much easier for non-experts to interpret. Thus in this paper we propose to generate MR-like images directly from clinical US images. In medical image analysis such a capability is potentially useful as well, for instance for automatic US-MRI registration and fusion. The proposed model is end-to-end trainable and self-supervised without any external annotations. Specifically, based on an assumption that the US and MRI data share a similar anatomical latent space, we first utilise a network to extract the shared latent features, which are then used for MRI synthesis. Since paired data is unavailable for our study (and rare in practice), pixel-level constraints are infeasible to apply. We instead propose to enforce the distributions to be statistically indistinguishable, by adversarial learning in both the image domain and feature space. To regularise the anatomical structures between US and MRI during synthesis, we further propose an adversarial structural constraint. A new cross-modal attention technique is proposed to utilise non-local spatial information, by encouraging multi-modal knowledge fusion and propagation. We extend the approach to consider the case where 3D auxiliary information (e.g., 3D neighbours and a 3D location index) from volumetric data is also available, and show that this improves image synthesis. The proposed approach is evaluated quantitatively and qualitatively with comparison to real fetal MR images and other approaches to synthesis, demonstrating its feasibility of synthesising realistic MR images.Comment: IEEE Transactions on Medical Imaging 202

    Automated description and workflow analysis of fetal echocardiography in first-trimester ultrasound video scans

    Get PDF
    This paper presents a novel, fully-automatic framework for fetal echocardiography analysis of full-length routine firsttrimester fetal ultrasound scan video. In this study, a new deep learning architecture, which considers spatio-temporal information and spatial attention, is designed to temporally partition ultrasound video into semantically meaningful segments. The resulting automated semantic annotation is used to analyse cardiac examination workflow. The proposed 2D+t convolution neural network architecture achieves an A1 accuracy of 96.37%, F1 of 95.61%, and precision of 96.18% with 21.49% fewer parameters than the smallest ResNet-based architecture. Automated deep-learning based semantic annotation of unlabelled video scans (n=250) shows a high correlation with expert cardiac annotations (ρ = 0.96, p = 0.0004), thereby demonstrating the applicability of the proposed annotation model for echocardiography workflow analysis

    Automating the human action of first-trimester biometry measurement from real-world freehand ultrasound

    Get PDF
    Objective: Automated medical image analysis solutions should closely mimic complete human actions to be useful in clinical practice. However, more often an automated image analysis solution represents only part of a human task, which restricts its practical utility. In the case of ultrasound-based fetal biometry, an automated solution should ideally recognize key fetal structures in freehand video guidance, select a standard plane from a video stream and perform biometry. A complete automated solution should automate all three subactions. Methods: In this article, we consider how to automate the complete human action of first-trimester biometry measurement from real-world freehand ultrasound. In the proposed hybrid convolutional neural network (CNN) architecture design, a classification regression-based guidance model detects and tracks fetal anatomical structures (using visual cues) in the ultrasound video. Several high-quality standard planes that contain the mid-sagittal view of the fetus are sampled at multiple time stamps (using a custom-designed confident-frame detector) based on the estimated probability values associated with predicted anatomical structures that define the biometry plane. Automated semantic segmentation is performed on the selected frames to extract fetal anatomical landmarks. A crown–rump length (CRL) estimate is calculated as the mean CRL from these multiple frames. Results: Our fully automated method has a high correlation with clinical expert CRL measurement (Pearson's p = 0.92, R-squared [R2] = 0.84) and a low mean absolute error of 0.834 (weeks) for fetal age estimation on a test data set of 42 videos. Conclusion: A novel algorithm for standard plane detection employs a quality detection mechanism defined by clinical standards, ensuring precise biometric measurements

    Audio-visual modelling in a clinical setting

    Get PDF
    Auditory and visual signals are two primary perception modalities that are usually present together and correlate with each other, not only in natural environments but also in clinical settings. However, audio-visual modelling in the latter case can be more challenging, due to the different sources of audio/video signals and the noise (both signal-level and semantic-level) in auditory signals—usually speech audio. In this study, we consider audio-visual modelling in a clinical setting, providing a solution to learn medical representations that benefit various clinical tasks, without relying on dense supervisory annotations from human experts for the model training. A simple yet effective multi-modal self-supervised learning framework is presented for this purpose. The proposed approach is able to help find standard anatomical planes, predict the focusing position of sonographer’s eyes, and localise anatomical regions of interest during ultrasound imaging. Experimental analysis on a large-scale clinical multi-modal ultrasound video dataset show that the proposed novel representation learning method provides good transferable anatomical representations that boost the performance of automated downstream clinical tasks, even outperforming fully-supervised solutions. Being able to learn such medical representations in a self-supervised manner will contribute to several aspects including a better understanding of obstetric imaging, training new sonographers, more effective assistive tools for human experts, and enhancement of the clinical workflow
    • 

    corecore