3 research outputs found

    Are We There Yet? The Value of Deep Learning in a Multicenter Setting for Response Prediction of Locally Advanced Rectal Cancer to Neoadjuvant Chemoradiotherapy

    No full text
    This retrospective study aims to evaluate the generalizability of a promising state-of-the-art multitask deep learning (DL) model for predicting the response of locally advanced rectal cancer (LARC) to neoadjuvant chemoradiotherapy (nCRT) using a multicenter dataset. To this end, we retrained and validated a Siamese network with two U-Nets joined at multiple layers using pre- and post-therapeutic T2-weighted (T2w), diffusion-weighted (DW) images and apparent diffusion coefficient (ADC) maps of 83 LARC patients acquired under study conditions at four different medical centers. To assess the predictive performance of the model, the trained network was then applied to an external clinical routine dataset of 46 LARC patients imaged without study conditions. The training and test datasets differed significantly in terms of their composition, e.g., T-/N-staging, the time interval between initial staging/nCRT/re-staging and surgery, as well as with respect to acquisition parameters, such as resolution, echo/repetition time, flip angle and field strength. We found that even after dedicated data pre-processing, the predictive performance dropped significantly in this multicenter setting compared to a previously published single- or two-center setting. Testing the network on the external clinical routine dataset yielded an area under the receiver operating characteristic curve of 0.54 (95% confidence interval [CI]: 0.41, 0.65), when using only pre- and post-therapeutic T2w images as input, and 0.60 (95% CI: 0.48, 0.71), when using the combination of pre- and post-therapeutic T2w, DW images, and ADC maps as input. Our study highlights the importance of data quality and harmonization in clinical trials using machine learning. Only in a joint, cross-center effort, involving a multidisciplinary team can we generate large enough curated and annotated datasets and develop the necessary pre-processing pipelines for data harmonization to successfully apply DL models clinically

    Influence of Image Processing on Radiomic Features From Magnetic Resonance Imaging

    Full text link
    OBJECTIVE Before implementing radiomics in routine clinical practice, comprehensive knowledge about the repeatability and reproducibility of radiomic features is required. The aim of this study was to systematically investigate the influence of image processing parameters on radiomic features from magnetic resonance imaging (MRI) in terms of feature values as well as test-retest repeatability. MATERIALS AND METHODS Utilizing a phantom consisting of 4 onions, 4 limes, 4 kiwifruits, and 4 apples, we acquired a test-retest dataset featuring 3 of the most commonly used MRI sequences on a 3 T scanner, namely, a T1-weighted, a T2-weighted, and a fluid-attenuated inversion recovery sequence, each at high and low resolution. After semiautomatic image segmentation, image processing with systematic variation of image processing parameters was performed, including spatial resampling, intensity discretization, and intensity rescaling. For each respective image processing setting, a total of 45 radiomic features were extracted, corresponding to the following 7 matrices/feature classes: conventional indices, histogram matrix, shape matrix, gray-level zone length matrix, gray-level run length matrix, neighboring gray-level dependence matrix, and gray-level cooccurrence matrix. Systematic differences of individual features between different resampling steps were assessed using 1-way analysis of variance with Tukey-type post hoc comparisons to adjust for multiple testing. Test-retest repeatability of radiomic features was measured using the concordance correlation coefficient, dynamic range, and intraclass correlation coefficient. RESULTS Image processing influenced radiological feature values. Regardless of the acquired sequence and feature class, significant differences (P < 0.05) in feature values were found when the size of the resampled voxels was too large, that is, bigger than 3 mm. Almost all higher-order features depended strongly on intensity discretization. The effects of intensity rescaling were negligible except for some features derived from T1-weighted sequences. For all sequences, the percentage of repeatable features (concordance correlation coefficient and dynamic range ≥ 0.9) varied considerably depending on the image processing settings. The optimal image processing setting to achieve the highest percentage of stable features varied per sequence. Irrespective of image processing, the fluid-attenuated inversion recovery sequence in high-resolution overall yielded the highest number of stable features in comparison with the other sequences (89% vs 64%-78% for the respective optimal image processing settings). Across all sequences, the most repeatable features were generally obtained for a spatial resampling close to the originally acquired voxel size and an intensity discretization to at least 32 bins. CONCLUSION Variation of image processing parameters has a significant impact on the values of radiomic features as well as their repeatability. Furthermore, the optimal image processing parameters differ for each MRI sequence. Therefore, it is recommended that these processing parameters be determined in corresponding test-retest scans before clinical application. Extensive repeatability, reproducibility, and validation studies as well as standardization are required before quantitative image analysis and radiomics can be reliably translated into routine clinical care

    End-to-end automated body composition analyses with integrated quality control for opportunistic assessment of sarcopenia in CT

    No full text
    Objectives!#!To develop a pipeline for automated body composition analysis and skeletal muscle assessment with integrated quality control for large-scale application in opportunistic imaging.!##!Methods!#!First, a convolutional neural network for extraction of a single slice at the L3/L4 lumbar level was developed on CT scans of 240 patients applying the nnU-Net framework. Second, a 2D competitive dense fully convolutional U-Net for segmentation of visceral and subcutaneous adipose tissue (VAT, SAT), skeletal muscle (SM), and subsequent determination of fatty muscle fraction (FMF) was developed on single CT slices of 1143 patients. For both steps, automated quality control was integrated by a logistic regression model classifying the presence of L3/L4 and a linear regression model predicting the segmentation quality in terms of Dice score. To evaluate the performance of the entire pipeline end-to-end, body composition metrics, and FMF were compared to manual analyses including 364 patients from two centers.!##!Results!#!Excellent results were observed for slice extraction (z-deviation = 2.46 ± 6.20 mm) and segmentation (Dice score for SM = 0.95 ± 0.04, VAT = 0.98 ± 0.02, SAT = 0.97 ± 0.04) on the dual-center test set excluding cases with artifacts due to metallic implants. No data were excluded for end-to-end performance analyses. With a restrictive setting of the integrated segmentation quality control, 39 of 364 patients were excluded containing 8 cases with metallic implants. This setting ensured a high agreement between manual and fully automated analyses with mean relative area deviations of ΔSM = 3.3 ± 4.1%, ΔVAT = 3.0 ± 4.7%, ΔSAT = 2.7 ± 4.3%, and ΔFMF = 4.3 ± 4.4%.!##!Conclusions!#!This study presents an end-to-end automated deep learning pipeline for large-scale opportunistic assessment of body composition metrics and sarcopenia biomarkers in clinical routine.!##!Key points!#!• Body composition metrics and skeletal muscle quality can be opportunistically determined from routine abdominal CT scans. • A pipeline consisting of two convolutional neural networks allows an end-to-end automated analysis. • Machine-learning-based quality control ensures high agreement between manual and automatic analysis
    corecore