13 research outputs found
ComBat Harmonization of Image Reconstruction Parameters to Improve the Repeatability of Radiomics Features
Image reconstruction parameters lead to variability in radiomic features, which challenges repeatability and reproducibility of radiomics features, especially in multicenter clinical trials. To deal with this issue, a number of harmonization techniques were proposed. The aim of the current study was to investigate feature domain harmonization to cope with repeatability issues in PET radiomic features. Our study was conducted on a thorax phantom filled with 18F-FDG to acquire PET/CT images followed by extraction of 56 radiomics features on each of the six phantom lesions. Different reconstruction algorithms, post-processing filters, number of iterations, number of subsets, and matrix sizes were investigated. ComBat harmonization method was applied to these parameters on radiomics features and the results and associated p-values reported using Kruskal-Wallis test with a significance level of 0.05. This test indicated that 2, 25, 8, 26, and 29 features for reconstruction algorithms, post-processing filter, number of iterations, number of subset sand matrix size parameters, respectively, had significant variability (all p-values <0.05) before harmonization. These were reduced to 0, 2, 0, 0, and 0 features, respectively. The results of our study indicate that the repeatability of PET radiomics features among several image reconstruction parameters might be improved with the help of harmonization methods and could further support multi-institutional studies.</p
PyTomography: A Python Library for Quantitative Medical Image Reconstruction
Background: There is a scarcity of open-source libraries in medical imaging
dedicated to both (i) the development and deployment of novel reconstruction
algorithms and (ii) support for clinical data.
Purpose: To create and evaluate a GPU-accelerated, open-source, and
user-friendly image reconstruction library, designed to serve as a central
platform for the development, validation, and deployment of novel tomographic
reconstruction algorithms.
Methods: PyTomography was developed using Python and inherits the
GPU-accelerated functionality of PyTorch for fast computations. The software
uses a modular design that decouples the system matrix from reconstruction
algorithms, simplifying the process of integrating new imaging modalities or
developing novel reconstruction techniques. As example developments, SPECT
reconstruction in PyTomography is validated against both vendor-specific
software and alternative open-source libraries. Bayesian reconstruction
algorithms are implemented and validated.
Results: PyTomography is consistent with both vendor-software and alternative
open source libraries for standard SPECT clinical reconstruction, while
providing significant computational advantages. As example applications,
Bayesian reconstruction algorithms incorporating anatomical information are
shown to outperform the traditional ordered subset expectation maximum (OSEM)
algorithm in quantitative image analysis. PSF modeling in PET imaging is shown
to reduce blurring artifacts.
Conclusions: We have developed and publicly shared PyTomography, a highly
optimized and user-friendly software for quantitative image reconstruction of
medical images, with a class hierarchy that fosters the development of novel
imaging applications.Comment: 26 pages, 7 figure
Left Ventricular Myocardial Dysfunction Evaluation in Thalassemia Patients Using Echocardiographic Radiomic Features and Machine Learning Algorithms.
Heart failure caused by iron deposits in the myocardium is the primary cause of mortality in beta-thalassemia major patients. Cardiac magnetic resonance imaging (CMRI) T2* is the primary screening technique used to detect myocardial iron overload, but inherently bears some limitations. In this study, we aimed to differentiate beta-thalassemia major patients with myocardial iron overload from those without myocardial iron overload (detected by T2*CMRI) based on radiomic features extracted from echocardiography images and machine learning (ML) in patients with normal left ventricular ejection fraction (LVEF > 55%) in echocardiography. Out of 91 cases, 44 patients with thalassemia major with normal LVEF (> 55%) and T2* ≤ 20 ms and 47 people with LVEF > 55% and T2* > 20 ms as the control group were included in the study. Radiomic features were extracted for each end-systolic (ES) and end-diastolic (ED) image. Then, three feature selection (FS) methods and six different classifiers were used. The models were evaluated using various metrics, including the area under the ROC curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE). Maximum relevance-minimum redundancy-eXtreme gradient boosting (MRMR-XGB) (AUC = 0.73, ACC = 0.73, SPE = 0.73, SEN = 0.73), ANOVA-MLP (AUC = 0.69, ACC = 0.69, SPE = 0.56, SEN = 0.83), and recursive feature elimination-K-nearest neighbors (RFE-KNN) (AUC = 0.65, ACC = 0.65, SPE = 0.64, SEN = 0.65) were the best models in ED, ES, and ED&ES datasets. Using radiomic features extracted from echocardiographic images and ML, it is feasible to predict cardiac problems caused by iron overload
Cardiac Pattern Recognition from SPECT Images Using Machine Learning Algorithms
Heart failure is a fatal disease that is becoming more prevalent worldwide. Cardiac resynchronization therapy (CRT) treatment is an approach to treat patients with end-stage heart failure. However, since one third of the patients do not respond to this invasive and expensive therapy, response prediction becomes essential for this treatment. Recent studies suggest that patients with a U-shaped left ventricular contraction pattern respond better to CRT treatment. Therefore, our main attempt is to identify these patterns on gated-SPECT myocardial perfusion images (GSPECT MPI) using radiomics and machine learning algorithms to achieve a robust prediction of treatment response. We enrolled 88 patients including 19 patients who underwent CRT, and 69 who did not. In addition to radiomic features, easily accessible clinical features, such as age, sex, QRS complex duration, ejection fraction (EF) and phase analysis data extracted from the quantified gated SPECT (QGS) were analysed. Feature selection was performed with maximum relevant minimum redundancy (MRMR) algorithm. After the feature selection three feature signatures, including a radiomics only, a clinical only and a radiomics + clinical were developed to feed machine learning algorithms. Machine learning techniques included logistic regression (LR), Random Forest (RF), Support Vector Machine (SVM), and XGBoost. The area under the ROC curve (AUC), accuracy (ACC), sensitivity (SEN), and specificity (SPE) of all models were reported. The best performance was achieved using the XGB model when applied on the clinical + radiomics feature set (AUC = 0.82). This is followed by that XGB and RF applied to clinical feature signature (AUC = 0.80 and 0.74, respectively). Our results demonstrated the promising potential regarding CRT response prediction with radiomics modelling.</p
Myocardial perfusion SPECT radiomic features reproducibility assessment:Impact of image reconstruction and harmonization
Background: Coronary artery disease (CAD) has one of the highest mortality rates in humans worldwide. Single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) provides clinicians with myocardial metabolic information non-invasively. However, there are some limitations to interpreting SPECT images performed by physicians or automatic quantitative approaches. Radiomics analyzes images objectively by extracting quantitative features and can potentially reveal biological characteristics that the human eye cannot detect. However, the reproducibility and repeatability of some radiomic features can be highly susceptible to segmentation and imaging conditions.Purpose: We aimed to assess the reproducibility of radiomic features extracted from uncorrected MPI-SPECT images reconstructed with 15 different settings before and after ComBat harmonization, along with evaluating the effectiveness of ComBat in realigning feature distributions.Materials and methods: A total of 200 patients (50% normal and 50% abnormal) including rest and stress (without attenuation and scatter corrections) MPI-SPECT images were included. Images were reconstructed using 15 combinations of filter cut-off frequencies, filter orders, filter types, reconstruction algorithms, number of iterations and subsets resulting in 6000 images. Image segmentation was performed on the left ventricle in the first reconstruction for each patient and applied to 14 others. A total of 93 radiomic features were extracted from the segmented area, and ComBat was used to harmonize them. The intraclass correlation coefficient (ICC) and overall concordance correlation coefficient (OCCC) tests were performed before and after ComBat to examine the impact of each parameter on feature robustness and to assess harmonization efficiency. The ANOVA and the Kruskal–Wallis tests were performed to evaluate the effectiveness of ComBat in correcting feature distributions. In addition, the Student's t-test, Wilcoxon rank-sum, and signed-rank tests were implemented to assess the significance level of the impacts made by each parameter of different batches and patient groups (normal vs. abnormal) on radiomic features. Results: Before applying ComBat, the majority of features (ICC: 82, OCCC: 61) achieved high reproducibility (ICC/OCCC ≥ 0.900) under every batch except Reconstruction. The largest and smallest number of poor features (ICC/OCCC < 0.500) were obtained by IterationSubset and Order batches, respectively. The most reliable features were from the first-order (FO) and gray-level co-occurrence matrix (GLCM) families. Following harmonization, the minimum number of robust features increased (ICC: 84, OCCC: 78). Applying ComBat showed that Order and Reconstruction were the least and the most responsive batches, respectively. The most robust families, in a descending order, were found to be FO, neighborhood gray-tone difference matrix (NGTDM), GLCM, gray-level run length matrix (GLRLM), gray-level size zone matrix (GLSZM), and gray-level dependence matrix (GLDM) under Cut-off, Filter, and Order batches. The Wilcoxon rank-sum test showed that the number of robust features significantly differed under most batches in the Normal and Abnormal groups.Conclusion: The majority of radiomic features show high levels of robustness across different OSEM reconstruction parameters in uncorrected MPI-SPECT. ComBat is effective in realigning feature distributions and enhancing radiomic features reproducibility.</p
Thyroidiomics: An Automated Pipeline for Segmentation and Classification of Thyroid Pathologies from Scintigraphy Images
The objective of this study was to develop an automated pipeline that enhances thyroid disease classification using thyroid scintigraphy images, aiming to decrease assessment time and increase diagnostic accuracy. Anterior thyroid scintigraphy images from 2,643 patients were collected and categorized into diffuse goiter (DG), multinodal goiter (MNG), and thyroiditis (TH) based on clinical reports, and then segmented by an expert. A ResUNet model was trained to perform auto-segmentation. Radiomic features were extracted from both physician (scenario 1) and ResUNet segmentations (scenario 2), followed by omitting highly correlated features using Spearman\u27s correlation, and feature selection using Recursive Feature Elimination (RFE) with XGBoost as the core. All models were trained under leave-one-center-out cross-validation (LOCOCV) scheme, where nine instances of algorithms were iteratively trained and validated on data from eight centers and tested on the ninth for both scenarios separately. Segmentation performance was assessed using the Dice similarity coefficient (DSC), while classification performance was assessed using metrics, such as precision, recall, F1-score, accuracy, area under the Receiver Operating Characteristic (ROC AUC), and area under the precision-recall curve (PRC AUC). ResUNet achieved DSC values of 0.840.03, 0.710.06, and 0.860.02 for MNG, TH, and DG, respectively. Classification in scenario 1 achieved an accuracy of 0.760.04 and a ROC AUC of 0.920.02 while in scenario 2, classification yielded an accuracy of 0.740.05 and a ROC AUC of 0.900.02. The automated pipeline demonstrated comparable performance to physician segmentations on several classification metrics across different classes, effectively reducing assessment time while maintaining high diagnostic accuracy. Code available at: https://github.com/ahxmeds/thyroidiomics.git.7 pages, 4 figures, 2 table
Machine learning based readmission and mortality prediction in heart failure patients
Abstract This study intends to predict in-hospital and 6-month mortality, as well as 30-day and 90-day hospital readmission, using Machine Learning (ML) approach via conventional features. A total of 737 patients remained after applying the exclusion criteria to 1101 heart failure patients. Thirty-four conventional features were collected for each patient. First, the data were divided into train and test cohorts with a 70–30% ratio. Then train data were normalized using the Z-score method, and its mean and standard deviation were applied to the test data. Subsequently, Boruta, RFE, and MRMR feature selection methods were utilized to select more important features in the training set. In the next step, eight ML approaches were used for modeling. Next, hyperparameters were optimized using tenfold cross-validation and grid search in the train dataset. All model development steps (normalization, feature selection, and hyperparameter optimization) were performed on a train set without touching the hold-out test set. Then, bootstrapping was done 1000 times on the hold-out test data. Finally, the obtained results were evaluated using four metrics: area under the ROC curve (AUC), accuracy (ACC), specificity (SPE), and sensitivity (SEN). The RFE-LR (AUC: 0.91, ACC: 0.84, SPE: 0.84, SEN: 0.83) and Boruta-LR (AUC: 0.90, ACC: 0.85, SPE: 0.85, SEN: 0.83) models generated the best results in terms of in-hospital mortality. In terms of 30-day rehospitalization, Boruta-SVM (AUC: 0.73, ACC: 0.81, SPE: 0.85, SEN: 0.50) and MRMR-LR (AUC: 0.71, ACC: 0.68, SPE: 0.69, SEN: 0.63) models performed the best. The best model for 3-month rehospitalization was MRMR-KNN (AUC: 0.60, ACC: 0.63, SPE: 0.66, SEN: 0.53) and regarding 6-month mortality, the MRMR-LR (AUC: 0.61, ACC: 0.63, SPE: 0.44, SEN: 0.66) and MRMR-NB (AUC: 0.59, ACC: 0.61, SPE: 0.48, SEN: 0.63) models outperformed the others. Reliable models were developed in 30-day rehospitalization and in-hospital mortality using conventional features and ML techniques. Such models can effectively personalize treatment, decision-making, and wiser budget allocation. Obtained results in 3-month rehospitalization and 6-month mortality endpoints were not astonishing and further experiments with additional information are needed to fetch promising results in these endpoints
Artificial intelligence-based analysis of whole-body bone scintigraphy: The quest for the optimal deep learning algorithm and comparison with human observer performance
Purpose: Whole-body bone scintigraphy (WBS) is one of the most widely used modalities in diagnosing malignant bone diseases during the early stages. However, the procedure is time-consuming and requires vigour and experience. Moreover, interpretation of WBS scans in the early stages of the disorders might be challenging because the patterns often reflect normal appearance that is prone to subjective interpretation. To simplify the gruelling, subjective, and prone-to-error task of interpreting WBS scans, we developed deep learning (DL) models to automate two major analyses, namely (i) classification of scans into normal and abnormal and (ii) discrimination between malignant and non-neoplastic bone diseases, and compared their performance with human observers. Materials and Methods: After applying our exclusion criteria on 7188 patients from three different centers, 3772 and 2248 patients were enrolled for the first and second analyses, respectively. Data were split into two parts, including training and testing, while a fraction of training data were considered for validation. Ten different CNN models were applied to single- and dual-view input (posterior and anterior views) modes to find the optimal model for each analysis. In addition, three different methods, including squeeze-and-excitation (SE), spatial pyramid pooling (SPP), and attention-augmented (AA), were used to aggregate the features for dual-view input models. Model performance was reported through area under the receiver operating characteristic (ROC) curve (AUC), accuracy, sensitivity, and specificity and was compared with the DeLong test applied to ROC curves. The test dataset was evaluated by three nuclear medicine physicians (NMPs) with different levels of experience to compare the performance of AI and human observers. Results: DenseNet121_AA (DensNet121, with dual-view input aggregated by AA) and InceptionResNetV2_SPP achieved the highest performance (AUC = 0.72) for the first and second analyses, respectively. Moreover, on average, in the first analysis, Inception V3 and InceptionResNetV2 CNN models and dual-view input with AA aggregating method had superior performance. In addition, in the second analysis, DenseNet121 and InceptionResNetV2 as CNN methods and dual-view input with AA aggregating method achieved the best results. Conversely, the performance of AI models was significantly higher than human observers for the first analysis, whereas their performance was comparable in the second analysis, although the AI model assessed the scans in a drastically lower time. Conclusion: Using the models designed in this study, a positive step can be taken toward improving and optimizing WBS interpretation. By training DL models with larger and more diverse cohorts, AI could potentially be used to assist physicians in the assessment of WBS images
Impact of Field-of-view Zooming and Segmentation Batches on Radiomics Features Reproducibility and Machine Learning Performance in Thyroid Scintigraphy
Background: Thyroid diseases are the second most common hormonal disorders, necessitating accurate diagnostics. Advances in artificial intelligence and radiomics have enhanced diagnostic precision by analyzing quantitative imaging features. However, reproducibility challenges arising from factors such as the field-of-view (FOV) zooming and segmentation variability limit the clinical application of radiomic-based models.
Aim: This study focuses on evaluating the impact of segmentation and FOV zooming on the reproducibility of radiomic features and improved performance of machine learning (ML) when using reproducible features for classification of thyroid scintigraphy images into normal, diffuse goiter (DG), multinodular goiter (MNG), and thyroiditis.
Patients and methods: A retrospective analysis was conducted on 872 thyroid scintigraphy cases from 3 centers. Radiomic feature reproducibility was assessed using the intraclass correlation coefficient (ICC), with robust features (ICC≥0.80) identified under segmentation and zooming conditions. Four ML training scenarios were implemented to train models on Center A data, including (1) all, (2) zoom-robust, (3) segmentation-robust, and (4) mutually robust features, with 3 feature selection methods and 7 classifiers. Models were validated on external data sets (centers B and C).
Results: FOV zooming significantly reduced feature reproducibility (ICC≥0.80: 49%), while segmentation effects were minimal (ICC≥0.80: 96%). Models trained on mutually robust features outperformed those trained using all features. Boruta-MLP achieved the highest accuracy (0.71, P-value <0.001 vs. all features) in zoomed data sets, and RFE-MLP performed best (0.69, P-value <0.001 vs. all features) in the baseline data set, with Gray-Level Co-occurrence Matrix (GLCM) features frequently selected.
Conclusions: Utilizing robust radiomic features significantly improved the performance of ML models in thyroid disease classification, enabling more accurate and generalizable diagnostic outcomes across diverse data sets.</p
Myocardial Perfusion SPECT Imaging Radiomic Features and Machine Learning Algorithms for Cardiac Contractile Pattern Recognition
A U-shaped contraction pattern was shown to be associated with a better Cardiac resynchronization therapy (CRT) response. The main goal of this study is to automatically recognize left ventricular contractile patterns using machine learning algorithms trained on conventional quantitative features (ConQuaFea) and radiomic features extracted from Gated single-photon emission computed tomography myocardial perfusion imaging (GSPECT MPI). Among 98 patients with standard resting GSPECT MPI included in this study, 29 received CRT therapy and 69 did not (also had CRT inclusion criteria but did not receive treatment yet at the time of data collection, or refused treatment). A total of 69 non-CRT patients were employed for training, and the 29 were employed for testing. The models were built utilizing features from three distinct feature sets (ConQuaFea, radiomics, and ConQuaFea + radiomics (combined)), which were chosen using Recursive feature elimination (RFE) feature selection (FS), and then trained using seven different machine learning (ML) classifiers. In addition, CRT outcome prediction was assessed by different treatment inclusion criteria as the study’s final phase. The MLP classifier had the highest performance among ConQuaFea models (AUC, SEN, SPE = 0.80, 0.85, 0.76). RF achieved the best performance in terms of AUC, SEN, and SPE with values of 0.65, 0.62, and 0.68, respectively, among radiomic models. GB and RF approaches achieved the best AUC, SEN, and SPE values of 0.78, 0.92, and 0.63 and 0.74, 0.93, and 0.56, respectively, among the combined models. A promising outcome was obtained when using radiomic and ConQuaFea from GSPECT MPI to detect left ventricular contractile patterns by machine learning
