15 research outputs found
Recommended from our members
Medical Image Data and Datasets in the Era of Machine Learning-Whitepaper from the 2016 C-MIMI Meeting Dataset Session.
At the first annual Conference on Machine Intelligence in Medical Imaging (C-MIMI), held in September 2016, a conference session on medical image data and datasets for machine learning identified multiple issues. The common theme from attendees was that everyone participating in medical image evaluation with machine learning is data starved. There is an urgent need to find better ways to collect, annotate, and reuse medical imaging data. Unique domain issues with medical image datasets require further study, development, and dissemination of best practices and standards, and a coordinated effort among medical imaging domain experts, medical imaging informaticists, government and industry data scientists, and interested commercial, academic, and government entities. High-level attributes of reusable medical image datasets suitable to train, test, validate, verify, and regulate ML products should be better described. NIH and other government agencies should promote and, where applicable, enforce, access to medical image datasets. We should improve communication among medical imaging domain experts, medical imaging informaticists, academic clinical and basic science researchers, government and industry data scientists, and interested commercial entities
Development and Validation of the Radiology Common Data Model (R-CDM) for the International Standardization of Medical Imaging Data
Purpose: Digital Imaging and Communications in Medicine (DICOM), a standard file format for medical imaging data, contains metadata describing each file. However, metadata are often incomplete, and there is no standardized format for recording metadata, leading to inefficiency during the metadata-based data retrieval process. Here, we propose a novel standardization method for DICOM metadata termed the Radiology Common Data Model (R-CDM).
Materials and methods: R-CDM was designed to be compatible with Health Level Seven International (HL7)/Fast Healthcare Interoperability Resources (FHIR) and linked with the Observational Medical Outcomes Partnership (OMOP)-CDM to achieve a seamless link between clinical data and medical imaging data. The terminology system was standardized using the RadLex playbook, a comprehensive lexicon of radiology. As a proof of concept, the R-CDM conversion process was conducted with 41.7 TB of data from the Ajou University Hospital. The R-CDM database visualizer was developed to visualize the main characteristics of the R-CDM database.
Results: Information from 2801360 cases and 87203226 DICOM files was organized into two tables constituting the R-CDM. Information on imaging device and image resolution was recorded with more than 99.9% accuracy. Furthermore, OMOP-CDM and R-CDM were linked to efficiently extract specific types of images from specific patient cohorts.
Conclusion: R-CDM standardizes the structure and terminology for recording medical imaging data to eliminate incomplete and unstandardized information. Successful standardization was achieved by the extract, transform, and load process and image classifier. We hope that the R-CDM will contribute to deep learning research in the medical imaging field by enabling the securement of large-scale medical imaging data from multinational institutions.ope
Recommended from our members
Radiogenomics of clear cell renal cell carcinoma: preliminary findings of The Cancer Genome Atlas–Renal Cell Carcinoma (TCGA–RCC) Imaging Research Group
Purpose: To investigate associations between imaging features and mutational status of clear cell renal cell carcinoma (ccRCC). Materials and methods: This multi-institutional, multi-reader study included 103 patients (77 men; median age 59 years, range 34–79) with ccRCC examined with CT in 81 patients, MRI in 19, and both CT and MRI in three; images were downloaded from The Cancer Imaging Archive, an NCI-funded project for genome-mapping and analyses. Imaging features [size (mm), margin (well-defined or ill-defined), composition (solid or cystic), necrosis (for solid tumors: 0%, 1%–33%, 34%–66% or >66%), growth pattern (endophytic, <50% exophytic, or ≥50% exophytic), and calcification (present, absent, or indeterminate)] were reviewed independently by three readers blinded to mutational data. The association of imaging features with mutational status (VHL, BAP1, PBRM1, SETD2, KDM5C, and MUC4) was assessed. Results: Median tumor size was 49 mm (range 14–162 mm), 73 (71%) tumors had well-defined margins, 98 (95%) tumors were solid, 95 (92%) showed presence of necrosis, 46 (45%) had ≥50% exophytic component, and 18 (19.8%) had calcification. VHL (n = 52) and PBRM1 (n = 24) were the most common mutations. BAP1 mutation was associated with ill-defined margin and presence of calcification (p = 0.02 and 0.002, respectively, Pearson’s χ2 test); MUC4 mutation was associated with an exophytic growth pattern (p = 0.002, Mann–Whitney U test). Conclusions: BAP1 mutation was associated with ill-defined tumor margins and presence of calcification; MUC4 mutation was associated with exophytic growth. Given the known prognostic implications of BAP1 and MUC4 mutations, these results support using radiogenomics to aid in prognostication and management
Quantitative MRI radiomics in the prediction of molecular classifications of breast cancer subtypes in the TCGA/TCIA data set
Using quantitative radiomics, we demonstrate that computer-extracted magnetic resonance (MR) image-based tumor phenotypes can be predictive of the molecular classification of invasive breast cancers. Radiomics analysis was performed on 91 MRIs of biopsy-proven invasive breast cancers from National Cancer Institute’s multi-institutional TCGA/TCIA. Immunohistochemistry molecular classification was performed including estrogen receptor, progesterone receptor, human epidermal growth factor receptor 2, and for 84 cases, the molecular subtype (normal-like, luminal A, luminal B, HER2-enriched, and basal-like). Computerized quantitative image analysis included: three-dimensional lesion segmentation, phenotype extraction, and leave-one-case-out cross validation involving stepwise feature selection and linear discriminant analysis. The performance of the classifier model for molecular subtyping was evaluated using receiver operating characteristic analysis. The computer-extracted tumor phenotypes were able to distinguish between molecular prognostic indicators; area under the ROC curve values of 0.89, 0.69, 0.65, and 0.67 in the tasks of distinguishing between ER+ versus ER−, PR+ versus PR−, HER2+ versus HER2−, and triple-negative versus others, respectively. Statistically significant associations between tumor phenotypes and receptor status were observed. More aggressive cancers are likely to be larger in size with more heterogeneity in their contrast enhancement. Even after controlling for tumor size, a statistically significant trend was observed within each size group (P = 0.04 for lesions ≤ 2 cm; P = 0.02 for lesions >2 to≤ 5 cm) as with the entire data set (P-value = 0.006) for the relationship between enhancement texture (entropy) and molecular subtypes (normal-like, luminal A, luminal B, HER2-enriched, basal-like). In conclusion, computer-extracted image phenotypes show promise for high-throughput discrimination of breast cancer subtypes and may yield a quantitative predictive signature for advancing precision medicine
Recommended from our members
Integration of proteomics with CT-based qualitative and radiomic features in high-grade serous ovarian cancer patients: an exploratory analysis
Abstract: Objectives: To investigate the association between CT imaging traits and texture metrics with proteomic data in patients with high-grade serous ovarian cancer (HGSOC). Methods: This retrospective, hypothesis-generating study included 20 patients with HGSOC prior to primary cytoreductive surgery. Two readers independently assessed the contrast-enhanced computed tomography (CT) images and extracted 33 imaging traits, with a third reader adjudicating in the event of a disagreement. In addition, all sites of suspected HGSOC were manually segmented texture features which were computed from each tumor site. Three texture features that represented intra- and inter-site tumor heterogeneity were used for analysis. An integrated analysis of transcriptomic and proteomic data identified proteins with conserved expression between primary tumor sites and metastasis. Correlations between protein abundance and various CT imaging traits and texture features were assessed using the Kendall tau rank correlation coefficient and the Mann-Whitney U test, whereas the area under the receiver operating characteristic curve (AUC) was reported as a metric of the strength and the direction of the association. P values < 0.05 were considered significant. Results: Four proteins were associated with CT-based imaging traits, with the strongest correlation observed between the CRIP2 protein and disease in the mesentery (p < 0.001, AUC = 0.05). The abundance of three proteins was associated with texture features that represented intra-and inter-site tumor heterogeneity, with the strongest negative correlation between the CKB protein and cluster dissimilarity (p = 0.047, τ = 0.326). Conclusion: This study provides the first insights into the potential associations between standard-of-care CT imaging traits and texture measures of intra- and inter-site heterogeneity, and the abundance of several proteins. Key Points: • CT-based texture features of intra- and inter-site tumor heterogeneity correlate with the abundance of several proteins in patients with HGSOC. • CT imaging traits correlate with protein abundance in patients with HGSOC
Recommended from our members
Integration of proteomics with CT-based qualitative and radiomic features in high-grade serous ovarian cancer patients: an exploratory analysis
Abstract: Objectives: To investigate the association between CT imaging traits and texture metrics with proteomic data in patients with high-grade serous ovarian cancer (HGSOC). Methods: This retrospective, hypothesis-generating study included 20 patients with HGSOC prior to primary cytoreductive surgery. Two readers independently assessed the contrast-enhanced computed tomography (CT) images and extracted 33 imaging traits, with a third reader adjudicating in the event of a disagreement. In addition, all sites of suspected HGSOC were manually segmented texture features which were computed from each tumor site. Three texture features that represented intra- and inter-site tumor heterogeneity were used for analysis. An integrated analysis of transcriptomic and proteomic data identified proteins with conserved expression between primary tumor sites and metastasis. Correlations between protein abundance and various CT imaging traits and texture features were assessed using the Kendall tau rank correlation coefficient and the Mann-Whitney U test, whereas the area under the receiver operating characteristic curve (AUC) was reported as a metric of the strength and the direction of the association. P values < 0.05 were considered significant. Results: Four proteins were associated with CT-based imaging traits, with the strongest correlation observed between the CRIP2 protein and disease in the mesentery (p < 0.001, AUC = 0.05). The abundance of three proteins was associated with texture features that represented intra-and inter-site tumor heterogeneity, with the strongest negative correlation between the CKB protein and cluster dissimilarity (p = 0.047, τ = 0.326). Conclusion: This study provides the first insights into the potential associations between standard-of-care CT imaging traits and texture measures of intra- and inter-site heterogeneity, and the abundance of several proteins. Key Points: • CT-based texture features of intra- and inter-site tumor heterogeneity correlate with the abundance of several proteins in patients with HGSOC. • CT imaging traits correlate with protein abundance in patients with HGSOC
MR Imaging Radiomics Signatures for Predicting the Risk of Breast Cancer Recurrence as Given by Research Versions of MammaPrint, Oncotype DX, and PAM50 Gene Assays
To investigate relationships between computer-extracted breast magnetic resonance (MR) imaging phenotypes with multigene assays of MammaPrint, Oncotype DX, and PAM50 to assess the role of radiomics in evaluating the risk of breast cancer recurrence
Recommended from our members
Integration of proteomics with CT-based qualitative and radiomic features in high-grade serous ovarian cancer patients: an exploratory analysis
Abstract: Objectives: To investigate the association between CT imaging traits and texture metrics with proteomic data in patients with high-grade serous ovarian cancer (HGSOC). Methods: This retrospective, hypothesis-generating study included 20 patients with HGSOC prior to primary cytoreductive surgery. Two readers independently assessed the contrast-enhanced computed tomography (CT) images and extracted 33 imaging traits, with a third reader adjudicating in the event of a disagreement. In addition, all sites of suspected HGSOC were manually segmented texture features which were computed from each tumor site. Three texture features that represented intra- and inter-site tumor heterogeneity were used for analysis. An integrated analysis of transcriptomic and proteomic data identified proteins with conserved expression between primary tumor sites and metastasis. Correlations between protein abundance and various CT imaging traits and texture features were assessed using the Kendall tau rank correlation coefficient and the Mann-Whitney U test, whereas the area under the receiver operating characteristic curve (AUC) was reported as a metric of the strength and the direction of the association. P values < 0.05 were considered significant. Results: Four proteins were associated with CT-based imaging traits, with the strongest correlation observed between the CRIP2 protein and disease in the mesentery (p < 0.001, AUC = 0.05). The abundance of three proteins was associated with texture features that represented intra-and inter-site tumor heterogeneity, with the strongest negative correlation between the CKB protein and cluster dissimilarity (p = 0.047, τ = 0.326). Conclusion: This study provides the first insights into the potential associations between standard-of-care CT imaging traits and texture measures of intra- and inter-site heterogeneity, and the abundance of several proteins. Key Points: • CT-based texture features of intra- and inter-site tumor heterogeneity correlate with the abundance of several proteins in patients with HGSOC. • CT imaging traits correlate with protein abundance in patients with HGSOC
Doctor of Philosophy
dissertationOver 40 years ago, the first computer simulation of a protein was reported: the atomic motions of a 58 amino acid protein were simulated for few picoseconds. With today's supercomputers, simulations of large biomolecular systems with hundreds of thousands of atoms can reach biologically significant timescales. Through dynamics information biomolecular simulations can provide new insights into molecular structure and function to support the development of new drugs or therapies. While the recent advances in high-performance computing hardware and computational methods have enabled scientists to run longer simulations, they also created new challenges for data management. Investigators need to use local and national resources to run these simulations and store their output, which can reach terabytes of data on disk. Because of the wide variety of computational methods and software packages available to the community, no standard data representation has been established to describe the computational protocol and the output of these simulations, preventing data sharing and collaboration. Data exchange is also limited due to the lack of repositories and tools to summarize, index, and search biomolecular simulation datasets. In this dissertation a common data model for biomolecular simulations is proposed to guide the design of future databases and APIs. The data model was then extended to a controlled vocabulary that can be used in the context of the semantic web. Two different approaches to data management are also proposed. The iBIOMES repository offers a distributed environment where input and output files are indexed via common data elements. The repository includes a dynamic web interface to summarize, visualize, search, and download published data. A simpler tool, iBIOMES Lite, was developed to generate summaries of datasets hosted at remote sites where user privileges and/or IT resources might be limited. These two informatics-based approaches to data management offer new means for the community to keep track of distributed and heterogeneous biomolecular simulation data and create collaborative networks