21 research outputs found

    J Biomed Inform

    Get PDF
    Objective:In machine learning, it is evident that the classification of the task performance increases if bootstrap aggregation (bagging) is applied. However, the bagging of deep neural networks takes tremendous amounts of computational resources and training time. The research question that we aimed to answer in this research is whether we could achieve higher task performance scores and accelerate the training by dividing a problem into sub-problems.Materials and Methods:The data used in this study consist of free text from electronic cancer pathology reports. We applied bagging and partitioned data training using Multi-Task Convolutional Neural Network (MT-CNN) and Multi-Task Hierarchical Convolutional Attention Network (MT-HCAN) classifiers. We split a big problem into 20 sub-problems, resampled the training cases 2,000 times, and trained the deep learning model for each bootstrap sample and each sub-problem\u2014thus, generating up to 40,000 models. We performed the training of many models concurrently in a high-performance computing environment at Oak Ridge National Laboratory (ORNL).Results:We demonstrated that aggregation of the models improves task performance compared with the single-model approach, which is consistent with other research studies; and we demonstrated that the two proposed partitioned bagging methods achieved higher classification accuracy scores on four tasks. Notably, the improvements were significant for the extraction of cancer histology data, which had more than 500 class labels in the task; these results show that data partition may alleviate the complexity of the task. On the contrary, the methods did not achieve superior scores for the tasks of site and subsite classification. Intrinsically, since data partitioning was based on the primary cancer site, the accuracy depended on the determination of the partitions, which needs further investigation and improvement.Conclusion:Results in this research demonstrate that 1. The data partitioning and bagging strategy achieved higher performance scores. 2. We achieved faster training leveraged by the high-performance Summit supercomputer at ORNL.20202021-01-13T00:00:00ZHHSN261201800013C/CA/NCI NIH HHSUnited States/HHSN261201800016C/CA/NCI NIH HHSUnited States/U58 DP003907/DP/NCCDPHP CDC HHSUnited States/HHSN261201800007C/CA/NCI NIH HHSUnited States/P30 CA177558/CA/NCI NIH HHSUnited States/HHSN261201300021C/CA/NCI NIH HHSUnited States/HHSN261201800013I/CA/NCI NIH HHSUnited States/P30 CA042014/CA/NCI NIH HHSUnited States/32919043PMC82765801002

    Chosen abstracts of the Hungarian Society of Nuclear Medicine Congress, Debrecen, 2009

    Get PDF

    Diet and colon cancer: investigation into the role of cholesterol in an experimental model

    Get PDF
    Epidemiological studies have implicated dietary fats in the pathogenesis of colon cancer. More specifically cholesterol in the diet has been shown to be co-carcinogenic in animals and there is indirect evidence for a similar role in man. This study investigates how cholesterol may exert such an influence using the dimethylhydrazine (DMH)-induced rat colon cancer model. Changes in serum and hepatic lipids and faecal acid and neutral sterols were measured in rats sequentially killed following DMH treatment. Rats were fed on a standard diet (41B), an elemental cholesterol-free diet Vivonex (V) or Vivonex + cholesterol (VCh). Saline-injected and dietary controls were included. The data was related to the time of sacrifice and tumour load at that time. Rats fed 41B had a significantly faster tumour development rate than those fed V whilst the VCh fed rats were intermediate. Rats with tumours had significantly higher levels of serum free cholesterol, serum phospholipids were decreased and total serum cholesterol was unchanged. Hepatic lipids were little effected by the presence of tumours but V feeding resulted in fatty liver. No changes were seen in faecal bile acids in any of the three dietary groups. Neutral sterols, in particular the ratio of cholesterol to its metabolites, coprostanol and coprostanone show an upward trend in rats with tumours but this did not reach statistical significance. The % cholesterol degradation was significantly lower in the 41B group compared to the other two whilst the VCh fed group was intermediate to and significantly different from the other two. These findings support the theory that dietary cholesterol is implicated in the pathogenesis of colon cancer. It is postulated that dietary cholesterol is diverted into the colon where it exerts its co-carcinogenic action. It is also postulated that the Vivonex diet protects via the liver and the pattern of cholesterol breakdown

    Scalable Profiling and Visualization for Characterizing Microbiomes

    Get PDF
    Metagenomics is the study of the combined genetic material found in microbiome samples, and it serves as an instrument for studying microbial communities, their biodiversities, and the relationships to their host environments. Creating, interpreting, and understanding microbial community profiles produced from microbiome samples is a challenging task as it requires large computational resources along with innovative techniques to process and analyze datasets that can contain terabytes of information. The community profiles are critical because they provide information about what microorganisms are present in the sample, and in what proportions. This is particularly important as many human diseases and environmental disasters are linked to changes in microbiome compositions. In this work we propose novel approaches for the creation and interpretation of microbial community profiles. This includes: (a) a cloud-based, distributed computational system that generates detailed community profiles by processing large DNA sequencing datasets against large reference genome collections, (b) the creation of Microbiome Maps: interpretable, high-resolution visualizations of community profiles, and (c) a machine learning framework for characterizing microbiomes from the Microbiome Maps that delivers deep insights into microbial communities. The proposed approaches have been implemented in three software solutions: Flint, a large scale profiling framework for commercial cloud systems that can process millions of DNA sequencing fragments and produces microbial community profiles at a very low cost; Jasper, a novel method for creating Microbiome Maps, which visualizes the abundance profiles based on the Hilbert curve; and Amber, a machine learning framework for characterizing microbiomes using the Microbiome Maps generated by Jasper with high accuracy. Results show that Flint scales well for reference genome collections that are an order of magnitude larger than those used by competing tools, while using less than a minute to profile a million reads on the cloud with 65 commodity processors. Microbiome maps produced by Jasper are compact, scalable representations of extremely complex microbial community profiles with numerous demonstrable advantages, including the ability to display latent relationships that are hard to elicit. Finally, experiments show that by using images as input instead of unstructured tabular input, the carefully engineered software, Amber, can outperform other sophisticated machine learning tools available for classification of microbiomes
    corecore