1,403 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Bioinformatic-driven search for metabolic biomarkers in disease

    Get PDF
    The search and validation of novel disease biomarkers requires the complementary power of professional study planning and execution, modern profiling technologies and related bioinformatics tools for data analysis and interpretation. Biomarkers have considerable impact on the care of patients and are urgently needed for advancing diagnostics, prognostics and treatment of disease. This survey article highlights emerging bioinformatics methods for biomarker discovery in clinical metabolomics, focusing on the problem of data preprocessing and consolidation, the data-driven search, verification, prioritization and biological interpretation of putative metabolic candidate biomarkers in disease. In particular, data mining tools suitable for the application to omic data gathered from most frequently-used type of experimental designs, such as case-control or longitudinal biomarker cohort studies, are reviewed and case examples of selected discovery steps are delineated in more detail. This review demonstrates that clinical bioinformatics has evolved into an essential element of biomarker discovery, translating new innovations and successes in profiling technologies and bioinformatics to clinical application

    Evaluating the Quality of Research into a Single Prognostic Biomarker: A Systematic Review and Meta-analysis of 83 Studies of C-Reactive Protein in Stable Coronary Artery Disease

    Get PDF
    Background Systematic evaluations of the quality of research on a single prognostic biomarker are rare. We sought to evaluate the quality of prognostic research evidence for the association of C-reactive protein (CRP) with fatal and nonfatal events among patients with stable coronary disease. Methods and Findings We searched MEDLINE (1966 to 2009) and EMBASE (1980 to 2009) and selected prospective studies of patients with stable coronary disease, reporting a relative risk for the association of CRP with death and nonfatal cardiovascular events. We included 83 studies, reporting 61,684 patients and 6,485 outcome events. No study reported a prespecified statistical analysis protocol; only two studies reported the time elapsed (in months or years) between initial presentation of symptomatic coronary disease and inclusion in the study. Studies reported a median of seven items (of 17) from the REMARK reporting guidelines, with no evidence of change over time. The pooled relative risk for the top versus bottom third of CRP distribution was 1.97 (95% confidence interval [CI] 1.78–2.17), with substantial heterogeneity (I2 = 79.5). Only 13 studies adjusted for conventional risk factors (age, sex, smoking, obesity, diabetes, and low-density lipoprotein [LDL] cholesterol) and these had a relative risk of 1.65 (95% CI 1.39–1.96), I2 = 33.7. Studies reported ten different ways of comparing CRP values, with weaker relative risks for those based on continuous measures. Adjusting for publication bias (for which there was strong evidence, Egger's p<0.001) using a validated method reduced the relative risk to 1.19 (95% CI 1.13–1.25). Only two studies reported a measure of discrimination (c-statistic). In 20 studies the detection rate for subsequent events could be calculated and was 31% for a 10% false positive rate, and the calculated pooled c-statistic was 0.61 (0.57–0.66). Conclusion Multiple types of reporting bias, and publication bias, make the magnitude of any independent association between CRP and prognosis among patients with stable coronary disease sufficiently uncertain that no clinical practice recommendations can be made. Publication of prespecified statistical analytic protocols and prospective registration of studies, among other measures, might help improve the quality of prognostic biomarker research

    Conditional Tabular Generative Adversarial Net for Enhancing Ensemble Classifiers in Sepsis Diagnosis

    Get PDF
    Antibiotic-resistant bacteria have proliferated at an alarming rate as a result of the extensive use of antibiotics and the paucity of new medication research. The possibility that an antibiotic-resistant bacterial infection would progress to sepsis is one of the major collateral problems affecting people with this condition. 31,000 lives were lost due to sepsis in England with costs about two billion pounds annually. This research aims to develop and evaluate several classification approaches to improve predicting sepsis and reduce the tendency of underdiagnosis in computer-aided predictive tools. This research employs medical data sets for patients diagnosed with sepsis, it analyses the efficacy of ensemble machine learning techniques compared to non ensemble machine learning techniques and the significance of data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation in producing reliable diagnosis. The average F Score obtained by the non-ensemble models trained in this paper is 0.83 compared to the ensemble techniques average of 0.94. Nonensemble techniques, such as Decision Tree, achieved an F score of 0.90, an AUC of 0.90 and an accuracy of 90%. Histogram-based Gradient Boosting Classification Tree achieved an F score of 0.96, an AUC of 0.96 and an accuracy of 95%, surpassing the other models tested. Additionally, when compared to the current state of the art sepsis prediction models, the models developed in this study demonstrated higher average performance in all metrics, indicating reduced bias and improved robustness through data balancing and Conditional Tabular Generative Adversarial Nets for data augmentation. The study revealed that data balancing and augmentation on the ensemble machine learning algorithms boost the efficacy of clinical predictive models and can help clinics decide which data types are most important when examining patients and diagnosing sepsis early through intelligent human-machine interface

    Identification of a gene signature for discriminating metastatic from primary melanoma using a molecular interaction network approach

    Get PDF
    Understanding the biological factors that are characteristic of metastasis in melanoma remains a key approach to improving treatment. In this study, we seek to identify a gene signature of metastatic melanoma. We configured a new network-based computational pipeline, combined with a machine learning method, to mine publicly available transcriptomic data from melanoma patient samples. Our method is unbiased and scans a genome-wide protein-protein interaction network using a novel formulation for network scoring. Using this, we identify the most influential, differentially expressed nodes in metastatic as compared to primary melanoma. We evaluated the shortlisted genes by a machine learning method to rank them by their discriminatory capacities. From this, we identified a panel of 6 genes, ALDH1A1, HSP90AB1, KIT, KRT16, SPRR3 and TMEM45B whose expression values discriminated metastatic from primary melanoma (87% classification accuracy). In an independent transcriptomic data set derived from 703 primary melanomas, we showed that all six genes were significant in predicting melanoma specific survival (MSS) in a univariate analysis, which was also consistent with AJCC staging. Further, 3 of these genes, HSP90AB1, SPRR3 and KRT16 remained significant predictors of MSS in a joint analysis (HR = 2.3, P = 0.03) although, HSP90AB1 (HR = 1.9, P = 2 × 10−4) alone remained predictive after adjusting for clinical predictors

    Identification of Novel Cancer-Related Genes with a Prognostic Role Using Gene Expression and Protein-Protein Interaction Network Data

    Get PDF
    Early cancer diagnosis and prognosis prediction are necessary for cancer patients. Effective identification of cancer-related genes and biomarkers and survival prediction for cancer patients would facilitate personalized treatment of cancer patients. This study aimed to investigate a method for integrating data regarding gene expression and protein-protein interaction networks to identify cancer-related prognostic genes via random walk with restart algorithm and survival analysis. Known cancer-related genes in protein-protein interaction networks were considered seed genes, and the random walk algorithm was used to identify candidate cancer-related genes. Thereafter, using the univariant Cox regression model, gene expression data were screened to identify survival-related genes. Furthermore, candidate genes and survival-related genes were screened to identify cancer-related prognostic genes. Finally, the effectiveness of the method was verified through gene function analysis and survival prediction. The results indicate that the cancer-related genes can be considered prognostic cancer biomarkers and provide a basis for cancer diagnosis
    corecore