17,955 research outputs found

    Machine Learning and Integrative Analysis of Biomedical Big Data.

    Get PDF
    Recent developments in high-throughput technologies have accelerated the accumulation of massive amounts of omics data from multiple sources: genome, epigenome, transcriptome, proteome, metabolome, etc. Traditionally, data from each source (e.g., genome) is analyzed in isolation using statistical and machine learning (ML) methods. Integrative analysis of multi-omics and clinical data is key to new biomedical discoveries and advancements in precision medicine. However, data integration poses new computational challenges as well as exacerbates the ones associated with single-omics studies. Specialized computational approaches are required to effectively and efficiently perform integrative analysis of biomedical data acquired from diverse modalities. In this review, we discuss state-of-the-art ML-based approaches for tackling five specific computational challenges associated with integrative analysis: curse of dimensionality, data heterogeneity, missing data, class imbalance and scalability issues

    Predicting breast cancer risk, recurrence and survivability

    Full text link
    This thesis focuses on predicting breast cancer at early stages by using machine learning algorithms based on biological datasets. The accuracy of those algorithms has been improved to enable the physicians to enhance the success of treatment, thus saving lives and avoiding several further medical tests

    Chemotherapy-Response Monitoring of Breast Cancer Patients Using Quantitative Ultrasound-Based Intra-Tumour Heterogeneities

    Get PDF
    © 2017 The Author(s). Anti-cancer therapies including chemotherapy aim to induce tumour cell death. Cell death introduces alterations in cell morphology and tissue micro-structures that cause measurable changes in tissue echogenicity. This study investigated the effectiveness of quantitative ultrasound (QUS) parametric imaging to characterize intra-tumour heterogeneity and monitor the pathological response of breast cancer to chemotherapy in a large cohort of patients (n = 100). Results demonstrated that QUS imaging can non-invasively monitor pathological response and outcome of breast cancer patients to chemotherapy early following treatment initiation. Specifically, QUS biomarkers quantifying spatial heterogeneities in size, concentration and spacing of acoustic scatterers could predict treatment responses of patients with cross-validated accuracies of 82 ± 0.7%, 86 ± 0.7% and 85 ± 0.9% and areas under the receiver operating characteristic (ROC) curve of 0.75 ± 0.1, 0.80 ± 0.1 and 0.89 ± 0.1 at 1, 4 and 8 weeks after the start of treatment, respectively. The patients classified as responders and non-responders using QUS biomarkers demonstrated significantly different survivals, in good agreement with clinical and pathological endpoints. The results form a basis for using early predictive information on survival-linked patient response to facilitate adapting standard anti-cancer treatments on an individual patient basis

    Predicting Outcomes of Hormone and Chemotherapy in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) Study by Biochemically-inspired Machine Learning

    Get PDF
    Genomic aberrations and gene expression-defined subtypes in the large METABRIC patient cohort have been used to stratify and predict survival. The present study used normalized gene expression signatures of paclitaxel drug response to predict outcome for different survival times in METABRIC patients receiving hormone (HT) and, in some cases, chemotherapy (CT) agents. This machine learning method, which distinguishes sensitivity vs. resistance in breast cancer cell lines and validates predictions in patients; was also used to derive gene signatures of other HT (tamoxifen) and CT agents (methotrexate, epirubicin, doxorubicin, and 5-fluorouracil) used in METABRIC. Paclitaxel gene signatures exhibited the best performance, however the other agents also predicted survival with acceptable accuracies. A support vector machine (SVM) model of paclitaxel response containing genes ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2, SLCO1B3, TUBB1, TUBB4A, and TUBB4B was 78.6% accurate in predicting survival of 84 patients treated with both HT and CT (median survival ≥ 4.4 yr). Accuracy was lower (73.4%) in 304 untreated patients. The performance of other machine learning approaches was also evaluated at different survival thresholds. Minimum redundancy maximum relevance feature selection of a paclitaxel-based SVM classifier based on expression of genes BCL2L1, BBC3, FGF2, FN1, and TWIST1 was 81.1% accurate in 53 CT patients. In addition, a random forest (RF) classifier using a gene signature ( ABCB1, ABCB11, ABCC1, ABCC10, BAD, BBC3, BCL2, BCL2L1, BMF, CYP2C8, CYP3A4, MAP2, MAP4, MAPT, NR1I2,SLCO1B3, TUBB1, TUBB4A, and TUBB4B) predicted \u3e3-year survival with 85.5% accuracy in 420 HT patients. A similar RF gene signature showed 82.7% accuracy in 504 patients treated with CT and/or HT. These results suggest that tumor gene expression signatures refined by machine learning techniques can be useful for predicting survival after drug therapies

    Correcting for selection bias via cross-validation in the classification of microarray data

    Full text link
    There is increasing interest in the use of diagnostic rules based on microarray data. These rules are formed by considering the expression levels of thousands of genes in tissue samples taken on patients of known classification with respect to a number of classes, representing, say, disease status or treatment strategy. As the final versions of these rules are usually based on a small subset of the available genes, there is a selection bias that has to be corrected for in the estimation of the associated error rates. We consider the problem using cross-validation. In particular, we present explicit formulae that are useful in explaining the layers of validation that have to be performed in order to avoid improperly cross-validated estimates.Comment: Published in at http://dx.doi.org/10.1214/193940307000000284 the IMS Collections (http://www.imstat.org/publications/imscollections.htm) by the Institute of Mathematical Statistics (http://www.imstat.org
    • …
    corecore