15 research outputs found

    GLAD: A mixed-membership model for heterogeneous tumor subtype classification

    Get PDF
    MOTIVATION: Genomic analyses of many solid cancers have demonstrated extensive genetic heterogeneity between as well as within individual tumors. However, statistical methods for classifying tumors by subtype based on genomic biomarkers generally entail an all-or-none decision, which may be misleading for clinical samples containing a mixture of subtypes and/or normal cell contamination. RESULTS: We have developed a mixed-membership classification model, called glad, that simultaneously learns a sparse biomarker signature for each subtype as well as a distribution over subtypes for each sample. We demonstrate the accuracy of this model on simulated data, in-vitro mixture experiments, and clinical samples from the Cancer Genome Atlas (TCGA) project. We show that many TCGA samples are likely a mixture of multiple subtypes

    Binary Linear Classification and Feature Selection via Generalized Approximate Message Passing

    Full text link
    For the problem of binary linear classification and feature selection, we propose algorithmic approaches to classifier design based on the generalized approximate message passing (GAMP) algorithm, recently proposed in the context of compressive sensing. We are particularly motivated by problems where the number of features greatly exceeds the number of training examples, but where only a few features suffice for accurate classification. We show that sum-product GAMP can be used to (approximately) minimize the classification error rate and max-sum GAMP can be used to minimize a wide variety of regularized loss functions. Furthermore, we describe an expectation-maximization (EM)-based scheme to learn the associated model parameters online, as an alternative to cross-validation, and we show that GAMP's state-evolution framework can be used to accurately predict the misclassification rate. Finally, we present a detailed numerical study to confirm the accuracy, speed, and flexibility afforded by our GAMP-based approaches to binary linear classification and feature selection

    Deep learning for biomarker and outcome prediction in cancer

    Get PDF
    Machine learning in the form of deep learning (DL) has recently transformed how computer vision tasks are solved in numerous domains, including image-based medical diagnostics. DL-based methods have the potential to enable more precise quantitative characterisation of cancer tissue specimens routinely analysed in clinical pathology laboratories for diagnostic purposes. Computer-assisted tissue analysis within pathology is not restricted to the quantification and classification of specific tissue entities. DL allows to directly address clinically relevant questions related to the prediction of cancer outcome and efficacy of cancer treatment. This thesis focused on the following crucial research question: is it possible to predict cancer outcome, biomarker status, and treatment efficacy directly from the tissue morphology using DL without any special stains or molecular methods? To address this question, we utilised digitised hematoxylin-eosin-stained (H&E) tissue specimens from two common types of solid tumours – breast and colorectal cancer. Tissue specimens and corresponding clinical data were retrieved from retrospective patient series collected in Finland. First, a DL-based algorithm was developed to extract prognostic information for patients diagnosed with colorectal cancer, using digitised H&E images only. Computational analysis of tumour tissue samples with DL demonstrated a superhuman performance and surpassed a consensus of three expert pathologists in predicting five-year colorectal cancer-specific outcomes. Then, outcome prediction was studied in two independent breast cancer patient series. Particularly, generalisation of the trained algorithms to previously unseen patients from an independent series was examined on the large whole-slide tumour specimens. In breast cancer outcome prediction, we investigated a multitask learning approach by combining outcome and biomarker-supervised learning. Our experiments in breast and colorectal cancer show that tissue morphological features learned by the DL models supervised by patient outcome provided prognostic information independent of established prognostic factors such as histological grade, tumour size and lymph nodes status. Additionally, the accuracy of DL-based predictors was compared to other prognostic characteristics evaluated by pathologists in breast cancer, including mitotic count, nuclear pleomorphism, tubules formation, tumour necrosis and tumour-infiltrating lymphocytes. We further assessed if molecular biomarkers such as hormone receptor status and ERBB2 gene amplification can be predicted from H&E- stained tissue samples obtained at the time of diagnosis from patients with breast cancer and showed that molecular alterations are reflected in the basic tissue morphology and can be captured with DL. Finally, we studied how morphological features of breast cancer can be linked to molecularly targeted treatment response. The results showed that ERBB2-associated morphology extracted with DL correlated with the efficacy of adjuvant anti-ERBB2 treatment and can contribute to treatment-predictive information in breast cancer. Taken together, this thesis shows the potential utility of DL in tissue-based characterisation of cancer for prediction of cancer outcome, tumour molecular status and efficacy of molecularly targeted treatments. DL-based analysis of the basic tissue morphology can provide significant predictive information and be combined with clinicopathological and molecular data to improve the accuracy of cancer diagnostics.Koneoppiminen syväoppimisen (SO) muodossa on muuttanut, miten tietokonenäön tehtävät ratkaistaan monilla toimialueilla, kuten lääketieteellisessä kuvantamisdiagnostiikkassa. SO-perusteiset menetelmät mahdollistavat tarkemman kvantitatiivisen karakterisoinnin syöpäkas- vainnäytteistä, jotka rutiinisti analysoidaan kliinisen patologian laboratorioissa diagnosointia varten. Tietokoneavusteinen kudosanalyysi ei rajoitu ainoastaan tiettyjen kudosentiteettien määrittämiseen ja luokitteluun. SO:n avulla voidaan suoraan tutkia syövän ennustetta ja syöpähoitojen vastetta. Tämä väitöskirja keskittyi tärkeään tutkimuskysymykseen: onko syövän ennuste, biomarkke- rien status ja hoidon tehokkuus mahdollista ennustaa SO:lla suoraan kudosmorfologiasta ilman erillisiä värjäyksiä tai molekyylibiologisia testejä? Vastataksemme tähän kysymykseen käytimme digitaalisia hematoksyliini-eosiini (H&E)-värjättyjä kudosnäytteitä kahdesta taval- lisesta kiinteästä kasvaimesta, rinta- ja paksusuolensyövästä. Kudosnäytteet ja niihin liittyvät kliiniset tiedot saatiin Suomessa kerätystä retrospektiivisestä potilassarjasta. Ensimmäiseksi kehitimme SO-algoritmin, jolla poimimme prognostisen tiedon paksusuolensyöpäpotilaista käyttäen ainoastaan digitalisoituja H&E-värjäyksiä. Kudosnäytteistä SO:lla tehty laskennalli- nen analyysi osoitti ihmisasiantuntijaa parempaa suorituskykyä ja ylitti kolmen patologian asiantuntijan antaman yksimielisen viiden vuoden ennusteen syövän lopputulemasta. Seu- raavaksi lopputuleman ennustamista tutkittiin kahdessa erillisessä rintasyöpäpotilassarjassa. Erityisesti tutkimme koulutetun algoritmin kykyä yleistää syöpäkudosten kokoleikkeistä, jotka olivat peräisin erillisestä algoritmille aiemmin tuntemattomasta potilassarjasta. Rin- tasyövän ennusteen suhteen tutkimme ”multitask learning”-lähestymistapaa yhdistämällä eloonjäämis- ja biomarkkeri-valvotun oppimisen. Tutkimuksemme rinta- ja paksusuolen- syövän osalta osoittavat, että SO-mallien avulla, jotka ovat opetettu potilaan eloonjäämisen mukaan, voidaan kudosmorfologian perusteella saada ennuste, joka on rippumaton aiemmin saatavilla olevista ennustetekijöistä, kuten histologisesta luokittelusta, kasvaimen koosta ja imusolmukkeiden statuksesta. Lisäksi SO-perusteisten ennusteiden tarkkuutta rintasyövässä verrattiin patologien arvioimiin syovän, kuten mitoosien lukumäärä, tuman pleomorfismiin, tubulusten tiehyeiden erilaistumisasteeseen, kasvaimen nekroosiin ja kasvaimen infiltroiviin lymfosyytteihin. Tutkimme myös, voiko rintasyöpäpotilailta syöpädiagnosoinnin yhteydessä saaduista H&E-värjätyistä kudosnäytteistä ennustaa molekulaarisia biomarkkereita, kuten hormonireseptoristatusta ja ERBB2-geenin monistumista. Tutkimuksemme osoitti, että mo- lekulaariset muutokset löytyvät myös kudosmorfologiasta ja ne voi tunnistaa SO:n avulla. Lopuksi tutkimme, miten rintasyövän morfologiset piirteet voidaan yhdistää hoitovasteeseen. Tutkimuksemme osoitti, että SO:n tunnistama ERBB2-positiivisen kasvaimen morfologia kor- reloi anti-ERBB2-liitännäishoitojen tehokkuuden kanssa ja SO:ta voi käyttää ennustamaan rintasyövän lääkevastetta. Tämän väitöskirjatyön tulokset osoittavat, että SO:n syöpäkudoksen karakterisointi voi olla hyödyllinen syövän ennusteen arvioinnissa sekä, molekulaarisen statuksen ja lääkevas- teen ennustamisessa. SO-perusteinen kudosmorfologinen analyysi voi antaa merkittävää tietoa syövän ennusteesta ja se voidaan yhdistää kliiniseen patologiaan ja molekulaariseen informaatioon tarkemman syöpädiagnosoinnin mahdollistamiseksi

    Bayesian Hierarchical Modelling for Image Processing Inverse Problems

    Get PDF
    The main motivation of this work is to review and extend some recent ideas in Bayesian inverse problems especially in the context of image processing problems. Often these problems are solved using deterministic optimisation algorithms. A Bayesian hierarchical model for total variation is presented in this thesis. This approach allows all the parameters of an inverse problem, including the “regularisation parameter”, to be estimated simultaneously from the data. The model is based on the characterisation of the Laplace density prior as a scale mixture of Gaussians. With different densities on the mixture variable, other total variation like regularisations are also obtained. An approximation of the resulting posterior mean is found using a variational Bayes method. In addition, algorithms for computing just the maximum a posteriori estimate, although not a fully Bayesian approach, are presented. The methods are illustrated with examples of image deblurring, image denoising and inpainting, the first of which being the main application of this thesis. Examples show that the methods generally work well for deblurring problems and the parameters can be succesfully estimated. Maximum a posteriori estimates preserve edges of “blocky” images well. The results given by variational Bayes method are more smooth than corresponding maximum a posteriori estimates which make it more suitable for problems where preserving the edges is not the top priority. As future work faster algorithms could be implemented as well as considering more complex and specialised models based on the ideas of this work
    corecore