5 research outputs found

    Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach

    Full text link
    Cancer is one of the leading cause of death, worldwide. Many believe that genomic data will enable us to better predict the survival time of these patients, which will lead to better, more personalized treatment options and patient care. As standard survival prediction models have a hard time coping with the high-dimensionality of such gene expression (GE) data, many projects use some dimensionality reduction techniques to overcome this hurdle. We introduce a novel methodology, inspired by topic modeling from the natural language domain, to derive expressive features from the high-dimensional GE data. There, a document is represented as a mixture over a relatively small number of topics, where each topic corresponds to a distribution over the words; here, to accommodate the heterogeneity of a patient's cancer, we represent each patient (~document) as a mixture over cancer-topics, where each cancer-topic is a mixture over GE values (~words). This required some extensions to the standard LDA model eg: to accommodate the "real-valued" expression values - leading to our novel "discretized" Latent Dirichlet Allocation (dLDA) procedure. We initially focus on the METABRIC dataset, which describes breast cancer patients using the r=49,576 GE values, from microarrays. Our results show that our approach provides survival estimates that are more accurate than standard models, in terms of the standard Concordance measure. We then validate this approach by running it on the Pan-kidney (KIPAN) dataset, over r=15,529 GE values - here using the mRNAseq modality - and find that it again achieves excellent results. In both cases, we also show that the resulting model is calibrated, using the recent "D-calibrated" measure. These successes, in two different cancer types and expression modalities, demonstrates the generality, and the effectiveness, of this approach

    Cross collection aspect based opinion mining using topic models

    Get PDF
    Aspect based opinion mining is the automated science of identifying and extracting sentiments associated to individual aspects in a text document. Over the years this science has emerged to be a cornerstone for analysis of public opinion on consumer products and social-political events. The task is more fruitful and likewise more challenging when comparison of opinion on aspects of multiple entities is of essence. Different methods in literature have attempted to extract aspects in a single collection or collection by collection across multiple collection. These approaches do not appeal when number of collections is large and hence su er significant performance drawbacks. In this work we perform aspect based opinion mining across contrasting multiple collections, simultaneously. We utilize existing cross collection topic models to identify topics that prevail across multiple collections, we propose a topic refinement algorithm that successfully converts these topics into semantically coherent and visually identifiable aspects. We compare the quality of aspects extracted by our algorithm to topics returned by two cross collection topic models. Finally we evaluate the accuracy of sentiment scores when measured over features extracted by the two cross collection topic models. We conclude that with proposed improvements cross collection topic models outperform state of art approaches in aspect based sentiment analysis

    Latent Markovian Modelling and Clustering for Continuous Data Sequences

    Get PDF

    Identification of biomarkers for the management of human prostate cancer

    Get PDF
    A critical problem in the clinical management of prostate cancer is that it shows high intra- and inter-tumoural heterogeneity. As a result, accurate prediction of individual cancer behaviour is not achievable at the time of diagnosis, leading to substantial overtreatment. It remains an enigma that, in contrast to other cancers, no molecular biomarkers which define robust subtypes of prostate cancer with distinct clinical outcomes have been discovered. In the first part of this study, using data from exon microarrays, we developed a novel method that can identify transcriptional alterations within genes. The alterations might be the result of chromosomal rearrangements, such as translocations, and deletions, or of other abnormalities, such as read-through transcription and alternative transcriptional initiation sites. Using data from two independent datasets we identify several candidate alterations that are constantly correlated with the biochemical failure or that are linked to the development of metastasis. In the second part of the study we illustrate the application of an unsupervised Bayesian procedure, which identifies a subtype of the disease in five prostate cancer transcriptome datasets. Cancers assigned to this subtype (designated DESNT cancers) are characterized by low expression of a core set of 45 genes. For the four datasets with linked PSA failure data following prostatectomy, patients with DESNT cancer exhibited poor outcome relative to other patients (p = 2.65 ・ 10−5, p = 4.28 ・ 10−5, p = 2.98 ・ 10−8 and p = 1.22 ・ 10−3). The DESNT cancers are not linked with the presence of any particular class of genetic mutation, including ETS gene status. However, the methylation analysis reveals a possible role of epigenetic changes in the generation of the DESNT subtype. Our results demonstrate the existence of a novel poor prognosis category of human prostate cancer and will assist in the targeting of therapy, helping avoid treatment-associated morbidity in men with indolent disease

    Bayesian multi-topic microarray analysis with hyperparameter reestimation

    Get PDF
    This paper provides a new method for multi-topic Bayesian analysis for microarray data. Our method achieves a further maximization of lower bounds in a marginalized variational Bayesian inference (MVB) for Latent Process Decomposition (LPD), which is an effective probabilistic model for microarray data. In our method, hyperparameters in LPD are updated by empirical Bayes point estimation. The experiments based on microarray data of realistically large size show efficiency of our hyperparameter reestimation technique.Advanced Data Mining and Applications: 5th International Conference, ADMA 2009, Beijing, China, August 17-19, 2009. Proceeding
    corecore