5 research outputs found
Gene Expression based Survival Prediction for Cancer Patients: A Topic Modeling Approach
Cancer is one of the leading cause of death, worldwide. Many believe that
genomic data will enable us to better predict the survival time of these
patients, which will lead to better, more personalized treatment options and
patient care. As standard survival prediction models have a hard time coping
with the high-dimensionality of such gene expression (GE) data, many projects
use some dimensionality reduction techniques to overcome this hurdle. We
introduce a novel methodology, inspired by topic modeling from the natural
language domain, to derive expressive features from the high-dimensional GE
data. There, a document is represented as a mixture over a relatively small
number of topics, where each topic corresponds to a distribution over the
words; here, to accommodate the heterogeneity of a patient's cancer, we
represent each patient (~document) as a mixture over cancer-topics, where each
cancer-topic is a mixture over GE values (~words). This required some
extensions to the standard LDA model eg: to accommodate the "real-valued"
expression values - leading to our novel "discretized" Latent Dirichlet
Allocation (dLDA) procedure. We initially focus on the METABRIC dataset, which
describes breast cancer patients using the r=49,576 GE values, from
microarrays. Our results show that our approach provides survival estimates
that are more accurate than standard models, in terms of the standard
Concordance measure. We then validate this approach by running it on the
Pan-kidney (KIPAN) dataset, over r=15,529 GE values - here using the mRNAseq
modality - and find that it again achieves excellent results. In both cases, we
also show that the resulting model is calibrated, using the recent
"D-calibrated" measure. These successes, in two different cancer types and
expression modalities, demonstrates the generality, and the effectiveness, of
this approach
Cross collection aspect based opinion mining using topic models
Aspect based opinion mining is the automated science of identifying and extracting sentiments associated to individual aspects in a text document. Over the years this science has emerged to be a cornerstone for analysis of public opinion on consumer products and social-political events. The task is more fruitful and likewise more challenging when comparison of opinion on aspects of multiple entities is of essence. Different methods in literature have attempted to extract aspects in a single collection or collection by collection across multiple collection. These approaches do not appeal when number of collections is large and hence su er significant performance drawbacks. In this work we perform aspect based opinion mining across contrasting multiple collections, simultaneously. We utilize existing cross collection topic models to identify topics that prevail across multiple collections, we propose a topic refinement algorithm that successfully converts these topics into semantically coherent and visually identifiable aspects. We compare the quality of aspects extracted by our algorithm to topics returned by two cross collection topic models. Finally we evaluate the accuracy of sentiment scores when measured over features extracted by the two cross collection topic models. We conclude that with proposed improvements cross collection topic models outperform state of art approaches in aspect based sentiment analysis
Identification of biomarkers for the management of human prostate cancer
A critical problem in the clinical management of prostate cancer is that it shows high
intra- and inter-tumoural heterogeneity. As a result, accurate prediction of individual
cancer behaviour is not achievable at the time of diagnosis, leading to substantial
overtreatment. It remains an enigma that, in contrast to other cancers, no molecular
biomarkers which define robust subtypes of prostate cancer with distinct clinical
outcomes have been discovered.
In the first part of this study, using data from exon microarrays, we developed a novel
method that can identify transcriptional alterations within genes. The alterations might
be the result of chromosomal rearrangements, such as translocations, and deletions, or
of other abnormalities, such as read-through transcription and alternative transcriptional
initiation sites. Using data from two independent datasets we identify several candidate
alterations that are constantly correlated with the biochemical failure or that are linked
to the development of metastasis.
In the second part of the study we illustrate the application of an unsupervised
Bayesian procedure, which identifies a subtype of the disease in five prostate cancer
transcriptome datasets. Cancers assigned to this subtype (designated DESNT cancers)
are characterized by low expression of a core set of 45 genes. For the four datasets
with linked PSA failure data following prostatectomy, patients with DESNT cancer
exhibited poor outcome relative to other patients (p = 2.65 ・ 10−5, p = 4.28 ・ 10−5, p =
2.98 ・ 10−8 and p = 1.22 ・ 10−3). The DESNT cancers are not linked with the presence
of any particular class of genetic mutation, including ETS gene status. However, the
methylation analysis reveals a possible role of epigenetic changes in the generation of
the DESNT subtype. Our results demonstrate the existence of a novel poor prognosis
category of human prostate cancer and will assist in the targeting of therapy, helping
avoid treatment-associated morbidity in men with indolent disease
Bayesian multi-topic microarray analysis with hyperparameter reestimation
This paper provides a new method for multi-topic Bayesian analysis for microarray data. Our method achieves a further maximization of lower bounds in a marginalized variational Bayesian inference (MVB) for Latent Process Decomposition (LPD), which is an effective probabilistic model for microarray data. In our method, hyperparameters in LPD are updated by empirical Bayes point estimation. The experiments based on microarray data of realistically large size show efficiency of our hyperparameter reestimation technique.Advanced Data Mining and Applications: 5th International Conference, ADMA 2009, Beijing, China, August 17-19, 2009. Proceeding