9,446 research outputs found
Maximum entropy models in the analysis of genome-wide data in cancer research
This thesis studies the maximum entropy principle in statistical modelling. Applications are taken from the emerging field of cancer genomics.
We start with a short introduction to the biology of cancer in chapter 1. In chapter 2, we discuss general principles of statistical modelling. We discuss in detail the maximum entropy principle in statistical modelling. In particular, we show that many statistical models can be put in a unified framework based on the principle of maximum entropy, which maps them into problems of statistical mechanics. In chapter 3, we consider a particular maximum entropy model, the Ising model, in the context of the inverse Ising problem. We introduce a Bethe–Peierls approximation to the inverse Ising problem. We then also suggest a modification for the mean-field approximation to work at low temperatures. The following chapters apply maximum entropy models to different problems of cancer genomics. A direct application of the inverse Ising problem to gene copy-number data of cancer cells is described in chapter 4. In chapter 5, we extend the concepts of indirect correlations and direct couplings of the inverse Ising problem to investigate the influence of gene copy-numbers on gene expressions in cancer cells. We show that the correlations in gene expression need not be due to regulatory interactions between genes. Instead, correlations in gene expression of cancer cells can be induced by the correlations in their copy-numbers, which is due to the geometrical organisation of the genome. We show that a simple maximum entropy-model can disentangle copy-number-induced correlations and the so-called “bare-correlations” in gene expression, which capture the effect of regulatory interactions alone. Chapter 6 is devoted to cancer classification. We introduce a simple semi-supervised learning algorithm to train a mixture of paramagnetic models with Ising spins to classify cancer mutation profiles. We show that, with the capability of both learning from unlabelled samples and correcting mislabelled samples, this learning algorithm outperforms both the supervised and unsupervised learning algorithms. The two appendices A and B summarise recent studies on sensitivity and resistance of cancer cells to therapy.
The results of chapter 3 were published in H. C. Nguyen and J. Berg (2012a). “Bethe– Peierls approximation and the inverse Ising problem”. J. Stat. Mech. P03004; and H. C. Nguyen and J. Berg (2012b). “Mean-field theory for the inverse Ising problem at low temperatures”. Phys. Rev. Lett. 109, p. 50602. Some results of chapter 6 were published as a part of The Clinical Lung Cancer Genome Project (CLCGP) and Network Genomic Medicine (NGM) (2013). “A genomics-based classification of human lung tumors”. Science Transl. Med. 5.209, 209ra153
Combined population dynamics and entropy modelling supports patient stratification in chronic myeloid leukemia
Modelling the parameters of multistep carcinogenesis is key for a better understanding of cancer
progression, biomarker identification and the design of individualized therapies. Using chronic
myeloid leukemia (CML) as a paradigm for hierarchical disease evolution we show that combined
population dynamic modelling and CML patient biopsy genomic analysis enables patient stratification
at unprecedented resolution. Linking CD34+ similarity as a disease progression marker to patientderived
gene expression entropy separated established CML progression stages and uncovered
additional heterogeneity within disease stages. Importantly, our patient data informed model enables
quantitative approximation of individual patients’ disease history within chronic phase (CP) and
significantly separates “early” from “late” CP. Our findings provide a novel rationale for personalized
and genome-informed disease progression risk assessment that is independent and complementary to
conventional measures of CML disease burden and prognosis
Interactions between species introduce spurious associations in microbiome studies
Microbiota contribute to many dimensions of host phenotype, including
disease. To link specific microbes to specific phenotypes, microbiome-wide
association studies compare microbial abundances between two groups of samples.
Abundance differences, however, reflect not only direct associations with the
phenotype, but also indirect effects due to microbial interactions. We found
that microbial interactions could easily generate a large number of spurious
associations that provide no mechanistic insight. Using techniques from
statistical physics, we developed a method to remove indirect associations and
applied it to the largest dataset on pediatric inflammatory bowel disease. Our
method corrected the inflation of p-values in standard association tests and
showed that only a small subset of associations is directly linked to the
disease. Direct associations had a much higher accuracy in separating cases
from controls and pointed to immunomodulation, butyrate production, and the
brain-gut axis as important factors in the inflammatory bowel disease.Comment: 4 main text figures, 15 supplementary figures (i.e appendix) and 6
supplementary tables. Overall 49 pages including reference
On dynamic network entropy in cancer
The cellular phenotype is described by a complex network of molecular
interactions. Elucidating network properties that distinguish disease from the
healthy cellular state is therefore of critical importance for gaining
systems-level insights into disease mechanisms and ultimately for developing
improved therapies. By integrating gene expression data with a protein
interaction network to induce a stochastic dynamics on the network, we here
demonstrate that cancer cells are characterised by an increase in the dynamic
network entropy, compared to cells of normal physiology. Using a fundamental
relation between the macroscopic resilience of a dynamical system and the
uncertainty (entropy) in the underlying microscopic processes, we argue that
cancer cells will be more robust to random gene perturbations. In addition, we
formally demonstrate that gene expression differences between normal and cancer
tissue are anticorrelated with local dynamic entropy changes, thus providing a
systemic link between gene expression changes at the nodes and their local
network dynamics. In particular, we also find that genes which drive
cell-proliferation in cancer cells and which often encode oncogenes are
associated with reductions in the dynamic network entropy. In summary, our
results support the view that the observed increased robustness of cancer cells
to perturbation and therapy may be due to an increase in the dynamic network
entropy that allows cells to adapt to the new cellular stresses. Conversely,
genes that exhibit local flux entropy decreases in cancer may render cancer
cells more susceptible to targeted intervention and may therefore represent
promising drug targets.Comment: 10 pages, 3 figures, 4 tables. Submitte
Intra-tumour signalling entropy determines clinical outcome in breast and lung cancer.
The cancer stem cell hypothesis, that a small population of tumour cells are responsible for tumorigenesis and cancer progression, is becoming widely accepted and recent evidence has suggested a prognostic and predictive role for such cells. Intra-tumour heterogeneity, the diversity of the cancer cell population within the tumour of an individual patient, is related to cancer stem cells and is also considered a potential prognostic indicator in oncology. The measurement of cancer stem cell abundance and intra-tumour heterogeneity in a clinically relevant manner however, currently presents a challenge. Here we propose signalling entropy, a measure of signalling pathway promiscuity derived from a sample's genome-wide gene expression profile, as an estimate of the stemness of a tumour sample. By considering over 500 mixtures of diverse cellular expression profiles, we reveal that signalling entropy also associates with intra-tumour heterogeneity. By analysing 3668 breast cancer and 1692 lung adenocarcinoma samples, we further demonstrate that signalling entropy correlates negatively with survival, outperforming leading clinical gene expression based prognostic tools. Signalling entropy is found to be a general prognostic measure, valid in different breast cancer clinical subgroups, as well as within stage I lung adenocarcinoma. We find that its prognostic power is driven by genes involved in cancer stem cells and treatment resistance. In summary, by approximating both stemness and intra-tumour heterogeneity, signalling entropy provides a powerful prognostic measure across different epithelial cancers
Mining Pure, Strict Epistatic Interactions from High-Dimensional Datasets: Ameliorating the Curse of Dimensionality
Background: The interaction between loci to affect phenotype is called epistasis. It is strict epistasis if no proper subset of the interacting loci exhibits a marginal effect. For many diseases, it is likely that unknown epistatic interactions affect disease susceptibility. A difficulty when mining epistatic interactions from high-dimensional datasets concerns the curse of dimensionality. There are too many combinations of SNPs to perform an exhaustive search. A method that could locate strict epistasis without an exhaustive search can be considered the brass ring of methods for analyzing high-dimensional datasets. Methodology/Findings: A SNP pattern is a Bayesian network representing SNP-disease relationships. The Bayesian score for a SNP pattern is the probability of the data given the pattern, and has been used to learn SNP patterns. We identified a bound for the score of a SNP pattern. The bound provides an upper limit on the Bayesian score of any pattern that could be obtained by expanding a given pattern. We felt that the bound might enable the data to say something about the promise of expanding a 1-SNP pattern even when there are no marginal effects. We tested the bound using simulated datasets and semi-synthetic high-dimensional datasets obtained from GWAS datasets. We found that the bound was able to dramatically reduce the search time for strict epistasis. Using an Alzheimer's dataset, we showed that it is possible to discover an interaction involving the APOE gene based on its score because of its large marginal effect, but that the bound is most effective at discovering interactions without marginal effects. Conclusions/Significance: We conclude that the bound appears to ameliorate the curse of dimensionality in high-dimensional datasets. This is a very consequential result and could be pivotal in our efforts to reveal the dark matter of genetic disease risk from high-dimensional datasets. © 2012 Jiang, Neapolitan
- …