2,587 research outputs found
A tug-of-war between driver and passenger mutations in cancer and other adaptive processes
Cancer progression is an example of a rapid adaptive process where evolving
new traits is essential for survival and requires a high mutation rate.
Precancerous cells acquire a few key mutations that drive rapid population
growth and carcinogenesis. Cancer genomics demonstrates that these few 'driver'
mutations occur alongside thousands of random 'passenger' mutations-a natural
consequence of cancer's elevated mutation rate. Some passengers can be
deleterious to cancer cells, yet have been largely ignored in cancer research.
In population genetics, however, the accumulation of mildly deleterious
mutations has been shown to cause population meltdown. Here we develop a
stochastic population model where beneficial drivers engage in a tug-of-war
with frequent mildly deleterious passengers. These passengers present a barrier
to cancer progression that is described by a critical population size, below
which most lesions fail to progress, and a critical mutation rate, above which
cancers meltdown. We find support for the model in cancer age-incidence and
cancer genomics data that also allow us to estimate the fitness advantage of
drivers and fitness costs of passengers. We identify two regimes of adaptive
evolutionary dynamics and use these regimes to rationalize successes and
failures of different treatment strategies. We find that a tumor's load of
deleterious passengers can explain previously paradoxical treatment outcomes
and suggest that it could potentially serve as a biomarker of response to
mutagenic therapies. Collective deleterious effect of passengers is currently
an unexploited therapeutic target. We discuss how their effects might be
exacerbated by both current and future therapies
On the gene expression landscape of cancer
A principal component analysis of the TCGA data for 15 cancer localizations
unveils the following qualitative facts about tumors: 1) The state of a tissue
in gene expression space may be described by a few variables. In particular,
there is a single variable describing the progression from a normal tissue to a
tumor. 2) Each cancer localization is characterized by a gene expression
profile, in which genes have specific weights in the definition of the cancer
state. There are no less than 2500 differentially-expressed genes, which lead
to power-like tails in the expression distribution functions. 3) Tumors in
different localizations share hundreds or even thousands of differentially
expressed genes. There are 6 genes common to the 15 studied tumor
localizations. 4) The tumor region is a kind of attractor. Tumors in advanced
stages converge to this region independently of patient age or genetic
variability. 5) There is a landscape of cancer in gene expression space with an
approximate border separating normal tissues from tumors
Lineage-based identification of cellular states and expression programs
We present a method, LineageProgram, that uses the developmental lineage relationship of observed gene expression measurements to improve the learning of developmentally relevant cellular states and expression programs. We find that incorporating lineage information allows us to significantly improve both the predictive power and interpretability of expression programs that are derived from expression measurements from in vitro differentiation experiments. The lineage tree of a differentiation experiment is a tree graph whose nodes describe all of the unique expression states in the input expression measurements, and edges describe the experimental perturbations applied to cells. Our method, LineageProgram, is based on a log-linear model with parameters that reflect changes along the lineage tree. Regularization with L1 that based methods controls the parameters in three distinct ways: the number of genes change between two cellular states, the number of unique cellular states, and the number of underlying factors responsible for changes in cell state. The model is estimated with proximal operators to quickly discover a small number of key cell states and gene sets. Comparisons with existing factorization, techniques, such as singular value decomposition and non-negative matrix factorization show that our method provides higher predictive power in held, out tests while inducing sparse and biologically relevant gene sets.National Institutes of Health (U.S.) (P01-NS055923)National Institutes of Health (U.S.) (1-UL1-RR024920
High-Dimensional Joint Estimation of Multiple Directed Gaussian Graphical Models
We consider the problem of jointly estimating multiple related directed
acyclic graph (DAG) models based on high-dimensional data from each graph. This
problem is motivated by the task of learning gene regulatory networks based on
gene expression data from different tissues, developmental stages or disease
states. We prove that under certain regularity conditions, the proposed
-penalized maximum likelihood estimator converges in Frobenius norm to
the adjacency matrices consistent with the data-generating distributions and
has the correct sparsity. In particular, we show that this joint estimation
procedure leads to a faster convergence rate than estimating each DAG model
separately. As a corollary, we also obtain high-dimensional consistency results
for causal inference from a mix of observational and interventional data. For
practical purposes, we propose \emph{jointGES} consisting of Greedy Equivalence
Search (GES) to estimate the union of all DAG models followed by variable
selection using lasso to obtain the different DAGs, and we analyze its
consistency guarantees. The proposed method is illustrated through an analysis
of simulated data as well as epithelial ovarian cancer gene expression data
Integrated study of copy number states and genotype calls using high-density SNP arrays
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls
Optimization algorithms for inference and classification of genetic profiles from undersampled measurements
In this thesis, we tackle three different problems, all related to optimization techniques for inference and classification of genetic profiles. First, we extend the deterministic Non-negative Matrix Factorization (NMF) framework to the probabilistic case (PNMF). We apply the PNMF algorithm to cluster and classify DNA microarrays data. The proposed PNMF is shown to outperform the deterministic NMF and the sparse NMF algorithms in clustering stability and classification accuracy. Second, we propose SMURC: Small-sample MUltivariate Regression with Covariance estimation. Specifically, we consider a high dimension low sample-size multivariate regression problem that accounts for correlation of the response variables. We show that, in this case, the maximum likelihood approach is senseless because the likelihood diverges. We propose a normalization of the likelihood function that guarantees convergence. Simulation results show that SMURC outperforms the regularized likelihood estimator with known covariance matrix and the state-of-the-art sparse Conditional Graphical Gaussian Model (sCGGM). In the third Chapter, we derive a new greedy algorithm that provides an exact sparse solution of the combinatorial l sub zero-optimization problem in an exponentially less computation time. Unlike other greedy approaches, which are only approximations of the exact sparse solution, the proposed greedy approach, called Kernel reconstruction, leads to the exact optimal solution
25 years of epidermal stem cell research.
This is a chronicle of concepts in the field of epidermal stem cell biology and a historic look at their development over time. The past 25 years have seen the evolution of epidermal stem cell science, from first fundamental studies to a sophisticated science. The study of epithelial stem cell biology was aided by the ability to visualize the distribution of stem cells and their progeny through lineage analysis studies. The excellent progress we have made in understanding epidermal stem cell biology is discussed in this article. The challenges we still face in understanding epidermal stem cells include defining molecular markers for stem and progenitor sub-populations, determining the locations and contributions of the different stem cell niches, and mapping regulatory pathways of epidermal stem cell proliferation and differentiation. However, our rapidly evolving understanding of epidermal stem cells has many potential uses that promise to translate into improved patient therapy
Comparative DNA methylome analysis of endometrial carcinoma reveals complex and distinct deregulation of cancer promoters and enhancers
BACKGROUND: Aberrant DNA methylation is a hallmark of many cancers. Classically there are two types of endometrial cancer, endometrioid adenocarcinoma (EAC), or Type I, and uterine papillary serous carcinoma (UPSC), or Type II. However, the whole genome DNA methylation changes in these two classical types of endometrial cancer is still unknown. RESULTS: Here we described complete genome-wide DNA methylome maps of EAC, UPSC, and normal endometrium by applying a combined strategy of methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylation-sensitive restriction enzyme digestion sequencing (MRE-seq). We discovered distinct genome-wide DNA methylation patterns in EAC and UPSC: 27,009 and 15,676 recurrent differentially methylated regions (DMRs) were identified respectively, compared with normal endometrium. Over 80% of DMRs were in intergenic and intronic regions. The majority of these DMRs were not interrogated on the commonly used Infinium 450K array platform. Large-scale demethylation of chromosome X was detected in UPSC, accompanied by decreased XIST expression. Importantly, we discovered that the majority of the DMRs harbored promoter or enhancer functions and are specifically associated with genes related to uterine development and disease. Among these, abnormal methylation of transposable elements (TEs) may provide a novel mechanism to deregulate normal endometrium-specific enhancers derived from specific TEs. CONCLUSIONS: DNA methylation changes are an important signature of endometrial cancer and regulate gene expression by affecting not only proximal promoters but also distal enhancers. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1471-2164-15-868) contains supplementary material, which is available to authorized users
- …