83,320 research outputs found
Diverse correlation structures in gene expression data and their utility in improving statistical inference
It is well known that correlations in microarray data represent a serious
nuisance deteriorating the performance of gene selection procedures. This paper
is intended to demonstrate that the correlation structure of microarray data
provides a rich source of useful information. We discuss distinct correlation
substructures revealed in microarray gene expression data by an appropriate
ordering of genes. These substructures include stochastic proportionality of
expression signals in a large percentage of all gene pairs, negative
correlations hidden in ordered gene triples, and a long sequence of weakly
dependent random variables associated with ordered pairs of genes. The reported
striking regularities are of general biological interest and they also have
far-reaching implications for theory and practice of statistical methods of
microarray data analysis. We illustrate the latter point with a method for
testing differential expression of nonoverlapping gene pairs. While designed
for testing a different null hypothesis, this method provides an order of
magnitude more accurate control of type 1 error rate compared to conventional
methods of individual gene expression profiling. In addition, this method is
robust to the technical noise. Quantitative inference of the correlation
structure has the potential to extend the analysis of microarray data far
beyond currently practiced methods.Comment: Published in at http://dx.doi.org/10.1214/07-AOAS120 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
DNMT inhibitors reverse a specific signature of aberrant promoter DNA methylation and associated gene silencing in AML
<b>Background</b>.
Myelodysplastic syndrome (MDS) and acute myeloid leukemia (AML) are neoplastic disorders of hematopoietic stem cells. DNA methyltransferase inhibitors (DNMTi), 5-azacytidine (AzaC) and 5-aza-2’-deoxycytidine (Decitabine), benefit some MDS/AML patients. However, the role of DNMTi-induced DNA hypomethylation in regulation of gene expression in AML is unclear.<p></p>
<b>Results. </b>
We compared the effects of AzaC on DNA methylation and gene expression using whole-genome single-nucleotide bisulfite-sequencing (WGBS) and RNA-sequencing in OCI-AML3 (AML3) cells. For data analysis, we used an approach recently developed for discovery of differential patterns of DNA methylation associated with changes in gene expression, that is tailored to single-nucleotide bisulfite-sequencing data (Washington University Interpolated Methylation Signatures (WIMSi)). By this approach, a subset of genes upregulated by AzaC was found to be characterized by AzaC-induced signature methylation loss flanking the transcription start site. These genes are enriched for genes increased in methylation and decreased in expression in AML3 cells compared to normal hematopoietic stem and progenitor cells. Moreover, these genes are preferentially upregulated by Decitabine in human primary AML blasts, and control cell proliferation, death and development. <p></p>
<b>Conclusions.</b>
Our WGBS and WIMSi data analysis approach has identified a set of genes whose is methylation and silencing in AML is reversed by DNMTi. These genes are good candidates for direct regulation by DNMTi, and their reactivation by DNMTi may contribute to therapeutic activity. This study also demonstrates the ability of WIMSi to reveal relationships between DNA methylation and gene expression, based on single-nucleotide bisulfite-sequencing and RNA-seq data.<p></p>
Insights into the regulation of intrinsically disordered proteins in the human proteome by analyzing sequence and gene expression data
Background:
Disordered proteins need to be expressed to carry out specified functions; however, their accumulation in the cell can potentially cause major problems through protein misfolding and aggregation. Gene expression levels, mRNA decay rates, microRNA (miRNA) targeting and ubiquitination have critical roles in the degradation and disposal of human proteins and transcripts. Here, we describe a study examining these features to gain insights into the regulation of disordered proteins.
Results:
In comparison with ordered proteins, disordered proteins have a greater proportion of predicted ubiquitination sites. The transcripts encoding disordered proteins also have higher proportions of predicted miRNA target sites and higher mRNA decay rates, both of which are indicative of the observed lower gene expression levels. The results suggest that the disordered proteins and their transcripts are present in the cell at low levels and/or for a short time before being targeted for disposal. Surprisingly, we find that for a significant proportion of highly disordered proteins, all four of these trends are reversed. Predicted estimates for miRNA targets, ubiquitination and mRNA decay rate are low in the highly disordered proteins that are constitutively and/or highly expressed.
Conclusions:
Mechanisms are in place to protect the cell from these potentially dangerous proteins. The evidence suggests that the enrichment of signals for miRNA targeting and ubiquitination may help prevent the accumulation of disordered proteins in the cell. Our data also provide evidence for a mechanism by which a significant proportion of highly disordered proteins (with high expression levels) can escape rapid degradation to allow them to successfully carry out their function
Sparse integrative clustering of multiple omics data sets
High resolution microarrays and second-generation sequencing platforms are
powerful tools to investigate genome-wide alterations in DNA copy number,
methylation and gene expression associated with a disease. An integrated
genomic profiling approach measures multiple omics data types simultaneously in
the same set of biological samples. Such approach renders an integrated data
resolution that would not be available with any single data type. In this
study, we use penalized latent variable regression methods for joint modeling
of multiple omics data types to identify common latent variables that can be
used to cluster patient samples into biologically and clinically relevant
disease subtypes. We consider lasso [J. Roy. Statist. Soc. Ser. B 58 (1996)
267-288], elastic net [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
301-320] and fused lasso [J. R. Stat. Soc. Ser. B Stat. Methodol. 67 (2005)
91-108] methods to induce sparsity in the coefficient vectors, revealing
important genomic features that have significant contributions to the latent
variables. An iterative ridge regression is used to compute the sparse
coefficient vectors. In model selection, a uniform design [Monographs on
Statistics and Applied Probability (1994) Chapman & Hall] is used to seek
"experimental" points that scattered uniformly across the search domain for
efficient sampling of tuning parameter combinations. We compared our method to
sparse singular value decomposition (SVD) and penalized Gaussian mixture model
(GMM) using both real and simulated data sets. The proposed method is applied
to integrate genomic, epigenomic and transcriptomic data for subtype analysis
in breast and lung cancer data sets.Comment: Published in at http://dx.doi.org/10.1214/12-AOAS578 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling
We present an applied study in cancer genomics for integrating data and
inferences from laboratory experiments on cancer cell lines with observational
data obtained from human breast cancer studies. The biological focus is on
improving understanding of transcriptional responses of tumors to changes in
the pH level of the cellular microenvironment. The statistical focus is on
connecting experimentally defined biomarkers of such responses to clinical
outcome in observational studies of breast cancer patients. Our analysis
exemplifies a general strategy for accomplishing this kind of integration
across contexts. The statistical methodologies employed here draw heavily on
Bayesian sparse factor models for identifying, modularizing and correlating
with clinical outcome these signatures of aggregate changes in gene expression.
By projecting patterns of biological response linked to specific experimental
interventions into observational studies where such responses may be evidenced
via variation in gene expression across samples, we are able to define
biomarkers of clinically relevant physiological states and outcomes that are
rooted in the biology of the original experiment. Through this approach we
identify microenvironment-related prognostic factors capable of predicting long
term survival in two independent breast cancer datasets. These results suggest
possible directions for future laboratory studies, as well as indicate the
potential for therapeutic advances though targeted disruption of specific
pathway components.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS261 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Mapping Dynamic Histone Acetylation Patterns to Gene Expression in Nanog-depleted Murine Embryonic Stem Cells
Embryonic stem cells (ESC) have the potential to self-renew indefinitely and
to differentiate into any of the three germ layers. The molecular mechanisms
for self-renewal, maintenance of pluripotency and lineage specification are
poorly understood, but recent results point to a key role for epigenetic
mechanisms. In this study, we focus on quantifying the impact of histone 3
acetylation (H3K9,14ac) on gene expression in murine embryonic stem cells. We
analyze genome-wide histone acetylation patterns and gene expression profiles
measured over the first five days of cell differentiation triggered by
silencing Nanog, a key transcription factor in ESC regulation. We explore the
temporal and spatial dynamics of histone acetylation data and its correlation
with gene expression using supervised and unsupervised statistical models. On a
genome-wide scale, changes in acetylation are significantly correlated to
changes in mRNA expression and, surprisingly, this coherence increases over
time. We quantify the predictive power of histone acetylation for gene
expression changes in a balanced cross-validation procedure. In an in-depth
study we focus on genes central to the regulatory network of Mouse ESC,
including those identified in a recent genome-wide RNAi screen and in the
PluriNet, a computationally derived stem cell signature. We find that compared
to the rest of the genome, ESC-specific genes show significantly more
acetylation signal and a much stronger decrease in acetylation over time, which
is often not reflected in an concordant expression change. These results shed
light on the complexity of the relationship between histone acetylation and
gene expression and are a step forward to dissect the multilayer regulatory
mechanisms that determine stem cell fate.Comment: accepted at PLoS Computational Biolog
- …