71 research outputs found
The similarity of gene expression between human and mouse tissues
Meta-analysis of human and mouse microarray data reveals conservation of patterns of gene expression that will help to better characterize the evolution of gene expression
Evaluation of several lightweight stochastic context-free grammars for RNA secondary structure prediction
BACKGROUND: RNA secondary structure prediction methods based on probabilistic modeling can be developed using stochastic context-free grammars (SCFGs). Such methods can readily combine different sources of information that can be expressed probabilistically, such as an evolutionary model of comparative RNA sequence analysis and a biophysical model of structure plausibility. However, the number of free parameters in an integrated model for consensus RNA structure prediction can become untenable if the underlying SCFG design is too complex. Thus a key question is, what small, simple SCFG designs perform best for RNA secondary structure prediction? RESULTS: Nine different small SCFGs were implemented to explore the tradeoffs between model complexity and prediction accuracy. Each model was tested for single sequence structure prediction accuracy on a benchmark set of RNA secondary structures. CONCLUSIONS: Four SCFG designs had prediction accuracies near the performance of current energy minimization programs. One of these designs, introduced by Knudsen and Hein in their PFOLD algorithm, has only 21 free parameters and is significantly simpler than the others
Recommended from our members
Survey of cryptic unstable transcripts in yeast.
BACKGROUND: Cryptic unstable transcripts (CUTs) are a largely unexplored class of nuclear exosome degraded, non-coding RNAs in budding yeast. It is highly debated whether CUT transcription has a functional role in the cell or whether CUTs represent noise in the yeast transcriptome. We sought to ascertain the extent of conserved CUT expression across a variety of Saccharomyces yeast strains to further understand and characterize the nature of CUT expression.
RESULTS: We sequenced the WT and rrp6Δ transcriptomes of three S.cerevisiae strains: S288c, Σ1278b, JAY291 and the S.paradoxus strain N17 and utilized a hidden Markov model to annotate CUTs in these four strains. Utilizing a four-way genomic alignment we identified a large population of CUTs with conserved syntenic expression across all four strains. By identifying configurations of gene-CUT pairs, where CUT expression originates from the gene 5' or 3' nucleosome free region, we observed distinct gene expression trends specific to these configurations which were most prevalent in the presence of conserved CUT expression. Divergent pairs correlate with higher expression of genes, and convergent pairs correlate with reduced gene expression.
CONCLUSIONS: Our RNA-seq based method has greatly expanded upon previous CUT annotations in S.cerevisiae underscoring the extensive and pervasive nature of unstable transcription. Furthermore we provide the first assessment of conserved CUT expression in yeast and globally demonstrate possible modes of CUT-based regulation of gene expression
Hierarchical Dirichlet Process-Based Models For Discovery of Cross-species Mammalian Gene Expression
An important research problem in computational biology is theidentification of expression programs, sets of co-activatedgenes orchestrating physiological processes, and thecharacterization of the functional breadth of these programs. Theuse of mammalian expression data compendia for discovery of suchprograms presents several challenges, including: 1) cellularinhomogeneity within samples, 2) genetic and environmental variationacross samples, and 3) uncertainty in the numbers of programs andsample populations. We developed GeneProgram, a new unsupervisedcomputational framework that uses expression data to simultaneouslyorganize genes into overlapping programs and tissues into groups toproduce maps of inter-species expression programs, which are sortedby generality scores that exploit the automatically learnedgroupings. Our method addresses each of the above challenges byusing a probabilistic model that: 1) allocates mRNA to differentexpression programs that may be shared across tissues, 2) ishierarchical, treating each tissue as a sample from a population ofrelated tissues, and 3) uses Dirichlet Processes, a non-parametricBayesian method that provides prior distributions over numbers ofsets while penalizing model complexity. Using real gene expressiondata, we show that GeneProgram outperforms several popularexpression analysis methods in recovering biologically interpretablegene sets. From a large compendium of mouse and human expressiondata, GeneProgram discovers 19 tissue groups and 100 expressionprograms active in mammalian tissues. Our method automaticallyconstructs a comprehensive, body-wide map of expression programs andcharacterizes their functional generality. This map can be used forguiding future biological experiments, such as discovery of genesfor new drug targets that exhibit minimal "cross-talk" withunintended organs, or genes that maintain general physiologicalresponses that go awry in disease states. Further, our method isgeneral, and can be applied readily to novel compendia of biologicaldata
Automated Discovery of Functional Generality of Human Gene Expression Programs
An important research problem in computational biology is the identification of expression programs, sets of co-expressed genes orchestrating normal or pathological processes, and the characterization of the functional breadth of these programs. The use of human expression data compendia for discovery of such programs presents several challenges including cellular inhomogeneity within samples, genetic and environmental variation across samples, uncertainty in the numbers of programs and sample populations, and temporal behavior. We developed GeneProgram, a new unsupervised computational framework based on Hierarchical Dirichlet Processes that addresses each of the above challenges. GeneProgram uses expression data to simultaneously organize tissues into groups and genes into overlapping programs with consistent temporal behavior, to produce maps of expression programs, which are sorted by generality scores that exploit the automatically learned groupings. Using synthetic and real gene expression data, we showed that GeneProgram outperformed several popular expression analysis methods. We applied GeneProgram to a compendium of 62 short time-series gene expression datasets exploring the responses of human cells to infectious agents and immune-modulating molecules. GeneProgram produced a map of 104 expression programs, a substantial number of which were significantly enriched for genes involved in key signaling pathways and/or bound by NF-κB transcription factors in genome-wide experiments. Further, GeneProgram discovered expression programs that appear to implicate surprising signaling pathways or receptor types in the response to infection, including Wnt signaling and neurotransmitter receptors. We believe the discovered map of expression programs involved in the response to infection will be useful for guiding future biological experiments; genes from programs with low generality scores might serve as new drug targets that exhibit minimal “cross-talk,” and genes from high generality programs may maintain common physiological responses that go awry in disease states. Further, our method is multipurpose, and can be applied readily to novel compendia of biological data
Recommended from our members
Model based heritability scores for high-throughput sequencing data
Supplementary materials. (PDF 1370 KB
Recommended from our members
The Δ40p53 isoform inhibits p53-dependent eRNA transcription and enables regulation by signal-specific transcription factors during p53 activation
The naturally occurring Δ40p53 isoform heterotetramerizes with wild-type p53 (WTp53) to regulate development, aging, and stress responses. How Δ40p53 alters WTp53 function remains enigmatic because their co-expression causes tetramer heterogeneity. We circumvented this issue with a well-tested strategy that expressed Δ40p53:WTp53 as a single transcript, ensuring a 2:2 tetramer stoichiometry. Human MCF10A cell lines expressing Δ40p53:WTp53, WTp53, or WTp53:WTp53 (as controls) from the native TP53 locus were examined with transcriptomics (precision nuclear run-on sequencing [PRO-seq] and RNA sequencing [RNA-seq]), metabolomics, and other methods. Δ40p53:WTp53 was transcriptionally active, and, although phenotypically similar to WTp53 under normal conditions, it failed to induce growth arrest upon Nutlin-induced p53 activation. This occurred via Δ40p53:WTp53-dependent inhibition of enhancer RNA (eRNA) transcription and subsequent failure to induce mRNA biogenesis, despite similar genomic occupancy to WTp53. A different stimulus (5-fluorouracil [5FU]) also showed Δ40p53:WTp53-specific changes in mRNA induction; however, other transcription factors (TFs; e.g., E2F2) could then drive the response, yielding similar outcomes vs. WTp53. Our results establish that Δ40p53 tempers WTp53 function to enable compensatory responses by other stimulus-specific TFs. Such modulation of WTp53 activity may be an essential physiological function for Δ40p53. Moreover, Δ40p53:WTp53 functional distinctions uncovered herein suggest an eRNA requirement for mRNA biogenesis and that human p53 evolved as a tetramer to support eRNA transcription.
</p
Recommended from our members
Protocol variations in run-on transcription dataset preparation produce detectable signatures in sequencing libraries
Background
A variety of protocols exist for producing whole genome run-on transcription datasets. However, little is known about how differences between these protocols affect the signal within the resulting libraries.
Results
Using run-on transcription datasets generated from the same biological system, we show that a variety of GRO- and PRO-seq preparation methods leave identifiable signatures within each library. Specifically we show that the library preparation method results in differences in quality control metrics, as well as differences in the signal distribution at the 5 ′ end of transcribed regions. These shifts lead to disparities in eRNA identification, but do not impact analyses aimed at inferring the key regulators involved in changes to transcription.
Conclusions
Run-on sequencing protocol variations result in technical signatures that can be used to identify both the enrichment and library preparation method of a particular data set. These technical signatures are batch effects that limit detailed comparisons of pausing ratios and eRNAs identified across protocols. However, these batch effects have only limited impact on our ability to infer which regulators underlie the observed transcriptional changes.
</p
Core transcriptional regulatory circuitry in human hepatocytes
We mapped the transcriptional regulatory circuitry for six master regulators in human hepatocytes using chromatin immunoprecipitation and high-resolution promoter microarrays. The results show that these regulators form a highly interconnected core circuitry, and reveal the local regulatory network motifs created by regulator–gene interactions. Autoregulation was a prominent theme among these regulators. We found that hepatocyte master regulators tend to bind promoter regions combinatorially and that the number of transcription factors bound to a promoter corresponds with observed gene expression. Our studies reveal portions of the core circuitry of human hepatocytes
Core transcriptional regulatory circuitry in human hepatocytes
We mapped the transcriptional regulatory circuitry for six master regulators in human hepatocytes using chromatin immunoprecipitation and high-resolution promoter microarrays. The results show that these regulators form a highly interconnected core circuitry, and reveal the local regulatory network motifs created by regulator–gene interactions. Autoregulation was a prominent theme among these regulators. We found that hepatocyte master regulators tend to bind promoter regions combinatorially and that the number of transcription factors bound to a promoter corresponds with observed gene expression. Our studies reveal portions of the core circuitry of human hepatocytes
- …