96 research outputs found
Diversity of core promoter elements comprising human bidirectional promoters
<p>Abstract</p> <p>Background</p> <p>Bidirectional promoters lie between adjacent genes, which are transcribed from opposite strands of DNA. The functional mechanisms underlying the activation of bidirectional promoters are currently uncharacterised. To define the core promoter elements of bidirectional promoters in human, we mapped motifs for TATA, INR, BRE, DPE, INR, as well as CpG-islands.</p> <p>Results</p> <p>We found a consistently high correspondence between C+G content, CpG-island presence and an average expression level increasing the median level for all genes in bidirectional promoters. These CpG-rich promoters showed discrete initiation patterns rather than broad regions of transcription initiation, as are typically seen for CpG-island promoters. CpG-islands encompass both TSSs within bidirectional promoters, providing an explanation for the symmetrical co-expression patterns of many of these genes. In contrast, TATA motifs appear to be asymmetrically positioned at one TSS or the other.</p> <p>Conclusion</p> <p>Our findings demonstrate that bidirectional promoters utilize a variety of core promoter elements to initiate transcription. CpG-islands dominate the regulatory landscape of this group of promoters.</p
Prediction-based approaches to characterize bidirectional promoters in the mammalian genome
<p>Abstract</p> <p>Background</p> <p>Machine learning approaches are emerging as a way to discriminate various classes of functional elements. Previous attempts to create Regulatory Potential (RP) scores to discriminate functional DNA from nonfunctional DNA included using Markov models trained to identify sequences from promoters and enhancers from ancestral repeats. We proposed that knowledge gleaned from those methods could be further refined using a multiple class predictor to separate classes of promoter elements from enhancers or nonfunctional DNA.</p> <p>Results</p> <p>We extended our previous work, which identified over 5,000 candidate bidirectional promoters in the human genome, to map the orthologous promoter regions in the mouse genome. Our algorithm measured the robustness of evidence provided by the spliced EST annotations and incorporated evidence from annotations of UCSC Known Genes and GenBank mRNA. In preparation for de novo prediction of this promoter type, we examined characteristic features of the dataset as a whole. For instance, bidirectional promoters score very highly among all functional elements for Regulatory Potential Scores. This result was unexpected due to the limited sequence conservation found in these noncoding regions. We demonstrate that bidirectional promoters can be classified apart from other genomic features including non-bidirectional promoters, i.e. those promoters having no nearby upstream genes. Furthermore bidirectional promoters consistently score at the level of very highly conserved functional elements in the genome- developmental enhancers. The high scores are due to sequence-based characteristics within the promoters, not the surrounding exons. These results indicate that high-scoring RP regions can be deconvoluted into various functional classes of genomic elements. Using a multiple class predictor we are able to discriminate bidirectional promoters from enhancers, non-bidirectional promoters, and non-promoter regions on the basis of RP scores and CpG islands.</p> <p>Conclusions</p> <p>We examine orthology at bidirectional promoters, use discriminatory machine learning approaches to differentiate multiple types of promoters from other functional and nonfunctional features in the genome and begin the process of deconvoluting classes of functional regions that score well with RP scores. These types of approaches precede supervised learning techniques to discover unannotated promoter regions.</p
Comprehensive Annotation of Bidirectional Promoters Identifies Co-Regulation among Breast and Ovarian Cancer Genes
A βbidirectional gene pairβ comprises two adjacent genes whose transcription start sites are neighboring and directed away from each other. The intervening regulatory region is called a βbidirectional promoter.β These promoters are often associated with genes that function in DNA repair, with the potential to participate in the development of cancer. No connection between these gene pairs and cancer has been previously investigated. Using the database of spliced-expressed sequence tags (ESTs), we identified the most complete collection of human transcripts under the control of bidirectional promoters. A rigorous screen of the spliced EST data identified new bidirectional promoters, many of which functioned as alternative promoters or regulated novel transcripts. Additionally, we show a highly significant enrichment of bidirectional promoters in genes implicated in somatic cancer, including a substantial number of genes implicated in breast and ovarian cancers. The repeated use of this promoter structure in the human genome suggests it could regulate co-expression patterns among groups of genes. Using microarray expression data from 79 human tissues, we verify regulatory networks among genes controlled by bidirectional promoters. Subsets of these promoters contain similar combinations of transcription factor binding sites, including evolutionarily conserved ETS factor binding sites in ERBB2, FANCD2, and BRCA2. Interpreting the regulation of genes involved in co-expression networks, especially those involved in cancer, will be an important step toward defining molecular events that may contribute to disease
Wide-Scale Analysis of Human Functional Transcription Factor Binding Reveals a Strong Bias towards the Transcription Start Site
We introduce a novel method to screen the promoters of a set of genes with
shared biological function, against a precompiled library of motifs, and find
those motifs which are statistically over-represented in the gene set. The gene
sets were obtained from the functional Gene Ontology (GO) classification; for
each set and motif we optimized the sequence similarity score threshold,
independently for every location window (measured with respect to the TSS),
taking into account the location dependent nucleotide heterogeneity along the
promoters of the target genes. We performed a high throughput analysis,
searching the promoters (from 200bp downstream to 1000bp upstream the TSS), of
more than 8000 human and 23,000 mouse genes, for 134 functional Gene Ontology
classes and for 412 known DNA motifs. When combined with binding site and
location conservation between human and mouse, the method identifies with high
probability functional binding sites that regulate groups of biologically
related genes. We found many location-sensitive functional binding events and
showed that they clustered close to the TSS. Our method and findings were put
to several experimental tests. By allowing a "flexible" threshold and combining
our functional class and location specific search method with conservation
between human and mouse, we are able to identify reliably functional TF binding
sites. This is an essential step towards constructing regulatory networks and
elucidating the design principles that govern transcriptional regulation of
expression. The promoter region proximal to the TSS appears to be of central
importance for regulation of transcription in human and mouse, just as it is in
bacteria and yeast.Comment: 31 pages, including Supplementary Information and figure
Defining functional DNA elements in the human genome
With the completion of the human genome sequence, attention turned to identifying and annotating its functional DNA elements. As a complement to genetic and comparative genomics approaches, the Encyclopedia of DNA Elements Project was launched to contribute maps of RNA transcripts, transcriptional regulator binding sites, and chromatin states in many cell types. The resulting genome-wide data reveal sites of biochemical activity with high positional resolution and cell type specificity that facilitate studies of gene regulation and interpretation of noncoding variants associated with human disease. However, the biochemically active regions cover a much larger fraction of the genome than do evolutionarily conserved regions, raising the question of whether nonconserved but biochemically active regions are truly functional. Here, we review the strengths and limitations of biochemical, evolutionary, and genetic approaches for defining functional DNA segments, potential sources for the observed differences in estimated genomic coverage, and the biological implications of these discrepancies. We also analyze the relationship between signal intensity, genomic coverage, and evolutionary conservation. Our results reinforce the principle that each approach provides complementary information and that we need to use combinations of all three to elucidate genome function in human biology and disease
A Dual Origin of the Xist Gene from a Protein-Coding Gene and a Set of Transposable Elements
X-chromosome inactivation, which occurs in female eutherian mammals is controlled by a complex X-linked locus termed the X-inactivation center (XIC). Previously it was proposed that genes of the XIC evolved, at least in part, as a result of pseudogenization of protein-coding genes. In this study we show that the key XIC gene Xist, which displays fragmentary homology to a protein-coding gene Lnx3, emerged de novo in early eutherians by integration of mobile elements which gave rise to simple tandem repeats. The Xist gene promoter region and four out of ten exons found in eutherians retain homology to exons of the Lnx3 gene. The remaining six Xist exons including those with simple tandem repeats detectable in their structure have similarity to different transposable elements. Integration of mobile elements into Xist accompanies the overall evolution of the gene and presumably continues in contemporary eutherian species. Additionally we showed that the combination of remnants of protein-coding sequences and mobile elements is not unique to the Xist gene and is found in other XIC genes producing non-coding nuclear RNA
A ChIP-Seq Benchmark Shows That Sequence Conservation Mainly Improves Detection of Strong Transcription Factor Binding Sites
Transcription factors are important controllers of gene expression and mapping transcription factor binding sites (TFBS) is key to inferring transcription factor regulatory networks. Several methods for predicting TFBS exist, but there are no standard genome-wide datasets on which to assess the performance of these prediction methods. Also, it is believed that information about sequence conservation across different genomes can generally improve accuracy of motif-based predictors, but it is not clear under what circumstances use of conservation is most beneficial.Here we use published ChIP-seq data and an improved peak detection method to create comprehensive benchmark datasets for prediction methods which use known descriptors or binding motifs to detect TFBS in genomic sequences. We use this benchmark to assess the performance of five different prediction methods and find that the methods that use information about sequence conservation generally perform better than simpler motif-scanning methods. The difference is greater on high-affinity peaks and when using short and information-poor motifs. However, if the motifs are specific and information-rich, we find that simple motif-scanning methods can perform better than conservation-based methods.Our benchmark provides a comprehensive test that can be used to rank the relative performance of transcription factor binding site prediction methods. Moreover, our results show that, contrary to previous reports, sequence conservation is better suited for predicting strong than weak transcription factor binding sites
High Resolution Detection and Analysis of CpG Dinucleotides Methylation Using MBD-Seq Technology
Methyl-CpG binding domain protein sequencing (MBD-seq) is widely used to survey DNA methylation patterns. However, the optimal experimental parameters for MBD-seq remain unclear and the data analysis remains challenging. In this study, we generated high depth MBD-seq data in MCF-7 cell and developed a bi-asymmetric-Laplace model (BALM) to perform data analysis. We found that optimal efficiency of MBD-seq experiments was achieved by sequencing βΌ100 million unique mapped tags from a combination of 500 mM and 1000 mM salt concentration elution in MCF-7 cells. Clonal bisulfite sequencing results showed that the methylation status of each CpG dinucleotides in the tested regions was accurately detected with high resolution using the proposed model. These results demonstrated the combination of MBD-seq and BALM could serve as a useful tool to investigate DNA methylome due to its low cost, high specificity, efficiency and resolution
Disease-Associated Mutations That Alter the RNA Structural Ensemble
Genome-wide association studies (GWAS) often identify disease-associated mutations in intergenic and non-coding regions of the genome. Given the high percentage of the human genome that is transcribed, we postulate that for some observed associations the disease phenotype is caused by a structural rearrangement in a regulatory region of the RNA transcript. To identify such mutations, we have performed a genome-wide analysis of all known disease-associated Single Nucleotide Polymorphisms (SNPs) from the Human Gene Mutation Database (HGMD) that map to the untranslated regions (UTRs) of a gene. Rather than using minimum free energy approaches (e.g. mFold), we use a partition function calculation that takes into consideration the ensemble of possible RNA conformations for a given sequence. We identified in the human genome disease-associated SNPs that significantly alter the global conformation of the UTR to which they map. For six disease-states (Hyperferritinemia Cataract Syndrome, Ξ²-Thalassemia, Cartilage-Hair Hypoplasia, Retinoblastoma, Chronic Obstructive Pulmonary Disease (COPD), and Hypertension), we identified multiple SNPs in UTRs that alter the mRNA structural ensemble of the associated genes. Using a Boltzmann sampling procedure for sub-optimal RNA structures, we are able to characterize and visualize the nature of the conformational changes induced by the disease-associated mutations in the structural ensemble. We observe in several cases (specifically the 5β² UTRs of FTL and RB1) SNPβinduced conformational changes analogous to those observed in bacterial regulatory Riboswitches when specific ligands bind. We propose that the UTR and SNP combinations we identify constitute a βRiboSNitch,β that is a regulatory RNA in which a specific SNP has a structural consequence that results in a disease phenotype. Our SNPfold algorithm can help identify RiboSNitches by leveraging GWAS data and an analysis of the mRNA structural ensemble
- β¦