230 research outputs found

    Pinpointing transcription factor binding sites from ChIP-seq data with SeqSite

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Chromatin immunoprecipitation combined with the next-generation DNA sequencing technologies (ChIP-seq) becomes a key approach for detecting genome-wide sets of genomic sites bound by proteins, such as transcription factors (TFs). Several methods and open-source tools have been developed to analyze ChIP-seq data. However, most of them are designed for detecting TF binding regions instead of accurately locating transcription factor binding sites (TFBSs). It is still challenging to pinpoint TFBSs directly from ChIP-seq data, especially in regions with closely spaced binding events.</p> <p>Results</p> <p>With the aim to pinpoint TFBSs at a high resolution, we propose a novel method named SeqSite, implementing a two-step strategy: detecting tag-enriched regions first and pinpointing binding sites in the detected regions. The second step is done by modeling the tag density profile, locating TFBSs on each strand with a least-squares model fitting strategy, and merging the detections from the two strands. Experiments on simulation data show that SeqSite can locate most of the binding sites more than 40-bp from each other. Applications on three human TF ChIP-seq datasets demonstrate the advantage of SeqSite for its higher resolution in pinpointing binding sites compared with existing methods.</p> <p>Conclusions</p> <p>We have developed a computational tool named SeqSite, which can pinpoint both closely spaced and isolated binding sites, and consequently improves the resolution of TFBS detection from ChIP-seq data.</p

    Network-based group variable selection for detecting expression quantitative trait loci (eQTL)

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Analysis of expression quantitative trait loci (eQTL) aims to identify the genetic loci associated with the expression level of genes. Penalized regression with a proper penalty is suitable for the high-dimensional biological data. Its performance should be enhanced when we incorporate biological knowledge of gene expression network and linkage disequilibrium (LD) structure between loci in high-noise background.</p> <p>Results</p> <p>We propose a network-based group variable selection (NGVS) method for QTL detection. Our method simultaneously maps highly correlated expression traits sharing the same biological function to marker sets formed by LD. By grouping markers, complex joint activity of multiple SNPs can be considered and the dimensionality of eQTL problem is reduced dramatically. In order to demonstrate the power and flexibility of our method, we used it to analyze two simulations and a mouse obesity and diabetes dataset. We considered the gene co-expression network, grouped markers into marker sets and treated the additive and dominant effect of each locus as a group: as a consequence, we were able to replicate results previously obtained on the mouse linkage dataset. Furthermore, we observed several possible sex-dependent loci and interactions of multiple SNPs.</p> <p>Conclusions</p> <p>The proposed NGVS method is appropriate for problems with high-dimensional data and high-noise background. On eQTL problem it outperforms the classical Lasso method, which does not consider biological knowledge. Introduction of proper gene expression and loci correlation information makes detecting causal markers more accurate. With reasonable model settings, NGVS can lead to novel biological findings.</p

    Observations on shifted cumulative regulation

    Get PDF
    A response to Dynamic cumulative activity of transcription factors as a mechanism of quantitative gene regulation by F He, J Buer, AP Zeng and R Balling. Genome Biol 2007, 8:R181

    Gene-set analysis identifies master transcription factors in developmental courses

    Get PDF
    AbstractTranscriptional regulation plays key roles in many biological processes. The regulation is dynamic in time and space. Identifying transcription factors that play major roles in a developmental time course is very important for understanding the regulation. This cannot be realized by studying the relation between the expression of individual genes. We developed a gene-set analysis approach to study master regulators and their actively regulated targets during a time course from gene expression data. We applied the method to a mouse liver development data and a mouse embryonic stem cell (mESC) development data, and identified 14 and 9 transcription factors that play major regulatory roles in the two development courses, respectively. Some transcription factors could not be identified as active in the process by studying their correlation with individual targets. The method was also extended for studying other regulation factors or pathways from time-course expression data

    Putative Zinc Finger Protein Binding Sites Are Over-Represented in the Boundaries of Methylation-Resistant CpG Islands in the Human Genome

    Get PDF
    Majority of CpG dinucleotides in mammalian genomes tend to undergo DNA methylation, but most CpG islands are resistant to such epigenetic modification. Understanding about mechanisms that may lead to the methylation resistance of CpG islands is still very poor.Using the genome-scale in vivo DNA methylation data from human brain, we investigated the flanking sequence features of methylation-resistant CpG islands, and discovered that there are several over-represented putative Transcription Factor Binding Sites (TFBSs) in methylation-resistant CpG islands, and a specific group of zinc finger protein binding sites are over-represented in boundary regions ( approximately 400 bp) flanking such CpG islands. About 77% of the over-represented putative TFBSs are conserved among human, mouse and rat. We also observed the enrichment of 4 histone methylations in methylation-resistant CpG islands or their boundaries.Our results suggest a possible mechanism that certain putative zinc finger protein binding sites over-represented in the boundary regions of the methylation-resistant CpG islands may block the spreading of methylation into these islands, and those TFBSs over-represented within the islands may both reinforce the methylation blocking and promote transcription. Some histone modifications may also enhance the immunity of the CpG islands against DNA methylation by augmenting these TFs' binding. We speculate that the dynamical equilibrium between methylation spreading and blocking is likely to be responsible for the establishment and maintenance of the relatively stable DNA methylation pattern in human somatic cells

    Embryonics: A path to artificial life?

    Get PDF
    Electronic systems, no matter how clever and intelligent they are, cannot yet demonstrate the reliability that biological systems can. Perhaps we can learn from these processes, which have developed through millions of years of evolution, in our pursuit of highly reliable systems. This article discusses how such systems, inspired by biological principles, might be built using simple embryonic cells. We illustrate how they can monitor their own functional integrity in order to protect themselves from internal failure or from hostile environmental effects and how faults caused by DNA mutation or cell death can be repaired and thus full system functionality restored. ©2006 Massachusetts Institute of Technology

    Identifications of conserved 7-mers in 3'-UTRs and microRNAs in Drosophila

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>MicroRNAs (miRNAs) are a class of endogenous regulatory small RNAs which play an important role in posttranscriptional regulations by targeting mRNAs for cleavage or translational repression. The base-pairing between the 5'-end of miRNA and the target mRNA 3'-UTRs is essential for the miRNA:mRNA recognition. Recent studies show that many seed matches in 3'-UTRs, which are fully complementary to miRNA 5'-ends, are highly conserved. Based on these features, a two-stage strategy can be implemented to achieve the <it>de novo </it>identification of miRNAs by requiring the complete base-pairing between the 5'-end of miRNA candidates and the potential seed matches in 3'-UTRs.</p> <p>Results</p> <p>We presented a new method, which combined multiple pairwise conservation information, to identify the frequently-occurred and conserved 7-mers in 3'-UTRs. A pairwise conservation score (PCS) was introduced to describe the conservation of all 7-mers in 3'-UTRs between any two <it>Drosophila </it>species. Using PCSs computed from 6 pairs of flies, we developed a support vector machine (SVM) classifier ensemble, named Cons-SVM and identified 689 conserved 7-mers including 63 seed matches covering 32 out of 38 known miRNA families in the reference dataset. In the second stage, we searched for 90 nt conserved stem-loop regions containing the complementary sequences to the identified 7-mers and used the previously published miRNA prediction software to analyze these stem-loops. We predicted 47 miRNA candidates in the genome-wide screen.</p> <p>Conclusion</p> <p>Cons-SVM takes advantage of the independent evolutionary information from the 6 pairs of flies and shows high sensitivity in identifying seed matches in 3'-UTRs. Combining the multiple pairwise conservation information by the machine learning approach, we finally identified 47 miRNA candidates in <it>D. melanogaster</it>.</p
    corecore