14 research outputs found

    BayesPeak: Bayesian analysis of ChIP-seq data.

    Get PDF
    BACKGROUND: High-throughput sequencing technology has become popular and widely used to study protein and DNA interactions. Chromatin immunoprecipitation, followed by sequencing of the resulting samples, produces large amounts of data that can be used to map genomic features such as transcription factor binding sites and histone modifications. METHODS: Our proposed statistical algorithm, BayesPeak, uses a fully Bayesian hidden Markov model to detect enriched locations in the genome. The structure accommodates the natural features of the Solexa/Illumina sequencing data and allows for overdispersion in the abundance of reads in different regions. Moreover, a control sample can be incorporated in the analysis to account for experimental and sequence biases. Markov chain Monte Carlo algorithms are applied to estimate the posterior distributions of the model parameters, and posterior probabilities are used to detect the sites of interest. CONCLUSION: We have presented a flexible approach for identifying peaks from ChIP-seq reads, suitable for use on both transcription factor binding and histone modification data. Our method estimates probabilities of enrichment that can be used in downstream analysis. The method is assessed using experimentally verified data and is shown to provide high-confidence calls with low false positive rates.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Phylogeography of the second plague pandemic revealed through analysis of historical Yersinia pestis genomes

    Get PDF
    The second plague pandemic, caused by Yersinia pestis, devastated Europe and the nearby regions between the 14th and 18th centuries AD. Here we analyse human remains from ten European archaeological sites spanning this period and reconstruct 34 ancient Y. pestis genomes. Our data support an initial entry of the bacterium through eastern Europe, the absence of genetic diversity during the Black Death, and low within-outbreak diversity thereafter. Analysis of post-Black Death genomes shows the diversification of a Y. pestis lineage into multiple genetically distinct clades that may have given rise to more than one disease reservoir in, or close to, Europe. In addition, we show the loss of a genomic region that includes virulence-related genes in strains associated with late stages of the pandemic. The deletion was also identified in genomes connected with the first plague pandemic (541–750 AD), suggesting a comparable evolutionary trajectory of Y. pestis during both events

    Cell Cycle Genes Are the Evolutionarily Conserved Targets of the E2F4 Transcription Factor

    Get PDF
    Maintaining quiescent cells in G0 phase is achieved in part through the multiprotein subunit complex known as DREAM, and in human cell lines the transcription factor E2F4 directs this complex to its cell cycle targets. We found that E2F4 binds a highly overlapping set of human genes among three diverse primary tissues and an asynchronous cell line, which suggests that tissue-specific binding partners and chromatin structure have minimal influence on E2F4 targeting. To investigate the conservation of these transcription factor binding events, we identified the mouse genes bound by E2f4 in seven primary mouse tissues and a cell line. E2f4 bound a set of mouse genes that was common among mouse tissues, but largely distinct from the genes bound in human. The evolutionarily conserved set of E2F4 bound genes is highly enriched for functionally relevant regulatory interactions important for maintaining cellular quiescence. In contrast, we found minimal mRNA expression perturbations in this core set of E2f4 bound genes in the liver, kidney, and testes of E2f4 null mice. Thus, the regulatory mechanisms maintaining quiescence are robust even to complete loss of conserved transcription factor binding events

    Investigating the Transition of Pre-Symptomatic to Symptomatic Huntington’s Disease Status Based on Omics Data

    No full text
    Huntington’s disease is a rare neurodegenerative disease caused by a cytosine–adenine–guanine (CAG) trinucleotide expansion in the Huntingtin (HTT) gene. Although Huntington’s disease (HD) is well studied, the pathophysiological mechanisms, genes and metabolites involved in HD remain poorly understood. Systems bioinformatics can reveal synergistic relationships among different omics levels and enables the integration of biological data. It allows for the overall understanding of biological mechanisms, pathways, genes and metabolites involved in HD. The purpose of this study was to identify the differentially expressed genes (DEGs), pathways and metabolites as well as observe how these biological terms differ between the pre-symptomatic and symptomatic HD stages. A publicly available dataset from the Gene Expression Omnibus (GEO) was analyzed to obtain the DEGs for each HD stage, and gene co-expression networks were obtained for each HD stage. Network rewiring, highlights the nodes that change most their connectivity with their neighbors and infers their possible implication in the transition between different states. The CACNA1I gene was the mostly highly rewired node among pre-symptomatic and symptomatic HD network. Furthermore, we identified AF198444 to be common between the rewired genes and DEGs of symptomatic HD. CNTN6, DEK, LTN1, MST4, ZFYVE16, CEP135, DCAKD, MAP4K3, NUPL1 and RBM15 between the DEGs of pre-symptomatic and DEGs of symptomatic HD and CACNA1I, DNAJB14, EPS8L3, HSDL2, SNRPD3, SOX12, ACLY, ATF2, BAG5, ERBB4, FOCAD, GRAMD1C, LIN7C, MIR22, MTHFR, NABP1, NRG2, OTC, PRAMEF12, SLC30A10, STAG2 and Y16709 between the rewired genes and DEGs of pre-symptomatic HD. The proteins encoded by these genes are involved in various biological pathways such as phosphatidylinositol-4,5-bisphosphate 3-kinase activity, cAMP response element-binding protein binding, protein tyrosine kinase activity, voltage-gated calcium channel activity, ubiquitin protein ligase activity, adenosine triphosphate (ATP) binding, and protein serine/threonine kinase. Additionally, prominent molecular pathways for each HD stage were then obtained, and metabolites related to each pathway for both disease stages were identified. The transforming growth factor beta (TGF-β) signaling (pre-symptomatic and symptomatic stages of the disease), calcium (Ca2+) signaling (pre-symptomatic), dopaminergic synapse pathway (symptomatic HD patients) and Hippo signaling (pre-symptomatic) pathways were identified. The in silico metabolites we identified include Ca2+, inositol 1,4,5-trisphosphate, sphingosine 1-phosphate, dopamine, homovanillate and L-tyrosine. The genes, pathways and metabolites identified for each HD stage can provide a better understanding of the mechanisms that become altered in each disease stage. Our results can guide the development of therapies that may target the altered genes and metabolites of the perturbed pathways, leading to an improvement in clinical symptoms and hopefully a delay in the age of onset

    BayesPeak-an R package for analysing ChIP-seq data

    Get PDF
    Motivation: Identification of genomic regions of interest in ChIP-seq data, commonly referred to as peak-calling, aims to find the locations of transcription factor binding sites, modified histones or nucleosomes. The BayesPeak algorithm was developed to model the data structure using Bayesian statistical techniques and was shown to be a reliable method, but did not have a full-genome implementation. Results: In this note we present BayesPeak, an R package for genome-wide peak-calling that provides a flexible implementation of the BayesPeak algorithm and is compatible with downstream BioConductor packages. The BayesPeak package introduces a new method for summarizing posterior probability output, along with methods for handling overfitting and support for parallel processing. We briefly compare the package with other common peak-callers. Availability: Available as part of BioConductor version 2.6. URL: http://bioconductor.org/packages/release/bioc/html/BayesPeak.html Contact: [email protected] Supplementary information: Supplementary data are available at Bioinformatics online
    corecore