31 research outputs found

    Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control

    No full text
    Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories. Results : Here, we present Zerone, a ChIP-seq discretizer with built-in quality control. Zerone is powered by a Hidden Markov Model with zero-inflated negative multinomial emissions, which allows it to merge several replicates into a single discretized profile. To identify low quality or irreproducible data, we trained a Support Vector Machine and integrated it as part of the discretization process. The result is a classifier reaching 95% accuracy in detecting low quality profiles. We also introduce a graphical representation to compare discretization quality and we show that Zerone achieves outstanding accuracy. Finally, on current hardware, Zerone discretizes a ChIP-seq experiment on mammalian genomes in about 5 min using less than 700 MB of memory. Availability and Implementation : Zerone is available as a command line tool and as an R package. The C source code and R scripts can be downloaded from https://github.com/nanakiksc/zerone . The information to reproduce the benchmark and the figures is stored in a public Docker image that can be downloaded from https://hub.docker.com/r/nanakiksc/zerone/ . Contact : [email protected] Supplementary information : Supplementary data are available at Bioinformatics online.This research was supported by the Government of Catalonia and the Spanish Ministery of Economy and Competitiveness (Plan Nacional BFU2012-37168, Centro de Excelencia Severo Ochoa 20132017 SEV-20120208). The fellowship of P.C. was partly supported by the Spanish Ministry of Economy and Competitiveness [State Training Subprogram: predoctoral fellowships for the training of PhD students (FPI) 2013]

    A Family of Human Zinc Finger Proteins That Bind Methylated DNA and Repress Transcription

    Get PDF
    In vertebrates, densely methylated DNA is associated with inactive transcription. Actors in this process include proteins of the MBD family that can recognize methylated CpGs and repress transcription. Kaiso, a structurally unrelated protein, has also been shown to bind methylated CGCGs through its three KrĂŒppel-like C(2)H(2) zinc fingers. The human genome contains two uncharacterized proteins, ZBTB4 and ZBTB38, that contain Kaiso-like zinc fingers. We report that ZBTB4 and ZBTB38 bind methylated DNA in vitro and in vivo. Unlike Kaiso, they can bind single methylated CpGs. When transfected in mouse cells, the proteins colocalize with foci of heavily methylated satellite DNA and become delocalized upon loss of DNA methylation. Chromatin immunoprecipitation suggests that both of these proteins specifically bind to the methylated allele of the H19/Igf2 differentially methylated region. ZBTB4 and ZBTB38 repress the transcription of methylated templates in transfection assays. The two genes have distinct tissue-specific expression patterns, but both are highly expressed in the brain. Our results reveal the existence of a family of Kaiso-like proteins that bind methylated CpGs. Like proteins of the MBD family, they are able to repress transcription in a methyl-dependent manner, yet their tissue-specific expression pattern suggests nonoverlapping functions

    Bayesian network analysis of targeting interactions in chromatin

    No full text
    In eukaryotes, many chromatin proteins together regulate gene expression. Chromatin proteins often direct the genomic binding pattern of other chromatin proteins, for example, by recruitment or competition mechanisms. The network of such targeting interactions in chromatin is complex and still poorly understood. Based on genome-wide binding maps, we constructed a Bayesian network model of the targeting interactions among a broad set of 43 chromatin components in Drosophila cells. This model predicts many novel functional relationships. For example, we found that the homologous proteins HP1 and HP1C each target the heterochromatin protein HP3 to distinct sets of genes in a competitive manner. We also discovered a central role for the remodeling factor Brahma in the targeting of several DNA-binding factors, including GAGA factor, JRA, and SU(VAR)3-7. Our network model provides a global view of the targeting interplay among dozens of chromatin components

    Machine Learning: How Much Does It Tell about Protein Folding Rates?

    No full text
    The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition

    OneD: increasing reproducibility of Hi-C samples with abnormal karyotypes

    No full text
    The three-dimensional conformation of genomes is an essential component of their biological activity. The advent of the Hi-C technology enabled an unprecedented progress in our understanding of genome structures. However, Hi-C is subject to systematic biases that can compromise downstream analyses. Several strategies have been proposed to remove those biases, but the issue of abnormal karyotypes received little attention. Many experiments are performed in cancer cell lines, which typically harbor large-scale copy number variations that create visible defects on the raw Hi-C maps. The consequences of these widespread artifacts on the normalized maps are mostly unexplored. We observed that current normalization methods are not robust to the presence of large-scale copy number variations, potentially obscuring biological differences and enhancing batch effects. To address this issue, we developed an alternative approach designed to take into account chromosomal abnormalities. The method, called OneD, increases reproducibility among replicates of Hi-C samples with abnormal karyotype, outperforming previous methods significantly. On normal karyotypes, OneD fared equally well as state-of-the-art methods, making it a safe choice for Hi-C normalization. OneD is fast and scales well in terms of computing resources for resolutions up to 5 kb

    TADbit flowchart.

    No full text
    <p>Main functions of the TADbit library from FASTQ files to 3D model analysis. TADbit accepts many input data types such as FASTQ files, interaction matrices and 3D models. A series of python functions in TADbit (<b>Supplementary Text</b>) allow for the full analysis of the interaction data, interaction matrices as well as derived 3D models.</p

    The Human Enhancer Blocker CTC-binding Factor Interacts with the Transcription Factor Kaiso

    No full text
    International audienceCTC-binding factor (CTCF) is a DNA-binding protein of vertebrates that plays essential roles in regulating genome activity through its capacity to act as an enhancer blocker. We performed a yeast two-hybrid screen to identify protein partners of CTCF that could regulate its activity. Using full-length CTCF as bait we recovered Kaiso, a POZ-zinc finger transcription factor, as a specific binding partner. The interaction occurs through a C-terminal region of CTCF and the POZ domain of Kaiso. CTCF and Kaiso are co-expressed in many tissues, and CTCF was specifically co-immu-noprecipitated by several Kaiso monoclonal antibodies from nuclear lysates. Kaiso is a bimodal transcription factor that recognizes methylated CpG dinucleotides or a conserved unmethylated sequence (TNGCAGGA, the Kaiso binding site). We identified one consensus unmethylated Kaiso binding site in close proximity to the CTCF binding site in the human 5-globin insulator. We found, in an insulation assay, that the presence of this Kaiso binding site reduced the enhancer-blocking activity of CTCF. These data suggest that the Kaiso-CTCF interaction negatively regulates CTCF insulator activity

    Structural properties of the five described chromatin colors.

    No full text
    <p>(a) Distribution of each of the four structural properties (that is, accessibility, density, interactions, and angle) grouped by chromatin colors (including the undefined “white” color for particles of non-homogeneous coloring). Statistical significance of the differences as computed by Tukey’s ‘Honest Significant Difference’ test (*: p < 0.01, ***: p < 0.001, ns: non-significant). (b) Schematic representation of the structural properties of the five colors for the <i>Drosophila</i> chromatin.</p
    corecore