Search CORE

31 research outputs found

Correction: Filion, G.J. Analytic Combinatorics for Computing Seeding Probabilities. Algorithms 2018, 11, 3

Author: Guillaume J. Filion
Publication venue: 'MDPI AG'
Publication date: 01/06/2022
Field of study

The author wishes to make the following correction to this paper [...

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Heterochromatin: did H3K9 methylation evolve to tame transposons?

Author: Filion Guillaume J.
Kabi Manisha
Publication venue: University of Toronto
Publication date: 03/12/2021
Field of study

University of Toronto Research Repository

PubMed Central

Zerone: a ChIP-seq discretizer for multiple replicates with built-in quality control

Author: Chang
Dempster
Guillaume J. Filion
Pol Cuscó
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2016
Field of study

Motivation: Chromatin immunoprecipitation followed by high-throughput sequencing (ChIP-seq) is the standard method to investigate chromatin protein composition. As the number of community-available ChIP-seq profiles increases, it becomes more common to use data from different sources, which makes joint analysis challenging. Issues such as lack of reproducibility, heterogeneous quality and conflicts between replicates become evident when comparing datasets, especially when they are produced by different laboratories. Results : Here, we present Zerone, a ChIP-seq discretizer with built-in quality control. Zerone is powered by a Hidden Markov Model with zero-inflated negative multinomial emissions, which allows it to merge several replicates into a single discretized profile. To identify low quality or irreproducible data, we trained a Support Vector Machine and integrated it as part of the discretization process. The result is a classifier reaching 95% accuracy in detecting low quality profiles. We also introduce a graphical representation to compare discretization quality and we show that Zerone achieves outstanding accuracy. Finally, on current hardware, Zerone discretizes a ChIP-seq experiment on mammalian genomes in about 5 min using less than 700 MB of memory. Availability and Implementation : Zerone is available as a command line tool and as an R package. The C source code and R scripts can be downloaded from https://github.com/nanakiksc/zerone . The information to reproduce the benchmark and the figures is stored in a public Docker image that can be downloaded from https://hub.docker.com/r/nanakiksc/zerone/ . Contact : [email protected] Supplementary information : Supplementary data are available at Bioinformatics online.This research was supported by the Government of Catalonia and the Spanish Ministery of Economy and Competitiveness (Plan Nacional BFU2012-37168, Centro de Excelencia Severo Ochoa 20132017 SEV-20120208). The fellowship of P.C. was partly supported by the Spanish Ministry of Economy and Competitiveness [State Training Subprogram: predoctoral fellowships for the training of PhD students (FPI) 2013]

Crossref

UPF Digital Repository

A Family of Human Zinc Finger Proteins That Bind Methylated DNA and Repress Transcription

Author: Defossez Pierre-Antoine
Filion Guillaume J. P.
Prokhortchouk Egor
Salozhin Sergey
Yamada Daisuke
Zhenilo Svetlana
Publication venue: American Society for Microbiology
Publication date: 01/01/2006
Field of study

In vertebrates, densely methylated DNA is associated with inactive transcription. Actors in this process include proteins of the MBD family that can recognize methylated CpGs and repress transcription. Kaiso, a structurally unrelated protein, has also been shown to bind methylated CGCGs through its three Krüppel-like C(2)H(2) zinc fingers. The human genome contains two uncharacterized proteins, ZBTB4 and ZBTB38, that contain Kaiso-like zinc fingers. We report that ZBTB4 and ZBTB38 bind methylated DNA in vitro and in vivo. Unlike Kaiso, they can bind single methylated CpGs. When transfected in mouse cells, the proteins colocalize with foci of heavily methylated satellite DNA and become delocalized upon loss of DNA methylation. Chromatin immunoprecipitation suggests that both of these proteins specifically bind to the methylated allele of the H19/Igf2 differentially methylated region. ZBTB4 and ZBTB38 repress the transcription of methylated templates in transfection assays. The two genes have distinct tissue-specific expression patterns, but both are highly expressed in the brain. Our results reveal the existence of a family of Kaiso-like proteins that bind methylated CpGs. Like proteins of the MBD family, they are able to repress transcription in a methyl-dependent manner, yet their tissue-specific expression pattern suggests nonoverlapping functions

CiteSeerX

Crossref

PubMed Central

Bayesian network analysis of targeting interactions in chromatin

Author: Braunschweig Ulrich
Chen Menzies
Filion Guillaume J.
Ideker Trey
van Bemmel Joke G.
van Steensel Bas
Publication venue: Cold Spring Harbor Laboratory Press
Publication date
Field of study

In eukaryotes, many chromatin proteins together regulate gene expression. Chromatin proteins often direct the genomic binding pattern of other chromatin proteins, for example, by recruitment or competition mechanisms. The network of such targeting interactions in chromatin is complex and still poorly understood. Based on genome-wide binding maps, we constructed a Bayesian network model of the targeting interactions among a broad set of 43 chromatin components in Drosophila cells. This model predicts many novel functional relationships. For example, we found that the homologous proteins HP1 and HP1C each target the heterochromatin protein HP3 to distinct sets of genes in a competitive manner. We also discovered a central role for the remodeling factor Brahma in the targeting of several DNA-binding factors, including GAGA factor, JRA, and SU(VAR)3-7. Our network model provides a global view of the targeting interplay among dozens of chromatin components

Crossref

PubMed Central

Machine Learning: How Much Does It Tell about Protein Folding Rates?

Author: Dinara R Usmanova
Dmitry N Ivankov
Guillaume J Filion
Heng-Chang Chen
Marc Corrales
Natalya S Bogatyreva
Pol Cuscó
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2015
Field of study

The prediction of protein folding rates is a necessary step towards understanding the principles of protein folding. Due to the increasing amount of experimental data, numerous protein folding models and predictors of protein folding rates have been developed in the last decade. The problem has also attracted the attention of scientists from computational fields, which led to the publication of several machine learning-based models to predict the rate of protein folding. Some of them claim to predict the logarithm of protein folding rate with an accuracy greater than 90%. However, there are reasons to believe that such claims are exaggerated due to large fluctuations and overfitting of the estimates. When we confronted three selected published models with new data, we found a much lower predictive power than reported in the original publications. Overly optimistic predictive powers appear from violations of the basic principles of machine-learning. We highlight common misconceptions in the studies claiming excessive predictive power and propose to use learning curves as a safeguard against those mistakes. As an example, we show that the current amount of experimental data is insufficient to build a linear predictor of logarithms of folding rates based on protein amino acid composition

Crossref

Directory of Open Access Journals

PubMed Central

UPF Digital Repository

FigShare

OneD: increasing reproducibility of Hi-C samples with abnormal karyotypes

Author: Beato Miguel
Cuartero Yasmina
Dily Francoisle
Filion Guillaume J.
Graf Thomas
Marti-Renom Marc A.
Quilez Javier
Stadhouders Ralph
Vidal Enrique
Publication venue: 'Oxford University Press (OUP)'
Publication date: 04/05/2018
Field of study

The three-dimensional conformation of genomes is an essential component of their biological activity. The advent of the Hi-C technology enabled an unprecedented progress in our understanding of genome structures. However, Hi-C is subject to systematic biases that can compromise downstream analyses. Several strategies have been proposed to remove those biases, but the issue of abnormal karyotypes received little attention. Many experiments are performed in cancer cell lines, which typically harbor large-scale copy number variations that create visible defects on the raw Hi-C maps. The consequences of these widespread artifacts on the normalized maps are mostly unexplored. We observed that current normalization methods are not robust to the presence of large-scale copy number variations, potentially obscuring biological differences and enhancing batch effects. To address this issue, we developed an alternative approach designed to take into account chromosomal abnormalities. The method, called OneD, increases reproducibility among replicates of Hi-C samples with abnormal karyotype, outperforming previous methods significantly. On normal karyotypes, OneD fared equally well as state-of-the-art methods, making it a safe choice for Hi-C normalization. OneD is fast and scales well in terms of computing resources for resolutions up to 5 kb

TADbit flowchart.

Author: David Castillo (4258543)
Davide Baù (4258549)
François Serra (3448964)
Guillaume J. Filion (830737)
Marc A. Marti-Renom (135993)
Mike Goodstadt (4258552)
Publication venue
Publication date
Field of study

Main functions of the TADbit library from FASTQ files to 3D model analysis. TADbit accepts many input data types such as FASTQ files, interaction matrices and 3D models. A series of python functions in TADbit (Supplementary Text) allow for the full analysis of the interaction data, interaction matrices as well as derived 3D models.</p

FigShare

The Human Enhancer Blocker CTC-binding Factor Interacts with the Transcription Factor Kaiso

Author: Daniel Juliet,
Defossez Pierre-Antoine
Filion Guillaume J. P.
Gilson Eric
Kelly Kevin F.
Magdinier Frédérique
Menoni Hervé
Nordgaard Curtis,
Pérez-Torrado Roberto
Publication venue: 'American Society for Biochemistry & Molecular Biology (ASBMB)'
Publication date: 01/01/2005
Field of study

International audienceCTC-binding factor (CTCF) is a DNA-binding protein of vertebrates that plays essential roles in regulating genome activity through its capacity to act as an enhancer blocker. We performed a yeast two-hybrid screen to identify protein partners of CTCF that could regulate its activity. Using full-length CTCF as bait we recovered Kaiso, a POZ-zinc finger transcription factor, as a specific binding partner. The interaction occurs through a C-terminal region of CTCF and the POZ domain of Kaiso. CTCF and Kaiso are co-expressed in many tissues, and CTCF was specifically co-immu-noprecipitated by several Kaiso monoclonal antibodies from nuclear lysates. Kaiso is a bimodal transcription factor that recognizes methylated CpG dinucleotides or a conserved unmethylated sequence (TNGCAGGA, the Kaiso binding site). We identified one consensus unmethylated Kaiso binding site in close proximity to the CTCF binding site in the human 5-globin insulator. We found, in an insulation assay, that the presence of this Kaiso binding site reduced the enhancer-blocking activity of CTCF. These data suggest that the Kaiso-CTCF interaction negatively regulates CTCF insulator activity

Structural properties of the five described chromatin colors.

Author: David Castillo (4258543)
Davide Baù (4258549)
François Serra (3448964)
Guillaume J. Filion (830737)
Marc A. Marti-Renom (135993)
Mike Goodstadt (4258552)
Publication venue
Publication date
Field of study

(a) Distribution of each of the four structural properties (that is, accessibility, density, interactions, and angle) grouped by chromatin colors (including the undefined “white” color for particles of non-homogeneous coloring). Statistical significance of the differences as computed by Tukey’s ‘Honest Significant Difference’ test (*: p < 0.01, ***: p < 0.001, ns: non-significant). (b) Schematic representation of the structural properties of the five colors for the Drosophila chromatin.</p

FigShare