93 research outputs found

    Predicting transcription factor binding sites using local over-representation and comparative genomics

    Get PDF
    BACKGROUND: Identifying cis-regulatory elements is crucial to understanding gene expression, which highlights the importance of the computational detection of overrepresented transcription factor binding sites (TFBSs) in coexpressed or coregulated genes. However, this is a challenging problem, especially when considering higher eukaryotic organisms. RESULTS: We have developed a method, named TFM-Explorer, that searches for locally overrepresented TFBSs in a set of coregulated genes, which are modeled by profiles provided by a database of position weight matrices. The novelty of the method is that it takes advantage of spatial conservation in the sequence and supports multiple species. The efficiency of the underlying algorithm and its robustness to noise allow weak regulatory signals to be detected in large heterogeneous data sets. CONCLUSION: TFM-Explorer provides an efficient way to predict TFBS overrepresentation in related sequences. Promising results were obtained in a variety of examples in human, mouse, and rat genomes. The software is publicly available at

    Combining single-cell RNA-sequencing with a molecular atlas unveils new markers for Caenorhabditis elegans neuron classes

    Get PDF
    Single-cell RNA-sequencing (scRNA-seq) of the Caenorhabditis elegans nervous system offers the unique opportunity to obtain a partial expression profile for each neuron within a known connectome. Building on recent scRNA-seq data and on a molecular atlas describing the expression pattern of ∼800 genes at the single cell resolution, we designed an iterative clustering analysis aiming to match each cell-cluster to the ∼100 anatomically defined neuron classes of C. elegans. This heuristic approach successfully assigned 97 of the 118 neuron classes to a cluster. Sixty two clusters were assigned to a single neuron class and 15 clusters grouped neuron classes sharing close molecular signatures. Pseudotime analysis revealed a maturation process occurring in some neurons (e.g. PDA) during the L2 stage. Based on the molecular profiles of all identified neurons, we predicted cell fate regulators and experimentally validated unc-86 for the normal differentiation of RMG neurons. Furthermore, we observed that different classes of genes functionally diversify sensory neurons, interneurons and motorneurons. Finally, we designed 15 new neuron class-specific promoters validated in vivo. Amongst them, 10 represent the only specific promoter reported to this day, expanding the list of neurons amenable to genetic manipulations.Fil: Lorenzo Lopez, Juan Ramiro. Consejo Nacional de Investigaciones Científicas y Técnicas. Centro Científico Tecnológico Conicet - Tandil. Centro de Investigación Veterinaria de Tandil. Universidad Nacional del Centro de la Provincia de Buenos Aires. Centro de Investigación Veterinaria de Tandil. Provincia de Buenos Aires. Gobernación. Comision de Investigaciones Científicas. Centro de Investigación Veterinaria de Tandil; Argentina. Université Libre de Bruxelles; BélgicaFil: Onizuka, Michiho. Université Libre de Bruxelles; BélgicaFil: Defrance, Matthieu. Université Libre de Bruxelles; BélgicaFil: Laurent, Patrick. Université Libre de Bruxelles; Bélgic

    Proteins Do Not Have Strong Spines After All

    Get PDF
    In this issue of Structure, Berkholz et al. show that the detailed backbone geometry of proteins depends on the local conformation and suggest how this information can be practically used in modeling and refining protein structures

    DNA methylation profiling identifies epigenetic dysregulation in pancreatic islets from type 2 diabetic patients

    Get PDF
    The first genome-scale DNA methylation study on pancreatic islets from type 2 diabetic patients identifies disease-associated DNA methylation pattern that translate into aberrant gene expression in novel factors relevant for β-cell function and survival

    5-hydroxymethylcytosine marks promoters in colon that resist DNA hypermethylation in cancer

    Get PDF
    The authors would like to acknowledge the support of The University of Cambridge, Cancer Research UK (CRUK SEB-Institute Group Award A ref10182; CRUK Senior fellowship C10112/A11388 to AEKI) and Hutchison Whampoa Limited. The Human Research Tissue Bank is supported by the NIHR Cambridge Biomedical Research Centre. FF is a ULB Professor funded by grants from the F.N.R.S. and Télévie, the IUAP P7/03 programme, the ARC (AUWB-2010-2015 ULB-No 7), the WB Health program and the Fonds Gaston Ithier. Data access: http://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?token=jpwzvsowiyuamzs&acc=GSE47592Background : The discovery of cytosine hydroxymethylation (5hmC) as a mechanism that potentially controls DNA methylation changes typical of neoplasia prompted us to investigate its behaviour in colon cancer. 5hmC is globally reduced in proliferating cells such as colon tumours and the gut crypt progenitors, from which tumours can arise. Results : Here, we show that colorectal tumours and cancer cells express Ten-Eleven-Translocation (TET) transcripts at levels similar to normal tissues. Genome-wide analyses show that promoters marked by 5hmC in normal tissue, and those identified as TET2 targets in colorectal cancer cells, are resistant to methylation gain in cancer. In vitro studies of TET2 in cancer cells confirm that these promoters are resistant to methylation gain independently of sustained TET2 expression. We also find that a considerable number of the methylation gain-resistant promoters marked by 5hmC in normal colon overlap with those that are marked with poised bivalent histone modifications in embryonic stem cells. Conclusions : Together our results indicate that promoters that acquire 5hmC upon normal colon differentiation are innately resistant to neoplastic hypermethylation by mechanisms that do not require high levels of 5hmC in tumours. Our study highlights the potential of cytosine modifications as biomarkers of cancerous cell proliferation.Publisher PDFPeer reviewe

    Regulation of DNA Methylation Patterns by CK2-Mediated Phosphorylation of Dnmt3a

    Get PDF
    DNA methylation is a central epigenetic modification that is established by de novo DNA methyltransferases. The mechanisms underlying the generation of genomic methylation patterns are still poorly understood. Using mass spectrometry and a phosphospecific Dnmt3a antibody, we demonstrate that CK2 phosphorylates endogenous Dnmt3a at two key residues located near its PWWP domain, thereby downregulating the ability of Dnmt3a to methylate DNA. Genome-wide DNA methylation analysis shows that CK2 primarily modulates CpG methylation of several repeats, most notably of Alu SINEs. This modulation can be directly attributed to CK2-mediated phosphorylation of Dnmt3a. We also find that CK2-mediated phosphorylation is required for localization of Dnmt3a to heterochromatin. By revealing phosphorylation as a mode of regulation of de novo DNA methyltransferase function and by uncovering a mechanism for the regulation of methylation at repetitive elements, our results shed light on the origin of DNA methylation patterns

    Algorithmes pour l'analyse de régions régulatrices dans le génome d'eucaryotes supérieurs

    Full text link
    The work presented in this thesis is oriented towards computational genome analysis. More precisely it focuses on gene expression and cis-regulatory elements. The cis-regulatory elements identification problem can be considered as a computational motif finding problem.Finding such motifs is a challenging question due to the low specificity of these motifs. To answer this question, new kind of information need to been taken into account. This can be cross species conservation (comparative genomics), over-represented signals such as those shared by co-regulated genes, or even spatial conservation of binding site locations.We have developed a method that searches for locally overrepresented regulatory motifs in a set of regulated genes. This method takes advantage of spatial conservation in the sequence and supports multiple species.The method has been implemented in a software named TFM-Explorer. Promising results were obtained in a variety of examples in human, mouse, and rat genomes.Les travaux présentés dans cette thèse s'inscrivent dans le cadre bio-informatique de l'analyse des génomes. Plus particulièrement, ces travaux concernent l'expression des gènes et les éléments régulateurs, présents dans l'ADN, qui participent à la modulation de cette expression.Le problème de la recherche de ces éléments régulateurs peut être envisagé sous l'angle informatique de la recherche de motifs approchés particuliers.La recherche de motifs régulateurs est une question difficile du fait de la faible spécificité des motifs recherchés. Pour pouvoir y répondre, il faut prendre en compte différentes formes d'information. En particulier, il est pertinent de prendre en compte la conservation entre espèces (génomique comparative), la conservation entre séquences génomiques partageant des éléments de régulation (gènes co-régulés) ou encore, dans certains cas, la conservation spatiale des sites de fixation.Dans ce cadre, nous proposons une méthode permettant de tirer parti, à la fois de la conservation spatiale, et de la conservation entre espèces. Cette approche se compose d'un algorithme de recherche locale et d'évaluateurs statistiques adaptés au problème de la recherche de motifs sur-représentés localement lorsque l'environnement de recherche est hétérogène, c'est-à-dire pour des séquences pouvant provenir d'organismes différents ou de régions différentes du génome. Ces travaux ont été mis en oeuvre dans un logiciel appelé TFM-Explorer, que nous avons évalué avec succès sur des données issues du génome humain, de la souris et du rat

    Algorithmes pour l'analyse de régions régulatrices dans le génome d'eucaryotes supérieurs

    Full text link
    The work presented in this thesis is oriented towards computational genome analysis. More precisely it focuses on gene expression and cis-regulatory elements. The cis-regulatory elements identification problem can be considered as a computational motif finding problem.Finding such motifs is a challenging question due to the low specificity of these motifs. To answer this question, new kind of information need to been taken into account. This can be cross species conservation (comparative genomics), over-represented signals such as those shared by co-regulated genes, or even spatial conservation of binding site locations.We have developed a method that searches for locally overrepresented regulatory motifs in a set of regulated genes. This method takes advantage of spatial conservation in the sequence and supports multiple species.The method has been implemented in a software named TFM-Explorer. Promising results were obtained in a variety of examples in human, mouse, and rat genomes.Les travaux présentés dans cette thèse s'inscrivent dans le cadre bio-informatique de l'analyse des génomes. Plus particulièrement, ces travaux concernent l'expression des gènes et les éléments régulateurs, présents dans l'ADN, qui participent à la modulation de cette expression.Le problème de la recherche de ces éléments régulateurs peut être envisagé sous l'angle informatique de la recherche de motifs approchés particuliers.La recherche de motifs régulateurs est une question difficile du fait de la faible spécificité des motifs recherchés. Pour pouvoir y répondre, il faut prendre en compte différentes formes d'information. En particulier, il est pertinent de prendre en compte la conservation entre espèces (génomique comparative), la conservation entre séquences génomiques partageant des éléments de régulation (gènes co-régulés) ou encore, dans certains cas, la conservation spatiale des sites de fixation.Dans ce cadre, nous proposons une méthode permettant de tirer parti, à la fois de la conservation spatiale, et de la conservation entre espèces. Cette approche se compose d'un algorithme de recherche locale et d'évaluateurs statistiques adaptés au problème de la recherche de motifs sur-représentés localement lorsque l'environnement de recherche est hétérogène, c'est-à-dire pour des séquences pouvant provenir d'organismes différents ou de régions différentes du génome. Ces travaux ont été mis en oeuvre dans un logiciel appelé TFM-Explorer, que nous avons évalué avec succès sur des données issues du génome humain, de la souris et du rat

    Contrastive self-supervised clustering of scRNA-seq data.

    Full text link
    Single-cell RNA sequencing (scRNA-seq) has emerged has a main strategy to study transcriptional activity at the cellular level. Clustering analysis is routinely performed on scRNA-seq data to explore, recognize or discover underlying cell identities. The high dimensionality of scRNA-seq data and its significant sparsity accentuated by frequent dropout events, introducing false zero count observations, make the clustering analysis computationally challenging. Even though multiple scRNA-seq clustering techniques have been proposed, there is no consensus on the best performing approach. On a parallel research track, self-supervised contrastive learning recently achieved state-of-the-art results on images clustering and, subsequently, image classification.info:eu-repo/semantics/publishe

    GNN-based embedding for clustering scRNA-seq data

    Full text link
    Abstract Motivation Single-cell RNA sequencing (scRNA-seq) provides transcriptomic profiling for individual cells, allowing researchers to study the heterogeneity of tissues, recognize rare cell identities and discover new cellular subtypes. Clustering analysis is usually used to predict cell class assignments and infer cell identities. However, the high sparsity of scRNA-seq data, accentuated by dropout events generates challenges that have motivated the development of numerous dedicated clustering methods. Nevertheless, there is still no consensus on the best performing method. Results graph-sc is a new method leveraging a graph autoencoder network to create embeddings for scRNA-seq cell data. While this work analyzes the performance of clustering the embeddings with various clustering algorithms, other downstream tasks can also be performed. A broad experimental study has been performed on both simulated and scRNA-seq datasets. The results indicate that although there is no consistently best method across all the analyzed datasets, graph-sc compares favorably to competing techniques across all types of datasets. Furthermore, the proposed method is stable across consecutive runs, robust to input down-sampling, generally insensitive to changes in the network architecture or training parameters and more computationally efficient than other competing methods based on neural networks. Modeling the data as a graph provides increased flexibility to define custom features characterizing the genes, the cells and their interactions. Moreover, external data (e.g. gene network) can easily be integrated into the graph and used seamlessly under the same optimization task. Availability and implementation https://github.com/ciortanmadalina/graph-sc. Supplementary information Supplementary data are available at Bioinformatics online.info:eu-repo/semantics/publishe
    corecore