334 research outputs found

    Computational analysis of transcriptional regulation in metazoans

    Get PDF
    This HDR thesis presents my work on transcriptional regulation in metazoans (animals). As a computational biologist, my research activities cover both the development of new bioinformatics tools, and contributions to a better understanding of biological questions. The first part focuses on transcription factors, with a study of the evolution of Hox and ParaHox gene families across meta- zoans, for which I developed HoxPred, a bioinformatics tool to automatically classify these genes into their groups of homology. Transcription factors regulate their target genes by binding to short cis-regulatory elements in DNA. The second part of this thesis introduces the prediction of these cis-regulatory elements in genomic sequences, and my contributions to the development of user- friendly computational tools (RSAT software suite and TRAP). The third part covers the detection of these cis-regulatory elements using high-throughput sequencing experiments such as ChIP-seq or ChIP-exo. The bioinformatics developments include reusable pipelines to process these datasets, and novel motif analysis tools adapted to these large datasets (RSAT peak-motifs and ExoProfiler). As all these approaches are generic, I naturally apply them to diverse biological questions, in close collaboration with experimental groups. In particular, this third part presents the studies uncover- ing new DNA sequences that are driving or preventing the binding of the glucocorticoid receptor. Finally, my research perspectives are introduced, especially regarding further developments within the RSAT suite enabling cross-species conservation analyses, and new collaborations with exper- imental teams, notably to tackle the epigenomic remodelling during osteoporosis.Cette thĂšse d’HDR prĂ©sente mes travaux concernant la rĂ©gulation transcriptionelle chez les mĂ©tazoaires (animaux). En tant que biologiste computationelle, mes activitĂ©s de recherche portent sur le dĂ©veloppement de nouveaux outils bioinformatiques, et contribuent Ă  une meilleure comprĂ©hension de questions biologiques. La premiĂšre partie concerne les facteurs de transcriptions, avec une Ă©tude de l’évolution des familles de gĂšnes Hox et ParaHox chez les mĂ©tazoaires. Pour cela, j’ai dĂ©veloppĂ© HoxPred, un outil bioinformatique qui classe automatiquement ces gĂšnes dans leur groupe d’homologie. Les facteurs de transcription rĂ©gulent leurs gĂšnes cibles en se fixant Ă  l’ADN sur des petites rĂ©gions cis-rĂ©gulatrices. La seconde partie de cette thĂšse introduit la prĂ©diction de ces Ă©lĂ©ments cis-rĂ©gulateurs au sein de sĂ©quences gĂ©nomiques, et prĂ©sente mes contributions au dĂ©veloppement d’outils accessibles aux non-spĂ©cialistes (la suite RSAT et TRAP). La troisiĂšme partie couvre la dĂ©tection de ces Ă©lĂ©ments cis-rĂ©gulateurs grĂące aux expĂ©riences basĂ©es sur le sĂ©quençage Ă  haut dĂ©bit comme le ChIP-seq ou le ChIP-exo. Les dĂ©veloppements bioinformatiques incluent des pipelines rĂ©utilisables pour analyser ces jeux de donnĂ©es, ainsi que de nouveaux outils d’analyse de motifs adaptĂ©s Ă  ces grands jeux de donnĂ©es (RSAT peak-motifs et ExoProfiler). Comme ces approches sont gĂ©nĂ©riques, je les applique naturellement Ă  des questions biologiques diverses, en Ă©troite collaboration avec des groupes expĂ©rimentaux. En particulier, cette troisiĂšme partie prĂ©sente les Ă©tudes qui ont permis de mettre en Ă©vidence de nouvelles sĂ©quences d’ADN qui favorisent ou empĂȘchent la fixation du rĂ©cepteur aux glucocorticoides. Enfin, mes perspectives de recherche sont prĂ©sentĂ©es, plus particuliĂšrement concernant les nouveaux dĂ©veloppements au sein de la suite RSAT pour permettre des analyses basĂ©es sur la conservation inter-espĂšces, mais aussi de nouvelles collaborations avec des Ă©quipes expĂ©rimentales, notamment pour Ă©udier le remodelage Ă©pigĂ©nomique au cours de l’ostĂ©oporose

    Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni: comment

    Get PDF
    A reanalysis of the sequences reported by Hoegg et al has highlighted the presence of a putative HoxC1a gene in Astatotilapia burtoni. We discuss the evolutionary history of the HoxC1a gene in the teleost fish lineages and suggest that HoxC1a gene was lost twice independently in the Neoteleosts. This comment points out that combining several gene-finding methods and a Hox-dedicated program can improve the identification of Hox genes

    HoxPred: automated classification of Hox proteins using combinations of generalised profiles

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Correct identification of individual Hox proteins is an essential basis for their study in diverse research fields. Common methods to classify Hox proteins focus on the homeodomain that characterise homeobox transcription factors. Classification is hampered by the high conservation of this short domain. Phylogenetic tree reconstruction is a widely used but time-consuming classification method.</p> <p>Results</p> <p>We have developed an automated procedure, HoxPred, that classifies Hox proteins in their groups of homology. The method relies on a discriminant analysis that classifies Hox proteins according to their scores for a combination of protein generalised profiles. 54 generalised profiles dedicated to each Hox homology group were produced <it>de novo </it>from a curated dataset of vertebrate Hox proteins. Several classification methods were investigated to select the most accurate discriminant functions. These functions were then incorporated into the HoxPred program.</p> <p>Conclusion</p> <p>HoxPred shows a mean accuracy of 97%. Predictions on the recently-sequenced stickleback fish proteome identified 44 Hox proteins, including HoxC1a only found so far in zebrafish. Using the Uniprot databank, we demonstrate that HoxPred can efficiently contribute to large-scale automatic annotation of Hox proteins into their paralogous groups. As orthologous group predictions show a higher risk of misclassification, they should be corroborated by additional supporting evidence. HoxPred is accessible via SOAP and Web interface <url>http://cege.vub.ac.be/hoxpred/</url>. Complete datasets, results and source code are available at the same site.</p

    A non-tree-based comprehensive study of metazoan Hox and ParaHox genes prompts new insights into their origin and evolution

    Get PDF
    Hox and the closely-related ParaHox genes, which emerged prior to the divergence between cnidarians and bilaterians, are the most well-known members of the ancient genetic toolkit that controls embryonic development across all metazoans. Fundamental questions relative to their origin and evolutionary relationships remain however unresolved. We investigate here the evolution of metazoan Hox and ParaHox genes using the HoxPred program that allows the identification of Hox genes without the need of phylogenetic tree reconstructions.Journal ArticleResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe

    Theoretical and empirical quality assessment of transcription factor-binding motifs

    Get PDF
    Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program ‘matrix-quality’, that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied ‘matrix-quality’ to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP–seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets

    MEME-ChIP: motif analysis of large DNA datasets

    Get PDF
    Motivation: Advances in high-throughput sequencing have resulted in rapid growth in large, high-quality datasets including those arising from transcription factor (TF) ChIP-seq experiments. While there are many existing tools for discovering TF binding site motifs in such datasets, most web-based tools cannot directly process such large datasets

    A naturally occuring insertion of a single amino acid rewires transcriptional regulation by glucocorticoid receptor isoforms

    No full text
    In addition to guiding proteins to defined genomic loci, DNA can act as an allosteric ligand that influences protein structure and activity. Here we compared genome-wide binding, transcriptional regulation, and, using NMR, the conformation of two glucocorticoid receptor (GR) isoforms that differ by a single amino acid insertion in the lever arm, a domain that adopts DNA sequence-specific conformations. We show that these isoforms differentially regulate gene expression levels through two mechanisms: differential DNA binding and altered communication between GR domains. Our studies suggest a versatile role for DNA in both modulating GR activity and also in directing the use of GR isoforms. We propose that the lever arm is a "fulcrum" for bidirectional allosteric signaling, conferring conformational changes in the DNA reading head that influence DNA sequence selectivity, as well as conferring changes in the dimerization domain that connect functionally with remote regulatory surfaces, thereby influencing which genes are regulated and the magnitude of their regulation

    Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics

    Get PDF
    BACKGROUND: Molecular and genetic analyses conducted in model organisms such as Drosophila and vertebrates, have provided a wealth of information about how networks of transcription factors control the proper development of these species. Much less is known, however, about the evolutionary origin of these elaborated networks and their large-scale evolution. Here we report the first evolutionary analysis of a whole superfamily of transcription factors, the basic helix-loop-helix (bHLH) proteins, at the scale of the whole metazoan kingdom. RESULTS: We identified in silico the putative full complement of bHLH genes in the sequenced genomes of 12 different species representative of the main metazoan lineages, including three non-bilaterian metazoans, the cnidarians Nematostella vectensis and Hydra magnipapillata and the demosponge Amphimedon queenslandica. We have performed extensive phylogenetic analyses of the 695 identified bHLHs, which has allowed us to allocate most of these bHLHs to defined evolutionary conserved groups of orthology. CONCLUSION: Three main features in the history of the bHLH gene superfamily can be inferred from these analyses: (i) an initial diversification of the bHLHs has occurred in the pre-Cambrian, prior to metazoan cladogenesis; (ii) a second expansion of the bHLH superfamily occurred early in metazoan evolution before bilaterians and cnidarians diverged; and (iii) the bHLH complement during the evolution of the bilaterians has been remarkably stable. We suggest that these features may be extended to other developmental gene families and reflect a general trend in the evolution of the developmental gene repertoires of metazoans

    RSAT 2011: regulatory sequence analysis tools

    Get PDF
    RSAT (Regulatory Sequence Analysis Tools) comprises a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. Thirteen new programs have been added to the 30 described in the 2008 NAR Web Software Issue, including an automated sequence retrieval from EnsEMBL (retrieve-ensembl-seq), two novel motif discovery algorithms (oligo-diff and info-gibbs), a 100-times faster version of matrix-scan enabling the scanning of genome-scale sequence sets, and a series of facilities for random model generation and statistical evaluation (random-genome-fragments, random-motifs, random-sites, implant-sites, sequence-probability, permute-matrix). Our most recent work also focused on motif comparison (compare-matrices) and evaluation of motif quality (matrix-quality) by combining theoretical and empirical measures to assess the predictive capability of position-specific scoring matrices. To process large collections of peak sequences obtained from ChIP-seq or related technologies, RSAT provides a new program (peak-motifs) that combines several efficient motif discovery algorithms to predict transcription factor binding motifs, match them against motif databases and predict their binding sites. Availability (web site, stand-alone programs and SOAP/WSDL (Simple Object Access Protocol/Web Services Description Language) web services): http://rsat.ulb.ac.be/rsat/

    Assigning roles to DNA regulatory motifs using comparative genomics

    Get PDF
    Motivation: Transcription factors (TFs) are crucial during the lifetime of the cell. Their functional roles are defined by the genes they regulate. Uncovering these roles not only sheds light on the TF at hand but puts it into the context of the complete regulatory network
    • 

    corecore