57 research outputs found
Computational analysis of transcriptional regulation in metazoans
This HDR thesis presents my work on transcriptional regulation in metazoans (animals). As a computational biologist, my research activities cover both the development of new bioinformatics tools, and contributions to a better understanding of biological questions. The first part focuses on transcription factors, with a study of the evolution of Hox and ParaHox gene families across meta- zoans, for which I developed HoxPred, a bioinformatics tool to automatically classify these genes into their groups of homology. Transcription factors regulate their target genes by binding to short cis-regulatory elements in DNA. The second part of this thesis introduces the prediction of these cis-regulatory elements in genomic sequences, and my contributions to the development of user- friendly computational tools (RSAT software suite and TRAP). The third part covers the detection of these cis-regulatory elements using high-throughput sequencing experiments such as ChIP-seq or ChIP-exo. The bioinformatics developments include reusable pipelines to process these datasets, and novel motif analysis tools adapted to these large datasets (RSAT peak-motifs and ExoProfiler). As all these approaches are generic, I naturally apply them to diverse biological questions, in close collaboration with experimental groups. In particular, this third part presents the studies uncover- ing new DNA sequences that are driving or preventing the binding of the glucocorticoid receptor. Finally, my research perspectives are introduced, especially regarding further developments within the RSAT suite enabling cross-species conservation analyses, and new collaborations with exper- imental teams, notably to tackle the epigenomic remodelling during osteoporosis.Cette thèse d’HDR présente mes travaux concernant la régulation transcriptionelle chez les métazoaires (animaux). En tant que biologiste computationelle, mes activités de recherche portent sur le développement de nouveaux outils bioinformatiques, et contribuent à une meilleure compréhension de questions biologiques. La première partie concerne les facteurs de transcriptions, avec une étude de l’évolution des familles de gènes Hox et ParaHox chez les métazoaires. Pour cela, j’ai développé HoxPred, un outil bioinformatique qui classe automatiquement ces gènes dans leur groupe d’homologie. Les facteurs de transcription régulent leurs gènes cibles en se fixant à l’ADN sur des petites régions cis-régulatrices. La seconde partie de cette thèse introduit la prédiction de ces éléments cis-régulateurs au sein de séquences génomiques, et présente mes contributions au développement d’outils accessibles aux non-spécialistes (la suite RSAT et TRAP). La troisième partie couvre la détection de ces éléments cis-régulateurs grâce aux expériences basées sur le séquençage à haut débit comme le ChIP-seq ou le ChIP-exo. Les développements bioinformatiques incluent des pipelines réutilisables pour analyser ces jeux de données, ainsi que de nouveaux outils d’analyse de motifs adaptés à ces grands jeux de données (RSAT peak-motifs et ExoProfiler). Comme ces approches sont génériques, je les applique naturellement à des questions biologiques diverses, en étroite collaboration avec des groupes expérimentaux. En particulier, cette troisième partie présente les études qui ont permis de mettre en évidence de nouvelles séquences d’ADN qui favorisent ou empêchent la fixation du récepteur aux glucocorticoides. Enfin, mes perspectives de recherche sont présentées, plus particulièrement concernant les nouveaux développements au sein de la suite RSAT pour permettre des analyses basées sur la conservation inter-espèces, mais aussi de nouvelles collaborations avec des équipes expérimentales, notamment pour éudier le remodelage épigénomique au cours de l’ostéoporose
Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni: comment
A reanalysis of the sequences reported by Hoegg et al has highlighted the presence of a putative HoxC1a gene in Astatotilapia burtoni. We discuss the evolutionary history of the HoxC1a gene in the teleost fish lineages and suggest that HoxC1a gene was lost twice independently in the Neoteleosts. This comment points out that combining several gene-finding methods and a Hox-dedicated program can improve the identification of Hox genes
HoxPred: automated classification of Hox proteins using combinations of generalised profiles
<p>Abstract</p> <p>Background</p> <p>Correct identification of individual Hox proteins is an essential basis for their study in diverse research fields. Common methods to classify Hox proteins focus on the homeodomain that characterise homeobox transcription factors. Classification is hampered by the high conservation of this short domain. Phylogenetic tree reconstruction is a widely used but time-consuming classification method.</p> <p>Results</p> <p>We have developed an automated procedure, HoxPred, that classifies Hox proteins in their groups of homology. The method relies on a discriminant analysis that classifies Hox proteins according to their scores for a combination of protein generalised profiles. 54 generalised profiles dedicated to each Hox homology group were produced <it>de novo </it>from a curated dataset of vertebrate Hox proteins. Several classification methods were investigated to select the most accurate discriminant functions. These functions were then incorporated into the HoxPred program.</p> <p>Conclusion</p> <p>HoxPred shows a mean accuracy of 97%. Predictions on the recently-sequenced stickleback fish proteome identified 44 Hox proteins, including HoxC1a only found so far in zebrafish. Using the Uniprot databank, we demonstrate that HoxPred can efficiently contribute to large-scale automatic annotation of Hox proteins into their paralogous groups. As orthologous group predictions show a higher risk of misclassification, they should be corroborated by additional supporting evidence. HoxPred is accessible via SOAP and Web interface <url>http://cege.vub.ac.be/hoxpred/</url>. Complete datasets, results and source code are available at the same site.</p
A non-tree-based comprehensive study of metazoan Hox and ParaHox genes prompts new insights into their origin and evolution
Hox and the closely-related ParaHox genes, which emerged prior to the divergence between cnidarians and bilaterians, are the most well-known members of the ancient genetic toolkit that controls embryonic development across all metazoans. Fundamental questions relative to their origin and evolutionary relationships remain however unresolved. We investigate here the evolution of metazoan Hox and ParaHox genes using the HoxPred program that allows the identification of Hox genes without the need of phylogenetic tree reconstructions.Journal ArticleResearch Support, Non-U.S. Gov'tSCOPUS: ar.jinfo:eu-repo/semantics/publishe
Theoretical and empirical quality assessment of transcription factor-binding motifs
Position-specific scoring matrices (PSSMs) are routinely used to predict transcription factor (TF)-binding sites in genome sequences. However, their reliability to predict novel binding sites can be far from optimum, due to the use of a small number of training sites or the inappropriate choice of parameters when building the matrix or when scanning sequences with it. Measures of matrix quality such as E-value and information content rely on theoretical models, and may fail in the context of full genome sequences. We propose a method, implemented in the program ‘matrix-quality’, that combines theoretical and empirical score distributions to assess reliability of PSSMs for predicting TF-binding sites. We applied ‘matrix-quality’ to estimate the predictive capacity of matrices for bacterial, yeast and mouse TFs. The evaluation of matrices from RegulonDB revealed some poorly predictive motifs, and allowed us to quantify the improvements obtained by applying multi-genome motif discovery. Interestingly, the method reveals differences between global and specific regulators. It also highlights the enrichment of binding sites in sequence sets obtained from high-throughput ChIP-chip (bacterial and yeast TFs), and ChIP–seq and experiments (mouse TFs). The method presented here has many applications, including: selecting reliable motifs before scanning sequences; improving motif collections in TFs databases; evaluating motifs discovered using high-throughput data sets
Origin and diversification of the basic helix-loop-helix gene family in metazoans: insights from comparative genomics
BACKGROUND: Molecular and genetic analyses conducted in model organisms such as Drosophila and vertebrates, have provided a wealth of information about how networks of transcription factors control the proper development of these species. Much less is known, however, about the evolutionary origin of these elaborated networks and their large-scale evolution. Here we report the first evolutionary analysis of a whole superfamily of transcription factors, the basic helix-loop-helix (bHLH) proteins, at the scale of the whole metazoan kingdom. RESULTS: We identified in silico the putative full complement of bHLH genes in the sequenced genomes of 12 different species representative of the main metazoan lineages, including three non-bilaterian metazoans, the cnidarians Nematostella vectensis and Hydra magnipapillata and the demosponge Amphimedon queenslandica. We have performed extensive phylogenetic analyses of the 695 identified bHLHs, which has allowed us to allocate most of these bHLHs to defined evolutionary conserved groups of orthology. CONCLUSION: Three main features in the history of the bHLH gene superfamily can be inferred from these analyses: (i) an initial diversification of the bHLHs has occurred in the pre-Cambrian, prior to metazoan cladogenesis; (ii) a second expansion of the bHLH superfamily occurred early in metazoan evolution before bilaterians and cnidarians diverged; and (iii) the bHLH complement during the evolution of the bilaterians has been remarkably stable. We suggest that these features may be extended to other developmental gene families and reflect a general trend in the evolution of the developmental gene repertoires of metazoans
Evolutionary study of the Hox gene family with matrix-based bioinformatics approaches
Hox transcription factors are extensively investigated in diverse fields of molecular and evolutionary biology. Hox genes belong to the family of homeobox transcription factors characterised by a 60 amino acids region called homeodomain. These genes are evolutionary conserved and play crucial roles in the development of animals. In particular, they are involved in the specification of segmental identity, and in the tetrapod limb differentiation. In vertebrates, this family of genes can be divided into 14 groups of homology. Common methods to classify Hox proteins focus on the homeodomain. Classification is however hampered by the high conservation of this short domain. Since phylogenetic tree reconstruction is time-consuming, it is not suitable to classify the growing number of Hox sequences. The first goal of this thesis is therefore to design an automated approach to classify vertebrate Hox proteins in their groups of homology. This approach classifies Hox proteins on the basis of their scores for a combination of protein generalised profiles. The resulting program, HoxPred, combines predictive accuracy and time efficiency. We used this program to detect and classify Hox genes in several teleost fish genomes. In particular, it allowed us to clarify the evolutionary history of the HoxC1a genes in teleosts. Overall, HoxPred could efficiently contribute to the bioinformatics toolbox commonly used to annotate vertebrate Hox sequences. This program was then evaluated in non-vertebrate species. Although not intended for the classification of Hox proteins in distantly related species, HoxPred showed a high accuracy in bilaterians. It has also given insights into the evolutionary relationships between bilaterian posterior Hox genes, which are notoriously difficult to classify with phylogenetic trees.<p><p>As transcription factors, Hox proteins regulate target genes by specifically binding DNA on cis-regulatory elements. Only a few of these target genes have been identified so far. The second goal of this work was to evaluate whether it is possible to apply computational approaches to detect Hox cis-regulatory elements in genomic sequences. Regulatory Sequence Analysis Tools (RSAT) is a suite of bioinformatics tools dedicated to the detection of cis-regulatory elements in genomes. We participated to the development of matrix-based pattern matching approaches in RSAT. After having performed a statistical validation of the pattern-matching scores, we focused on a study case based on the vertebrate HoxB1 protein, which binds DNA with its cofactors Pbx and Meis. This study aimed at predicting combinations of cis-regulatory elements for these three transcription factors.Doctorat en Sciencesinfo:eu-repo/semantics/nonPublishe
Evolutionary study of the Hox gene family with matrix-based bioinformatics approaches
Hox transcription factors are extensively investigated in diverse fields of molecular and evolutionary biology. Hox genes belong to the family of homeobox transcription factors characterised by a 60 amino acids region called homeodomain. These genes are evolutionary conserved and play crucial roles in the development of animals. In particular, they are involved in the specification of segmental identity, and in the tetrapod limb differentiation. In vertebrates, this family of genes can be divided into 14 groups of homology. Common methods to classify Hox proteins focus on the homeodomain. Classification is however hampered by the high conservation of this short domain. Since phylogenetic tree reconstruction is time-consuming, it is not suitable to classify the growing number of Hox sequences. The first goal of this thesis is therefore to design an automated approach to classify vertebrate Hox proteins in their groups of homology. This approach classifies Hox proteins on the basis of their scores for a combination of protein generalised profiles. The resulting program, HoxPred, combines predictive accuracy and time efficiency. We used this program to detect and classify Hox genes in several teleost fish genomes. In particular, it allowed us to clarify the evolutionary history of the HoxC1a genes in teleosts. Overall, HoxPred could efficiently contribute to the bioinformatics toolbox commonly used to annotate vertebrate Hox sequences. This program was then evaluated in non-vertebrate species. Although not intended for the classification of Hox proteins in distantly related species, HoxPred showed a high accuracy in bilaterians. It has also given insights into the evolutionary relationships between bilaterian posterior Hox genes, which are notoriously difficult to classify with phylogenetic trees.As transcription factors, Hox proteins regulate target genes by specifically binding DNA on cis-regulatory elements. Only a few of these target genes have been identified so far. The second goal of this work was to evaluate whether it is possible to apply computational approaches to detect Hox cis-regulatory elements in genomic sequences. Regulatory Sequence Analysis Tools (RSAT) is a suite of bioinformatics tools dedicated to the detection of cis-regulatory elements in genomes. We participated to the development of matrix-based pattern matching approaches in RSAT. After having performed a statistical validation of the pattern-matching scores, we focused on a study case based on the vertebrate HoxB1 protein, which binds DNA with its cofactors Pbx and Meis. This study aimed at predicting combinations of cis-regulatory elements for these three transcription factors.Doctorat en Sciencesinfo:eu-repo/semantics/nonPublishe
Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni: comment-1
<p><b>Copyright information:</b></p><p>Taken from "Comparative phylogenomic analyses of teleost fish Hox gene clusters: lessons from the cichlid fish Astatotilapia burtoni: comment"</p><p>http://www.biomedcentral.com/1471-2164/9/35</p><p>BMC Genomics 2008;9():35-35.</p><p>Published online 24 Jan 2008</p><p>PMCID:PMC2246111.</p><p></p> in the phylogeny is hypothetic
- …