762 research outputs found

    More robust detection of motifs in coexpressed genes by using phylogenetic information

    Get PDF
    BACKGROUND: Several motif detection algorithms have been developed to discover overrepresented motifs in sets of coexpressed genes. However, in a noisy gene list, the number of genes containing the motif versus the number lacking the motif might not be sufficiently high to allow detection by classical motif detection tools. To still recover motifs which are not significantly enriched but still present, we developed a procedure in which we use phylogenetic footprinting to first delineate all potential motifs in each gene. Then we mutually compare all detected motifs and identify the ones that are shared by at least a few genes in the data set as potential candidates. RESULTS: We applied our methodology to a compiled test data set containing known regulatory motifs and to two biological data sets derived from genome wide expression studies. By executing four consecutive steps of 1) identifying conserved regions in orthologous intergenic regions, 2) aligning these conserved regions, 3) clustering the conserved regions containing similar regulatory regions followed by extraction of the regulatory motifs and 4) screening the input intergenic sequences with detected regulatory motif models, our methodology proves to be a powerful tool for detecting regulatory motifs when a low signal to noise ratio is present in the input data set. Comparing our results with two other motif detection algorithms points out the robustness of our algorithm. CONCLUSION: We developed an approach that can reliably identify multiple regulatory motifs lacking a high degree of overrepresentation in a set of coexpressed genes (motifs belonging to sparsely connected hubs in the regulatory network) by exploiting the advantages of using both coexpression and phylogenetic information

    Systematic identification of functional plant modules through the integration of complementary data sources

    Get PDF
    A major challenge is to unravel how genes interact and are regulated to exert specific biological functions. The integration of genome-wide functional genomics data, followed by the construction of gene networks, provides a powerful approach to identify functional gene modules. Large-scale expression data, functional gene annotations, experimental protein-protein interactions, and transcription factor-target interactions were integrated to delineate modules in Arabidopsis (Arabidopsis thaliana). The different experimental input data sets showed little overlap, demonstrating the advantage of combining multiple data types to study gene function and regulation. In the set of 1,563 modules covering 13,142 genes, most modules displayed strong coexpression, but functional and cis-regulatory coherence was less prevalent. Highly connected hub genes showed a significant enrichment toward embryo lethality and evidence for cross talk between different biological processes. Comparative analysis revealed that 58% of the modules showed conserved coexpression across multiple plants. Using module-based functional predictions, 5,562 genes were annotated, and an evaluation experiment disclosed that, based on 197 recently experimentally characterized genes, 38.1% of these functions could be inferred through the module context. Examples of confirmed genes of unknown function related to cell wall biogenesis, xylem and phloem pattern formation, cell cycle, hormone stimulus, and circadian rhythm highlight the potential to identify new gene functions. The module-based predictions offer new biological hypotheses for functionally unknown genes in Arabidopsis (1,701 genes) and six other plant species (43,621 genes). Furthermore, the inferred modules provide new insights into the conservation of coexpression and coregulation as well as a starting point for comparative functional annotation

    A functional and regulatory perspective on Arabidopsis thaliana

    Get PDF

    Identification of novel regulatory modules in dicotyledonous plants using expression data and comparative genomics

    Get PDF
    BACKGROUND: Transcriptional regulation plays an important role in the control of many biological processes. Transcription factor binding sites (TFBSs) are the functional elements that determine transcriptional activity and are organized into separable cis-regulatory modules, each defining the cooperation of several transcription factors required for a specific spatio-temporal expression pattern. Consequently, the discovery of novel TFBSs in promoter sequences is an important step to improve our understanding of gene regulation. RESULTS: Here, we applied a detection strategy that combines features of classic motif overrepresentation approaches in co-regulated genes with general comparative footprinting principles for the identification of biologically relevant regulatory elements and modules in Arabidopsis thaliana, a model system for plant biology. In total, we identified 80 TFBSs and 139 regulatory modules, most of which are novel, and primarily consist of two or three regulatory elements that could be linked to different important biological processes, such as protein biosynthesis, cell cycle control, photosynthesis and embryonic development. Moreover, studying the physical properties of some specific regulatory modules revealed that Arabidopsis promoters have a compact nature, with cooperative TFBSs located in close proximity of each other. CONCLUSION: These results create a starting point to unravel regulatory networks in plants and to study the regulation of biological processes from a systems biology point of view

    In silico identification and experimental validation of PmrAB targets in Salmonella typhimurium by regulatory motif detection

    Get PDF
    BACKGROUND: The PmrAB (BasSR) two-component regulatory system is required for Salmonella typhimurium virulence. PmrAB-controlled modifications of the lipopolysaccharide (LPS) layer confer resistance to cationic antibiotic polypeptides, which may allow bacteria to survive within macrophages. The PmrAB system also confers resistance to Fe(3+)-mediated killing. New targets of the system have recently been discovered that seem not to have a role in the well-described functions of PmrAB, suggesting that the PmrAB-dependent regulon might contain additional, unidentified targets. RESULTS: We performed an in silico analysis of possible targets of the PmrAB system. Using a motif model of the PmrA binding site in DNA, genome-wide screening was carried out to detect PmrAB target genes. To increase confidence in the predictions, all putative targets were subjected to a cross-species comparison (phylogenetic footprinting) using a Gibbs sampling-based motif-detection procedure. As well as the known targets, we detected additional targets with unknown functions. Four of these were experimentally validated (yibD, aroQ, mig-13 and sseJ). Site-directed mutagenesis of the PmrA-binding site (PmrA box) in yibD revealed specific sequence requirements. CONCLUSIONS: We demonstrated the efficiency of our procedure by recovering most of the known PmrAB-dependent targets and by identifying unknown targets that we were able to validate experimentally. We also pinpointed directions for further research that could help elucidate the S. typhimurium virulence pathway

    The Effect of Orthology and Coregulation on Detecting Regulatory Motifs

    Get PDF
    Background: Computational de novo discovery of transcription factor binding sites is still a challenging problem. The growing number of sequenced genomes allows integrating orthology evidence with coregulation information when searching for motifs. Moreover, the more advanced motif detection algorithms explicitly model the phylogenetic relatedness between the orthologous input sequences and thus should be well adapted towards using orthologous information. In this study, we evaluated the conditions under which complementing coregulation with orthologous information improves motif detection for the class of probabilistic motif detection algorithms with an explicit evolutionary model. Methodology: We designed datasets (real and synthetic) covering different degrees of coregulation and orthologous information to test how well Phylogibbs and Phylogenetic sampler, as representatives of the motif detection algorithms with evolutionary model performed as compared to MEME, a more classical motif detection algorithm that treats orthologs independently. Results and Conclusions: Under certain conditions detecting motifs in the combined coregulation-orthology space is indeed more efficient than using each space separately, but this is not always the case. Moreover, the difference in success rate between the advanced algorithms and MEME is still marginal. The success rate of motif detection depends on the complex interplay between the added information and the specificities of the applied algorithms. Insights in this relation provide information useful to both developers and users. All benchmark datasets are available at http://homes.esat.kuleuven.be/,kmarchal/Supplementary_Storms_Valerie_PlosONE

    Comparative transcriptomics in plants

    Get PDF
    Comparative genomics is the study of the structural and functional rela- tionships between the genomes of different species or strains. Recently microarray experiments have yielded massive amounts of expression infor- mation for many genes under various conditions or in different tissues for different model species. Expression compendia grouping multiple microar- ray experiments performed in similar (or different) experimental condition make it possible to define correlated expression patterns between genes. Genes within such a coexpression cluster are expected to have more similar functionality compared to genes lacking expression similarity. In this thesis the different steps required to systematically compare expres- sion data across species are described and some future applications of plant comparative transcriptomics are highlighted. Then we analyzed if function- ally related genes show coexpression in Arabidopsis and rice and developed a general framework to measure expression context conservation (ECC) for orthologous genes. Additionally, we studied the evolutionary parameters influencing ECC conservation and compared expression with sequence evo- lution. At the end, a new method is presented to define high quality tis- sue specific genes in seven different plant species; A.thaliana (Arabidopsis), Z.mays (Maize), M.truncatula (Medicago), P.trichocarpa (Poplar), O.sativa (Rice), G.max (Soybean) and V.vinifera (Grape) using Affymetrix microar- ray expression profiles. We also performed an in-depth study on the rela- tionship between leaf tissue specific genes coexpression clusters, within a species and in comparison with other species for a set of strictly selected genes

    Positional clustering improves computational binding site detection and identifies novel cis-regulatory sites in mammalian GABA(A) receptor subunit genes

    Get PDF
    Understanding transcription factor (TF) mediated control of gene expression remains a major challenge at the interface of computational and experimental biology. Computational techniques predicting TF-binding site specificity are frequently unreliable. On the other hand, comprehensive experimental validation is difficult and time consuming. We introduce a simple strategy that dramatically improves robustness and accuracy of computational binding site prediction. First, we evaluate the rate of recurrence of computational TFBS predictions by commonly used sampling procedures. We find that the vast majority of results are biologically meaningless. However clustering results based on nucleotide position improves predictive power. Additionally, we find that positional clustering increases robustness to long or imperfectly selected input sequences. Positional clustering can also be used as a mechanism to integrate results from multiple sampling approaches for improvements in accuracy over each one alone. Finally, we predict and validate regulatory sequences partially responsible for transcriptional control of the mammalian type A γ-aminobutyric acid receptor (GABA(A)R) subunit genes. Positional clustering is useful for improving computational binding site predictions, with potential application to improving our understanding of mammalian gene expression. In particular, predicted regulatory mechanisms in the mammalian GABA(A)R subunit gene family may open new avenues of research towards understanding this pharmacologically important neurotransmitter receptor system

    Identification of Avramr1 from Phytophthora infestans using long read and cDNA pathogen-enrichment sequencing (PenSeq)

    Get PDF
    Potato late blight, caused by the oomycete pathogen Phytophthora infestans, significantly hampers potato production. Recently, a new Resistance to Phytophthora infestans (Rpi) gene, Rpi‐amr1, was cloned from a wild Solanum species, Solanum americanum. Identification of the corresponding recognized effector (Avirulence or Avr) genes from P. infestans is key to elucidating their naturally occurring sequence variation, which in turn informs the potential durability of the cognate late blight resistance. To identify the P. infestans effector recognized by Rpi‐amr1, we screened available RXLR effector libraries and used long read and cDNA pathogen‐enrichment sequencing (PenSeq) on four P. infestans isolates to explore the untested effectors. Using single‐molecule real‐time sequencing (SMRT) and cDNA PenSeq, we identified 47 highly expressed effectors from P. infestans, including PITG_07569, which triggers a highly specific cell death response when transiently coexpressed with Rpi‐amr1 in Nicotiana benthamiana, suggesting that PITG_07569 is Avramr1. Here we demonstrate that long read and cDNA PenSeq enables the identification of full‐length RXLR effector families and their expression profile. This study has revealed key insights into the evolution and polymorphism of a complex RXLR effector family that is associated with the recognition by Rpi‐amr1

    Fast, sensitive discovery of conserved genome-wide motifs

    Get PDF
    Regulatory sites that control gene expression are essential to the proper functioning of cells, and identifying them is critical for modeling regulatory networks. We have developed Magma (Multiple Aligner of Genomic Multiple Alignments), a software tool for multiple species, multiple gene motif discovery. Magma identifies putative regulatory sites that are conserved across multiple species and occur near multiple genes throughout a reference genome. Magma takes as input multiple alignments that can include gaps. It uses efficient clustering methods that make it about 70 times faster than PhyloNet, a previous program for this task, with slightly greater sensitivity. We ran Magma on all non-coding DNA conserved between Caenorhabditis elegans and five additional species, about 70 Mbp in total, in <4 h. We obtained 2,309 motifs with lengths of 6–20 bp, each occurring at least 10 times throughout the genome, which collectively covered about 566 kbp of the genomes, approximately 0.8% of the input. Predicted sites occurred in all types of non-coding sequence but were especially enriched in the promoter regions. Comparisons to several experimental datasets show that Magma motifs correspond to a variety of known regulatory motifs
    corecore