6 research outputs found

    Identification of shared single copy nuclear genes in Arabidopsis, Populus, Vitis and Oryza and their phylogenetic utility across various taxonomic levels

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although the overwhelming majority of genes found in angiosperms are members of gene families, and both gene- and genome-duplication are pervasive forces in plant genomes, some genes are sufficiently distinct from all other genes in a genome that they can be operationally defined as 'single copy'. Using the gene clustering algorithm MCL-tribe, we have identified a set of 959 single copy genes that are shared single copy genes in the genomes of <it>Arabidopsis thaliana, Populus trichocarpa, Vitis vinifera </it>and <it>Oryza sativa</it>. To characterize these genes, we have performed a number of analyses examining GO annotations, coding sequence length, number of exons, number of domains, presence in distant lineages, such as <it>Selaginella </it>and <it>Physcomitrella</it>, and phylogenetic analysis to estimate copy number in other seed plants and to demonstrate their phylogenetic utility. We then provide examples of how these genes may be used in phylogenetic analyses to reconstruct organismal history, both by using extant coverage in EST databases for seed plants and <it>de novo </it>amplification via RT-PCR in the family Brassicaceae.</p> <p>Results</p> <p>There are 959 single copy nuclear genes shared in <it>Arabidopsis</it>, <it>Populus</it>, <it>Vitis </it>and <it>Oryza </it>["APVO SSC genes"]. The majority of these genes are also present in the <it>Selaginella </it>and <it>Physcomitrella </it>genomes. Public EST sets for 197 species suggest that most of these genes are present across a diverse collection of seed plants, and appear to exist as single or very low copy genes, though exceptions are seen in recently polyploid taxa and in lineages where there is significant evidence for a shared large-scale duplication event. Genes encoding proteins localized in organelles are more commonly single copy than expected by chance, but the evolutionary forces responsible for this bias are unknown.</p> <p>Regardless of the evolutionary mechanisms responsible for the large number of shared single copy genes in diverse flowering plant lineages, these genes are valuable for phylogenetic and comparative analyses. Eighteen of the APVO SSC single copy genes were amplified in the Brassicaceae using RT-PCR and directly sequenced. Alignments of these sequences provide improved resolution of Brassicaceae phylogeny compared to recent studies using plastid and ITS sequences. An analysis of sequences from 13 APVO SSC genes from 69 species of seed plants, derived mainly from public EST databases, yielded a phylogeny that was largely congruent with prior hypotheses based on multiple plastid sequences. Whereas single gene phylogenies that rely on EST sequences have limited bootstrap support as the result of limited sequence information, concatenated alignments result in phylogenetic trees with strong bootstrap support for already established relationships. Overall, these single copy nuclear genes are promising markers for phylogenetics, and contain a greater proportion of phylogenetically-informative sites than commonly used protein-coding sequences from the plastid or mitochondrial genomes.</p> <p>Conclusions</p> <p>Putatively orthologous, shared single copy nuclear genes provide a vast source of new evidence for plant phylogenetics, genome mapping, and other applications, as well as a substantial class of genes for which functional characterization is needed. Preliminary evidence indicates that many of the shared single copy nuclear genes identified in this study may be well suited as markers for addressing phylogenetic hypotheses at a variety of taxonomic levels.</p

    Floral gene resources from basal angiosperms for comparative genomics research

    Get PDF
    BACKGROUND: The Floral Genome Project was initiated to bridge the genomic gap between the most broadly studied plant model systems. Arabidopsis and rice, although now completely sequenced and under intensive comparative genomic investigation, are separated by at least 125 million years of evolutionary time, and cannot in isolation provide a comprehensive perspective on structural and functional aspects of flowering plant genome dynamics. Here we discuss new genomic resources available to the scientific community, comprising cDNA libraries and Expressed Sequence Tag (EST) sequences for a suite of phylogenetically basal angiosperms specifically selected to bridge the evolutionary gaps between model plants and provide insights into gene content and genome structure in the earliest flowering plants. RESULTS: Random sequencing of cDNAs from representatives of phylogenetically important eudicot, non-grass monocot, and gymnosperm lineages has so far (as of 12/1/04) generated 70,514 ESTs and 48,170 assembled unigenes. Efficient sorting of EST sequences into putative gene families based on whole Arabidopsis/rice proteome comparison has permitted ready identification of cDNA clones for finished sequencing. Preliminarily, (i) proportions of functional categories among sequenced floral genes seem representative of the entire Arabidopsis transcriptome, (ii) many known floral gene homologues have been captured, and (iii) phylogenetic analyses of ESTs are providing new insights into the process of gene family evolution in relation to the origin and diversification of the angiosperms. CONCLUSION: Initial comparisons illustrate the utility of the EST data sets toward discovery of the basic floral transcriptome. These first findings also afford the opportunity to address a number of conspicuous evolutionary genomic questions, including reproductive organ transcriptome overlap between angiosperms and gymnosperms, genome-wide duplication history, lineage-specific gene duplication and functional divergence, and analyses of adaptive molecular evolution. Since not all genes in the floral transcriptome will be associated with flowering, these EST resources will also be of interest to plant scientists working on other functions, such as photosynthesis, signal transduction, and metabolic pathways

    EST database for early flower development in California poppy (Eschscholzia californica Cham., Papaveraceae) tags over 6000 genes from a basal eudicot

    No full text
    The Floral Genome Project (FGP) selected California poppy (Eschscholzia californica Cham. ssp. Californica) to help identify new florally-expressed genes related to floral diversity in basal eudicots. A large, non-normalized cDNA library was constructed from premeiotic and meiotic floral buds and sequenced to generate a database of 9079 high quality Expressed Sequence Tags (ESTs). These sequences clustered into 5713 unigenes, including 1414 contigs and 4299 singletons. Homologs of genes regulating many aspects of flower development were identified, including those for organ identity and development, cell and tissue differentiation, cell cycle control, and secondary metabolism. Over 5% of the transcriptome consisted of homologs to known floral gene families. Most are the first representatives of their respective gene families in basal eudicots and their conservation suggests they are important for floral development and/or function. App. 10% of the transcripts encoded transcription factors and other regulatory genes, including nine genes from the seven major lineages of the important MADS-box family of developmental regulators. Homologs of alkaloid pathway genes were also recovered, providing opportunities to explore adaptive evolution in secondary products. Furthermore, comparison of the poppy ESTs with the Arabidopsis genome provided support for putative Arabidopsis genes that previously lacked annotation. Finally, over 1800 unique sequences had no observable homology in the public databases. The California poppy EST database and library will help bridge our understanding of flower initiation and development among higher eudicot and monocot model plants and provide new opportunities for comparative analysis of gene families across angiosperm species
    corecore