139 research outputs found

    Genomic Resources for Asparagales

    Get PDF
    Enormous genomic resources have been developed for plants in the monocot order Poales; however, it is not known how useful these resources will be for other economically important monocots. Asparagales are a monophyletic order sister to class Commelinanae that carries Poales, and is the second most economically important monocot order. Development of genomic resources for and their application to Asparagales are challenging because of huge nuclear genomes and the relatively long generation times required to develop segregating families. We synthesized a normalized eDNA library of onion (Allium cepa) and produced II ,008 unique expressed sequence tags (ESTs) for comparative genomic analyses of Asparagales and Poales. Alignments of onion ESTs, Poales ESTs, and genomic sequences from rice were used to design oligonucleotide primers amplifying genomic regions from asparagus, garlic, and onion. Sequence analyses of these genomic regions revealed microsatellites, insertions/deletions, and single nucleotide polymorphisms for comparative mapping of rice and Asparagales vegetables. Initial mapping revealed no obvious synteny at the recombinationallevel between onion and rice, indicating that genomic resources developed for Poales may not be applicable to the monocots as a whole. Genomic analyses of Asparagales would greatly benefit from EST sequencing and deep-coverage, large-insert genomic libraries of representative small-genome model species within the higher and lower Asparagales, such as asparagus and orchid, respectively

    Highly syntenic regions in the genomes of soybean, Medicago truncatula, and Arabidopsis thaliana

    Get PDF
    BACKGROUND: Recent genome sequencing enables mega-base scale comparisons between related genomes. Comparisons between animals, plants, fungi, and bacteria demonstrate extensive synteny tempered by rearrangements. Within the legume plant family, glimpses of synteny have also been observed. Characterizing syntenic relationships in legumes is important in transferring knowledge from model legumes to crops that are important sources of protein, fixed nitrogen, and health-promoting compounds. RESULTS: We have uncovered two large soybean regions exhibiting synteny with M. truncatula and with a network of segmentally duplicated regions in Arabidopsis. In all, syntenic regions comprise over 500 predicted genes spanning 3 Mb. Up to 75% of soybean genes are colinear with M. truncatula, including one region in which 33 of 35 soybean predicted genes with database support are colinear to M. truncatula. In some regions, 60% of soybean genes share colinearity with a network of A. thaliana duplications. One region is especially interesting because this 500 kbp segment of soybean is syntenic to two paralogous regions in M. truncatula on different chromosomes. Phylogenetic analysis of individual genes within these regions demonstrates that one is orthologous to the soybean region, with which it also shows substantially denser synteny and significantly lower levels of synonymous nucleotide substitutions. The other M. truncatula region is inferred to be paralogous, presumably resulting from a duplication event preceding speciation. CONCLUSION: The presence of well-defined M. truncatula segments showing orthologous and paralogous relationships with soybean allows us to explore the evolution of contiguous genomic regions in the context of ancient genome duplication and speciation events

    The complete chloroplast genome sequence of Gossypium hirsutum: organization and phylogenetic relationships to other angiosperms

    Get PDF
    BACKGROUND: Cotton (Gossypium hirsutum) is the most important fiber crop grown in 90 countries. In 2004–2005, US farmers planted 79% of the 5.7-million hectares of nuclear transgenic cotton. Unfortunately, genetically modified cotton has the potential to hybridize with other cultivated and wild relatives, resulting in geographical restrictions to cultivation. However, chloroplast genetic engineering offers the possibility of containment because of maternal inheritance of transgenes. The complete chloroplast genome of cotton provides essential information required for genetic engineering. In addition, the sequence data were used to assess phylogenetic relationships among the major clades of rosids using cotton and 25 other completely sequenced angiosperm chloroplast genomes. RESULTS: The complete cotton chloroplast genome is 160,301 bp in length, with 112 unique genes and 19 duplicated genes within the IR, containing a total of 131 genes. There are four ribosomal RNAs, 30 distinct tRNA genes and 17 intron-containing genes. The gene order in cotton is identical to that of tobacco but lacks rpl22 and infA. There are 30 direct and 24 inverted repeats 30 bp or longer with a sequence identity ≥ 90%. Most of the direct repeats are within intergenic spacer regions, introns and a 72 bp-long direct repeat is within the psaA and psaB genes. Comparison of protein coding sequences with expressed sequence tags (ESTs) revealed nucleotide substitutions resulting in amino acid changes in ndhC, rpl23, rpl20, rps3 and clpP. Phylogenetic analysis of a data set including 61 protein-coding genes using both maximum likelihood and maximum parsimony were performed for 28 taxa, including cotton and five other angiosperm chloroplast genomes that were not included in any previous phylogenies. CONCLUSION: Cotton chloroplast genome lacks rpl22 and infA and contains a number of dispersed direct and inverted repeats. RNA editing resulted in amino acid changes with significant impact on their hydropathy. Phylogenetic analysis provides strong support for the position of cotton in the Malvales in the eurosids II clade sister to Arabidopsis in the Brassicales. Furthermore, there is strong support for the placement of the Myrtales sister to the eurosid I clade, although expanded taxon sampling is needed to further test this relationship

    Complete Plastid Genome Sequence of Daucus Carota: Implications for Biotechnology and Phylogeny of Angiosperms

    Get PDF
    Background Carrot (Daucus carota) is a major food crop in the US and worldwide. Its capacity for storage and its lifecycle as a biennial make it an attractive species for the introduction of foreign genes, especially for oral delivery of vaccines and other therapeutic proteins. Until recently efforts to express recombinant proteins in carrot have had limited success in terms of protein accumulation in the edible tap roots. Plastid genetic engineering offers the potential to overcome this limitation, as demonstrated by the accumulation of BADH in chromoplasts of carrot taproots to confer exceedingly high levels of salt resistance. The complete plastid genome of carrot provides essential information required for genetic engineering. Additionally, the sequence data add to the rapidly growing database of plastid genomes for assessing phylogenetic relationships among angiosperms. Results The complete carrot plastid genome is 155,911 bp in length, with 115 unique genes and 21 duplicated genes within the IR. There are four ribosomal RNAs, 30 distinct tRNA genes and 18 intron-containing genes. Repeat analysis reveals 12 direct and 2 inverted repeats ≥ 30 bp with a sequence identity ≥ 90%. Phylogenetic analysis of nucleotide sequences for 61 protein-coding genes using both maximum parsimony (MP) and maximum likelihood (ML) were performed for 29 angiosperms. Phylogenies from both methods provide strong support for the monophyly of several major angiosperm clades, including monocots, eudicots, rosids, asterids, eurosids II, euasterids I, and euasterids II. Conclusion The carrot plastid genome contains a number of dispersed direct and inverted repeats scattered throughout coding and non-coding regions. This is the first sequenced plastid genome of the family Apiaceae and only the second published genome sequence of the species-rich euasterid II clade. Both MP and ML trees provide very strong support (100% bootstrap) for the sister relationship of Daucus with Panax in the euasterid II clade. These results provide the best taxon sampling of complete chloroplast genomes and the strongest support yet for the sister relationship of Caryophyllales to the asterids. The availability of the complete plastid genome sequence should facilitate improved transformation efficiency and foreign gene expression in carrot through utilization of endogenous flanking sequences and regulatory elements

    A newly-developed community microarray resource for transcriptome profiling in Brassica species enables the confirmation of Brassica-specific expressed sequences

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>The <it>Brassica </it>species include an important group of crops and provide opportunities for studying the evolutionary consequences of polyploidy. They are related to <it>Arabidopsis thaliana</it>, for which the first complete plant genome sequence was obtained and their genomes show extensive, although imperfect, conserved synteny with that of <it>A. thaliana</it>. A large number of EST sequences, derived from a range of different <it>Brassica </it>species, are available in the public database, but no public microarray resource has so far been developed for these species.</p> <p>Results</p> <p>We assembled unigenes using ~800,000 EST sequences, mainly from three species: <it>B. napus</it>, <it>B. rapa </it>and <it>B. oleracea</it>. The assembly was conducted with the aim of co-assembling ESTs of orthologous genes (including homoeologous pairs of genes in <it>B. napus </it>from each of the A and C genomes), but resolving assemblies of paralogous, or paleo-homoeologous, genes (<it>i.e</it>. the genes related by the ancestral genome triplication observed in diploid <it>Brassica </it>species). 90,864 unique sequence assemblies were developed. These were incorporated into the BAC sequence annotation for the <it>Brassica rapa </it>Genome Sequencing Project, enabling the identification of cognate genomic sequences for a proportion of them. A 60-mer oligo microarray comprising 94,558 probes was developed using the unigene sequences. Gene expression was analysed in reciprocal resynthesised <it>B. napus </it>lines and the <it>B. oleracea </it>and <it>B. rapa </it>lines used to produce them. The analysis showed that significant expression could consistently be detected in leaf tissue for 35,386 unigenes. Expression was detected across all four genotypes for 27,355 unigenes, genome-specific expression patterns were observed for 7,851 unigenes and 180 unigenes displayed other classes of expression pattern. Principal component analysis (PCA) clearly resolved the individual microarray datasets for <it>B. rapa</it>, <it>B. oleracea </it>and resynthesised <it>B. napus</it>. Quantitative differences in expression were observed between the resynthesised <it>B. napus </it>lines for 98 unigenes, most of which could be classified into non-additive expression patterns, including 17 that showed cytoplasm-specific patterns. We further characterized the unigenes for which A genome-specific expression was observed and cognate genomic sequences could be identified. Ten of these unigenes were found to be <it>Brassica</it>-specific sequences, including two that originate from complex loci comprising gene clusters.</p> <p>Conclusion</p> <p>We succeeded in developing a <it>Brassica </it>community microarray resource. Although expression can be measured for the majority of unigenes across species, there were numerous probes that reported in a genome-specific manner. We anticipate that some proportion of these will represent species-specific transcripts and the remainder will be the consequence of variation of sequences within the regions represented by the array probes. Our studies demonstrated that the datasets obtained from the arrays can be used for typical analyses, including PCA and the analysis of differential expression. We have also demonstrated that <it>Brassica</it>-specific transcripts identified <it>in silico </it>in the sequence assembly of public EST database accessions are indeed reported by the array. These would not be detectable using arrays designed using <it>A. thaliana </it>sequences.</p

    Sequencing and analysis of the gene-rich space of cowpea

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Cowpea, <it>Vigna unguiculata </it>(L.) Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing.</p> <p>Results</p> <p>We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF) technology. Over 250,000 gene-space sequence reads (GSRs) with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (<it>A. thaliana</it>, <it>M. truncatula, O. sativa</it>, and <it>P. trichocarpa</it>), and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO) with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A total of 5,888 GSRs had homology to genes encoding transcription factors (TFs) and transcription associated factors (TAFs) representing about 5% of the total annotated sequences in the dataset. Sixty-two (62) of the 64 well-characterized plant transcription factor (TF) gene families are represented in the cowpea GSRs, and these families are of similar size and phylogenetic organization to those characterized in other plants. The cowpea GSRs also provides a rich source of genes involved in photoperiodic control, symbiosis, and defense-related responses. Comparisons to available databases revealed that about 74% of cowpea ESTs and 70% of all legume ESTs were represented in the GSR dataset. As approximately 12% of all GSRs contain an identifiable simple-sequence repeat, the dataset is a powerful resource for the design of microsatellite markers.</p> <p>Conclusion</p> <p>The availability of extensive publicly available genomic data for cowpea, a non-model legume with significant importance in the developing world, represents a significant step forward in legume research. Not only does the gene space sequence enable the detailed analysis of gene structure, gene family organization and phylogenetic relationships within cowpea, but it also facilitates the characterization of syntenic relationships with other cultivated and model legumes, and will contribute to determining patterns of chromosomal evolution in the Leguminosae. The micro and macrosyntenic relationships detected between cowpea and other cultivated and model legumes should simplify the identification of informative markers for marker-assisted trait selection and map-based gene isolation necessary for cowpea improvement.</p

    Phylogenomics and the dynamic genome evolution of the genus Streptococcus

    Get PDF
    The genus Streptococcus comprises important pathogens that have a severe impact on human health and are responsible for substantial economic losses to agriculture. Here, we utilize 46 Streptococcus genome sequences (44 species), including eight species sequenced here, to provide the first genomic level insight into the evolutionary history and genetic basis underlying the functional diversity of all major groups of this genus. Gene gain/loss analysis revealed a dynamic pattern of genome evolution characterized by an initial period of gene gain followed by a period of loss, as the major groups within the genus diversified. This was followed by a period of genome expansion associated with the origins of the present extant species. The pattern is concordant with an emerging view that genomes evolve through a dynamic process of expansion and streamlining. A large proportion of the pan-genome has experienced lateral gene transfer (LGT) with causative factors, such as relatedness and shared environment, operating over different evolutionary scales. Multiple gene ontology terms were significantly enriched for each group, and mapping terms onto the phylogeny showed that those corresponding to genes born on branches leading to the major groups represented approximately one-fifth of those enriched. Furthermore, despite the extensive LGT, several biochemical characteristics have been retained since group formation, suggesting genomic cohesiveness through time, and that these characteristics may be fundamental to each group. For example, proteolysis: mitis group; urea metabolism: salivarius group; carbohydrate metabolism: pyogenic group; and transcription regulation: bovis group

    High throughput generation of promoter reporter (GFP) transgenic lines of low expressing genes in Arabidopsis and analysis of their expression patterns

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although the complete genome sequence and annotation of Arabidopsis were released at the end of year 2000, it is still a great challenge to understand the function of each gene in the Arabidopsis genome. One way to understand the function of genes on a genome-wide scale is expression profiling by microarrays. However, the expression level of many genes in Arabidopsis genome cannot be detected by microarray experiments. In addition, there are many more novel genes that have been discovered by experiments or predicted by new gene prediction programs. Another way to understand the function of individual genes is to investigate their <it>in vivo </it>expression patterns by reporter constructs in transgenic plants which can provide basic information on the patterns of gene expression.</p> <p>Results</p> <p>A high throughput pipeline was developed to generate promoter-reporter (GFP) transgenic lines for Arabidopsis genes expressed at very low levels and to examine their expression patterns <it>in vivo</it>. The promoter region from a total of 627 non- or low-expressed genes in Arabidopsis based on Arabidopsis annotation release 5 were amplified and cloned into a Gateway vector. A total of 353 promoter-reporter (GFP) constructs were successfully transferred into Agrobacterium (GV3101) by triparental mating and subsequently used for Arabidopsis transformation. Kanamycin-resistant transgenic lines were obtained from 266 constructs and among them positive GFP expression was detected from 150 constructs. Of these 150 constructs, multiple transgenic lines exhibiting consistent expression patterns were obtained for 112 constructs. A total 81 different regions of expression were discovered during our screening of positive transgenic plants and assigned Plant Ontology (PO) codes.</p> <p>Conclusions</p> <p>Many of the genes tested for which expression data were lacking previously are indeed expressed in Arabidopsis during the developmental stages screened. More importantly, our study provides plant researchers with another resource of gene expression information in Arabidopsis. The results of this study are captured in a MySQL database and can be searched at <url>http://www.jcvi.org/arabidopsis/qpcr/index.shtml</url>. Transgenic seeds and constructs are also available for the research community.</p
    corecore