139 research outputs found

    Identification and analysis of gene families from the duplicated genome of soybean using EST sequences

    Get PDF
    BACKGROUND: Large scale gene analysis of most organisms is hampered by incomplete genomic sequences. In many organisms, such as soybean, the best source of sequence information is the existence of expressed sequence tag (EST) libraries. Soybean has a large (1115 Mbp) genome that has yet to be fully sequenced. However it does have the 6th largest EST collection comprised of ESTs from a variety of soybean genotypes. Many EST libraries were constructed from RNA extracted from various genetic backgrounds, thus gene identification from these sources is complicated by the existence of both gene and allele sequence differences. We used the ESTminer suite of programs to identify potential soybean gene transcripts from a single genetic background allowing us to observe functional classifications between gene families as well as structural differences between genes and gene paralogs within families. The identification of potential gene sequences (pHaps) from soybean allows us to begin to get a picture of the genomic history of the organism as well as begin to observe the evolutionary fates of gene copies in this highly duplicated genome. RESULTS: We identified approximately 45,000 potential gene sequences (pHaps) from EST sequences of Williams/Williams82, an inbred genotype of soybean (Glycine max L. Merr.) using a redundancy criterion to identify reproducible sequence differences between related genes within gene families. Analysis of these sequences revealed single base substitutions and single base indels are the most frequently observed form of sequence variation between genes within families in the dataset. Genomic sequencing of selected loci indicate that intron-like intervening sequences are numerous and are approximately 220 bp in length. Functional annotation of gene sequences indicate functional classifications are not randomly distributed among gene families containing few or many genes. CONCLUSION: The predominance of single nucleotide insertion/deletions and substitution events between genes within families (individual genes and gene paralogs) is consistent with a model of gene amplification followed by single base random mutational events expected under the classical model of duplicated gene evolution. Molecular functions of small and large gene families appear to be non-randomly distributed possibly indicating a difference in retention of duplicates or local expansion

    SoyTEdb: a comprehensive database of transposable elements in the soybean genome

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Transposable elements are the most abundant components of all characterized genomes of higher eukaryotes. It has been documented that these elements not only contribute to the shaping and reshaping of their host genomes, but also play significant roles in regulating gene expression, altering gene function, and creating new genes. Thus, complete identification of transposable elements in sequenced genomes and construction of comprehensive transposable element databases are essential for accurate annotation of genes and other genomic components, for investigation of potential functional interaction between transposable elements and genes, and for study of genome evolution. The recent availability of the soybean genome sequence has provided an unprecedented opportunity for discovery, and structural and functional characterization of transposable elements in this economically important legume crop.</p> <p>Description</p> <p>Using a combination of structure-based and homology-based approaches, a total of 32,552 retrotransposons (Class I) and 6,029 DNA transposons (Class II) with clear boundaries and insertion sites were structurally annotated and clearly categorized, and a soybean transposable element database, SoyTEdb, was established. These transposable elements have been anchored in and integrated with the soybean physical map and genetic map, and are browsable and visualizable at any scale along the 20 soybean chromosomes, along with predicted genes and other sequence annotations. BLAST search and other infrastracture tools were implemented to facilitate annotation of transposable elements or fragments from soybean and other related legume species. The majority (> 95%) of these elements (particularly a few hundred low-copy-number families) are first described in this study.</p> <p>Conclusion</p> <p>SoyTEdb provides resources and information related to transposable elements in the soybean genome, representing the most comprehensive and the largest manually curated transposable element database for any individual plant genome completely sequenced to date. Transposable elements previously identified in legumes, the third largest family of flowering plants, are relatively scarce. Thus this database will facilitate structural, evolutionary, functional, and epigenetic analyses of transposable elements in soybean and other legume species.</p

    SoyBase, the USDA-ARS soybean genetics and genomics database

    Get PDF
    SoyBase, the USDA-ARS soybean genetic database, is a comprehensive repository for professionally curated genetics, genomics and related data resources for soybean. SoyBase contains the most current genetic, physical and genomic sequence maps integrated with qualitative and quantitative traits. The quantitative trait loci (QTL) represent more than 18 years of QTL mapping of more than 90 unique traits. SoyBase also contains the well-annotated ‘Williams 82’ genomic sequence and associated data mining tools. The genetic and sequence views of the soybean chromosomes and the extensive data on traits and phenotypes are extensively interlinked. This allows entry to the database using almost any kind of available information, such as genetic map symbols, soybean gene names or phenotypic traits. SoyBase is the repository for controlled vocabularies for soybean growth, development and trait terms, which are also linked to the more general plant ontologies. SoyBase can be accessed at http://soybase.org

    Integrating microarray analysis and the soybean genome to understand the soybeans iron deficiency response

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Soybeans grown in the upper Midwestern United States often suffer from iron deficiency chlorosis, which results in yield loss at the end of the season. To better understand the effect of iron availability on soybean yield, we identified genes in two near isogenic lines with changes in expression patterns when plants were grown in iron sufficient and iron deficient conditions.</p> <p>Results</p> <p>Transcriptional profiles of soybean (<it>Glycine max</it>, L. Merr) near isogenic lines Clark (PI548553, iron efficient) and IsoClark (PI547430, iron inefficient) grown under Fe-sufficient and Fe-limited conditions were analyzed and compared using the Affymetrix<sup>® </sup>GeneChip<sup>® </sup>Soybean Genome Array. There were 835 candidate genes in the Clark (PI548553) genotype and 200 candidate genes in the IsoClark (PI547430) genotype putatively involved in soybean's iron stress response. Of these candidate genes, fifty-eight genes in the Clark genotype were identified with a genetic location within known iron efficiency QTL and 21 in the IsoClark genotype. The arrays also identified 170 single feature polymorphisms (SFPs) specific to either Clark or IsoClark. A sliding window analysis of the microarray data and the 7X genome assembly coupled with an iterative model of the data showed the candidate genes are clustered in the genome. An analysis of 5' untranslated regions in the promoter of candidate genes identified 11 conserved motifs in 248 differentially expressed genes, all from the Clark genotype, representing 129 clusters identified earlier, confirming the cluster analysis results.</p> <p>Conclusion</p> <p>These analyses have identified the first genes with expression patterns that are affected by iron stress and are located within QTL specific to iron deficiency stress. The genetic location and promoter motif analysis results support the hypothesis that the differentially expressed genes are co-regulated. The combined results of all analyses lead us to postulate iron inefficiency in soybean is a result of a mutation in a transcription factor(s), which controls the expression of genes required in inducing an iron stress response.</p

    Climate change mitigation beyond agriculture: A review of food system opportunities and implications

    Get PDF
    A large body of research has explored opportunities to mitigate climate change in agricultural systems; however, less research has explored opportunities across the food system. Here we expand the existing research with a review of potential mitigation opportunities across the entire food system, including in pre-production, production, processing, transport, consumption and loss and waste. We detail and synthesize recent research on the topic, and explore the applicability of different climate mitigation strategies in varying country contexts with different economic and agricultural systems. Further, we highlight some potential adaptation co-benefits of food system mitigation strategies and explore the potential implications of such strategies on food systems as a whole. We suggest that a food systems research approach is greatly needed to capture such potential synergies, and highlight key areas of additional research including a greater focus on low- and middle-income countries in particular. We conclude by discussing the policy and finance opportunities needed to advance mitigation strategies in food systems

    Gene duplication and paleopolyploidy in soybean and the implications for whole genome sequencing

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Soybean, <it>Glycine max </it>(L.) Merr., is a well documented paleopolyploid. What remains relatively under characterized is the level of sequence identity in retained homeologous regions of the genome. Recently, the Department of Energy Joint Genome Institute and United States Department of Agriculture jointly announced the sequencing of the soybean genome. One of the initial concerns is to what extent sequence identity in homeologous regions would have on whole genome shotgun sequence assembly.</p> <p>Results</p> <p>Seventeen BACs representing ~2.03 Mb were sequenced as representative potential homeologous regions from the soybean genome. Genetic mapping of each BAC shows that 11 of the 20 chromosomes are represented. Sequence comparisons between homeologous BACs shows that the soybean genome is a mosaic of retained paleopolyploid regions. Some regions appear to be highly conserved while other regions have diverged significantly. Large-scale "batch" reassembly of all 17 BACs combined showed that even the most homeologous BACs with upwards of 95% sequence identity resolve into their respective homeologous sequences. Potential assembly errors were generated by tandemly duplicated pentatricopeptide repeat containing genes and long simple sequence repeats. Analysis of a whole-genome shotgun assembly of 80,000 randomly chosen JGI-DOE sequence traces reveals some new soybean-specific repeat sequences.</p> <p>Conclusion</p> <p>This analysis investigated both the structure of the paleopolyploid soybean genome and the potential effects retained homeology will have on assembling the whole genome shotgun sequence. Based upon these results, homeologous regions similar to those characterized here will not cause major assembly issues.</p

    Data sharing and ontology use among agricultural genetics, genomics, and breeding databases and resources of the AgBioData Consortium

    Full text link
    Over the last several decades, there has been rapid growth in the number and scope of agricultural genetics, genomics and breeding (GGB) databases and resources. The AgBioData Consortium (https://www.agbiodata.org/) currently represents 44 databases and resources covering model or crop plant and animal GGB data, ontologies, pathways, genetic variation and breeding platforms (referred to as 'databases' throughout). One of the goals of the Consortium is to facilitate FAIR (Findable, Accessible, Interoperable, and Reusable) data management and the integration of datasets which requires data sharing, along with structured vocabularies and/or ontologies. Two AgBioData working groups, focused on Data Sharing and Ontologies, conducted a survey to assess the status and future needs of the members in those areas. A total of 33 researchers responded to the survey, representing 37 databases. Results suggest that data sharing practices by AgBioData databases are in a healthy state, but it is not clear whether this is true for all metadata and data types across all databases; and that ontology use has not substantially changed since a similar survey was conducted in 2017. We recommend 1) providing training for database personnel in specific data sharing techniques, as well as in ontology use; 2) further study on what metadata is shared, and how well it is shared among databases; 3) promoting an understanding of data sharing and ontologies in the stakeholder community; 4) improving data sharing and ontologies for specific phenotypic data types and formats; and 5) lowering specific barriers to data sharing and ontology use, by identifying sustainability solutions, and the identification, promotion, or development of data standards. Combined, these improvements are likely to help AgBioData databases increase development efforts towards improved ontology use, and data sharing via programmatic means.Comment: 17 pages, 8 figure

    An ontology approach to comparative phenomics in plants

    Get PDF
    BACKGROUND: Plant phenotype datasets include many different types of data, formats, and terms from specialized vocabularies. Because these datasets were designed for different audiences, they frequently contain language and details tailored to investigators with different research objectives and backgrounds. Although phenotype comparisons across datasets have long been possible on a small scale, comprehensive queries and analyses that span a broad set of reference species, research disciplines, and knowledge domains continue to be severely limited by the absence of a common semantic framework. RESULTS: We developed a workflow to curate and standardize existing phenotype datasets for six plant species, encompassing both model species and crop plants with established genetic resources. Our effort focused on mutant phenotypes associated with genes of known sequence in Arabidopsis thaliana (L.) Heynh. (Arabidopsis), Zea mays L. subsp. mays (maize), Medicago truncatula Gaertn. (barrel medic or Medicago), Oryza sativa L. (rice), Glycine max (L.) Merr. (soybean), and Solanum lycopersicum L. (tomato). We applied the same ontologies, annotation standards, formats, and best practices across all six species, thereby ensuring that the shared dataset could be used for cross-species querying and semantic similarity analyses. Curated phenotypes were first converted into a common format using taxonomically broad ontologies such as the Plant Ontology, Gene Ontology, and Phenotype and Trait Ontology. We then compared ontology-based phenotypic descriptions with an existing classification system for plant phenotypes and evaluated our semantic similarity dataset for its ability to enhance predictions of gene families, protein functions, and shared metabolic pathways that underlie informative plant phenotypes. CONCLUSIONS: The use of ontologies, annotation standards, shared formats, and best practices for cross-taxon phenotype data analyses represents a novel approach to plant phenomics that enhances the utility of model genetic organisms and can be readily applied to species with fewer genetic resources and less well-characterized genomes. In addition, these tools should enhance future efforts to explore the relationships among phenotypic similarity, gene function, and sequence similarity in plants, and to make genotype-to-phenotype predictions relevant to plant biology, crop improvement, and potentially even human health.This item is part of the UA Faculty Publications collection. For more information this item or other items in the UA Campus Repository, contact the University of Arizona Libraries at [email protected]

    2007 AAPP Monograph Series

    Get PDF
    The African American Professors Program (AAPP) at the University of South Carolina is proud to publish the seventh edition of its annual monograph series. Furthermore, it is an honor to celebrate the remarkable tenth anniversary of AAPP through these manuscripts. The program recognizes the significance of offering its scholars a venue to engage actively in research and publish papers related thereto. Parallel with the publication of their refereed manuscripts is the opportunity to gain visibility among scholars throughout institutions worldwide. Scholars who have contributed papers for this monograph are to be commended for adding this responsibility to their academic workload. Writing across disciplines adds to the intellectual diversity of these papers. From neophytes, relatively speaking, to an array of very experienced individuals, the chapters have been researched and comprehensively written. Founded in 1997 through the Department of Educational Leadership and Policies in the College of Education, AAPP was designed to address the underrepresentation of African American professors on college and university campuses. Its mission is to expand the pool of these professors in critical academic and research areas. Sponsored by the University of South Carolina, the W.K. Kellogg Foundation, and the South Carolina General Assembly, the program recruits doctoral students for disciplines in which African Americans currently are underrepresented among faculty in higher education. The continuation of this monograph series is seen as responding to a window of opportunity to be sensitive to an academic expectation of graduates as they pursue career placement and, at the same time, one that allows for the dissemination of products to a broader community. The importance of this monograph series has been voiced by one of our 2002 AAPP graduates, Dr. Shundele LaTjuan Dogan, formerly an Administrative Fellow at Harvard University and a Program Officer for the Southern Education Foundation. She is currently a Program Officer for the Arthur M. Blank Foundation in Atlanta. Dr. Dogan wrote: One thing in particular that I want to thank you for is having the African American Professors Program scholars publish articles for the monograph. have to admit that writing the articles seemed like extra work at the time. However, in my recent interview process, organizations have asked me for samples of my writing. Including an article from a published monograph helped to make my portfolio much more impressive. You were \u27right on target\u27 in having us do the monograph series. (AAPP 2003 Monograph, p. xi) The African American Professors Program dedicates this 2007 tenth anniversary publication as a special contribution to its readership and hopes that each will be inspired by this interdisciplinary group of manuscripts. John McFadden, Ph.D. The Benjamin Elijah Mays Distinguished Professor Emeritus Director, African American Professors Program University of South Carolinahttps://scholarcommons.sc.edu/mcfadden_monographs/1009/thumbnail.jp

    Abundance of SSR Motifs and Development of Candidate Polymorphic SSR Markers (BARCSOYSSR_1.0) in Soybean

    Get PDF
    Simple sequence repeat (SSR) genetic markers, also referred to as microsatellites, function in map-based cloning and for marker-assisted selection in plant breeding. The objectives of this study were to determine the abundance of SSRs in the soybean genome and to develop and test soybean SSR markers to create a database of locus-specific markers with a high likelihood of polymorphism. A total of 210,990 SSRs with di-, tri-, and tetranucleotide repeats of five or more were identified in the soybean whole genome sequence (WGS) which included 61,458 SSRs consisting of repeat units of di- (≥10), tri- (≥8), and tetranucleotide (≥7). Among the 61,458 SSRs, (AT)n, (ATT)n and (AAAT)n were the most abundant motifs among di-, tri-, and tetranucleotide SSRs, respectively. After screening for a number of factors including locus-specificity using e-PCR, a soybean SSR database (BARCSOYSSR_1.0) with the genome position and primer sequences for 33,065 SSRs was created. To examine the likelihood that primers in the database would function to amplify locus-specific polymorphic products, 1034 primer sets were evaluated by amplifying DNAs of seven diverse Glycine max (L.) Merr. and one wild soybean (Glycine soja Siebold & Zucc.) genotypes. A total of 978 (94.6%) of the primer sets amplified a single polymerase chain reaction (PCR) product and 798 (77.2%) amplified polymorphic amplicons as determined by 4.5% agarose gel electrophoresis. The BARCSOYSSR1.0 SSR markers can be found in Soy- Base (http://soybase.org; verified 21 June 2010) the USDA-ARS Soybean Genome Database
    corecore