160 research outputs found

    From manual curation to visualization of gene families and networks across Solanaceae plant species

    Get PDF
    High-quality manual annotation methods and practices need to be scaled to the increased rate of genomic data production. Curation based on gene families and gene networks is one approach that can significantly increase both curation efficiency and quality. The Sol Genomics Network (SGN; http://solgenomics.net) is a comparative genomics platform, with genetic, genomic and phenotypic information of the Solanaceae family and its closely related species that incorporates a community-based gene and phenotype curation system. In this article, we describe a manual curation system for gene families aimed at facilitating curation, querying and visualization of gene interaction patterns underlying complex biological processes, including an interface for efficiently capturing information from experiments with large data sets reported in the literature. Well-annotated multigene families are useful for further exploration of genome organization and gene evolution across species. As an example, we illustrate the system with the multigene transcription factor families, WRKY and Small Auxin Up-regulated RNA (SAUR), which both play important roles in responding to abiotic stresses in plants

    TomExpress, a unified tomato RNA-Seq platform for visualization of expression data, clustering and correlation networks

    Get PDF
    The TomExpress platform was developed to provide the tomato research community with a browser and integrated web tools for public RNA-Seq data visualization and data mining. To avoid major biases that can result from the use of different mapping and statistical processing methods, RNA-Seq raw sequence data available in public databases were mapped de novo on a unique tomato reference genome sequence and post-processed using the same pipeline with accurate parameters. Following the calculation of the number of counts per gene in each RNA-Seq sample, a communal global normalization method was applied to all expression values. This unifies the whole set of expression data and makes them comparable. A database was designed where each expression value is associated with corresponding experimental annotations. Sample details were manually curated to be easily understandable by biologists. To make the data easily searchable, a user-friendly web interface was developed that provides versatile data mining web tools via on-the-fly generation of output graphics, such as expression bar plots, comprehensive in planta representations and heatmaps of hierarchically clustered expression data. In addition, it allows for the identification of co-expressed genes and the visualization of correlation networks of co-regulated gene groups. TomExpress provides one of the most complete free resources of publicly available tomato RNA-Seq data, and allows for the immediate interrogation of transcriptional programs that regulate vegetative and reproductive development in tomato under diverse conditions. The design of the pipeline developed in this project enables easy updating of the database with newly published RNA-Seq data, thereby allowing for continuous enrichment of the resource

    The Sol Genomics Network (SGN)—from genotype to phenotype to breeding

    Get PDF
    The Sol Genomics Network (SGN, http://solgenomics.net) is a web portal with genomic and phenotypic data, and analysis tools for the Solanaceae family and close relatives. SGN hosts whole genome data for an increasing number of Solanaceae family members including tomato, potato, pepper, eggplant, tobacco and Nicotiana benthamiana. The database also stores loci and phenotype data, which researchers can upload and edit with user-friendly web interfaces. Tools such as BLAST, GBrowse and JBrowse for browsing genomes, expression and map data viewers, a locus community annotation system and a QTL analysis tools are available. A new tool was recently implemented to improve Virus-Induced Gene Silencing (VIGS) constructs called the SGN VIGS tool. With the growing genomic and phenotypic data in the database, SGN is now advancing to develop new web-based breeding tools and implement the code and database structure for other species or clade-specific databases

    The Sol Genomics Network (SGN)--from genotype to phenotype to breeding

    Get PDF
    The Sol Genomics Network (SGN, http://solgenomics.net) is a web portal with genomic and phenotypic data, and analysis tools for the Solanaceae family and close relatives. SGN hosts whole genome data for an increasing number of Solanaceae family members including tomato, potato, pepper, eggplant, tobacco and Nicotiana benthamiana. The database also stores loci and phenotype data, which researchers can upload and edit with user-friendly web interfaces. Tools such as BLAST, GBrowse and JBrowse for browsing genomes, expression and map data viewers, a locus community annotation system and a QTL analysis tools are available. A new tool was recently implemented to improve Virus-Induced Gene Silencing (VIGS) constructs called the SGN VIGS tool. With the growing genomic and phenotypic data in the database, SGN is now advancing to develop new web-based breeding tools and implement the code and database structure for other species or clade-specific databases.Peer reviewe

    AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture

    Get PDF
    The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices

    Genomics data integration for knowledge discovery using genome annotations from molecular databases and scientific literature

    Get PDF
    One of the major global challenges of today is to meet the food demands of an ever increasing population (food demand will increase by 50% in 2030). One approach to address this challenge is to breed new crop varieties that yield more even under unfavorable conditions e.g. have improved tolerance to drought and/or resistance to pathogens. However, designing a breeding program is a laborious and time consuming effort that often lacks the capacity to generate new cultivars quickly in response to the required traits. Recent advances in biotechnology and genomics data science have the potential to accelerate and precise breeding programs greatly. As large-scale genomic data sets for crop species are available in multiple independent data sources and scientific literature, this thesis provides innovative technologies that use natural language processing (NLP) and semantic web technologies to address challenges of integrating genomic data for improving plant breeding. Firstly, in this research study, we developed a supervised Natural language processing (NLP) model with the help of IBM Watson, to extract knowledge networks containing genotypic-phenotypic associations of potato tuber flesh color from the scientific literature. Secondly, a table mining tool called QTLTableMiner++ (QTM) was developed which enables knowledge discovery of novel genomic regions (such as QTL regions), which positively or negatively affect the traits of interest. The objective of both above mentioned, NLP techniques was to extract information which is implicitly described in the literature and is not available in structured resources, like databases. Thirdly, with the help of semantic web technology, a linked-data platform called Solanaceae linked data platform(pbg-ld) was developed, to semantically integrates geno- and pheno-typic data of Solanaceae species. This platform combines both unstructured data from scientific literature and structured data from publicly available biological databases using the Linked Data approach. Lastly, analysis workflows for prioritizing candidate genes with QTL regions were tested using pbg-ld. Hence, this research provides in-silico knowledge discovery tools and genomic data infrastructure, which aids researchers and breeders in the design of a precise and improved breeding program.</p

    Molecular Evolutionary Trends and Feeding Ecology Diversification in the Hemiptera, Anchored by the Milkweed Bug Genome

    Get PDF
    Background: The Hemiptera (aphids, cicadas, and true bugs) are a key insect order, with high diversity for feeding ecology and excellent experimental tractability for molecular genetics. Building upon recent sequencing of hemipteran pests such as phloem-feeding aphids and blood-feeding bed bugs, we present the genome sequence and comparative analyses centered on the milkweed bug Oncopeltus fasciatus, a seed feeder of the family Lygaeidae. Results: The 926-Mb Oncopeltus genome is well represented by the current assembly and official gene set. We use our genomic and RNA-seq data not only to characterize the protein-coding gene repertoire and perform isoform-specific RNAi, but also to elucidate patterns of molecular evolution and physiology. We find ongoing, lineage-specific expansion and diversification of repressive C2H2 zinc finger proteins. The discovery of intron gain and turnover specific to the Hemiptera also prompted the evaluation of lineage and genome size as predictors of gene structure evolution. Furthermore, we identify enzymatic gains and losses that correlate with feeding biology, particularly for reductions associated with derived, fluid nutrition feeding. Conclusions: With the milkweed bug, we now have a critical mass of sequenced species for a hemimetabolous insect order and close outgroup to the Holometabola, substantially improving the diversity of insect genomics. We thereby define commonalities among the Hemiptera and delve into how hemipteran genomes reflect distinct feeding ecologies. Given Oncopeltus’s strength as an experimental model, these new sequence resources bolster the foundation for molecular research and highlight technical considerations for the analysis of medium-sized invertebrate genomes

    A genome scale metabolic network for rice and accompanying analysis of tryptophan, auxin and serotonin biosynthesis regulation under biotic stress

    Get PDF
    Background Functional annotations of large plant genome projects mostly provide information on gene function and gene families based on the presence of protein domains and gene homology, but not necessarily in association with gene expression or metabolic and regulatory networks. These additional annotations are necessary to understand the physiology, development and adaptation of a plant and its interaction with the environment. Results RiceCyc is a metabolic pathway networks database for rice. It is a snapshot of the substrates, metabolites, enzymes, reactions and pathways of primary and intermediary metabolism in rice. RiceCyc version 3.3 features 316 pathways and 6,643 peptide-coding genes mapped to 2,103 enzyme-catalyzed and 87 protein-mediated transport reactions. The initial functional annotations of rice genes with InterPro, Gene Ontology, MetaCyc, and Enzyme Commission (EC) numbers were enriched with annotations provided by KEGG and Gramene databases. The pathway inferences and the network diagrams were first predicted based on MetaCyc reference networks and plant pathways from the Plant Metabolic Network, using the Pathologic module of Pathway Tools. This was enriched by manually adding metabolic pathways and gene functions specifically reported for rice. The RiceCyc database is hierarchically browsable from pathway diagrams to the associated genes, metabolites and chemical structures. Through the integrated tool OMICs Viewer, users can upload transcriptomic, proteomic and metabolomic data to visualize expression patterns in a virtual cell. RiceCyc, along with additional species-specific pathway databases hosted in the Gramene project, facilitates comparative pathway analysis. Conclusions Here we describe the RiceCyc network development and discuss its contribution to rice genome annotations. As a case study to demonstrate the use of RiceCyc network as a discovery environment we carried out an integrated bioinformatic analysis of rice metabolic genes that are differentially regulated under diurnal photoperiod and biotic stress treatments. The analysis of publicly available rice transcriptome datasets led to the hypothesis that the complete tryptophan biosynthesis and its dependent metabolic pathways including serotonin biosynthesis are induced by taxonomically diverse pathogens while also being under diurnal regulation. The RiceCyc database is available online for free access at http://www.gramene.org/pathway

    Whole-Plant Growth Stage Ontology for Angiosperms and Its Application in Plant Biology

    Get PDF
    • …
    corecore