45 research outputs found
G-Quadruplex (G4) Motifs in the Maize (Zea mays L.) Genome Are Enriched at Specific Locations in Thousands of Genes Coupled to Energy Status, Hypoxia, Low Sugar, and Nutrient Deprivation
The G-quadruplex (G4) elements comprise a class of nucleic acid structures formed by stacking of guanine base quartets in a quadruple helix. This G4 DNA can form within or across single-stranded DNA molecules and is mutually exclusive with duplex B-form DNA. The reversibility and structural diversity of G4s make them highly versatile genetic structures, as demonstrated by their roles in various functions including telomere metabolism, genome maintenance, immunoglobulin gene diversification, transcription, and translation. Sequence motifs capable of forming G4 DNA are typically located in telomere repeat DNA and other non-telomeric genomic loci. To investigate their potential roles in a large-genome model plant species, we computationally identified 149,988 non-telomeric G4 motifs in maize (Zea mays L., B73 AGPv2), 29% of which were in non-repetitive genomic regions. G4 motif hotspots exhibited non-random enrichment in genes at two locations on the antisense strand, one in the 5âČ UTR and the other at the 5âČ end of the first intron. Several genic G4 motifs were shown to adopt sequence-specific and potassium-dependent G4 DNA structures in vitro. The G4 motifs were prevalent in key regulatory genes associated with hypoxia (group VII ERFs), oxidative stress (DJ-1/GATase1), and energy status (AMPK/SnRK) pathways. They also showed statistical enrichment for genes in metabolic pathways that function in glycolysis, sugar degradation, inositol metabolism, and base excision repair. Collectively, the maize G4 motifs may represent conditional regulatory elements that can aid in energy status gene responses. Such a network of elements could provide a mechanistic basis for linking energy status signals to gene regulation in maize, a model genetic system and major world crop species for feed, food, and fuel
The Locus Lookup tool at MaizeGDB: identification of genomic regions in maize by integrating sequence information with physical and genetic maps
Methods to automatically integrate sequence information with physical and genetic maps are scarce. The Locus Lookup tool enables researchers to define windows of genomic sequence likely to contain loci of interest where only genetic or physical mapping associations are reported. Using the Locus Lookup tool, researchers will be able to locate specific genes more efficiently that will ultimately help them develop a better maize plant. With the availability of the well-documented source code, the tool can be easily adapted to other biological systems
qTeller: a tool for comparative multi-genomic gene expression analysis
Motivation: Over the last decade, RNA-Seq whole-genome sequencing has become a widely used method for measuring and understanding transcriptome-level changes in gene expression. Since RNA-Seq is relatively inexpensive, it can be used on multiple genomes to evaluate gene expression across many different conditions, tissues and cell types. Although many tools exist to map and compare RNA-Seq at the genomics level, few web-based tools are dedicated to making data generated for individual genomic analysis accessible and reusable at a gene-level scale for comparative analysis between genes, across different genomes and meta-analyses. Results: To address this challenge, we revamped the comparative gene expression tool qTeller to take advantage of the growing number of public RNA-Seq datasets. qTeller allows users to evaluate gene expression data in a defined genomic interval and also perform two-gene comparisons across multiple user-chosen tissues. Though previously unpublished, qTeller has been cited extensively in the scientific literature, demonstrating its importance to researchers. Our new version of qTeller now supports multiple genomes for intergenomic comparisons, and includes capabilities for both mRNA and protein abundance datasets. Other new features include support for additional data formats, modernized interface and back-end database and an optimized framework for adoption by other organismsâ databases.
Availability and implementation: The source code for qTeller is open-source and available through GitHub (https:// github.com/Maize-Genetics-and-Genomics-Database/qTeller). A maize instance of qTeller is available at the Maize Genetics and Genomics database (MaizeGDB) (https://qteller.maizegdb.org/), where we have mapped over 200 unique datasets from GenBank across 27 maize genomes
Association mapping across a multitude of traits collected in diverse environments in maize
Classical genetic studies have identified many cases of pleiotropy where mutations in individual genes alter many different phenotypes. Quantitative genetic studies of natural genetic variants frequently examine one or a few traits, limiting their potential to identify pleiotropic effects of natural genetic variants. Widely adopted community association panels have been employed by plant genetics communities to study the genetic basis of naturally occurring phenotypic variation in a wide range of traits. High-density genetic marker dataâ18M markersâfrom 2 partially overlapping maize association panels comprising 1,014 unique genotypes grown in field trials across at least 7 US states and scored for 162 distinct trait data sets enabled the identification of of 2,154 suggestive marker-trait associations and 697 confident associations in the maize genome using a resampling-based genome-wide association strategy. The precision of individual marker-trait associations was estimated to be 3 genes based on a reference set of genes with known phenotypes. Examples were observed of both genetic loci associated with variation in diverse traits (e.g., above-ground and below-ground traits), as well as individual loci associated with the same or similar traits across diverse environments. Many significant signals are located near genes whose functions were previously entirely unknown or estimated purely via functional data on homologs. This study demonstrates the potential of mining community association panel data using new higher-density genetic marker sets combined with resampling-based genome-wide association tests to develop testable hypotheses about gene functions, identify potential pleiotropic effects of natural genetic variants, and study genotype-by-environment interaction
POPcorn: An Online Resource Providing Access to Distributed and Diverse Maize Project Data
The purpose of the online resource presented here, POPcorn (Project Portal for corn), is to enhance accessibility of maize genetic and genomic resources for plant biologists. Currently, many online locations are difficult to find, some are best searched independently, and individual project websites often degrade over timeâsometimes disappearing entirely. The POPcorn site makes available (1) a centralized, web-accessible resource to search and browse descriptions of ongoing maize genomics projects, (2) a single, stand-alone tool that uses web Services and minimal data warehousing to search for sequence matches in online resources of diverse offsite projects, and (3) a set of tools that enables researchers to migrate their data to the long-term model organism database for maize genetic and genomic information: MaizeGDB. Examples demonstrating POPcorn's utility are provided herein
Choosing a genome browser for a Model Organism Database: surveying the Maize community
As the B73 maize genome sequencing project neared completion, MaizeGDB began to integrate a graphical genome browser with its existing web interface and database. To ensure that maize researchers would optimally benefit from the potential addition of a genome browser to the existing MaizeGDB resource, personnel at MaizeGDB surveyed researchersâ needs. Collected data indicate that existing genome browsers for maize were inadequate and suggest implementation of a browser with quick interface and intuitive tools would meet most researchersâ needs. Here, we document the surveyâs outcomes, review functionalities of available genome browser software platforms and offer our rationale for choosing the GBrowse software suite for MaizeGDB. Because the genome as represented within the MaizeGDB Genome Browser is tied to detailed phenotypic data, molecular marker information, available stocks, etc., the MaizeGDB Genome Browser represents a novel mechanism by which the researchers can leverage maize sequence information toward crop improvement directly
AgBioData consortium recommendations for sustainable genomics and genetics databases for agriculture
The future of agricultural research depends on data. The sheer volume of agricultural biological data being produced today makes excellent data management essential. Governmental agencies, publishers and science funders require data management plans for publicly funded research. Furthermore, the value of data increases exponentially when they are properly stored, described, integrated and shared, so that they can be easily utilized in future analyses. AgBioData (https://www.agbiodata.org) is a consortium of people working at agricultural biological databases, data archives and knowledgbases who strive to identify common issues in database development, curation and management, with the goal of creating database products that are more Findable, Accessible, Interoperable and Reusable. We strive to promote authentic, detailed, accurate and explicit communication between all parties involved in scientific data. As a step toward this goal, we present the current state of biocuration, ontologies, metadata and persistence, database platforms, programmatic (machine) access to data, communication and sustainability with regard to data curation. Each section describes challenges and opportunities for these topics, along with recommendations and best practices
Predicting Tissue-Specific mRNA and Protein Abundance in Maize: A Machine Learning Approach
Machine learning and modeling approaches have been used to classify protein sequences for a broad set of tasks including predicting protein function, structure, expression, and localization. Some recent studies have successfully predicted whether a given gene is expressed as mRNA or even translated to proteins potentially, but given that not all genes are expressed in every condition and tissue, the challenge remains to predict condition-specific expression. To address this gap, we developed a machine learning approach to predict tissue-specific gene expression across 23 different tissues in maize, solely based on DNA promoter and protein sequences. For class labels, we defined high and low expression levels for mRNA and protein abundance and optimized classifiers by systematically exploring various methods and combinations of k-mer sequences in a two-phase approach. In the first phase, we developed Markov model classifiers for each tissue and built a feature vector based on the predictions. In the second phase, the feature vector was used as an input to a Bayesian network for final classification. Our results show that these methods can achieve high classification accuracy of up to 95% for predicting gene expression for individual tissues. By relying on sequence alone, our method works in settings where costly experimental data are unavailable and reveals useful insights into the functional, evolutionary, and regulatory characteristics of genes
Co-expression pan-network reveals genes involved in complex traits within maize pan-genome.
BackgroundWith the advances in the high throughput next generation sequencing technologies, genome-wide association studies (GWAS) have identified a large set of variants associated with complex phenotypic traits at a very fine scale. Despite the progress in GWAS, identification of genotype-phenotype relationship remains challenging in maize due to its nature with dozens of variants controlling the same trait. As the causal variations results in the change in expression, gene expression analyses carry a pivotal role in unraveling the transcriptional regulatory mechanisms behind the phenotypes.ResultsTo address these challenges, we incorporated the gene expression and GWAS-driven traits to extend the knowledge of genotype-phenotype relationships and transcriptional regulatory mechanisms behind the phenotypes. We constructed a large collection of gene co-expression networks and identified more than 2 million co-expressing gene pairs in the GWAS-driven pan-network which contains all the gene-pairs in individual genomes of the nested association mapping (NAM) population. We defined four sub-categories for the pan-network: (1) core-network contains the highest represented ~â1% of the gene-pairs, (2) near-core network contains the next highest represented 1-5% of the gene-pairs, (3) private-network contains ~â50% of the gene pairs that are unique to individual genomes, and (4) the dispensable-network contains the remaining 50-95% of the gene-pairs in the maize pan-genome. Strikingly, the private-network contained almost all the genes in the pan-network but lacked half of the interactions. We performed gene ontology (GO) enrichment analysis for the pan-, core-, and private- networks and compared the contributions of variants overlapping with genes and promoters to the GWAS-driven pan-network.ConclusionsGene co-expression networks revealed meaningful information about groups of co-regulated genes that play a central role in regulatory processes. Pan-network approach enabled us to visualize the global view of the gene regulatory network for the studied system that could not be well inferred by the core-network alone