59 research outputs found
Using Deep RNA Sequencing for the Structural Annotation of the Laccaria Bicolor Mycorrhizal Transcriptome
BACKGROUND: Accurate structural annotation is important for prediction of function and required for in vitro approaches to characterize or validate the gene expression products. Despite significant efforts in the field, determination of the gene structure from genomic data alone is a challenging and inaccurate process. The ease of acquisition of transcriptomic sequence provides a direct route to identify expressed sequences and determine the correct gene structure. METHODOLOGY: We developed methods to utilize RNA-seq data to correct errors in the structural annotation and extend the boundaries of current gene models using assembly approaches. The methods were validated with a transcriptomic data set derived from the fungus Laccaria bicolor, which develops a mycorrhizal symbiotic association with the roots of many tree species. Our analysis focused on the subset of 1501 gene models that are differentially expressed in the free living vs. mycorrhizal transcriptome and are expected to be important elements related to carbon metabolism, membrane permeability and transport, and intracellular signaling. Of the set of 1501 gene models, 1439 (96%) successfully generated modified gene models in which all error flags were successfully resolved and the sequences aligned to the genomic sequence. The remaining 4% (62 gene models) either had deviations from transcriptomic data that could not be spanned or generated sequence that did not align to genomic sequence. The outcome of this process is a set of high confidence gene models that can be reliably used for experimental characterization of protein function. CONCLUSIONS: 69% of expressed mycorrhizal JGI "best" gene models deviated from the transcript sequence derived by this method. The transcriptomic sequence enabled correction of a majority of the structural inconsistencies and resulted in a set of validated models for 96% of the mycorrhizal genes. The method described here can be applied to improve gene structural annotation in other species, provided that there is a sequenced genome and a set of gene models
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plant’s sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes use of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. The resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance
Recommended from our members
Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis Generation
Various ‘omics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa
The Sorghum bicolor reference genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of genome organization.
Sorghum bicolor is a drought tolerant C4 grass used for the production of grain, forage, sugar, and lignocellulosic biomass and a genetic model for C4 grasses due to its relatively small genome (approximately 800 Mbp), diploid genetics, diverse germplasm, and colinearity with other C4 grass genomes. In this study, deep sequencing, genetic linkage analysis, and transcriptome data were used to produce and annotate a high-quality reference genome sequence. Reference genome sequence order was improved, 29.6 Mbp of additional sequence was incorporated, the number of genes annotated increased 24% to 34 211, average gene length and N50 increased, and error frequency was reduced 10-fold to 1 per 100 kbp. Subtelomeric repeats with characteristics of Tandem Repeats in Miniature (TRIM) elements were identified at the termini of most chromosomes. Nucleosome occupancy predictions identified nucleosomes positioned immediately downstream of transcription start sites and at different densities across chromosomes. Alignment of more than 50 resequenced genomes from diverse sorghum genotypes to the reference genome identified approximately 7.4 M single nucleotide polymorphisms (SNPs) and 1.9 M indels. Large-scale variant features in euchromatin were identified with periodicities of approximately 25 kbp. A transcriptome atlas of gene expression was constructed from 47 RNA-seq profiles of growing and developed tissues of the major plant organs (roots, leaves, stems, panicles, and seed) collected during the juvenile, vegetative and reproductive phases. Analysis of the transcriptome data indicated that tissue type and protein kinase expression had large influences on transcriptional profile clustering. The updated assembly, annotation, and transcriptome data represent a resource for C4 grass research and crop improvement
Genome-wide analysis of lectin receptor-like kinases in Populus
Transcript level of C-type PtLecRLK gene in 24 different datasets from the Populus Gene Atlas Study. RNA-seq data were collected from the Populus Gene Atlas Study in Phytozome v11.0 ( http://phytozome.jgi.doe.gov/pz/portal.html ). The transcript level was expressed as FPKM. The sheet labeled as “whole_set” contains the original FPKM values from Gene Atlas. The data of four different tissues under standard condition are sorted in the data sheet labeled as “standard”. (XLSX 10 kb
Recommended from our members
Light-responsive expression atlas reveals the effects of light quality and intensity in Kalanchoë fedtschenkoi, a plant with crassulacean acid metabolism.
BackgroundCrassulacean acid metabolism (CAM), a specialized mode of photosynthesis, enables plant adaptation to water-limited environments and improves photosynthetic efficiency via an inorganic carbon-concentrating mechanism. Kalanchoë fedtschenkoi is an obligate CAM model featuring a relatively small genome and easy stable transformation. However, the molecular responses to light quality and intensity in CAM plants remain understudied.ResultsHere we present a genome-wide expression atlas of K. fedtschenkoi plants grown under 12 h/12 h photoperiod with different light quality (blue, red, far-red, white light) and intensity (0, 150, 440, and 1,000 μmol m-2 s-1) based on RNA sequencing performed for mature leaf samples collected at dawn (2 h before the light period) and dusk (2 h before the dark period). An eFP web browser was created for easy access of the gene expression data. Based on the expression atlas, we constructed a light-responsive co-expression network to reveal the potential regulatory relationships in K. fedtschenkoi. Measurements of leaf titratable acidity, soluble sugar, and starch turnover provided metabolic indicators of the magnitude of CAM under the different light treatments and were used to provide biological context for the expression dataset. Furthermore, CAM-related subnetworks were highlighted to showcase genes relevant to CAM pathway, circadian clock, and stomatal movement. In comparison with white light, monochrome blue/red/far-red light treatments repressed the expression of several CAM-related genes at dusk, along with a major reduction in acid accumulation. Increasing light intensity from an intermediate level (440 μmol m-2 s-1) of white light to a high light treatment (1,000 μmol m-2 s-1) increased expression of several genes involved in dark CO2 fixation and malate transport at dawn, along with an increase in organic acid accumulation.ConclusionsThis study provides a useful genomics resource for investigating the molecular mechanism underlying the light regulation of physiology and metabolism in CAM plants. Our results support the hypothesis that both light intensity and light quality can modulate the CAM pathway through regulation of CAM-related genes in K. fedtschenkoi
Recommended from our members
Finding New Cell Wall Regulatory Genes in Populus trichocarpa Using Multiple Lines of Evidence.
Understanding the regulatory network controlling cell wall biosynthesis is of great interest in Populus trichocarpa, both because of its status as a model woody perennial and its importance for lignocellulosic products. We searched for genes with putatively unknown roles in regulating cell wall biosynthesis using an extended network-based Lines of Evidence (LOE) pipeline to combine multiple omics data sets in P. trichocarpa, including gene coexpression, gene comethylation, population level pairwise SNP correlations, and two distinct SNP-metabolite Genome Wide Association Study (GWAS) layers. By incorporating validation, ranking, and filtering approaches we produced a list of nine high priority gene candidates for involvement in the regulation of cell wall biosynthesis. We subsequently performed a detailed investigation of candidate gene GROWTH-REGULATING FACTOR 9 (PtGRF9). To investigate the role of PtGRF9 in regulating cell wall biosynthesis, we assessed the genome-wide connections of PtGRF9 and a paralog across data layers with functional enrichment analyses, predictive transcription factor binding site analysis, and an independent comparison to eQTN data. Our findings indicate that PtGRF9 likely affects the cell wall by directly repressing genes involved in cell wall biosynthesis, such as PtCCoAOMT and PtMYB.41, and indirectly by regulating homeobox genes. Furthermore, evidence suggests that PtGRF9 paralogs may act as transcriptional co-regulators that direct the global energy usage of the plant. Using our extended pipeline, we show multiple lines of evidence implicating the involvement of these genes in cell wall regulatory functions and demonstrate the value of this method for prioritizing candidate genes for experimental validation
Engineering Tree Seasonal Cycles of Growth Through Chromatin Modification
In temperate and boreal regions, perennial trees arrest cell division in their meristematic tissues during winter dormancy until environmental conditions become appropriate for their renewed growth. Release from the dormant state requires exposure to a period of chilling temperatures similar to the vernalization required for flowering in Arabidopsis. Over the past decade, genomic DNA (gDNA) methylation and transcriptome studies have revealed signatures of chromatin regulation during active growth and winter dormancy. To date, only a few chromatin modification genes, as candidate regulators of these developmental stages, have been functionally characterized in trees. In this work, we summarize the major findings of the chromatin-remodeling role during growth-dormancy cycles and we explore the transcriptional profiling of vegetative apical bud and stem tissues during dormancy. Finally, we discuss genetic strategies designed to improve the growth and quality of forest trees
Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement
Polyploidy is an evolutionary innovation for many animals and all flowering plants, but its impact on selection and domestication remains elusive. Here we analyze genome evolution and diversification for all five allopolyploid cotton species, including economically important Upland and Pima cottons. Although these polyploid genomes are conserved in gene content and synteny, they have diversified by subgenomic transposon exchanges that equilibrate genome size, evolutionary rate heterogeneities and positive selection between homoeologs within and among lineages. These differential evolutionary trajectories are accompanied by gene-family diversification and homoeolog expression divergence among polyploid lineages. Selection and domestication drive parallel gene expression similarities in fibers of two cultivated cottons, involving coexpression networks and N6-methyladenosine RNA modifications. Furthermore, polyploidy induces recombination suppression, which correlates with altered epigenetic landscapes and can be overcome by wild introgression. These genomic insights will empower efforts to manipulate genetic recombination and modify epigenetic landscapes and target genes for crop improvement
- …