44 research outputs found
Network and multi-scale signal analysis for the integration of large omic datasets: applications in \u3ci\u3ePopulus trichocarpa\u3c/i\u3e
Poplar species are promising sources of cellulosic biomass for biofuels because of their fast growth rate, high cellulose content and moderate lignin content. There is an increasing movement on integrating multiple layers of âomics data in a systems biology approach to understand gene-phenotype relationships and assist in plant breeding programs. This dissertation involves the use of network and signal processing techniques for the combined analysis of these various data types, for the goals of (1) increasing fundamental knowledge of P. trichocarpa and (2) facilitating the generation of hypotheses about target genes and phenotypes of interest. A data integration âLines of Evidenceâ method is presented for the identification and prioritization of target genes involved in functions of interest. A new post-GWAS method, Pleiotropy Decomposition, is presented, which extracts pleiotropic relationships between genes and phenotypes from GWAS results, allowing for identification of genes with signatures favorable to genome editing. Continuous wavelet transform signal processing analysis is applied in the characterization of genome distributions of various features (including variant density, gene density, and methylation profiles) in order to identify chromosome structures such as the centromere. This resulted in the approximate centromere locations on all P. trichocarpa chromosomes, which had previously not been adequately reported in the scientific literature. Discrete wavelet transform signal processing followed by correlation analysis was applied to genomic features from various data types including transposable element density, methylation density, SNP density, gene density, centromere position and putative ancestral centromere position. Subsequent correlation analysis of the resulting wavelet coefficients identified scale-specific relationships between these genomic features, and provide insights into the evolution of the genome structure of P. trichocarpa. These methods have provided strategies to both increase fundamental knowledge about the P. trichocarpa system, as well as to identify new target genes related to biofuels targets. We intend that these approaches will ultimately be used in the designing of better plants for more efficient and sustainable production of bioenergy
Linking crop traits to transcriptome differences in a progeny population of tetraploid potato
Background Potato is the third most consumed crop in the world. Breeding for traits such as yield, product quality and pathogen resistance are main priorities. Identifying molecular signatures of these and other important traits is important in future breeding efforts. In this study, a progeny population from a cross between a breeding line, SW93-1015, and a cultivar, Desiree, was studied by trait analysis and RNA-seq in order to develop understanding of segregating traits at the molecular level and identify transcripts with expressional correlation to these traits. Transcript markers with predictive value for field performance applicable under controlled environments would be of great value for plant breeding. Results A total of 34 progeny lines from SW93-1015 and Desiree were phenotyped for 17 different traits in a field in Nordic climate conditions and controlled climate settings. A master transcriptome was constructed with all 34 progeny lines and the parents through a de novo assembly of RNA-seq reads. Gene expression data obtained in a controlled environment from the 34 lines was correlated to traits by different similarity indices, including Pearson and Spearman, as well as DUO, which calculates the co-occurrence between high and low values for gene expression and trait. Our study linked transcripts to traits such as yield, growth rate, high laying tubers, late and tuber blight, tuber greening and early flowering. We found several transcripts associated to late blight resistance and transcripts encoding receptors were associated to Dickeya solani susceptibility. Transcript levels of a UBX-domain protein was negatively associated to yield and a GLABRA2 expression modulator was negatively associated to growth rate. Conclusion In our study, we identify 100's of transcripts, putatively linked based on expression with 17 traits of potato, representing both well-known and novel associations. This approach can be used to link the transcriptome to traits. We explore the possibility of associating the level of transcript expression from controlled, optimal environments to traits in a progeny population with different methods introducing the application of DUO for the first time on transcriptome data. We verify the expression pattern for five of the putative transcript markers in another progeny population
A Variable Polyglutamine Repeat Affects Subcellular Localization and Regulatory Activity of a Populus ANGUSTIFOLIA Protein.
Polyglutamine (polyQ) stretches have been reported to occur in proteins across many organisms including animals, fungi and plants. Expansion of these repeats has attracted much attention due their associations with numerous human diseases including Huntington's and other neurological maladies. This suggests that the relative length of polyQ stretches is an important modulator of their function. Here, we report the identification of a Populus C-terminus binding protein (CtBP) ANGUSTIFOLIA (PtAN1) which contains a polyQ stretch whose functional relevance had not been established. Analysis of 917 resequenced Populus trichocarpa genotypes revealed three allelic variants at this locus encoding 11-, 13- and 15-glutamine residues. Transient expression assays using Populus leaf mesophyll protoplasts revealed that the 11Q variant exhibited strong nuclear localization whereas the 15Q variant was only found in the cytosol, with the 13Q variant exhibiting localization in both subcellular compartments. We assessed functional implications by evaluating expression changes of putative PtAN1 targets in response to overexpression of the three allelic variants and observed allele-specific differences in expression levels of putative targets. Our results provide evidence that variation in polyQ length modulates PtAN1 function by altering subcellular localization
Pleiotropic and Epistatic Network-Based Discovery: Integrated Networks for Target Gene Discovery
Biological organisms are complex systems that are composed of functional networks of interacting molecules and macro-molecules. Complex phenotypes are the result of orchestrated, hierarchical, heterogeneous collections of expressed genomic variants. However, the effects of these variants are the result of historic selective pressure and current environmental and epigenetic signals, and, as such, their co-occurrence can be seen as genome-wide correlations in a number of different manners. Biomass recalcitrance (i.e., the resistance of plants to degradation or deconstruction, which ultimately enables access to a plantâs sugars) is a complex polygenic phenotype of high importance to biofuels initiatives. This study makes use of data derived from the re-sequenced genomes from over 800 different Populus trichocarpa genotypes in combination with metabolomic and pyMBMS data across this population, as well as co-expression and co-methylation networks in order to better understand the molecular interactions involved in recalcitrance, and identify target genes involved in lignin biosynthesis/degradation. A Lines Of Evidence (LOE) scoring system is developed to integrate the information in the different layers and quantify the number of lines of evidence linking genes to target functions. This new scoring system was applied to quantify the lines of evidence linking genes to lignin-related genes and phenotypes across the network layers, and allowed for the generation of new hypotheses surrounding potential new candidate genes involved in lignin biosynthesis in P. trichocarpa, including various AGAMOUS-LIKE genes. The resulting Genome Wide Association Study networks, integrated with Single Nucleotide Polymorphism (SNP) correlation, co-methylation, and co-expression networks through the LOE scores are proving to be a powerful approach to determine the pleiotropic and epistatic relationships underlying cellular functions and, as such, the molecular basis for complex phenotypes, such as recalcitrance
Recommended from our members
Wavelet-Based Genomic Signal Processing for Centromere Identification and Hypothesis Generation
Various âomics data types have been generated for Populus trichocarpa, each providing a layer of information which can be represented as a density signal across a chromosome. We make use of genome sequence data, variants data across a population as well as methylation data across 10 different tissues, combined with wavelet-based signal processing to perform a comprehensive analysis of the signature of the centromere in these different data signals, and successfully identify putative centromeric regions in P. trichocarpa from these signals. Furthermore, using SNP (single nucleotide polymorphism) correlations across a natural population of P. trichocarpa, we find evidence for the co-evolution of the centromeric histone CENH3 with the sequence of the newly identified centromeric regions, and identify a new CENH3 candidate in P. trichocarpa
Fungal-Bacterial Networks in the Populus Rhizobiome Are Impacted by Soil Properties and Host Genotype
Plant root-associated microbial symbionts comprise the plant rhizobiome. These microbes function in provisioning nutrients and water to their hosts, impacting plant health and disease. The plant microbiome is shaped by plant species, plant genotype, soil and environmental conditions, but the contributions of these variables are hard to disentangle from each other in natural systems. We used bioassay common garden experiments to decouple plant genotype and soil property impacts on fungal and bacterial community structure in the Populus rhizobiome. High throughput amplification and sequencing of 16S, ITS, 28S and 18S rDNA was accomplished through 454 pyrosequencing. Co-association patterns of fungal and bacterial taxa were assessed with 16S and ITS datasets. Community bipartite fungal-bacterial networks and PERMANOVA results attribute significant difference in fungal or bacterial communities to soil origin, soil chemical properties and plant genotype. Indicator species analysis identified a common set of root bacteria as well as endophytic and ectomycorrhizal fungi associated with Populus in different soils. However, no single taxon, or consortium of microbes, was indicative of a particular Populus genotype. Fungal-bacterial networks were over-represented in arbuscular mycorrhizal, endophytic, and ectomycorrhizal fungi, as well as bacteria belonging to the orders Rhizobiales, Chitinophagales, Cytophagales, and Burkholderiales. These results demonstrate the importance of soil and plant genotype on fungal-bacterial networks in the belowground plant microbiome
High Throughput Screening Technologies in Biomass Characterization
Biomass analysis is a slow and tedious process and not solely due to the long generation time for most plant species. Screening large numbers of plant variants for various geno-, pheno-, and chemo-types, whether naturally occurring or engineered in the lab, has multiple challenges. Plant cell walls are complex, heterogeneous networks that are difficult to deconstruct and analyze. Macroheterogeneity from tissue types, age, and environmental factors makes representative sampling a challenge and natural variability generates a significant range in data. Using high throughput (HTP) methodologies allows for large sample sets and replicates to be examined, narrowing in on more precise data for various analyses. This review provides a comprehensive survey of high throughput screening as applied to biomass characterization, from compositional analysis of cell walls by NIR, NMR, mass spectrometry, and wet chemistry to functional screening of changes in recalcitrance via HTP thermochemical pretreatment coupled to enzyme hydrolysis and microscale fermentation. The advancements and development of most high-throughput methods have been achieved through utilization of state-of-the art equipment and robotics, rapid detection methods, as well as reduction in sample size and preparation procedures. The computational analysis of the large amount of data generated using high throughput analytical techniques has recently become more sophisticated, faster and economically viable, enabling a more comprehensive understanding of biomass genomics, structure, composition, and properties. Therefore, methodology for analyzing large datasets generated by the various analytical techniques is also covered
Multi-Phenotype Association Decomposition: Unraveling Complex Gene-Phenotype Relationships
Various patterns of multi-phenotype associations (MPAs) exist in the results of Genome Wide Association Studies (GWAS) involving different topologies of single nucleotide polymorphism (SNP)-phenotype associations. These can provide interesting information about the different impacts of a gene on closely related phenotypes or disparate phenotypes (pleiotropy). In this work we present MPA Decomposition, a new network-based approach which decomposes the results of a multi-phenotype GWAS study into three bipartite networks, which, when used together, unravel the multi-phenotype signatures of genes on a genome-wide scale. The decomposition involves the construction of a phenotype powerset space, and subsequent mapping of genes into this new space. Clustering of genes in this powerset space groups genes based on their detailed MPA signatures. We show that this method allows us to find multiple different MPA and pleiotropic signatures within individual genes and to classify and cluster genes based on these SNP-phenotype association topologies. We demonstrate the use of this approach on a GWAS analysis of a large population of 882 Populus trichocarpa genotypes using untargeted metabolomics phenotypes. This method should prove invaluable in the interpretation of large GWAS datasets and aid in future synthetic biology efforts designed to optimize phenotypes of interest