42 research outputs found
Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris
Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.Peer reviewe
Does the seed fall far from the tree? Weak fine-scale genetic structure in a continuous Scots pine population
Knowledge of fine-scale spatial genetic structure, i.e., the distribution of genetic diversity at short distances, is important in evolutionary research and in practical applications such as conservation and breeding programs. In trees, related individuals often grow close to each other due to limited seed and/or pollen dispersal. The extent of seed dispersal also limits the speed at which a tree species can spread to new areas. We studied the fine-scale spatial genetic structure of Scots pine (Pinus sylvestris) in two naturally regenerated sites located 20 km from each other in continuous south-eastern Finnish forest. We genotyped almost 500 adult trees for 150k SNPs using a custom made Affymetrix array. We detected some pairwise relatedness at short distances, but the average relatedness was low and decreased with increasing distance, as expected. Despite the clustering of related individuals, the sampling sites were not differentiated (FST = 0.0005). According to our results, Scots pine has a large neighborhood size (Nb = 1680â3210), but a relatively short gene dispersal distance (Ïg = 36.5â71.3 m). Knowledge of Scots pine fine-scale spatial genetic structure can be used to define suitable sampling distances for evolutionary studies and practical applications. Detailed empirical estimates of dispersal are necessary both in studying post-glacial recolonization and predicting the response of forest trees to climate change
Taming the massive genome of Scots pine with PiSy50k, a new genotyping array for conifer research
Pinus sylvestris (Scots pine) is the most widespread coniferous tree in the boreal forests of Eurasia, with major economic and ecological importance. However, its large and repetitive genome presents a challenge for conducting genome-wide analyses such as association studies, genetic mapping and genomic selection. We present a new 50K single-nucleotide polymorphism (SNP) genotyping array for Scots pine research, breeding and other applications. To select the SNP set, we first genotyped 480 Scots pine samples on a 407 540 SNP screening array and identified 47 712 high-quality SNPs for the final array (called 'PiSy50k'). Here, we provide details of the design and testing, as well as allele frequency estimates from the discovery panel, functional annotation, tissue-specific expression patterns and expression level information for the SNPs or corresponding genes, when available. We validated the performance of the PiSy50k array using samples from Finland and Scotland. Overall, 39 678 (83.2%) SNPs showed low error rates (mean = 0.9%). Relatedness estimates based on array genotypes were consistent with the expected pedigrees, and the level of Mendelian error was negligible. In addition, array genotypes successfully discriminate between Scots pine populations of Finnish and Scottish origins. The PiSy50k SNP array will be a valuable tool for a wide variety of future genetic studies and forestry applications.Peer reviewe
The GenTree Dendroecological Collection, tree-ring and wood density data from seven tree species across Europe
The dataset presented here was collected by the GenTree project (EU-Horizon 2020), which aims to improve the use of forest genetic resources across Europe by better understanding how trees adapt to their local environment. This dataset of individual tree-core characteristics including ring-width series and whole-core wood density was collected for seven ecologically and economically important European tree species: silver birch (Betula pendula), European beech (Fagus sylvatica), Norway spruce (Picea abies), European black poplar (Populus nigra), maritime pine (Pinus pinaster), Scots pine (Pinus sylvestris), and sessile oak (Quercus petraea). Tree-ring width measurements were obtained from 3600 trees in 142 populations and whole-core wood density was measured for 3098 trees in 125 populations. This dataset covers most of the geographical and climatic range occupied by the selected species. The potential use of it will be highly valuable for assessing ecological and evolutionary responses to environmental conditions as well as for model development and parameterization, to predict adaptability under climate change scenarios
Germline variation at 8q24 and prostate cancer risk in men of European ancestry
Chromosome 8q24 is a susceptibility locus for multiple cancers, including prostate cancer. Here we combine genetic data across the 8q24 susceptibility region from 71,535 prostate cancer cases and 52,935 controls of European ancestry to define the overall contribution of germline variation at 8q24 to prostate cancer risk. We identify 12 independent risk signals for prostate cancer (pâ<â4.28âĂâ10â15), including three risk variants that have yet to be reported. From a polygenic risk score (PRS) model, derived to assess the cumulative effect of risk variants at 8q24, men in the top 1% of the PRS have a 4-fold (95%CIâ=â3.62â4.40) greater risk compared to the population average. These 12 variants account for ~25% of what can be currently explained of the familial risk of prostate cancer by known genetic risk factors. These findings highlight the overwhelming contribution of germline variation at 8q24 on prostate cancer risk which has implications for population risk stratification
Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants
Prostate cancer is a polygenic disease with a large heritable component. A number of common, low-penetrance prostate cancer risk loci have been identified through GWAS. Here we apply the Bayesian multivariate variable selection algorithm JAM to fine-map 84 prostate cancer susceptibility loci, using summary data from a large European ancestry meta-analysis. We observe evidence for multiple independent signals at 12 regions and 99 risk signals overall. Only 15 original GWAS tag SNPs remain among the catalogue of candidate variants identified; the remainder are replaced by more likely candidates. Biological annotation of our credible set of variants indicates significant enrichment within promoter and enhancer elements, and transcription factor-binding sites, including AR, ERG and FOXA1. In 40 regions at least one variant is colocalised with an eQTL in prostate cancer tissue. The refined set of candidate variants substantially increase the proportion of familial relative risk explained by these known susceptibility regions, which highlights the importance of fine-mapping studies and has implications for clinical risk profiling. © 2018 The Author(s).Prostate cancer is a polygenic disease with a large heritable component. A number of common, low-penetrance prostate cancer risk loci have been identified through GWAS. Here we apply the Bayesian multivariate variable selection algorithm JAM to fine-map 84 prostate cancer susceptibility loci, using summary data from a large European ancestry meta-analysis. We observe evidence for multiple independent signals at 12 regions and 99 risk signals overall. Only 15 original GWAS tag SNPs remain among the catalogue of candidate variants identified; the remainder are replaced by more likely candidates. Biological annotation of our credible set of variants indicates significant enrichment within promoter and enhancer elements, and transcription factor-binding sites, including AR, ERG and FOXA1. In 40 regions at least one variant is colocalised with an eQTL in prostate cancer tissue. The refined set of candidate variants substantially increase the proportion of familial relative risk explained by these known susceptibility regions, which highlights the importance of fine-mapping studies and has implications for clinical risk profiling. © 2018 The Author(s).Peer reviewe
The GenTree Platform: growth traits and tree-level environmental data in 12 European forest tree species
Background: Progress in the field of evolutionary forest ecology has been hampered by the huge challenge of phenotyping trees across their ranges in their natural environments, and the limitation in high-resolution environmental information.
Findings: The GenTree Platform contains phenotypic and environmental data from 4,959 trees from 12 ecologically and economically important European forest tree species: Abies alba Mill. (silver fir), Betula pendula Roth. (silver birch), Fagus sylvatica L. (European beech), Picea abies (L.) H. Karst (Norway spruce), Pinus cembra L. (Swiss stone pine), Pinus halepensis Mill. (Aleppo pine), Pinus nigra Arnold (European black pine), Pinus pinaster Aiton (maritime pine), Pinus sylvestris L. (Scots pine), Populus nigra L. (European black poplar), Taxus baccata L. (English yew), and Quercus petraea (Matt.) Liebl. (sessile oak). Phenotypic (height, diameter at breast height, crown size, bark thickness, biomass, straightness, forking, branch angle, fructification), regeneration, environmental in situ measurements (soil depth, vegetation cover, competition indices), and environmental modeling data extracted by using bilinear interpolation accounting for surrounding conditions of each tree (precipitation, temperature, insolation, drought indices) were obtained from trees in 194 sites covering the speciesâ geographic ranges and reflecting local environmental gradients.
Conclusion: The GenTree Platform is a new resource for investigating ecological and evolutionary processes in forest trees. The coherent phenotyping and environmental characterization across 12 species in their European ranges allow for a wide range of analyses from forest ecologists, conservationists, and macro-ecologists. Also, the data here presented can be linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and the GenTree Genomic collection presented elsewhere, which together build the largest evolutionary forest ecology data collection available
Between but not within species variation in the distribution of fitness effects
New mutations provide the raw material for evolution and adaptation. The distribution of fitness effects (DFE) describes the spectrum of effects of new mutations that can occur along a genome, and is therefore of vital interest in evolutionary biology. Recent work has uncovered striking similarities in the DFE between closely related species, prompting us to ask whether there is variation in the DFE among populations of the same species, or among species with different degrees of divergence, i.e., whether there is variation in the DFE at different levels of evolution. Using exome capture data from six tree species sampled across Europe we characterised the DFE for multiple species, and for each species, multiple populations, and investigated the factors potentially influencing the DFE, such as demography, population divergence and genetic background. We find statistical support for there being variation in the DFE at the species level, even among relatively closely related species. However, we find very little difference at the population level, suggesting that differences in the DFE are primarily driven by deep features of species biology, and that evolutionarily recent events, such as demographic changes and local adaptation, have little impact