38 research outputs found

    Utilization of Tissue Ploidy Level Variation in de Novo Transcriptome Assembly of Pinus sylvestris

    Get PDF
    Compared to angiosperms, gymnosperms lag behind in the availability of assembled and annotated genomes. Most genomic analyses in gymnosperms, especially conifer tree species, rely on the use of de novo assembled transcriptomes. However, the level of allelic redundancy and transcript fragmentation in these assembled transcriptomes, and their effect on downstream applications have not been fully investigated. Here, we assessed three assembly strategies for short-reads data, including the utility of haploid megagametophyte tissue during de novo assembly as single-allele guides, for six individuals and five different tissues in Pinus sylvestris. We then contrasted haploid and diploid tissue genotype calls obtained from the assembled transcriptomes to evaluate the extent of paralog mapping. The use of the haploid tissue during assembly increased its completeness without reducing the number of assembled transcripts. Our results suggest that current strategies that rely on available genomic resources as guidance to minimize allelic redundancy are less effective than the application of strategies that cluster redundant assembled transcripts. The strategy yielding the lowest levels of allelic redundancy among the assembled transcriptomes assessed here was the generation of SuperTranscripts with Lace followed by CD-HIT clustering. However, we still observed some levels of heterozygosity (multiple gene fragments per transcript reflecting allelic redundancy) in this assembled transcriptome on the haploid tissue, indicating that further filtering is required before using these assemblies for downstream applications. We discuss the influence of allelic redundancy when these reference transcriptomes are used to select regions for probe design of exome capture baits and for estimation of population genetic diversity.Peer reviewe

    Taming the massive genome of Scots pine with PiSy50k, a new genotyping array for conifer research

    Get PDF
    Pinus sylvestris (Scots pine) is the most widespread coniferous tree in the boreal forests of Eurasia, with major economic and ecological importance. However, its large and repetitive genome presents a challenge for conducting genome-wide analyses such as association studies, genetic mapping and genomic selection. We present a new 50K single-nucleotide polymorphism (SNP) genotyping array for Scots pine research, breeding and other applications. To select the SNP set, we first genotyped 480 Scots pine samples on a 407 540 SNP screening array and identified 47 712 high-quality SNPs for the final array (called 'PiSy50k'). Here, we provide details of the design and testing, as well as allele frequency estimates from the discovery panel, functional annotation, tissue-specific expression patterns and expression level information for the SNPs or corresponding genes, when available. We validated the performance of the PiSy50k array using samples from Finland and Scotland. Overall, 39 678 (83.2%) SNPs showed low error rates (mean = 0.9%). Relatedness estimates based on array genotypes were consistent with the expected pedigrees, and the level of Mendelian error was negligible. In addition, array genotypes successfully discriminate between Scots pine populations of Finnish and Scottish origins. The PiSy50k SNP array will be a valuable tool for a wide variety of future genetic studies and forestry applications.Peer reviewe

    The GenTree Dendroecological Collection, tree-ring and wood density data from seven tree species across Europe

    Get PDF
    The dataset presented here was collected by the GenTree project (EU-Horizon 2020), which aims to improve the use of forest genetic resources across Europe by better understanding how trees adapt to their local environment. This dataset of individual tree-core characteristics including ring-width series and whole-core wood density was collected for seven ecologically and economically important European tree species: silver birch (Betula pendula), European beech (Fagus sylvatica), Norway spruce (Picea abies), European black poplar (Populus nigra), maritime pine (Pinus pinaster), Scots pine (Pinus sylvestris), and sessile oak (Quercus petraea). Tree-ring width measurements were obtained from 3600 trees in 142 populations and whole-core wood density was measured for 3098 trees in 125 populations. This dataset covers most of the geographical and climatic range occupied by the selected species. The potential use of it will be highly valuable for assessing ecological and evolutionary responses to environmental conditions as well as for model development and parameterization, to predict adaptability under climate change scenarios

    Germline variation at 8q24 and prostate cancer risk in men of European ancestry

    Get PDF
    Chromosome 8q24 is a susceptibility locus for multiple cancers, including prostate cancer. Here we combine genetic data across the 8q24 susceptibility region from 71,535 prostate cancer cases and 52,935 controls of European ancestry to define the overall contribution of germline variation at 8q24 to prostate cancer risk. We identify 12 independent risk signals for prostate cancer (p < 4.28 × 10−15), including three risk variants that have yet to be reported. From a polygenic risk score (PRS) model, derived to assess the cumulative effect of risk variants at 8q24, men in the top 1% of the PRS have a 4-fold (95%CI = 3.62–4.40) greater risk compared to the population average. These 12 variants account for ~25% of what can be currently explained of the familial risk of prostate cancer by known genetic risk factors. These findings highlight the overwhelming contribution of germline variation at 8q24 on prostate cancer risk which has implications for population risk stratification

    Fine-mapping of prostate cancer susceptibility loci in a large meta-analysis identifies candidate causal variants

    Get PDF
    Prostate cancer is a polygenic disease with a large heritable component. A number of common, low-penetrance prostate cancer risk loci have been identified through GWAS. Here we apply the Bayesian multivariate variable selection algorithm JAM to fine-map 84 prostate cancer susceptibility loci, using summary data from a large European ancestry meta-analysis. We observe evidence for multiple independent signals at 12 regions and 99 risk signals overall. Only 15 original GWAS tag SNPs remain among the catalogue of candidate variants identified; the remainder are replaced by more likely candidates. Biological annotation of our credible set of variants indicates significant enrichment within promoter and enhancer elements, and transcription factor-binding sites, including AR, ERG and FOXA1. In 40 regions at least one variant is colocalised with an eQTL in prostate cancer tissue. The refined set of candidate variants substantially increase the proportion of familial relative risk explained by these known susceptibility regions, which highlights the importance of fine-mapping studies and has implications for clinical risk profiling. © 2018 The Author(s).Prostate cancer is a polygenic disease with a large heritable component. A number of common, low-penetrance prostate cancer risk loci have been identified through GWAS. Here we apply the Bayesian multivariate variable selection algorithm JAM to fine-map 84 prostate cancer susceptibility loci, using summary data from a large European ancestry meta-analysis. We observe evidence for multiple independent signals at 12 regions and 99 risk signals overall. Only 15 original GWAS tag SNPs remain among the catalogue of candidate variants identified; the remainder are replaced by more likely candidates. Biological annotation of our credible set of variants indicates significant enrichment within promoter and enhancer elements, and transcription factor-binding sites, including AR, ERG and FOXA1. In 40 regions at least one variant is colocalised with an eQTL in prostate cancer tissue. The refined set of candidate variants substantially increase the proportion of familial relative risk explained by these known susceptibility regions, which highlights the importance of fine-mapping studies and has implications for clinical risk profiling. © 2018 The Author(s).Peer reviewe

    The GenTree Platform: growth traits and tree-level environmental data in 12 European forest tree species

    Get PDF
    Background: Progress in the field of evolutionary forest ecology has been hampered by the huge challenge of phenotyping trees across their ranges in their natural environments, and the limitation in high-resolution environmental information. Findings: The GenTree Platform contains phenotypic and environmental data from 4,959 trees from 12 ecologically and economically important European forest tree species: Abies alba Mill. (silver fir), Betula pendula Roth. (silver birch), Fagus sylvatica L. (European beech), Picea abies (L.) H. Karst (Norway spruce), Pinus cembra L. (Swiss stone pine), Pinus halepensis Mill. (Aleppo pine), Pinus nigra Arnold (European black pine), Pinus pinaster Aiton (maritime pine), Pinus sylvestris L. (Scots pine), Populus nigra L. (European black poplar), Taxus baccata L. (English yew), and Quercus petraea (Matt.) Liebl. (sessile oak). Phenotypic (height, diameter at breast height, crown size, bark thickness, biomass, straightness, forking, branch angle, fructification), regeneration, environmental in situ measurements (soil depth, vegetation cover, competition indices), and environmental modeling data extracted by using bilinear interpolation accounting for surrounding conditions of each tree (precipitation, temperature, insolation, drought indices) were obtained from trees in 194 sites covering the species’ geographic ranges and reflecting local environmental gradients. Conclusion: The GenTree Platform is a new resource for investigating ecological and evolutionary processes in forest trees. The coherent phenotyping and environmental characterization across 12 species in their European ranges allow for a wide range of analyses from forest ecologists, conservationists, and macro-ecologists. Also, the data here presented can be linked to the GenTree Dendroecological collection, the GenTree Leaf Trait collection, and the GenTree Genomic collection presented elsewhere, which together build the largest evolutionary forest ecology data collection available

    Between but not within species variation in the distribution of fitness effects

    Get PDF
    New mutations provide the raw material for evolution and adaptation. The distribution of fitness effects (DFE) describes the spectrum of effects of new mutations that can occur along a genome, and is therefore of vital interest in evolutionary biology. Recent work has uncovered striking similarities in the DFE between closely related species, prompting us to ask whether there is variation in the DFE among populations of the same species, or among species with different degrees of divergence, i.e., whether there is variation in the DFE at different levels of evolution. Using exome capture data from six tree species sampled across Europe we characterised the DFE for multiple species, and for each species, multiple populations, and investigated the factors potentially influencing the DFE, such as demography, population divergence and genetic background. We find statistical support for there being variation in the DFE at the species level, even among relatively closely related species. However, we find very little difference at the population level, suggesting that differences in the DFE are primarily driven by deep features of species biology, and that evolutionarily recent events, such as demographic changes and local adaptation, have little impact

    Data from: Genetic heterogeneity underlying variation in a locally adaptive clinal trait in Pinus sylvestris revealed by a Bayesian multipopulation analysis

    No full text
    Local adaptation is a common feature of plant and animal populations. Adaptive phenotypic traits are genetically differentiated along environmental gradients, but the genetic basis of such adaptation is still poorly known. Genetic association studies of local adaptation combine data over populations. Correcting for population structure in these studies can be problematic since both selection and neutral demographic events can create similar allele frequency differences between populations. Correcting for demography with traditional methods may lead to eliminating some true associations. We developed a new Bayesian approach for identifying the loci underlying an adaptive trait in a multipopulation situation in the presence of possible double confounding due to population stratification and adaptation. With this method we studied the genetic basis of timing of bud set, a surrogate trait for timing of yearly growth cessation that confers local adaptation to the populations of Scots pine (Pinus sylvestris). Population means of timing of bud set were highly correlated with latitude. Most effects at individual loci were small. Interestingly, we found genetic heterogeneity (that is, different sets of loci associated with the trait) between the northern and central European parts of the cline. We also found indications of stronger stabilizing selection toward the northern part of the range. The harsh northern conditions may impose greater selective pressure on timing of growth cessation, and the relative importance of different environmental cues used for tracking the seasons might differ depending on latitude of origin

    Gibbs sampler

    No full text
    Gibbs sampler in a C-language module implemented as an extension to R
    corecore