49 research outputs found

    SparkBeagle: Scalable Genotype Imputation from Distributed Whole-Genome Reference Panels in the Cloud

    Get PDF
    Massive whole-genome genotype reference panels now provide accurate and fast genotyping by imputation for high-resolution genome-wide association (GWA) studies. Imputation-assisted genotyping can increase the genomic coverage of genotypes and thus satisfy the resolution required in comprehensive GWA studies in a cost-effective manner. However, the imputation of missing genotypes from large reference panels is a compute-intensive process that requires high-performance computing (HPC). Although HPC uses extremely distributed and parallel computing, current imputation tools, and existing algorithms have not been developed to fully exploit the power of distributed computing. To this end, we have developed SparkBeagle, a scalable, fast, and accurate distributed genotype imputation tool based on popular Beagle software. SparkBeagle is designed for HPC and cloud computing environments and it is implemented on top of the Apache Spark distributed computing framework. We have carried out scalability experiments by imputing 64,976,316 variants of 2504 samples from the 1000 Genomes reference panel in the cloud. SparkBeagle shows near-linear scalability while increasing the number of computing nodes. A speedup of 30x was achieved with 40 nodes. The imputation time of the whole data set decreased from 565 minutes to 18 minutes compared to a single node parallel execution. Near identical imputation accuracy was measured in the concordance analysis between the original Beagle and the distributed SparkBeagle tool.Peer reviewe

    Primary age-related tauopathy in a Finnish population-based study of the oldest old (Vantaa 85+)

    Get PDF
    Abstract Aims Few studies have investigated primary age-related tauopathy (PART) in a population-based setting. Here, we assessed its prevalence, genetic background, comorbidities and features of cognitive decline in an unselected elderly population. Methods The population-based Vantaa 85+ study includes all 601 inhabitants of Vantaa aged ≄ 85 years in 1991. Neuropathological assessment was possible in 301. Dementia (DSM IIIR criteria) and Mini-Mental State Examination (MMSE) scores were assessed at the baseline of the study and follow-ups. PART subjects were identified according to the criteria by Crary et al and were compared with subjects with mild and severe Alzheimer's disease (AD) neuropathological changes. The effects of other neuropathologies were taken into account using multivariate and sensitivity assays. Genetic analyses included APOE genotypes and 29 polymorphisms of the MAPT 3â€Č untranslated region (3â€ČUTR region). Results The frequency of PART was 20n = 61/301, definite PART 5. When PART subjects were compared with those with severe AD pathology, dementia was less common, its age at onset was higher and duration shorter. No such differences were seen when compared with those with milder AD pathology. However, both AD groups showed a steeper decline in MMSE scores in follow-ups compared with PART. APOE Δ4 frequency was lower, and APOE Δ2 frequency higher in the PART group compared with each AD group. The detected nominally significant associations between PART and two MAPT 3â€ČUTR polymorphisms and haplotypes did not survive Bonferroni correction. Conclusions PART is common among very elderly. PART subjects differ from individuals with AD-type changes in the pattern of cognitive decline, associated genetic and neuropathological features.Peer reviewe

    High denitrification potential but low nitrous oxide emission in a constructed wetland treating nitrate-polluted agricultural run-off

    Get PDF
    Acknowledgements The study was conducted within the framework of several scientific projects: “EfficacitĂ© des Zones Tampons” by OFB (French Office for Biodiversity, and technical group “Zones Tampons “), and HydroGES (financed by the Agency for the Environment and Mastery of Energy, ADEME). The travel was supported by two French–Estonian Parrot RTD projects “Ecological engineering for nutrient control in rural catchments” and “Process-based approach and enhanced technologies of treatment wetlands” (2014–2016). The PIREN-Seine programme and the FĂ©dĂ©ration Ile-de-France de Recherche pour l'Environnement (FIRE) are also acknowledged for their support. The authors also thank AQUI'Brie association for their support and stakeholders' involvement. This study was also supported by the Estonian Research Council (grants IUT2 16, PRG352 and MOBERC20) and by the EU through the European Regional Development Fund (Centres of Excellence ENVIRON and EcolChange, and MOBTP101 returning researcher grant by the Mobilitas Pluss programme).Peer reviewedPostprin

    The role of polygenic risk and susceptibility genes in breast cancer over the course of life

    Get PDF
    Polygenic risk scores (PRS) for breast cancer have potential to improve risk prediction, but there is limited information on their utility in various clinical situations. Here we show that among 122,978 women in the FinnGen study with 8401 breast cancer cases, the PRS modifies the breast cancer risk of two high-impact frameshift risk variants. Similarly, we show that after the breast cancer diagnosis, individuals with elevated PRS have an elevated risk of developing contralateral breast cancer, and that the PRS can considerably improve risk assessment among their female first-degree relatives. In more detail, women with the c.1592delT variant in PALB2 (242-fold enrichment in Finland, 336 carriers) and an average PRS (10-90(th) percentile) have a lifetime risk of breast cancer at 55% (95% CI 49-61%), which increases to 84% (71-97%) with a high PRS (>90(th) percentile), and decreases to 49% (30-68%) with a low PRS (Peer reviewe

    Comprehensive population-based genome sequencing provides insight into hematopoietic regulatory mechanisms

    Get PDF
    Genetic variants affecting hematopoiesis can influence commonly measured blood cell traits. To identify factors that affect hematopoiesis, we performed association studies for blood cell traits in the population-based Estonian Biobank using high-coverage whole-genome sequencing (WGS) in 2,284 samples and SNP genotyping in an additional 14,904 samples. Using up to 7,134 samples with available phenotype data, our analyses identified 17 associations across 14 blood cell traits. Integration of WGS-based fine-mapping and complementary epigenomic datasets provided evidence for causal mechanisms at several loci, including at a previously undiscovered basophil count-associated locus near the master hematopoietic transcription factor CEBPA. The fine-mapped variant at this basophil count association near CEBPA overlapped an enhancer active in common myeloid progenitors and influenced its activity. In situ perturbation of this enhancer by CRISPR/Cas9 mutagenesis in hematopoietic stem and progenitor cells demonstrated that it is necessary for and specifically regulates CEBPA expression during basophil differentiation. We additionally identified basophil count-associated variation at another more pleiotropic myeloid enhancer near GATA2, highlighting regulatory mechanisms for ordered expression of master hematopoietic regulators during lineage specification. Our study illustrates how population-based genetic studies can provide key insights into poorly understood cell differentiation processes of considerable physiologic relevance.Peer reviewe

    FinnGen provides genetic insights from a well-phenotyped isolated population

    Get PDF
    Population isolates such as those in Finland benefit genetic research because deleterious alleles are often concentrated on a small number of low-frequency variants (0.1% ≀ minor allele frequency < 5%). These variants survived the founding bottleneck rather than being distributed over a large number of ultrarare variants. Although this effect is well established in Mendelian genetics, its value in common disease genetics is less explored1,2. FinnGen aims to study the genome and national health register data of 500,000 Finnish individuals. Given the relatively high median age of participants (63 years) and the substantial fraction of hospital-based recruitment, FinnGen is enriched for disease end points. Here we analyse data from 224,737 participants from FinnGen and study 15 diseases that have previously been investigated in large genome-wide association studies (GWASs). We also include meta-analyses of biobank data from Estonia and the United Kingdom. We identified 30 new associations, primarily low-frequency variants, enriched in the Finnish population. A GWAS of 1,932 diseases also identified 2,733 genome-wide significant associations (893 phenome-wide significant (PWS), P < 2.6 × 10–11) at 2,496 (771 PWS) independent loci with 807 (247 PWS) end points. Among these, fine-mapping implicated 148 (73 PWS) coding variants associated with 83 (42 PWS) end points. Moreover, 91 (47 PWS) had an allele frequency of <5% in non-Finnish European individuals, of which 62 (32 PWS) were enriched by more than twofold in Finland. These findings demonstrate the power of bottlenecked populations to find entry points into the biology of common diseases through low-frequency, high impact variants.publishedVersionPeer reviewe

    Cerebral small vessel disease genomics and its implications across the lifespan

    Get PDF
    White matter hyperintensities (WMH) are the most common brain-imaging feature of cerebral small vessel disease (SVD), hypertension being the main known risk factor. Here, we identify 27 genome-wide loci for WMH-volume in a cohort of 50,970 older individuals, accounting for modification/confounding by hypertension. Aggregated WMH risk variants were associated with altered white matter integrity (p = 2.5×10-7) in brain images from 1,738 young healthy adults, providing insight into the lifetime impact of SVD genetic risk. Mendelian randomization suggested causal association of increasing WMH-volume with stroke, Alzheimer-type dementia, and of increasing blood pressure (BP) with larger WMH-volume, notably also in persons without clinical hypertension. Transcriptome-wide colocalization analyses showed association of WMH-volume with expression of 39 genes, of which four encode known drug targets. Finally, we provide insight into BP-independent biological pathways underlying SVD and suggest potential for genetic stratification of high-risk individuals and for genetically-informed prioritization of drug targets for prevention trials.Peer reviewe

    Meta-analysis of 375,000 individuals identifies 38 susceptibility loci for migraine

    Get PDF
    Migraine is a debilitating neurological disorder affecting around one in seven people worldwide, but its molecular mechanisms remain poorly understood. There is some debate about whether migraine is a disease of vascular dysfunction or a result of neuronal dysfunction with secondary vascular changes. Genome-wide association (GWA) studies have thus far identified 13 independent loci associated with migraine. To identify new susceptibility loci, we carried out a genetic study of migraine on 59,674 affected subjects and 316,078 controls from 22 GWA studies. We identified 44 independent single-nucleotide polymorphisms (SNPs) significantly associated with migraine risk (P < 5 × 10−8) that mapped to 38 distinct genomic loci, including 28 loci not previously reported and a locus that to our knowledge is the first to be identified on chromosome X. In subsequent computational analyses, the identified loci showed enrichment for genes expressed in vascular and smooth muscle tissues, consistent with a predominant theory of migraine that highlights vascular etiologies
    corecore