59 research outputs found
A framework for the detection of de novo mutations in family-based sequencing data
Germline mutation detection from human DNA sequence data is challenging due to the rarity of such events relative to the intrinsic error rates of sequencing technologies and the uneven coverage across the genome. We developed PhaseByTransmission (PBT) to identify de novo single nucleotide variants and short insertions and deletions (indels) from sequence data collected in parent-offspring trios. We compute the joint probability of the data given the genotype likelihoods in the individual family members, the known familial relationships and a prior probability for the mutation rate. Candidate de novo mutations (DNMs) are reported along with their posterior probability, providing a systematic way to prioritize them for validation. Our tool is integrated in the Genome Analysis Toolkit and can be used together with the ReadBackedPhasing module to infer the parental origin of DNMs based on phase-informative reads. Using simulated data, we show that PBT outperforms existing tools, especially in low coverage data and on the X chromosome. We further show that PBT displays high validation rates on empirical parent-offspring sequencing data for whole-exome data from 104 trios and X-chromosome data from 249 parent-offspring families. Finally, we demonstrate an association between father's age at conception and the number of DNMs in female offspring's X chromosome, consistent with previous literature reports
Modeling the cumulative genetic risk for multiple sclerosis from genome-wide association data
BACKGROUND:
Multiple sclerosis (MS) is the most common cause of chronic neurologic disability beginning in early to middle adult life. Results from recent genome-wide association studies (GWAS) have substantially lengthened the list of disease loci and provide convincing evidence supporting a multifactorial and polygenic model of inheritance. Nevertheless, the knowledge of MS genetics remains incomplete, with many risk alleles still to be revealed.
METHODS:
We used a discovery GWAS dataset (8,844 samples, 2,124 cases and 6,720 controls) and a multi-step logistic regression protocol to identify novel genetic associations. The emerging genetic profile included 350 independent markers and was used to calculate and estimate the cumulative genetic risk in an independent validation dataset (3,606 samples). Analysis of covariance (ANCOVA) was implemented to compare clinical characteristics of individuals with various degrees of genetic risk. Gene ontology and pathway enrichment analysis was done using the DAVID functional annotation tool, the GO Tree Machine, and the Pathway-Express profiling tool.
RESULTS:
In the discovery dataset, the median cumulative genetic risk (P-Hat) was 0.903 and 0.007 in the case and control groups, respectively, together with 79.9% classification sensitivity and 95.8% specificity. The identified profile shows a significant enrichment of genes involved in the immune response, cell adhesion, cell communication/signaling, nervous system development, and neuronal signaling, including ionotropic glutamate receptors, which have been implicated in the pathological mechanism driving neurodegeneration. In the validation dataset, the median cumulative genetic risk was 0.59 and 0.32 in the case and control groups, respectively, with classification sensitivity 62.3% and specificity 75.9%. No differences in disease progression or T2-lesion volumes were observed among four levels of predicted genetic risk groups (high, medium, low, misclassified). On the other hand, a significant difference (F = 2.75, P = 0.04) was detected for age of disease onset between the affected misclassified as controls (mean = 36 years) and the other three groups (high, 33.5 years; medium, 33.4 years; low, 33.1 years).
CONCLUSIONS:
The results are consistent with the polygenic model of inheritance. The cumulative genetic risk established using currently available genome-wide association data provides important insights into disease heterogeneity and completeness of current knowledge in MS genetics
A replication study of genetic risk loci for ischemic stroke in a Dutch population: A case-control study
We aimed to replicate reported associations of 10 SNPs at eight distinct loci with overall ischemic stroke (IS) and its subtypes in an independent cohort of Dutch IS patients. We included 1,375 IS patients enrolled in a prospective multicenter hospital-based cohort in the Netherlands, and 1,533 population-level controls of Dutch descent. We tested these SNPs for association with overall IS and its subtypes (large artery atherosclerosis, small vessel disease and cardioembolic stroke (CE), as classified by TOAST) using an additive multivariable logistic regression model, adjusting for age and sex. We obtained odds ratios (OR) with 95% confidence intervals (95% CI) for the risk allele of each SNP analyzed and exact p-values by permutation. We confirmed the association at 4q25 (PITX2) (OR 1.43; 95% CI, 1.13-1.81, p = 0.029) and 16q22 (ZFHX3) (OR 1.62; 95% CI, 1.26-2.07, p = 0.001) as risk loci for CE. Locus 16q22 was also associated with overall IS (OR 1.24; 95% CI, 1.08-1.42, p = 0.016). Other loci previously associated with IS and/or its subtypes were not confirmed. In conclusion, we validated two loci (4q25, 16q22) associated with CE. In addition, our study may suggest that the association of locus 16q22 may not be limited to CE, but also includes overall IS
Improved imputation quality of low-frequency and rare variants in European samples using the 'Genome of the Netherlands'
Although genome-wide association studies (GWAS) have identified many common variants associated with complex traits, low-frequency and rare variants have not been interrogated in a comprehensive manner. Imputation from dense reference panels, such as the 1000 Genomes Project (1000G), enables testing of ungenotyped variants for association. Here we present the results of imputation using a large, new population-specific panel: the Genome of The Netherlands (GoNL). We benchmarked the performance of the 1000G and GoNL reference sets by comparing imputation genotypes with 'true' genotypes typed on ImmunoChip in three European populations (Dutch, British, and Italian). GoNL showed significant improvement in the imputation quality for rare variants (MAF 0.05-0.5%) compared with 1000G. In Dutch samples, the mean observed Pearson correlation, r 2, increased from 0.61 to 0.71. W
No additional prognostic value of genetic information in the prediction of vascular events after cerebral ischemia of arterial origin
Background: Patients who have suffered from cerebral ischemia have a high risk of recurrent vascular events. Predictive models based on classical risk factors typically have limited prognostic value. Given that cerebral ischemia has a heritable component, genetic information might improve performance of these risk models. Our aim was to develop and compare two models: one containing traditional vascular risk factors, the other also including genetic information. Methods and Results: We studied 1020 patients with cerebral ischemia and genotyped them with the Illumina Immunochip. Median follow-up time was 6.5 years; the annual incidence of new ischemic events (primary outcome, n=198) was 3.0%. The prognostic model based on classical vascular risk factors had an area under the receiver operating characteristics curve (AUC-ROC) of 0.65 (95% confidence interval 0.61-0.69). When we added a genetic risk score based on prioritized SNPs from a genome-wide association study of ischemic stroke (using summary statistics from the METASTROKE study which included 12389 cases and 62004 controls), the AUC-ROC remained the same. Similar results were found for the secondary outcome ischemic stroke. Conclusions: We found no additional value of genetic information in a prognostic model for the risk of ischemic events in patients with cerebral ischemia of arterial origin. This is consistent with a complex, polygenic architecture, where many genes of weak effect likely act in concert to influence the heritable risk of an individual to develop (recurrent) vascular events. At present, genetic information cannot help clinicians to distinguish patients at high risk for recurrent vascular events
Twenty-eight genetic loci associated with ST-T-wave amplitudes of the electrocardiogram
The ST-segment and adjacent T-wave (ST-T wave) amplitudes of the electrocardiogram are quantitative characteristics of cardiac repolarization. Repolarization abnormalities have been linked to ventricular arrhythmias and sudden cardiac death. We performed the first genome-wide association meta-analysis of ST-T-wave amplitudes in up to 37 977 individuals identifying 71 robust genotype-phenotype associations clustered within 28 independent loci. Fifty-four genes were prioritized as candidates underlying the phenotypes, including genes with established roles in the cardiac repolarization phase (SCN5A/SCN10A, KCND3, KCNB1, NOS1AP and HEY2) and others with as yet undefined cardiac function. These associations may provide insights in the spatiotemporal contribution of genetic variation influencing cardiac repolarization and provide novel leads for future functional follow-up
The Genome of the Netherlands: Design, and project goals
Within the Netherlands a national network of biobanks has been established (Biobanking and Biomolecular Research Infrastructure-Netherlands (BBMRI-NL)) as a national node of the European BBMRI. One of the aims of BBMRI-NL is to enrich biobanks with different types of molecular and phenotype data. Here, we describe the Genome of the Netherlands (GoNL), one of the projects within BBMRI-NL. GoNL is a whole-genome-sequencing project in a representative sample consisting of 250 trio-families from all provinces in the Netherlands, which aims to characterize DNA sequence variation in the Dutch population. The parent-offspring trios include adult individuals ranging in age from 19 to 87 years (mean=53 years; SD=16 years) from birth cohorts 1910-1994. Sequencing was done on blood-derived DNA from uncultured cells and accomplished coverage was 14-15x. The family-based design represents a unique resource to assess the frequency of regional variants, accurately reconstruct haplotypes by family-based phasing, characterize short indels and complex structural variants, and establish the rate of de novo mutational events. GoNL will also serve as a reference panel for imputation in the available genome-wide association studies in Dutch and other cohorts to refine association signals and uncover population-specific variants. GoNL will create a catalog of human genetic variation in this sample that is uniquely characterized with respect to micro-geographic location and a wide range of phenotypes. The resource will be made available to the research and medical community to guide the interpretation of sequencing projects. The present paper summarizes the global characteristics of the project
Meta-analysis in more than 17,900 cases of ischemic stroke reveals a novel association at 12q24.12
Results: In an overall analysis of 17,970 cases of ischemic stroke and 70,764 controls, we identified a novel association on chromosome 12q24 (rs10744777, odds ratio [OR] 1.10 [1.07-1.13], p 5 7.12 3 10-11) with ischemic stroke. The association was with all ischemic stroke rather than an individual stroke subtype, with similar effect sizes seen in different stroke subtypes. There was no association with intracerebral hemorrhage (OR 1.03 [0.90-1.17], p 5 0.695).Conclusion: Our results show, for the first time, a genetic risk locus associated with ischemic stroke as a whole, rather than in a subtype-specific manner. This finding was not associated with intracerebral hemorrhage.Methods: Using the Immunochip, we genotyped 3,420 ischemic stroke cases and 6,821 controls. After imputation we meta-analyzed the results with imputed GWAS data from 3,548 case
A high-quality human reference panel reveals the complexity and distribution of genomic structural variants
Structural variation (SV) represents a major source of differences between individual human genomes and has been linked to disease phenotypes. However, the majority of studies provide neither a global view of the full spectrum of these variants nor integrate them into reference panels of genetic variation. Here, we analyse whole genome sequencing data of 769 individuals from 250 Dutch families, and provide a haplotype-resolved map of 1.9 million genome variants across 9 different variant classes, including novel forms of complex indels, and retrotransposition-mediated insertions of mobile elements and processed RNAs. A large proportion are previously under reported variants sized between 21 and 100 bp. We detect 4 megabases of novel sequence, encoding 11 new transcripts. Finally, we show 191 known, trait-associated SNPs to be in strong linkage disequilibrium with SVs and demonstrate that our panel facilitates accurate imputation of SVs in unrelated individuals
Characteristics of de novo structural changes in the human genome
Small insertions and deletions (indels) and large structural variations (SVs) are major contributors to human genetic diversity and disease. However, mutation rates and characteristics of de novo indels and SVs in the general population have remained largely unexplored. We report 332 validated de novo structural changes identified in whole genomes of 250 families, including complex indels, retrotransposon insertions, and interchromosomal events. These data indicate a mutation rate of 2.94 indels (120 bp) and 0.16 SVs (>20 bp) per generation. De novo structural changes affect on average 4.1 kbp of genomic sequence and 29 coding bases per generation, which is 91 and 52 times more nucleotides than de novo substitutions, respectively. This contrasts with the equal genomic footprint of inherited SVs and substitutions. An excess of structural changes originated on paternal haplotypes. Additionally, we observed a nonuniform distribution of de novo SVs across offspring. These results reveal the importance of different mutational mechanisms to changes in human genome structure across generations
- …