86 research outputs found
The functional spectrum of low-frequency coding variation
Background
Rare coding variants constitute an important class of human genetic variation, but are underrepresented in current databases that are based on small population samples. Recent studies show that variants altering amino acid sequence and protein function are enriched at low variant allele frequency, 2 to 5%, but because of insufficient sample size it is not clear if the same trend holds for rare variants below 1% allele frequency.
Results
The 1000 Genomes Exon Pilot Project has collected deep-coverage exon-capture data in roughly 1,000 human genes, for nearly 700 samples. Although medical whole-exome projects are currently afoot, this is still the deepest reported sampling of a large number of human genes with next-generation technologies. According to the goals of the 1000 Genomes Project, we created effective informatics pipelines to process and analyze the data, and discovered 12,758 exonic SNPs, 70% of them novel, and 74% below 1% allele frequency in the seven population samples we examined. Our analysis confirms that coding variants below 1% allele frequency show increased population-specificity and are enriched for functional variants.
Conclusions
This study represents a large step toward detecting and interpreting low frequency coding variation, clearly lays out technical steps for effective analysis of DNA capture data, and articulates functional and population properties of this important class of genetic variatio
Recommended from our members
Mapping Copy Number Variation by Population Scale Genome Sequencing
Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications. Most SVs (53%) were mapped to nucleotide resolution, which facilitated analysing their origin and functional impact. We examined numerous whole and partial gene deletions with a genotyping approach and observed a depletion of gene disruptions amongst high frequency deletions. Furthermore, we observed differences in the size spectra of SVs originating from distinct formation mechanisms, and constructed a map of SV hotspots formed by common mechanisms. Our analytical framework and SV map serves as a resource for sequencing-based association studies.Organismic and Evolutionary Biolog
Multi-Ethnic Analysis of Lipid-Associated Loci: The NHLBI CARe Project
Background: Whereas it is well established that plasma lipid levels have substantial heritability within populations, it remains unclear how many of the genetic determinants reported in previous studies (largely performed in European American cohorts) are relevant in different ethnicities. Methodology/Principal Findings: We tested a set of 50,000 polymorphisms from 2,000 candidate genes and genetic loci from genome-wide association studies (GWAS) for association with low-density lipoprotein cholesterol (LDL-C), high-density lipoprotein cholesterol (HDL-C), and triglycerides (TG) in 25,000 European Americans and 9,000 African Americans in the National Heart, Lung, and Blood Institute (NHLBI) Candidate Gene Association Resource (CARe). We replicated associations for a number of genes in one or both ethnicities and identified a novel lipid-associated variant in a locus harboring ICAM1. We compared the architecture of genetic loci associated with lipids in both African Americans and European Americans and found that the same genes were relevant across ethnic groups but the specific associated variants at each gene often differed. Conclusions/Significance: We identify or provide further evidence for a number of genetic determinants of plasma lipid levels through population association studies. In many loci the determinants appear to differ substantially between African Americans and European Americans
Patterns and rates of exonic de novo mutations in autism spectrum disorders
Autism spectrum disorders (ASD) are believed to have genetic and environmental origins, yet in only a modest fraction of individuals can specific causes be identified1,2. To identify further genetic risk factors, we assess the role of de novo mutations in ASD by sequencing the exomes of ASD cases and their parents (n= 175 trios). Fewer than half of the cases (46.3%) carry a missense or nonsense de novo variant and the overall rate of mutation is only modestly higher than the expected rate. In contrast, there is significantly enriched connectivity among the proteins encoded by genes harboring de novo missense or nonsense mutations, and excess connectivity to prior ASD genes of major effect, suggesting a subset of observed events are relevant to ASD risk. The small increase in rate of de novo events, when taken together with the connections among the proteins themselves and to ASD, are consistent with an important but limited role for de novo point mutations, similar to that documented for de novo copy number variants. Genetic models incorporating these data suggest that the majority of observed de novo events are unconnected to ASD, those that do confer risk are distributed across many genes and are incompletely penetrant (i.e., not necessarily causal). Our results support polygenic models in which spontaneous coding mutations in any of a large number of genes increases risk by 5 to 20-fold. Despite the challenge posed by such models, results from de novo events and a large parallel case-control study provide strong evidence in favor of CHD8 and KATNAL2 as genuine autism risk factors
Analysis of Rare, Exonic Variation amongst Subjects with Autism Spectrum Disorders and Population Controls
We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD. © 2013 Liu et al
RANTES/CCL5 and Risk for Coronary Events: Results from the MONICA/KORA Augsburg Case-Cohort, Athero-Express and CARDIoGRAM Studies
BACKGROUND: The chemokine RANTES (regulated on activation, normal T-cell expressed and secreted)/CCL5 is involved in the pathogenesis of cardiovascular disease in mice, whereas less is known in humans. We hypothesised that its relevance for atherosclerosis should be reflected by associations between CCL5 gene variants, RANTES serum concentrations and protein levels in atherosclerotic plaques and risk for coronary events.
METHODS AND FINDINGS: We conducted a case-cohort study within the population-based MONICA/KORA Augsburg studies. Baseline RANTES serum levels were measured in 363 individuals with incident coronary events and 1,908 non-cases (mean follow-up: 10.2±4.8 years). Cox proportional hazard models adjusting for age, sex, body mass index, metabolic factors and lifestyle factors revealed no significant association between RANTES and incident coronary events (HR [95% CI] for increasing RANTES tertiles 1.0, 1.03 [0.75-1.42] and 1.11 [0.81-1.54]). None of six CCL5 single nucleotide polymorphisms and no common haplotype showed significant associations with coronary events. Also in the CARDIoGRAM study (>22,000 cases, >60,000 controls), none of these CCL5 SNPs was significantly associated with coronary artery disease. In the prospective Athero-Express biobank study, RANTES plaque levels were measured in 606 atherosclerotic lesions from patients who underwent carotid endarterectomy. RANTES content in atherosclerotic plaques was positively associated with macrophage infiltration and inversely associated with plaque calcification. However, there was no significant association between RANTES content in plaques and risk for coronary events (mean follow-up 2.8±0.8 years).
CONCLUSIONS: High RANTES plaque levels were associated with an unstable plaque phenotype. However, the absence of associations between (i) RANTES serum levels, (ii) CCL5 genotypes and (iii) RANTES content in carotid plaques and either coronary artery disease or incident coronary events in our cohorts suggests that RANTES may not be a novel coronary risk biomarker. However, the potential relevance of RANTES levels in platelet-poor plasma needs to be investigated in further studies
RANTES/CCL5 and risk for coronary events: Results from the MONICA/KORA Augsburg case-cohort, Athero-express and CARDIoGRAM studies
Background: The chemokine RANTES (regulated on activation, normal T-cell expressed and secreted)/CCL5 is involved in the pathogenesis of cardiovascular disease in mice, whereas less is known in humans. We hypothesised that its relevance for atherosclerosis should be reflected by associations between CCL5 gene variants, RANTES serum concentrations and protein levels in atherosclerotic plaques and risk for coronary events. Methods and Findings: We conducted a case-cohort study within the population-based MONICA/KORA Augsburg studies. Baseline RANTES serum levels were measured in 363 individuals with incident coronary events and 1,908 non-cases (mean follow-up: 10.2±
A polygenic burden of rare disruptive mutations in schizophrenia
By analyzing the exome sequences of 2,536 schizophrenia cases and 2,543 controls, we have demonstrated a polygenic burden primarily arising from rare (<1/10,000), disruptive mutations distributed across many genes. Especially enriched genesets included the voltage-gated calcium ion channel and the signaling complex formed by the activity-regulated cytoskeleton-associated (ARC) scaffold protein of the postsynaptic density (PSD), sets previously implicated by genome-wide association studies (GWAS) and copy-number variation (CNV) studies. Similar to reports in autism, targets of the fragile × mental retardation protein (FMRP, product of FMR1) were enriched for case mutations. No individual gene-based test achieved significance after correction for multiple testing and we did not detect any alleles of moderately low frequency (~0.5-1%) and moderately large effect. Taken together, these data suggest that population-based exome sequencing can discover risk alleles and complements established gene mapping paradigms in neuropsychiatric disease
- …