44 research outputs found
Population-based rare variant detection via pooled exome or custom hybridization capture with or without individual indexing
BACKGROUND: Rare genetic variation in the human population is a major source of pathophysiological variability and has been implicated in a host of complex phenotypes and diseases. Finding disease-related genes harboring disparate functional rare variants requires sequencing of many individuals across many genomic regions and comparing against unaffected cohorts. However, despite persistent declines in sequencing costs, population-based rare variant detection across large genomic target regions remains cost prohibitive for most investigators. In addition, DNA samples are often precious and hybridization methods typically require large amounts of input DNA. Pooled sample DNA sequencing is a cost and time-efficient strategy for surveying populations of individuals for rare variants. We set out to 1) create a scalable, multiplexing method for custom capture with or without individual DNA indexing that was amenable to low amounts of input DNA and 2) expand the functionality of the SPLINTER algorithm for calling substitutions, insertions and deletions across either candidate genes or the entire exome by integrating the variant calling algorithm with the dynamic programming aligner, Novoalign. RESULTS: We report methodology for pooled hybridization capture with pre-enrichment, indexed multiplexing of up to 48 individuals or non-indexed pooled sequencing of up to 92 individuals with as little as 70 ng of DNA per person. Modified solid phase reversible immobilization bead purification strategies enable no sample transfers from sonication in 96-well plates through adapter ligation, resulting in 50% less library preparation reagent consumption. Custom Y-shaped adapters containing novel 7 base pair index sequences with a Hamming distance of ≥2 were directly ligated onto fragmented source DNA eliminating the need for PCR to incorporate indexes, and was followed by a custom blocking strategy using a single oligonucleotide regardless of index sequence. These results were obtained aligning raw reads against the entire genome using Novoalign followed by variant calling of non-indexed pools using SPLINTER or SAMtools for indexed samples. With these pipelines, we find sensitivity and specificity of 99.4% and 99.7% for pooled exome sequencing. Sensitivity, and to a lesser degree specificity, proved to be a function of coverage. For rare variants (≤2% minor allele frequency), we achieved sensitivity and specificity of ≥94.9% and ≥99.99% for custom capture of 2.5 Mb in multiplexed libraries of 22–48 individuals with only ≥5-fold coverage/chromosome, but these parameters improved to ≥98.7 and 100% with 20-fold coverage/chromosome. CONCLUSIONS: This highly scalable methodology enables accurate rare variant detection, with or without individual DNA sample indexing, while reducing the amount of required source DNA and total costs through less hybridization reagent consumption, multi-sample sonication in a standard PCR plate, multiplexed pre-enrichment pooling with a single hybridization and lesser sequencing coverage required to obtain high sensitivity
Genome wide association and linkage analyses identified three loci-4q25, 17q23.2, and 10q11.21-associated with variation in leukocyte telomere length: the Long Life Family Study
Leukocyte telomere length is believed to measure cellular aging in humans, and short leukocyte telomere length is associated with increased risks of late onset diseases, including cardiovascular disease, dementia, etc. Many studies have shown that leukocyte telomere length is a heritable trait, and several candidate genes have been identified, including TERT, TERC, OBFC1, and CTC1. Unlike most studies that have focused on genetic causes of chronic diseases such as heart disease and diabetes in relation to leukocyte telomere length, the present study examined the genome to identify variants that may contribute to variation in leukocyte telomere length among families with exceptional longevity. From the genome wide association analysis in 4,289 LLFS participants, we identified a novel intergenic SNP rs7680468 located near PAPSS1 and DKK2 on 4q25 (p = 4.7E-8). From our linkage analysis, we identified two additional novel loci with HLOD scores exceeding three, including 4.77 for 17q23.2, and 4.36 for 10q11.21. These two loci harbor a number of novel candidate genes with SNPs, and our gene-wise association analysis identified multiple genes, including DCAF7, POLG2, CEP95, and SMURF2 at 17q23.2; and RASGEF1A, HNRNPF, ANF487, CSTF2T, and PRKG1 at 10q11.21. Among these genes, multiple SNPs were associated with leukocyte telomere length, but the strongest association was observed with one contiguous haplotype in CEP95 and SMURF2. We also show that three previously reported genes—TERC, MYNN, and OBFC1—were significantly associated with leukocyte telomere length at p(empirical) < 0.05
Deep Sequencing of the Nicastrin Gene in Pooled DNA, the Identification of Genetic Variants That Affect Risk of Alzheimer's Disease
Nicastrin is an obligatory component of the γ-secretase; the enzyme complex that leads to the production of Aβ fragments critically central to the pathogenesis of Alzheimer's disease (AD). Analyses of the effects of common variation in this gene on risk for late onset AD have been inconclusive. We investigated the effect of rare variation in the coding regions of the Nicastrin gene in a cohort of AD patients and matched controls using an innovative pooling approach and next generation sequencing. Five SNPs were identified and validated by individual genotyping from 311 cases and 360 controls. Association analysis identified a non-synonymous rare SNP (N417Y) with a statistically higher frequency in cases compared to controls in the Greek population (OR 3.994, CI 1.105–14.439, p = 0.035). This finding warrants further investigation in a larger cohort and adds weight to the hypothesis that rare variation explains some of genetic heritability still to be identified in Alzheimer's disease
Integrated analysis of germline and somatic variants in ovarian cancer
We report the first large-scale exome-wide analysis of the combined germline-somatic landscape in ovarian cancer. Here we analyze germline and somatic alterations in 429 ovarian carcinoma cases and 557 controls. We identify 3,635 high confidence, rare truncation and 22,953 missense variants with predicted functional impact. We find germline truncation variants and large deletions across Fanconi pathway genes in 20% of cases. Enrichment of rare truncations is shown in BRCA1, BRCA2, and PALB2. Additionally, we observe germline truncation variants in genes not previously associated with ovarian cancer susceptibility (NF1, MAP3K4, CDKN2B, and MLL3). Evidence for loss of heterozygosity was found in 100% and 76% of cases with germline BRCA1 and BRCA2 truncations respectively. Germline-somatic interaction analysis combined with extensive bioinformatics annotation identifies 237 candidate functional germline truncation and missense variants, including 2 pathogenic BRCA1 and 1 TP53 deleterious variants. Finally, integrated analyses of germline and somatic variants identify significantly altered pathways, including the Fanconi, MAPK, and MLL pathways
Two Evolutionary Histories in the Genome of Rice: the Roles of Domestication Genes
Genealogical patterns in different genomic regions may be different due to the joint influence of gene flow and selection. The existence of two subspecies of cultivated rice provides a unique opportunity for analyzing these effects during domestication. We chose 66 accessions from the three rice taxa (about 22 each from Oryza sativa indica, O. sativa japonica, and O. rufipogon) for whole-genome sequencing. In the search for the signature of selection, we focus on low diversity regions (LDRs) shared by both cultivars. We found that the genealogical histories of these overlapping LDRs are distinct from the genomic background. While indica and japonica genomes generally appear to be of independent origin, many overlapping LDRs may have originated only once, as a result of selection and subsequent introgression. Interestingly, many such LDRs contain only one candidate gene of rice domestication, and several known domestication genes have indeed been “rediscovered” by this approach. In summary, we identified 13 additional candidate genes of domestication
Nomenclature and definition in asymmetric regional body overgrowth
We designate a novel term “isolated lateralized overgrowth” (ILO) for the findings previously described as “isolated hemihypertrophy” and “isolated hemihyperplasia.” ILO is defined as lateralized overgrowth in the absence of a recognized pattern of malformations, dysplasia, or morphologic variants. ILO is likely genetically heterogeneous. Further study is required to determine more of the underlying genetic etiologies and potential associations with currently unrecognized patterns of malformation.National Cancer Institute, Grant number: K08CA193915; Alex’s Lemonade Stand Foundationfor Childhood Cancer; St. Baldrick’s Foundatio
Ultradeep Sequencing of a Human Ultraconserved Region Reveals Somatic and Constitutional Genomic Instability
Ultradeep sequencing of genomes permits the detection of very low-level genomic instability in non-neoplastic tissues of patients with the most common form of inherited colorectal cancer
Whole-Exome Capture and Sequencing Identifies HEATR2 Mutation as a Cause of Primary Ciliary Dyskinesia
Motile cilia are essential components of the mucociliary escalator and are central to respiratory-tract host defenses. Abnormalities in these evolutionarily conserved organelles cause primary ciliary dyskinesia (PCD). Despite recent strides characterizing the ciliome and sensory ciliopathies through exploration of the phenotype-genotype associations in model organisms, the genetic bases of most cases of PCD remain elusive. We identified nine related subjects with PCD from geographically dispersed Amish communities and performed exome sequencing of two affected individuals and their unaffected parents. A single autosomal-recessive nonsynonymous missense mutation was identified in HEATR2, an uncharacterized gene that belongs to a family not previously associated with ciliary assembly or function. Airway epithelial cells isolated from PCD-affected individuals had markedly reduced HEATR2 levels, absent dynein arms, and loss of ciliary beating. MicroRNA-mediated silencing of the orthologous gene in Chlamydomonas reinhardtii resulted in absent outer dynein arms, reduced flagellar beat frequency, and decreased cell velocity. These findings were recapitulated by small hairpin RNA-mediated knockdown of HEATR2 in airway epithelial cells from unaffected donors. Moreover, immunohistochemistry studies in human airway epithelial cells showed that HEATR2 was localized to the cytoplasm and not in cilia, which suggests a role in either dynein arm transport or assembly. The identification of HEATR2 contributes to the growing number of genes associated with PCD identified in both individuals and model organisms and shows that exome sequencing in family studies facilitates the discovery of novel disease-causing gene mutations
Clonal haematopoiesis and risk of acute myeloid leukemia
Nearly all adults harbor acute myeloid leukemia-related clonal hematopoietic mutations at a variant allele fraction of ≥0.0001, yet relatively few develop hematologic malignancies. We conducted a nested analysis in the Nurses' Health Study and Health Professionals Follow-Up Study blood subcohorts, with up to 22 years of follow-up, to investigate associations of clonal mutations of ≥0.0001 allele frequency with future risk of acute myeloid leukemia. We identified 35 cases with acute myeloid leukemia that had pre-diagnosis peripheral blood samples and matched two controls without history of cancer per case by sex, age, and ethnicity. We conducted blinded error-corrected sequencing on all study samples and assessed variant-associated risk using conditional logistic regression. We detected acute myeloid leukemia-associated mutations in 97% of all participants (598 mutations, 5.8/person). Individuals with mutations ≥0.01 variant allele fraction had a significantly increased acute myeloid leukemia risk (OR 5.4, 95% CI 1.8-16.6), as did individuals with higher-frequency clones and those with DNMT3A R882H/C mutations. The risk of lower-frequency clones was less clear. In the 11 case-control sets with samples banked 10 years apart, clonal mutations rarely expanded over time. Our findings are consistent with published evidence that detection of clonal mutations ≥0.01 variant allele fraction identifies individuals at increased risk for acute myeloid leukemia. Further study of larger populations, mutations co-occurring within the same pre-leukemic clone and other risk factors (lifestyle, epigenetics, etc.), are still needed to fully elucidate the risk conferred by low-frequency clonal hematopoiesis in asymptomatic adults