13 research outputs found
A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries
Genome targeting methods enable cost-effective capture of specific subsets of the genome for sequencing. We present here an automated, highly scalable method for carrying out the Solution Hybrid Selection capture approach that provides a dramatic increase in scale and throughput of sequence-ready libraries produced. Significant process improvements and a series of in-process quality control checkpoints are also added. These process improvements can also be used in a manual version of the protocol
Recommended from our members
Mutations causing medullary cystic kidney disease type 1 (MCKD1) lie in a large VNTR in MUC1 missed by massively parallel sequencing
While genetic lesions responsible for some Mendelian disorders can be rapidly discovered through massively parallel sequencing (MPS) of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple Mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing, and de novo assembly, we found that each of six MCKD1 families harbors an equivalent, but apparently independently arising, mutation in sequence dramatically underrepresented in MPS data: the insertion of a single C in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5-5 kb), GC-rich (>80%), coding VNTR in the mucin 1 gene. The results provide a cautionary tale about the challenges in identifying genes responsible for Mendelian, let alone more complex, disorders through MPS
Recommended from our members
Whole exome sequencing of circulating tumor cells provides a window into metastatic prostate cancer
Comprehensive analyses of cancer genomes promise to inform prognoses and precise cancer treatments. A major barrier, however, is inaccessibility of metastatic tissue. A potential solution is to characterize circulating tumor cells (CTCs), but this requires overcoming the challenges of isolating rare cells and sequencing low-input material. Here we report an integrated process to isolate, qualify and sequence whole exomes of CTCs with high fidelity, using a census-based sequencing strategy. Power calculations suggest that mapping of >99.995% of the standard exome is possible in CTCs. We validated our process in two prostate cancer patients including one for whom we sequenced CTCs, a lymph node metastasis and nine cores of the primary tumor. Fifty-one of 73 CTC mutations (70%) were observed in matched tissue. Moreover, we identified 10 early-trunk and 56 metastatic-trunk mutations in the non-CTC tumor samples and found 90% and 73% of these, respectively, in CTC exomes. This study establishes a foundation for CTC genomics in the clinic
Mutations causing medullary cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing
Although genetic lesions responsible for some mendelian disorders can be rapidly discovered through massively parallel sequencing of whole genomes or exomes, not all diseases readily yield to such efforts. We describe the illustrative case of the simple mendelian disorder medullary cystic kidney disease type 1 (MCKD1), mapped more than a decade ago to a 2-Mb region on chromosome 1. Ultimately, only by cloning, capillary sequencing and de novo assembly did we find that each of six families with MCKD1 harbors an equivalent but apparently independently arising mutation in sequence markedly under-represented in massively parallel sequencing data: the insertion of a single cytosine in one copy (but a different copy in each family) of the repeat unit comprising the extremely long (~1.5–5 kb), GC-rich (>80%) coding variable-number tandem repeat (VNTR) sequence in the MUC1 gene encoding mucin 1. These results provide a cautionary tale about the challenges in identifying the genes responsible for mendelian, let alone more complex, disorders through massively parallel sequencing.National Institutes of Health (U.S.) (Intramural Research Program)National Human Genome Research Institute (U.S.)Charles University (program UNCE 204011)Charles University (program PRVOUK-P24/LF1/3)Czech Republic. Ministry of Education, Youth, and Sports (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant NT13116-4/2012)Czech Republic. Ministry of Health (grant LH12015)National Institutes of Health (U.S.) (Harvard Digestive Diseases Center, grant DK34854
Using viral load and epidemic dynamics to optimize pooled testing in resource-constrained settings
Virological testing is central to severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) containment, but many settings face severe limitations on testing. Group testing offers a way to increase throughput by testing pools of combined samples; however, most proposed designs have not yet addressed key concerns over sensitivity loss and implementation feasibility. Here, we combined a mathematical model of epidemic spread and empirically derived viral kinetics for SARS-CoV-2 infections to identify pooling designs that are robust to changes in prevalence and to ratify sensitivity losses against the time course of individual infections. We show that prevalence can be accurately estimated across a broad range, from 0.02 to 20%, using only a few dozen pooled tests and using up to 400 times fewer tests than would be needed for individual identification. We then exhaustively evaluated the ability of different pooling designs to maximize the number of detected infections under various resource constraints, finding that simple pooling designs can identify up to 20 times as many true positives as individual testing with a given budget. Crucially, we confirmed that our theoretical results can be translated into practice using pooled human nasopharyngeal specimens by accurately estimating a 1% prevalence among 2304 samples using only 48 tests and through pooled sample identification in a panel of 960 samples. Our results show that accounting for variation in sampled viral loads provides a nuanced picture of how pooling affects sensitivity to detect infections. Using simple, practical group testing designs can vastly increase surveillance capabilities in resource-limited settings.National Institute of General Medical Sciences (Grant U54GM088558
EGFR Variant Heterogeneity in Glioblastoma Resolved through Single-Nucleus Sequencing
Glioblastomas (GBM) with EGFR amplification represent approximately 50% of newly diagnosed cases, and recent studies have revealed frequent coexistence of multiple EGFR aberrations within the same tumor, which has implications for mutation cooperation and treatment resistance. However, bulk tumor sequencing studies cannot resolve the patterns of how the multiple EGFR aberrations coexist with other mutations within single tumor cells. Here, we applied a population-based single-cell whole-genome sequencing methodology to characterize genomic heterogeneity in EGFR-amplified glioblastomas. Our analysis effectively identified clonal events, including a novel translocation of a super enhancer to the TERT promoter, as well as subclonal LOH and multiple EGFR mutational variants within tumors. Correlating the EGFR mutations onto the cellular hierarchy revealed that EGFR truncation variants (EGFRvII and EGFR carboxyl-terminal deletions) identified in the bulk tumor segregate into nonoverlapping subclonal populations. In vitro and in vivo functional studies show that EGFRvII is oncogenic and sensitive to EGFR inhibitors currently in clinical trials. Thus, the association between diverse activating mutations in EGFR and other subclonal mutations within a single tumor supports an intrinsic mechanism for proliferative and clonal diversification with broad implications in resistance to treatment.
Significance: We developed a novel single-cell sequencing methodology capable of identifying unique, nonoverlapping subclonal alterations from archived frozen clinical specimens. Using GBM as an example, we validated our method to successfully define tumor cell subpopulations containing distinct genetic and treatment resistance profiles and potentially mutually cooperative combinations of alterations in EGFR and other genes.Dana-Farber/Harvard Cancer Center (MIT Bridge Project Fund)National Brain Tumor Societ
Noninvasive Immunohistochemical Diagnosis and Novel MUC1 Mutations Causing Autosomal Dominant Tubulointerstitial Kidney Disease
Background Autosomal dominant tubulointerstitial kidney disease caused by mucin-1 gene (MUC1) mutations (ADTKD-MUC1) is characterized by progressive kidney failure. Genetic evaluation for ADTKD-MUC1 specifically tests for a cytosine duplication that creates a unique frameshift protein (MUC1fs). Our goal was to develop immunohistochemical methods to detect the MUC1fs created by the cytosine duplication and, possibly, by other similar frameshift mutations and to identify novel MUC1 mutations in individuals with positive immunohistochemical staining for the MUC1fs protein. Methods We performed MUC1fs immunostaining on urinary cell smears and various tissues from ADTKD-MUC1-positive and -negative controls as well as in individuals from 37 ADTKD families that were negative for mutations in known ADTKD genes. We used novel analytic methods to identify MUC1 frameshift mutations. Results After technique refinement, the sensitivity and specificity for MUC1fs immunostaining of urinary cell smears were 94.2% and 88.6%, respectively. Further genetic testing on 17 families with positive MUC1fs immunostaining revealed six families with five novel MUC1 frameshift mutations that all predict production of the identical MUC1fs protein. Conclusions We developed a noninvasive immunohistochemical method to detect MUC1fs that, after further validation, may be useful in the future for diagnostic testing. Production of the MUC1fs protein may be central to the pathogenesis of ADTKD-MUC1
Sensitive Detection of Minimal Residual Disease in Patients Treated for Early-Stage Breast Cancer
© 2020 American Association for Cancer Research. Purpose: Existing cell-free DNA (cfDNA) methods lack the sensitivity needed for detecting minimal residual disease (MRD) following therapy. We developed a test for tracking hundreds of patient-specific mutations to detect MRD with a 1,000-fold lower error rate than conventional sequencing. Experimental Design: We compared the sensitivity of our approach to digital droplet PCR (ddPCR) in a dilution series, then retrospectively identified two cohorts of patients who had undergone prospective plasma sampling and clinical data collection: 16 patients with ER+/HER2- metastatic breast cancer (MBC) sampled within 6 months following metastatic diagnosis and 142 patients with stage 0 to III breast cancer who received curative-intent treatment with most sampled at surgery and 1 year postoperative. We performed whole-exome sequencing of tumors and designed individualized MRD tests, which we applied to serial cfDNA samples. Results: Our approach was 100-fold more sensitive than ddPCR when tracking 488 mutations, but most patients had fewer identifiable tumor mutations to track in cfDNA (median = 57; range = 2–346). Clinical sensitivity was 81% (n = 13/16) in newly diagnosed MBC, 23% (n = 7/30) at postoperative and 19% (n = 6/32) at 1 year in early-stage disease, and highest in patients with the most tumor mutations available to track. MRD detection at 1 year was strongly associated with distant recurrence [HR = 20.8; 95% confidence interval, 7.3–58.9]. Median lead time from first positive sample to recurrence was 18.9 months (range = 3.4–39.2 months). Conclusions: Tracking large numbers of individualized tumor mutations in cfDNA can improve MRD detection, but its sensitivity is driven by the number of tumor mutations available to track
Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels
New strategies for prevention and treatment of type 2 diabetes (T2D) require improved insight into disease etiology. We analyzed 386,731 common single-nucleotide polymorphisms (SNPs) in 1464 patients with T2D and 1467 matched controls, each characterized for measures of glucose metabolism, lipids, obesity, and blood pressure. With collaborators (FUSION and WTCCC/UKT2D), we identified and confirmed three loci associated with T2D - in a noncoding region near CDKN2A and CDKN2B, in an intron of IGF2BP2, and an intron of CDKAL1 - and replicated associations near HHEX and in SLC30A8 found by a recent whole-genome association study. We identified and confirmed association of a SNP in an intron of glucokinase regulatory protein (GCKR) with serum triglycerides. The discovery of associated variants in unsuspected genes and outside coding regions illustrates the ability of genome-wide association studies to provide potentially important clues to the pathogenesis of common diseases