13 research outputs found

    A Macaque's-Eye View of Human Insertions and Deletions: Differences in Mechanisms

    Get PDF
    Insertions and deletions (indels) cause numerous genetic diseases and lead to pronounced evolutionary differences among genomes. The macaque sequences provide an opportunity to gain insights into the mechanisms generating these mutations on a genome-wide scale by establishing the polarity of indels occurring in the human lineage since its divergence from the chimpanzee. Here we apply novel regression techniques and multiscale analyses to demonstrate an extensive regional indel rate variation stemming from local fluctuations in divergence, GC content, male and female recombination rates, proximity to telomeres, and other genomic factors. We find that both replication and, surprisingly, recombination are significantly associated with the occurrence of small indels. Intriguingly, the relative inputs of replication versus recombination differ between insertions and deletions, thus the two types of mutations are likely guided in part by distinct mechanisms. Namely, insertions are more strongly associated with factors linked to recombination, while deletions are mostly associated with replication-related features. Indel as a term misleadingly groups the two types of mutations together by their effect on a sequence alignment. However, here we establish that the correct identification of a small gap as an insertion or a deletion (by use of an outgroup) is crucial to determining its mechanism of origin. In addition to providing novel insights into insertion and deletion mutagenesis, these results will assist in gap penalty modeling and eventually lead to more reliable genomic alignments

    Clinically actionable mutation profiles in patients with cancer identified by whole-genome sequencing

    Get PDF
    Next-generation sequencing (NGS) efforts have established catalogs of mutations relevant to cancer development. However, the clinical utility of this information remains largely unexplored. Here, we present the results of the first eight patients recruited into a clinical whole-genome sequencing (WGS) program in the United Kingdom. We performed PCR-free WGS of fresh frozen tumors and germline DNA at 75× and 30×, respectively, using the HiSeq2500 HTv4. Subtracted tumor VCFs and paired germlines were subjected to comprehensive analysis of coding and noncoding regions, integration of germline with somatically acquired variants, and global mutation signatures and pathway analyses. Results were classified into tiers and presented to a multidisciplinary tumor board. WGS results helped to clarify an uncertain histopathological diagnosis in one case, led to informed or supported prognosis in two cases, leading to de-escalation of therapy in one, and indicated potential treatments in all eight. Overall 26 different tier 1 potentially clinically actionable findings were identified using WGS compared with six SNVs/indels using routine targeted NGS. These initial results demonstrate the potential of WGS to inform future diagnosis, prognosis, and treatment choice in cancer and justify the systematic evaluation of the clinical utility of WGS in larger cohorts of patients with cancer

    Structural and non-coding variants increase the diagnostic yield of clinical whole genome sequencing for rare diseases

    Get PDF
    BACKGROUND: Whole genome sequencing is increasingly being used for the diagnosis of patients with rare diseases. However, the diagnostic yields of many studies, particularly those conducted in a healthcare setting, are often disappointingly low, at 25-30%. This is in part because although entire genomes are sequenced, analysis is often confined to in silico gene panels or coding regions of the genome.METHODS: We undertook WGS on a cohort of 122 unrelated rare disease patients and their relatives (300 genomes) who had been pre-screened by gene panels or arrays. Patients were recruited from a broad spectrum of clinical specialties. We applied a bioinformatics pipeline that would allow comprehensive analysis of all variant types. We combined established bioinformatics tools for phenotypic and genomic analysis with our novel algorithms (SVRare, ALTSPLICE and GREEN-DB) to detect and annotate structural, splice site and non-coding variants.RESULTS: Our diagnostic yield was 43/122 cases (35%), although 47/122 cases (39%) were considered solved when considering novel candidate genes with supporting functional data into account. Structural, splice site and deep intronic variants contributed to 20/47 (43%) of our solved cases. Five genes that are novel, or were novel at the time of discovery, were identified, whilst a further three genes are putative novel disease genes with evidence of causality. We identified variants of uncertain significance in a further fourteen candidate genes. The phenotypic spectrum associated with RMND1 was expanded to include polymicrogyria. Two patients with secondary findings in FBN1 and KCNQ1 were confirmed to have previously unidentified Marfan and long QT syndromes, respectively, and were referred for further clinical interventions. Clinical diagnoses were changed in six patients and treatment adjustments made for eight individuals, which for five patients was considered life-saving.CONCLUSIONS: Genome sequencing is increasingly being considered as a first-line genetic test in routine clinical settings and can make a substantial contribution to rapidly identifying a causal aetiology for many patients, shortening their diagnostic odyssey. We have demonstrated that structural, splice site and intronic variants make a significant contribution to diagnostic yield and that comprehensive analysis of the entire genome is essential to maximise the value of clinical genome sequencing.</p

    Strong heterogeneity in mutation rate causes misleading hallmarks of natural selection on indel mutations in the human genome

    Get PDF
    Elucidating the mechanisms of mutation accumulation and fixation is critical to understand the nature of genetic variation and its contribution to genome evolution. Of particular interest is the effect of insertions and deletions (indels) on the evolution of genome landscapes. Recent population-scaled sequencing efforts provide unprecedented data for analyzing the relative impact of selection versus nonadaptive forces operating on indels. Here, we combined McDonald–Kreitman tests with the analysis of derived allele frequency spectra to investigate the dynamics of allele fixation of short (1–50 bp) indels in the human genome. Our analyses revealed apparently higher fixation probabilities for insertions than deletions. However, this fixation bias is not consistent with either selection or biased gene conversion and varies with local mutation rate, being particularly pronounced at indel hotspots. Furthermore, we identified an unprecedented number of loci with evidence for multiple indel events in the primate phylogeny. Even in nonrepetitive sequence contexts (a priori not prone to indel mutations), such loci are 60-fold more frequent than expected according to a model of uniform indel mutation rate. This provides evidence of as yet unidentified cryptic indel hotspots. We propose that indel homoplasy, at known and cryptic hotspots, produces systematic errors in determination of ancestral alleles via parsimony and advise caution interpreting classic selection tests given the strong heterogeneity in indel rates across the genome. These results will have great impact on studies seeking to infer evolutionary forces operating on indels observed in closely related species, because such mutations are traditionally presumed homoplasy-free

    The (r)evolution of SINE versus LINE distributions in primate genomes: Sex chromosomes are important

    No full text
    The densities of transposable elements (TEs) in the human genome display substantial variation both within individual chromosomes and among chromosome types (autosomes and the two sex chromosomes). Finding an explanation for this variability has been challenging, especially in light of genome landscapes unique to the sex chromosomes. Here, using a multiple regression framework, we investigate primate Alu and L1 densities shaped by regional genome features and location on a particular chromosome type. As a result of our analysis, first, we build statistical models explaining up to 79% and 44% of variation in Alu and L1 element density, respectively. Second, we analyze sex chromosome versus autosome TE densities corrected for regional genomic effects. We discover that sex-chromosome bias in Alu and L1 distributions not only persists after accounting for these effects, but even presents differences in patterns, confirming preferential Alu integration in the male germline, yet likely integration of L1s in both male and female germlines or in early embryogenesis. Additionally, our models reveal that local base composition (measured by GC content and density of L1 target sites) and natural selection (inferred via density of most conserved elements) are significant to predicting densities of L1s. Interestingly, measurements of local double-stranded breaks (a 13-mer associated with genome instability) strongly correlate with densities of Alu elements; little evidence was found for the role of recombination-driven deletion in driving TE distributions over evolutionary time. Thus, Alu and L1 densities have been influenced by the combination of distinct local genome landscapes and the unique evolutionary dynamics of sex chromosomes

    Ride the wavelet: A multiscale analysis of genomic contexts flanking small insertions and deletions

    No full text
    Recent studies have revealed that insertions and deletions (indels) are more different in their formation than previously assumed. What remains enigmatic is how the local DNA sequence context contributes to these differences. To investigate the relative impact of various molecular mechanisms to indel formation, we analyzed sequence contexts of indels in the non protein- or RNA-coding, nonrepetitive (NCNR) portion of the human genome. We considered small (≤30-bp) indels occurring in the human lineage since its divergence from chimpanzee and used wavelet techniques to study, simultaneously for multiple scales, the spatial patterns of short sequence motifs associated with indel mutagenesis. In particular, we focused on motifs associated with DNA polymerase activity, topoisomerase cleavage, double-strand breaks (DSBs), and their repair. We came to the following conclusions. First, many motifs are characterized by unique enrichment profiles in the vicinity of indels vs. indel-free portions of the genome, verifying the importance of sequence context in indel mutagenesis. Second, only limited similarity in motif frequency profiles is evident flanking insertions vs. deletions, confirming differences in their mutagenesis. Third, substantial similarity in frequency profiles exists between pairs of individual motifs flanking insertions (and separately deletions), suggesting “cooperation” among motifs, and thus molecular mechanisms, during indel formation. Fourth, the wavelet analyses demonstrate that all these patterns are highly dependent on scale (the size of an interval considered). Finally, our results depict a model of indel mutagenesis comprising both replication and recombination (via repair of paused replication forks and site-specific recombination)

    A high throughput screen for active human transposable elements

    No full text
    Abstract Background Transposable elements (TEs) are mobile genetic sequences that randomly propagate within their host’s genome. This mobility has the potential to affect gene transcription and cause disease. However, TEs are technically challenging to identify, which complicates efforts to assess the impact of TE insertions on disease. Here we present a targeted sequencing protocol and computational pipeline to identify polymorphic and novel TE insertions using next-generation sequencing: TE-NGS. The method simultaneously targets the three subfamilies that are responsible for the majority of recent TE activity (L1HS, AluYa5/8, and AluYb8/9) thereby obviating the need for multiple experiments and reducing the amount of input material required. Results Here we describe the laboratory protocol and detection algorithm, and a benchmark experiment for the reference genome NA12878. We demonstrate a substantial enrichment for on-target fragments, and high sensitivity and precision to both reference and NA12878-specific insertions. We report 17 previously unreported loci for this individual which are supported by orthogonal long-read evidence, and we identify 1470 polymorphic and novel TEs in 12 additional samples that were previously undocumented in databases of insertion polymorphisms. Conclusions We anticipate that future applications of TE-NGS alongside exome sequencing of patients with sporadic disease will reduce the number of unresolved cases, and improve estimates of the contribution of TEs to human genetic disease
    corecore