26,250 research outputs found

    QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles

    Get PDF
    Background: Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. Results: For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNVD). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNVHS). To also increase specificity, SNVs called were overruled when their frequency was below the 80th percentile calculated on the distribution of error frequencies (QQ-SNVHS-P80). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNVD performed similarly to the existing approaches. QQ-SNVHS was more sensitive on all test sets but with more false positives. QQ-SNVHS-P80 was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5 %, QQ-SNVHS-P80 revealed a sensitivity of 100 % (vs. 40-60 % for the existing methods) and a specificity of 100 % (vs. 98.0-99.7 % for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5 % were consistently detected by QQ-SNVHS-P80 from different generations of Illumina sequencers. Conclusions: We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data

    Inferring clonal evolution of tumors from single nucleotide somatic mutations

    Get PDF
    High-throughput sequencing allows the detection and quantification of frequencies of somatic single nucleotide variants (SNV) in heterogeneous tumor cell populations. In some cases, the evolutionary history and population frequency of the subclonal lineages of tumor cells present in the sample can be reconstructed from these SNV frequency measurements. However, automated methods to do this reconstruction are not available and the conditions under which reconstruction is possible have not been described. We describe the conditions under which the evolutionary history can be uniquely reconstructed from SNV frequencies from single or multiple samples from the tumor population and we introduce a new statistical model, PhyloSub, that infers the phylogeny and genotype of the major subclonal lineages represented in the population of cancer cells. It uses a Bayesian nonparametric prior over trees that groups SNVs into major subclonal lineages and automatically estimates the number of lineages and their ancestry. We sample from the joint posterior distribution over trees to identify evolutionary histories and cell population frequencies that have the highest probability of generating the observed SNV frequency data. When multiple phylogenies are consistent with a given set of SNV frequencies, PhyloSub represents the uncertainty in the tumor phylogeny using a partial order plot. Experiments on a simulated dataset and two real datasets comprising tumor samples from acute myeloid leukemia and chronic lymphocytic leukemia patients demonstrate that PhyloSub can infer both linear (or chain) and branching lineages and its inferences are in good agreement with ground truth, where it is available

    Quantifying single nucleotide variant detection sensitivity in exome sequencing

    Get PDF
    BACKGROUND: The targeted capture and sequencing of genomic regions has rapidly demonstrated its utility in genetic studies. Inherent in this technology is considerable heterogeneity of target coverage and this is expected to systematically impact our sensitivity to detect genuine polymorphisms. To fully interpret the polymorphisms identified in a genetic study it is often essential to both detect polymorphisms and to understand where and with what probability real polymorphisms may have been missed. RESULTS: Using down-sampling of 30 deeply sequenced exomes and a set of gold-standard single nucleotide variant (SNV) genotype calls for each sample, we developed an empirical model relating the read depth at a polymorphic site to the probability of calling the correct genotype at that site. We find that measured sensitivity in SNV detection is substantially worse than that predicted from the naive expectation of sampling from a binomial. This calibrated model allows us to produce single nucleotide resolution SNV sensitivity estimates which can be merged to give summary sensitivity measures for any arbitrary partition of the target sequences (nucleotide, exon, gene, pathway, exome). These metrics are directly comparable between platforms and can be combined between samples to give “power estimates” for an entire study. We estimate a local read depth of 13X is required to detect the alleles and genotype of a heterozygous SNV 95% of the time, but only 3X for a homozygous SNV. At a mean on-target read depth of 20X, commonly used for rare disease exome sequencing studies, we predict 5–15% of heterozygous and 1–4% of homozygous SNVs in the targeted regions will be missed. CONCLUSIONS: Non-reference alleles in the heterozygote state have a high chance of being missed when commonly applied read coverage thresholds are used despite the widely held assumption that there is good polymorphism detection at these coverage levels. Such alleles are likely to be of functional importance in population based studies of rare diseases, somatic mutations in cancer and explaining the “missing heritability” of quantitative traits

    Coupling of a Single Tin-vacancy Center to a Photonic Crystal Cavity in Diamond

    Full text link
    We demonstrate optical coupling between a single tin-vacancy (SnV) center in diamond and a free-standing photonic crystal nanobeam cavity. The cavities are fabricated using quasi-isotropic etching and feature experimentally measured quality factors as high as ~11,000. We investigate the dependence of a single SnV center's emission by controlling the cavity wavelength using a laser-induced gas desorption technique. Under resonance conditions, we observe an intensity enhancement of the SnV emission by a factor of 12 and a 16-fold reduction of the SnV lifetime. Based on the large enhancement of the SnV emission rate inside the cavity, we estimate the Purcell factor for the SnV zero-phonon line to be 37 and the coupling efficiency of the SnV center to the cavity, the beta factor, to be 95%. Our work paves the way for the realization of quantum photonic devices and systems based on efficient photonic interfaces using the SnV color center in diamond

    Germ-line and somatic EPHA2 coding variants in lens aging and cataract

    Get PDF
    Rare germ-line mutations in the coding regions of the human EPHA2 gene (EPHA2) have been associated with inherited forms of pediatric cataract, whereas, frequent, non-coding, single nucleotide variants (SNVs) have been associated with age-related cataract. Here we sought to determine if germ-line EPHA2 coding SNVs were associated with age-related cataract in a case-control DNA panel (> 50 years) and if somatic EPHA2 coding SNVs were associated with lens aging and/or cataract in a post-mortem lens DNA panel (> 48 years). Micro-fluidic PCR amplification followed by targeted amplicon (exon) next-generation (deep) sequencing of EPHA2 (17-exons) afforded high read-depth coverage (1000x) for > 82% of reads in the cataract case-control panel (161 cases, 64 controls) and > 70% of reads in the post-mortem lens panel (35 clear lens pairs, 22 cataract lens pairs). Novel and reference (known) missense SNVs in EPHA2 that were predicted in silico to be functionally damaging were found in both cases and controls from the age-related cataract panel at variant allele frequencies (VAFs) consistent with germ-line transmission (VAF > 20%). Similarly, both novel and reference missense SNVs in EPHA2 were found in the post-mortem lens panel at VAFs consistent with a somatic origin (VAF > 3%). The majority of SNVs found in the cataract case-control panel and post-mortem lens panel were transitions and many occurred at di-pyrimidine sites that are susceptible to ultraviolet (UV) radiation induced mutation. These data suggest that novel germ-line (blood) and somatic (lens) coding SNVs in EPHA2 that are predicted to be functionally deleterious occur in adults over 50 years of age. However, both types of EPHA2 coding variants were present at comparable levels in individuals with or without age-related cataract making simple genotype-phenotype correlations inconclusive

    DeCiFering the elusive cancer cell fraction in tumor heterogeneity and evolution

    Get PDF
    The cancer cell fraction (CCF), or proportion of cancerous cells in a tumor containing a single-nucleotide variant (SNV), is a fundamental statistic used to quantify tumor heterogeneity and evolution. Existing CCF estimation methods from bulk DNA sequencing data assume that every cell with an SNV contains the same number of copies of the SNV. This assumption is unrealistic in tumors with copy-number aberrations that alter SNV multiplicities. Furthermore, the CCF does not account for SNV losses due to copy-number aberrations, confounding downstream phylogenetic analyses. We introduce DeCiFer, an algorithm that overcomes these limitations by clustering SNVs using a novel statistic, the descendant cell fraction (DCF). The DCF quantifies both the prevalence of an SNV at the present time and its past evolutionary history using an evolutionary model that allows mutation losses. We show that DeCiFer yields more parsimonious reconstructions of tumor evolution than previously reported for 49 prostate cancer samples

    A novel method for detecting SNV genotypes from personal genome sequencing data

    Get PDF
    Genoomi variatsioonide uuringud on olulised mitme erineva valdkonna jaoks nagu nĂ€iteks personaalne meditsiin, evolutsiooniline analĂŒĂŒs vĂ”i bakteritĂŒvede tuvastamine. SNV-d, ĂŒksiku nukleotiidi variandid, on kĂ”ige pĂ”hjalikumalt uuritud variatsioonid genoomis ning seostatud mitmete tunnuste ja haigustega. Genoomiuuringud sĂ”ltuvad olulisel mÀÀral genoomist antud variatsioonide alleeli variantide mÀÀramise vĂ”imekusest, olemasolevad SNV genotĂŒĂŒpide mÀÀramise meetodid on aga vĂ”rdlemisi aeglased ja ebausaldusvÀÀrsed. KĂ€esoleva magistritöö eesmĂ€rk on arendada vĂ€lja uudne meetod SNV genotĂŒĂŒpide mÀÀramiseks kiiresti ning usaldusvÀÀrselt, jĂ€ttes vahele kĂ”ige vigaderohkema etapi tavalisest SNV mÀÀramise töövoost. Selles töös tutvustati uut, k-meeridel pĂ”hinevat lĂ€henemist SNV genotĂŒĂŒpide mÀÀramiseks. Arendati vĂ€lja meetod SNV asukohti katvate unikaalsete k-meeride kasutamiseks antud SNV-de alleeli variantide leidmiseks. Töö kĂ€igus loodi programmid etteantud SNV-de jaoks unikaalsete k-meeride leidmiseks ning personaalse genoomi sekveneerimisandmetest genotĂŒĂŒbi mÀÀramise metoodika testimiseks. Tutvustatud meetodit testiti nii simuleeritud kui reaalsete sekveneerimisandmetega, ĂŒhtlasi mÔÔdeti programmi aja- ja mĂ€lukasutust. Tulevaseks tööks toodi vĂ€lja ka mĂ”ned soovitused programmi ajakulu vĂ€hendamiseks ning sekveneerimisandmetest mÀÀratud genotĂŒĂŒpide arvu suurendamiseks.The genome variation studies are important for many areas like personal medicine, evolutionary analysis or bacterial strain identification. The single nucleotide variants (SNVs) are the most thoroughly studied variations in the genome, associated with different traits and diseases. Genomic studies depend greatly on the ability of detecting the allele variants of these variations present in personal genome. However, the methods used for calling SNV genotypes from personal sequencing data are not very fast nor reliable. The aim of this master's thesis was to develop a novel method for detecting SNV genotypes fast and reliably with a new approach that allows omitting the often error-prone step of read mapping used in the general variant calling pipelines. A k-mer based approach was introduced in this study for detecting SNV genotypes. A method was developed for using the unique k-mers covering the SNV locations for different allele variants to identify the genotypes of these SNVs. A program was created for compiling a list of unique k-mers for the allele variants of given SNVs and the method was tested using a program for detecting the genotype of these SNVs from the personal genome sequencing data. The method introduced in this study was tested on both simulated and real sequencing data and the memory and time usage was measured. Some recommendations were made for future work to reduce the time usage of the program as well as improving the detection of SNV genotypes

    Defect interactions in Sn<sub>1-<i>x</i></sub>Ge<sub><i>x</i></sub> random alloys

    Get PDF
    Sn1-xGex alloys are candidates for buffer layers to match the lattices of III-V or II-VI compounds with Si or Ge for microelectronic or optoelectronic applications. In the present work electronic structure calculations are used to study relative energies of clusters formed between Sn atoms and lattice vacancies in Ge that relate to alloys of low Sn content. We also establish that the special quasirandom structure approach correctly describes the random alloy nature of Sn1-xGex with higher Sn content. In particular, the calculated deviations of the lattice parameters from Vegard's Law are consistent with experimental results
    • 

    corecore