24 research outputs found

    Utilizing Haplotypes for Sensitive SNP Array-based Discovery of Somatic Chromosomal Mutations

    Get PDF
    Somatic copy-number (CN) gains and losses and copy-neutral loss of heterozygosity (CNLOH) frequently occur in tumors and play a major role in the progression of disease by altering gene dosage and unmasking deleterious recessive variants. Characterizing these mutations in an individual tumor sample is therefore critical for research on the relationship of specific mutations to disease outcome and for clinical decision-making based on mutations with known impact. A pervasive hindrance to sensitive detection of these mutations is genetic heterogeneity and high levels of contaminating normal cells in tumor samples, which limit the fraction of cells carrying informative mutations. The method presented here is the first method to utilize population-based haplotype estimates to discover low-frequency somatic kilobase- to megabase-size CN alterations and CNLOH mutations using DNA microarrays. The major innovation of the method is the use of phase concordance as a robust metric to measure evidence of allelic imbalance in the face of sporadic phasing errors in the statistical haplotype estimates and stochastic variation in the microarray data. In addition to presenting a hidden Markov model that uses the phase concordance data to perform agnostic whole-genome discovery of imbalanced regions, we also describe how to test candidate regions, and to infer the haplotype of the major chromosome. We demonstrate through controlled experiments using lab-created tumor-normal mixture samples and in silico simulated data that the sensitivity is higher than that of existing methods, detecting specific imbalance events in samples with 7% tumor or less, while maintaining specificity. We also demonstrate the potential of the method via a real-data analysis of genomic mosaicism in the general population using over 30,000 samples that were previously analyzed using another method. We made nearly three times as many calls in these samples as the previous analysis (1,119 vs. 379), most of which appear to exist at low frequencies. These findings validate recent hypotheses that somatic variation in healthy tissues is more prevalent than had previously been reported, and provides valuable observations of in vivo mutations that can be studied to make inference on genetic robustness and how these mutations impact cell fitness

    DNA isolation protocol effects on nuclear DNA analysis by microarrays, droplet digital PCR, and whole genome sequencing, and on mitochondrial DNA copy number estimation.

    Get PDF
    Potential bias introduced during DNA isolation is inadequately explored, although it could have significant impact on downstream analysis. To investigate this in human brain, we isolated DNA from cerebellum and frontal cortex using spin columns under different conditions, and salting-out. We first analysed DNA using array CGH, which revealed a striking wave pattern suggesting primarily GC-rich cerebellar losses, even against matched frontal cortex DNA, with a similar pattern on a SNP array. The aCGH changes varied with the isolation protocol. Droplet digital PCR of two genes also showed protocol-dependent losses. Whole genome sequencing showed GC-dependent variation in coverage with spin column isolation from cerebellum. We also extracted and sequenced DNA from substantia nigra using salting-out and phenol / chloroform. The mtDNA copy number, assessed by reads mapping to the mitochondrial genome, was higher in substantia nigra when using phenol / chloroform. We thus provide evidence for significant method-dependent bias in DNA isolation from human brain, as reported in rat tissues. This may contribute to array "waves", and could affect copy number determination, particularly if mosaicism is being sought, and sequencing coverage. Variations in isolation protocol may also affect apparent mtDNA abundance

    Selection Signatures in Worldwide Sheep Populations

    Get PDF
    The diversity of populations in domestic species offers great opportunities to study genome response to selection. The recently published Sheep HapMap dataset is a great example of characterization of the world wide genetic diversity in sheep. In this study, we re-analyzed the Sheep HapMap dataset to identify selection signatures in worldwide sheep populations. Compared to previous analyses, we made use of statistical methods that (i) take account of the hierarchical structure of sheep populations, (ii) make use of linkage disequilibrium information and (iii) focus specifically on either recent or older selection signatures. We show that this allows pinpointing several new selection signatures in the sheep genome and distinguishing those related to modern breeding objectives and to earlier post-domestication constraints. The newly identified regions, together with the ones previously identified, reveal the extensive genome response to selection on morphology, color and adaptation to new environments

    Genetic testing for TMEM154 mutations associated with lentivirus susceptibility in sheep

    Get PDF
    Stefan Hiendleder is a member of the International Sheep Genomics ConsortiumIn sheep, small ruminant lentiviruses cause an incurable, progressive, lymphoproliferative disease that affects millions of animals worldwide. Known as ovine progressive pneumonia virus (OPPV) in the U.S., and Visna/Maedi virus (VMV) elsewhere, these viruses reduce an animal’s health, productivity, and lifespan. Genetic variation in the ovine transmembrane protein 154 gene (TMEM154) has been previously associated with OPPV infection in U.S. sheep. Sheep with the ancestral TMEM154 haplotype encoding glutamate (E) at position 35, and either form of an N70I variant, were highly-susceptible compared to sheep homozygous for the K35 missense mutation. Our current overall aim was to characterize TMEM154 in sheep from around the world to develop an efficient genetic test for reduced susceptibility. The average frequency of TMEM154 E35 among 74 breeds was 0.51 and indicated that highly-susceptible alleles were present in most breeds around the world. Analysis of whole genome sequences from an international panel of 75 sheep revealed more than 1,300 previously unreported polymorphisms in a 62 kb region containing TMEM154 and confirmed that the most susceptible haplotypes were distributed worldwide. Novel missense mutations were discovered in the signal peptide (A13V) and the extracellular domains (E31Q, I74F, and I102T) of TMEM154. A matrix-assisted laser desorption/ionization–time-of flight mass spectrometry (MALDI-TOF MS) assay was developed to detect these and six previously reported missense and two deletion mutations in TMEM154. In blinded trials, the call rate for the eight most common coding polymorphisms was 99.4% for 499 sheep tested and 96.0% of the animals were assigned paired TMEM154 haplotypes (i.e., diplotypes). The widespread distribution of highly-susceptible TMEM154 alleles suggests that genetic testing and selection may improve the health and productivity of infected flocks.Michael P. Heaton, Theodore S. Kalbfleisch, Dustin T. Petrik, Barry Simpson, James W. Kijas, Michael L. Clawson, Carol G. Chitko-McKown, Gregory P. Harhay, Kreg A. Leymaster, the International Sheep Genomics Consortiu

    Identification of Allelic Imbalance with a Statistical Model for Subtle Genomic Mosaicism

    No full text
    <div><p>Genetic heterogeneity in a mixed sample of tumor and normal DNA can confound characterization of the tumor genome. Numerous computational methods have been proposed to detect aberrations in DNA samples from tumor and normal tissue mixtures. Most of these require tumor purities to be at least 10–15%. Here, we present a statistical model to capture information, contained in the individual's germline haplotypes, about expected patterns in the B allele frequencies from SNP microarrays while fully modeling their magnitude, the first such model for SNP microarray data. Our model consists of a pair of hidden Markov models—one for the germline and one for the tumor genome—which, conditional on the observed array data and patterns of population haplotype variation, have a dependence structure induced by the relative imbalance of an individual's inherited haplotypes. Together, these hidden Markov models offer a powerful approach for dealing with mixtures of DNA where the main component represents the germline, thus suggesting natural applications for the characterization of primary clones when stromal contamination is extremely high, and for identifying lesions in rare subclones of a tumor when tumor purity is sufficient to characterize the primary lesions. Our joint model for germline haplotypes and acquired DNA aberration is flexible, allowing a large number of chromosomal alterations, including balanced and imbalanced losses and gains, copy-neutral loss-of-heterozygosity (LOH) and tetraploidy. We found our model (which we term J-LOH) to be superior for localizing rare aberrations in a simulated 3% mixture sample. More generally, our model provides a framework for full integration of the germline and tumor genomes to deal more effectively with missing or uncertain features, and thus extract maximal information from difficult scenarios where existing methods fail.</p></div

    A joint model for germline haplotypes and acquired DNA aberration (J-LOH).

    No full text
    <p>Here we extend the HMM-based GPHMM model (bottom left) to include haplotype information, also modeled via an HMM similar to fastPHASE (top left). However, one key difference from GPHMM is that in our model, we do not use mirrored BAFs but rather model the untransformed BAF (<i>b</i>) and log R ratio (<i>r</i>) data directly. Also, unlike in fastPHASE, the pair of <i>z</i> in our model are ordered. In the joint model (right), <i>l</i><sub>1</sub>,…,<i>l<sub>M</sub></i> and <i>z</i><sub>1</sub>,…,<i>z<sub>M</sub></i> form two <i>a priori</i> independent Markov chains, with <i>l</i> describing the somatic mutation events and <i>z</i> the germline allelic dependence. The inclusion of germline genotype information contained in helps in better modeling dependence of observed BAF (<i>b<sub>m</sub></i>) and generating more accurate posterior probabilities of aberrant states (<i>l<sub>m</sub></i>).</p

    Whole genome posterior marginal probabilities for simulated 3% tumor sample.

    No full text
    <p>Results from J-LOH and J-LOH (<i>K</i> = 1) are presented for the simulated 3% tumor sample. The vertical bars represent the model state probabilities as in <a href="http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1003765#pcbi-1003765-g001" target="_blank">Figure 1</a>. The horizontal bar at the top depicts the simulated aberration regions. The white gaps in the plot represent genome regions where the pure normal cell line sample shows LOH.</p

    Unpaired analyses of adjacent normal samples.

    No full text
    <p>Posterior probabilities from the normal sample, tumor sample BAFs and LRRs, and normal sample BAFs and LRRs are presented for sample pair GSM809143/GSM809144 (a–c) and sample pair GSM809109/GSM809110 (d–f). Results from GPHMM are represented by horizontal bars above the posterior probability plots (a,d), and results from hapLOH are represented by green and orange curves (higher and lower levels of imbalance, respectively) overlaid on the BAF data (c,f).</p

    Genome-wide sensitivity and specificity for low purity simulations.

    No full text
    (†)<p> With a limited state space (normal, cn-LOH, hemizygous deletion only) and no use of LRR, approximating the settings for hapLOH.</p><p>Sensitivity is defined as the proportion of simulated aberrant markers that are called correctly. Specificity (shown in parentheses) is defined as the proportion of simulated non-aberrant markers that are called correctly. GPHMM has sensitivity less than 0.01 for purities less than 9%. Blank table entries (“-”) are due to either zero output or sensitivities <0.01. PSCN, genoCN, and ASCAT failed to produce meaningful output at all purity levels.</p><p>Genome-wide sensitivity and specificity for low purity simulations.</p

    Posterior marginal probabilities for p-arm of chromosome 1.

    No full text
    <p>Results from J-LOH and J-LOH (<i>K</i> = 1) are presented for the 30% tumor sample (top panel) and 10% tumor sample (bottom panel). The vertical height of the colored bars at each marker is proportional to the posterior marginal probability of the corresponding aberration category. Aberration types were placed into categories based on allele copy gain or loss. Horizontal bars at the top of each panel depict the regions called by other methods, from bottom: GAP, GPHMM, ASCAT, genoCN, and PSCN. ASCAT and genoCN did not produce results in the 10% tumor sample. Empty segments of the GAP bar indicate regions with sub-clones or low confidence scores.</p
    corecore