201 research outputs found

    Computational and statistical approaches to analyzing variants identified by exome sequencing

    Get PDF
    New sequencing technology has enabled the identification of thousands of single nucleotide polymorphisms in the exome, and many computational and statistical approaches to identify disease-association signals have emerged.National Institutes of Health (U.S.) (Grant R01-MH084676)National Institutes of Health (U.S.) (Grant R01-GM078598)National Institutes of Health (U.S.) (Training grant T32-HL07604-25)Brigham and Women's Hospital (Division of Cardiovascular Medicine

    Analysis of Sequence Conservation at Nucleotide Resolution

    Get PDF
    One of the major goals of comparative genomics is to understand the evolutionary history of each nucleotide in the human genome sequence, and the degree to which it is under selective pressure. Ascertainment of selective constraint at nucleotide resolution is particularly important for predicting the functional significance of human genetic variation and for analyzing the sequence substructure of cis-regulatory sequences and other functional elements. Current methods for analysis of sequence conservation are focused on delineation of conserved regions comprising tens or even hundreds of consecutive nucleotides. We therefore developed a novel computational approach designed specifically for scoring evolutionary conservation at individual base-pair resolution. Our approach estimates the rate at which each nucleotide position is evolving, computes the probability of neutrality given this rate estimate, and summarizes the result in a Sequence CONservation Evaluation (SCONE) score. We computed SCONE scores in a continuous fashion across 1% of the human genome for which high-quality sequence information from up to 23 genomes are available. We show that SCONE scores are clearly correlated with the allele frequency of human polymorphisms in both coding and noncoding regions. We find that the majority of noncoding conserved nucleotides lie outside of longer conserved elements predicted by other conservation analyses, and are experiencing ongoing selection in modern humans as evident from the allele frequency spectrum of human polymorphism. We also applied SCONE to analyze the distribution of conserved nucleotides within functional regions. These regions are markedly enriched in individually conserved positions and short (<15 bp) conserved “chunks.” Our results collectively suggest that the majority of functionally important noncoding conserved positions are highly fragmented and reside outside of canonically defined long conserved noncoding sequences. A small subset of these fragmented positions may be identified with high confidence

    Adaptive Mutations in the JC Virus Protein Capsid Are Associated with Progressive Multifocal Leukoencephalopathy (PML)

    Get PDF
    PML is a progressive and mostly fatal demyelinating disease caused by JC virus infection and destruction of infected oligodendrocytes in multiple brain foci of susceptible individuals. While JC virus is highly prevalent in the human population, PML is a rare disease that exclusively afflicts only a small percentage of immunocompromised individuals including those affected by HIV (AIDS) or immunosuppressive drugs. Viral- and/or host-specific factors, and not simply immune status, must be at play to account for the very large discrepancy between viral prevalence and low disease incidence. Here, we show that several amino acids on the surface of the JC virus capsid protein VP1 display accelerated evolution in viral sequences isolated from PML patients but not in sequences isolated from healthy subjects. We provide strong evidence that at least some of these mutations are involved in binding of sialic acid, a known receptor for the JC virus. Using statistical methods of molecular evolution, we performed a comprehensive analysis of JC virus VP1 sequences isolated from 55 PML patients and 253 sequences isolated from the urine of healthy individuals and found that a subset of amino acids found exclusively among PML VP1 sequences is acquired via adaptive evolution. By modeling of the 3-D structure of the JC virus capsid, we showed that these residues are located within the sialic acid binding site, a JC virus receptor for cell infection. Finally, we go on to demonstrate the involvement of some of these sites in receptor binding by demonstrating a profound reduction in hemagglutination properties of viral-like particles made of the VP1 protein carrying these mutations. Collectively, these results suggest that a more virulent PML causing phenotype of JC virus is acquired via adaptive evolution that changes viral specificity for its cellular receptor(s)

    Fine-Scale Haplotype Structure Reveals Strong Signatures of Positive Selection in a Recombining Bacterial Pathogen

    Get PDF
    Identifying genetic variation in bacteria that has been shaped by ecological differences remains an important challenge. For recombining bacteria, the sign and strength of linkage provide a unique lens into ongoing selection. We show that derived allelesPeer reviewe

    Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects

    Get PDF
    As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.United States. National Institutes of Health (U54DK105566)United States. National Institutes of Health (R01GM104371
    corecore