9 research outputs found

    Association mapping from sequencing reads using k-mers

    Get PDF
    Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting k-mers in whole-genome sequencing reads, testing for associations directly between k-mers and the trait of interest, and local assembly of the statistically significant k-mers to identify sequence differences. An analysis of the 1000 genomes data show that sequences identified by our method largely agree with results obtained using the standard approach. However, unlike standard GWAS, our method identifies associations with structural variations and sites not present in the reference genome. We also demonstrate that population stratification can be inferred from k-mers. Finally, application to an E.coli dataset on ampicillin resistance validates the approach

    Association mapping from sequencing reads using k-mers

    Get PDF
    Genome wide association studies (GWAS) rely on microarrays, or more recently mapping of sequencing reads, to genotype individuals. The reliance on prior sequencing of a reference genome limits the scope of association studies, and also precludes mapping associations outside of the reference. We present an alignment free method for association studies of categorical phenotypes based on counting k-mers in whole-genome sequencing reads, testing for associations directly between k-mers and the trait of interest, and local assembly of the statistically significant k-mers to identify sequence differences. An analysis of the 1000 genomes data show that sequences identified by our method largely agree with results obtained using the standard approach. However, unlike standard GWAS, our method identifies associations with structural variations and sites not present in the reference genome. We also demonstrate that population stratification can be inferred from k-mers. Finally, application to an E.coli dataset on ampicillin resistance validates the approach

    A complete classification of epistatic two-locus models

    Get PDF
    Background: The study of epistasis is of great importance in statistical genetics in fields such as linkage and association analysis and QTL mapping. In an effort to classify the types of epistasis in the case of two biallelic loci Li and Reich listed and described all models in the simplest case of 0/ 1 penetrance values. However, they left open the problem of finding a classification of two-locus models with continuous penetrance values. Results: We provide a complete classification of biallelic two-locus models. In addition to solving the classification problem for dichotomous trait disease models, our results apply to any instance where real numbers are assigned to genotypes, and provide a complete framework for studying epistasis in QTL data. Our approach is geometric and we show that there are 387 distinct types of two-locus models, which can be reduced to 69 when symmetry between loci and alleles is accounted for. The model types are defined by 86 circuits, which are linear combinations of genotype values, each of which measures a fundamental unit of interaction. Conclusion: The circuits provide information on epistasis beyond that contained in the additive × additive, additive × dominance, and dominance × dominance interaction terms. We discuss th

    Resultants in genetic linkage analysis

    Get PDF
    AbstractStatistical models for genetic linkage analysis of k locus diseases are k-dimensional subvarieties of a (3k−1)-dimensional probability simplex. We determine the algebraic invariants of these models with general characteristics for k=1; in particular we recover, and generalize, the Hardy–Weinberg curve. For k=2, the algebraic invariants are presented as determinants of 32×32-matrices of linear forms in nine unknowns, a suitable format for computations with numerical data

    Additional file 1: Figure S1. of A longitudinal genome-wide association study of anti-tumor necrosis factor response among Japanese patients with rheumatoid arthritis

    No full text
    Regional plots showing association results from GEE models at 6q15, 6q27 and 10q25.3, when the analyses were restricted to patients with moderate or severe disease activity at baseline (n = 413). (PDF 350 kb

    Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project.

    No full text
    We report the generation and analysis of functional data from multiple, diverse experiments performed on a targeted 1% of the human genome as part of the pilot phase of the ENCODE Project. These data have been further integrated and augmented by a number of evolutionary and computational analyses. Together, our results advance the collective knowledge about human genome function in several major areas. First, our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another. Second, systematic examination of transcriptional regulation has yielded new understanding about transcription start sites, including their relationship to specific regulatory sequences and features of chromatin accessibility and histone modification. Third, a more sophisticated view of chromatin structure has emerged, including its inter-relationship with DNA replication and transcriptional regulation. Finally, integration of these new sources of information, in particular with respect to mammalian evolution based on inter- and intra-species sequence comparisons, has yielded new mechanistic and evolutionary insights concerning the functional landscape of the human genome. Together, these studies are defining a path for pursuit of a more comprehensive characterization of human genome function
    corecore