6 research outputs found

    Linkage disequilibrium based genotype calling from low-coverage shotgun sequencing reads

    Get PDF
    Background Recent technology advances have enabled sequencing of individual genomes, promising to revolutionize biomedical research. However, deep sequencing remains more expensive than microarrays for performing whole-genome SNP genotyping. Results In this paper we introduce a new multi-locus statistical model and computationally efficient genotype calling algorithms that integrate shotgun sequencing data with linkage disequilibrium (LD) information extracted from reference population panels such as Hapmap or the 1000 genomes project. Experiments on publicly available 454, Illumina, and ABI SOLiD sequencing datasets suggest that integration of LD information results in genotype calling accuracy comparable to that of microarray platforms from sequencing data of low-coverage. A software package implementing our algorithm, released under the GNU General Public License, is available at http://dna.engr.uconn.edu/software/GeneSeq/. Conclusions Integration of LD information leads to significant improvements in genotype calling accuracy compared to prior LD-oblivious methods, rendering low-coverage sequencing as a viable alternative to microarrays for conducting large-scale genome-wide association studies

    Informative SNP Selection and Validation

    Get PDF
    The search for genetic regions associated with complex diseases, such as cancer or Alzheimer\u27s disease, is an important challenge that may lead to better diagnosis and treatment. The existence of millions of DNA variations, primarily single nucleotide polymorphisms (SNPs), may allow the fine dissection of such associations. However, studies seeking disease association are limited by the cost of genotyping SNPs. Therefore, it is essential to find a small subset of informative SNPs (tag SNPs) that may be used as good representatives of the rest of the SNPs. Several informative SNP selection methods have been developed. Our experiments compare favorably to all the prediction and statistical methods by selecting the least number of informative SNPs. We proposed algorithms for faster prediction which yielded acceptable trade off. We validated our results using the k-fold test and its many variations

    Linkage, association, and haplotype analysis: A spectrum of approaches to elucidate the genetic influences of complex human traits

    Get PDF
    The goal of human genetics is to identify genetic variants that influence a certain trait with the intent to provide a better understanding of the biology behind that trait. As technologies and statistical methods towards this goal have developed, there has been a change in the approaches to identify trait-causing variants. The three projects reported here cover a range of approaches. Early studies focused on family-based data, using linkage analysis to find regions of the genome shared by members with similar trait values. This approach was used to confirm the involvement of CYP2E1 with the level of response to alcohol in sibling pairs with an alcoholic parent. With the advent of high through-put genotyping panels, the field of human genetics has shifted to population-based association studies that seek to find variants that correlate with a trait. This approach was used to search for regions of the genome that infer risk for Pick's disease, a spectrum of heterogeneous dementia diseases, and to reproduce the association with MAPT, a gene with known disease-causing mutations. Haplotype based analysis approaches have emerged to improve the analysis of genomic data. A novel algorithm for haplotype based analysis was developed to identify long haplotypes shared in a population based on genotypes from genome-wide association data and was found to be very accurate when predicting haplotypes within the shared regions. Together, these three projects represent the past, present, and future of the study of human genetics
    corecore