205 research outputs found

    Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

    Full text link
    Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Searching Genome-wide Disease Association Through SNP Data

    Get PDF
    Taking the advantage of the high-throughput Single Nucleotide Polymorphism (SNP) genotyping technology, Genome-Wide Association Studies (GWASs) are regarded holding promise for unravelling complex relationships between genotype and phenotype. GWASs aim to identify genetic variants associated with disease by assaying and analyzing hundreds of thousands of SNPs. Traditional single-locus-based and two-locus-based methods have been standardized and led to many interesting findings. Recently, a substantial number of GWASs indicate that, for most disorders, joint genetic effects (epistatic interaction) across the whole genome are broadly existing in complex traits. At present, identifying high-order epistatic interactions from GWASs is computationally and methodologically challenging. My dissertation research focuses on the problem of searching genome-wide association with considering three frequently encountered scenarios, i.e. one case one control, multi-cases multi-controls, and Linkage Disequilibrium (LD) block structure. For the first scenario, we present a simple and fast method, named DCHE, using dynamic clustering. Also, we design two methods, a Bayesian inference based method and a heuristic method, to detect genome-wide multi-locus epistatic interactions on multiple diseases. For the last scenario, we propose a block-based Bayesian approach to model the LD and conditional disease association simultaneously. Experimental results on both synthetic and real GWAS datasets show that the proposed methods improve the detection accuracy of disease-specific associations and lessen the computational cost compared with current popular methods

    Algorithms for Computational Genetics Epidemiology

    Get PDF
    The most intriguing problems in genetics epidemiology are to predict genetic disease susceptibility and to associate single nucleotide polymorphisms (SNPs) with diseases. In such these studies, it is necessary to resolve the ambiguities in genetic data. The primary obstacle for ambiguity resolution is that the physical methods for separating two haplotypes from an individual genotype (phasing) are too expensive. Although computational haplotype inference is a well-explored problem, high error rates continue to deteriorate association accuracy. Secondly, it is essential to use a small subset of informative SNPs (tag SNPs) accurately representing the rest of the SNPs (tagging). Tagging can achieve budget savings by genotyping only a limited number of SNPs and computationally inferring all other SNPs. Recent successes in high throughput genotyping technologies drastically increase the length of available SNP sequences. This elevates importance of informative SNP selection for compaction of huge genetic data in order to make feasible fine genotype analysis. Finally, even if complete and accurate data is available, it is unclear if common statistical methods can determine the susceptibility of complex diseases. The dissertation explores above computational problems with a variety of methods, including linear algebra, graph theory, linear programming, and greedy methods. The contributions include (1)significant speed-up of popular phasing tools without compromising their quality, (2)stat-of-the-art tagging tools applied to disease association, and (3)graph-based method for disease tagging and predicting disease susceptibility

    Mining whole genome sequence data to efficiently attribute individuals to source populations

    Get PDF
    Acknowledgements: The Campylobacter work in this project was supported by Food Standards Scotland project FSS00017 and the Scottish Government (Rural and Environment Science and Analytical Services Division) project A13559368.Peer reviewedPublisher PD

    Discrete Algorithms for Analysis of Genotype Data

    Get PDF
    Accessibility of high-throughput genotyping technology makes possible genome-wide association studies for common complex diseases. When dealing with common diseases, it is necessary to search and analyze multiple independent causes resulted from interactions of multiple genes scattered over the entire genome. The optimization formulations for searching disease-associated risk/resistant factors and predicting disease susceptibility for given case-control study have been introduced. Several discrete methods for disease association search exploiting greedy strategy and topological properties of case-control studies have been developed. New disease susceptibility prediction methods based on the developed search methods have been validated on datasets from case-control studies for several common diseases. Our experiments compare favorably the proposed algorithms with the existing association search and susceptibility prediction methods

    Association of Interacting Genes in the Toll-Like Receptor Signaling Pathway and the Antibody Response to Pertussis Vaccination

    Get PDF
    BACKGROUND: Activation of the Toll-like receptor (TLR) signaling pathway through TLR4 may be important in the induction of protective immunity against Bordetella pertussis with TLR4-mediated activation of dendritic and B cells, induction of cytokine expression, and reversal of tolerance as crucial steps. We examined whether single nucleotide polymorphisms (SNPs) in genes of the TLR4 pathway and their interaction are associated with the response to whole-cell vaccine (WCV) pertussis vaccination in 490 one-year-old children. METHODOLOGY/PRINCIPAL FINDINGS: We analyzed associations of 75 haplotype-tagging SNPs in genes in the TLR4 signaling pathway with pertussis toxin (PT)-IgG titers. We found significant associations between the PT-IgG titer and SNPs in CD14, TLR4, TOLLIP, TIRAP, IRAK3, IRAK4, TICAM1, and TNFRSF4 in one or more of the analyses. The strongest evidence for association was found for two SNPs (rs5744034 and rs5743894) in TOLLIP that were almost completely in linkage disequilibrium, provided statistically significant associations in all tests with the lowest p-values, and displayed a dominant mode of inheritance. However, none of these single gene associations would withstand correction for multiple testing. In addition, Multifactor Dimensionality Reduction Analysis, an approach that does not need correction for multiple testing, showed significant and strong two and three locus interactions between SNPs in TOLLIP (rs4963060), TLR4 (rs6478317) and IRAK1 (rs1059703). CONCLUSIONS/SIGNIFICANCE: We have identified significant interactions between genes in the TLR pathway in the induction of vaccine-induced immunity. These interactions underline that these genes are functionally related and together form a true biological relationship in a protein-protein interaction network. Practically all our findings may be explained by genetic variation in directly or indirectly interacting proteins at the extra- and intracytoplasmic sites of the cell membrane of antigen-presenting cells, B cells, or both. Fine tuning of interacting proteins in the TLR pathway appears important for the induction of an optimal vaccine response
    corecore