26 research outputs found

    Statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic study for complex diseases

    Full text link
    Recent advances of information technology in biomedical sciences and other applied areas have created numerous large diverse data sets with a high dimensional feature space, which provide us a tremendous amount of information and new opportunities for improving the quality of human life. Meanwhile, great challenges are also created driven by the continuous arrival of new data that requires researchers to convert these raw data into scientific knowledge in order to benefit from it. Association studies of complex diseases using SNP data have become more and more popular in biomedical research in recent years. In this paper, we present a review of recent statistical advances and challenges for analyzing correlated high dimensional SNP data in genomic association studies for complex diseases. The review includes both general feature reduction approaches for high dimensional correlated data and more specific approaches for SNPs data, which include unsupervised haplotype mapping, tag SNP selection, and supervised SNPs selection using statistical testing/scoring, statistical modeling and machine learning methods with an emphasis on how to identify interacting loci.Comment: Published in at http://dx.doi.org/10.1214/07-SS026 the Statistics Surveys (http://www.i-journals.org/ss/) by the Institute of Mathematical Statistics (http://www.imstat.org

    Use of supplementary phenotype to identify additional rheumatoid arthritis loci in a linkage analysis of 342 UK affected sibling pair families

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Although rheumatoid arthritis has been shown to have moderately strong genetic component, both linked loci identified in linkage analyses and susceptibility variants from association studies are short of adequately accounting for a comprehensive catalogue of the molecular factors underlying this complex disease. The objective of this study was to use supplementary phenotype based on cumulative hazard of rheumatoid arthritis to identify linkage evidence for new and additional rheumatoid arthritis loci in a genome-wide linkage analysis of 342 affected sibling pair families from the United Kingdom.</p> <p>Methods</p> <p>Using proportional hazards model, we estimated cumulative hazard of rheumatoid arthritis and then used it as a quantitative trait in a non-parametric multipoint variance component linkage analysis with 353 microsatellite markers distributed across the 22 autosomal chromosomes.</p> <p>Results</p> <p>We identified 3 new loci with genome-wide suggestive linkage evidence for rheumatoid arthritis on 9q21.13, 15p11.1 and 20q13.33. Our results also confirmed previously reported linkage evidence in the HLA-DRB1 region on chromosome 6 and on locus 1q32.1.</p> <p>Conclusion</p> <p>This study demonstrates the potential for information gain through the use of supplementary phenotypes in genetic study of complex diseases to identify new and additional potential linked loci that are not detected by linkage analysis of traditional phenotypes; and our results provide further evidence of the involvement of multiple loci in the genetic aetiology of rheumatoid arthritis.</p

    Computational dynamic approaches for temporal omics data with applications to systems medicine

    No full text
    Abstract Modeling and predicting biological dynamic systems and simultaneously estimating the kinetic structural and functional parameters are extremely important in systems and computational biology. This is key for understanding the complexity of the human health, drug response, disease susceptibility and pathogenesis for systems medicine. Temporal omics data used to measure the dynamic biological systems are essentials to discover complex biological interactions and clinical mechanism and causations. However, the delineation of the possible associations and causalities of genes, proteins, metabolites, cells and other biological entities from high throughput time course omics data is challenging for which conventional experimental techniques are not suited in the big omics era. In this paper, we present various recently developed dynamic trajectory and causal network approaches for temporal omics data, which are extremely useful for those researchers who want to start working in this challenging research area. Moreover, applications to various biological systems, health conditions and disease status, and examples that summarize the state-of-the art performances depending on different specific mining tasks are presented. We critically discuss the merits, drawbacks and limitations of the approaches, and the associated main challenges for the years ahead. The most recent computing tools and software to analyze specific problem type, associated platform resources, and other potentials for the dynamic trajectory and interaction methods are also presented and discussed in detail

    Hierarchical Bayesian Neural Network for Gene Expression Temporal Patterns

    No full text
    There are several important issues to be addressed for gene expression temporal patterns' analysis: first, the correlation structure of multidimensional temporal data; second, the numerous sources of variations with existing high level noise; and last, gene expression mostly involves heterogeneous multiple dynamic patterns. We propose a Hierarchical Bayesian Neural Network model to account for the input correlations of time course gene array data. The variations in absolute gene expression levels and the noise can be estimated with the hierarchical Bayesian setting. The network parameters and the hyperparameters were simultaneously optimized with Monte Carlo Markov Chain simulation. Results show that the proposed model and algorithm can well capture the dynamic feature of gene expression temporal patterns despite the high noise levels, the highly correlated inputs, the overwhelming interactions, and other complex features typically present in microarray data. We test and demonstrate the proposed models with yeast cell cycle temporal data sets. The model performance of Hierarchical Bayesian Neural Network was compared to other popular machine learning methods such as Nearest Neighbor, Support Vector Machine, and Self Organized Map.
    corecore