237 research outputs found

    PCA-based bootstrap confidence interval tests for gene-disease association involving multiple SNPs.

    Get PDF
    BACKGROUND: Genetic association study is currently the primary vehicle for identification and characterization of disease-predisposing variant(s) which usually involves multiple single-nucleotide polymorphisms (SNPs) available. However, SNP-wise association tests raise concerns over multiple testing. Haplotype-based methods have the advantage of being able to account for correlations between neighbouring SNPs, yet assuming Hardy-Weinberg equilibrium (HWE) and potentially large number degrees of freedom can harm its statistical power and robustness. Approaches based on principal component analysis (PCA) are preferable in this regard but their performance varies with methods of extracting principal components (PCs). RESULTS: PCA-based bootstrap confidence interval test (PCA-BCIT), which directly uses the PC scores to assess gene-disease association, was developed and evaluated for three ways of extracting PCs, i.e., cases only(CAES), controls only(COES) and cases and controls combined(CES). Extraction of PCs with COES is preferred to that with CAES and CES. Performance of the test was examined via simulations as well as analyses on data of rheumatoid arthritis and heroin addiction, which maintains nominal level under null hypothesis and showed comparable performance with permutation test. CONCLUSIONS: PCA-BCIT is a valid and powerful method for assessing gene-disease association involving multiple SNPs.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Comparing Partial Least Square Approaches in Gene-or Region-based Association Study for Multiple Quantitative Phenotypes

    Get PDF
    On thinking quantitatively of complex diseases, there are at least three statistical strategies for association study: single SNP on single trait, gene-or region (with multiple SNPs) on single trait and on multiple traits. The third of which is the most general in dissecting the genetic mechanism underlying complex diseases underpinning multiple quantitative traits. Gene-or region association methods based on partial least square (PLS) approaches have been shown to have apparent power advantage. However, few attempts are developed for multiple quantitative phenotypes or traits underlying a condition or disease, and the performance of various PLS approaches used in association study for multiple quantitative traits had not been assessed. We, from regression perspective, exploit association between multiple SNPs and multiple phenotypes or traits through exhaustive scan statistics (sliding window) using PLS and sparse PLS (SPLS) regression. Simulations are conducted to assess the performance of the proposed scan statistics and compare them with the existed method. The proposed methods are applied to 12 regions of GWAS data from the European Prospective Investigation of Cancer (EPIC)-Norfolk study

    Gene- or region-based association study via kernel principal component analysis.

    Get PDF
    BACKGROUND: In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity. RESULTS: Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT. CONCLUSIONS: KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Spatial epidemiology and spatial ecology study of worldwide drug-resistant tuberculosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Drug-resistant tuberculosis (DR-TB) is a major public health problem caused by various factors. It is essential to systematically investigate the epidemiological and, in particular, the ecological factors of DR-TB for its prevention and control. Studies of the ecological factors can provide information on etiology, and assist in the effective prevention and control of disease. So it is of great significance for public health to explore the ecological factors of DR-TB, which can provide guidance for formulating regional prevention and control strategies.</p> <p>Methods</p> <p>Anti-TB drug resistance data were obtained from the World Health Organization/International Union Against Tuberculosis and Lung Disease (WHO/UNION) Global Project on Anti-Tuberculosis Drug Resistance Surveillance, and data on ecological factors were collected to explore the ecological factors for DR-TB. Partial least square path modeling (PLS-PM), in combination with ordinary least squares (OLS) regression, as well as geographically weighted regression (GWR), were used to build a global and local spatial regression model between the latent synthetic DR-TB factor ("DR-TB") and latent synthetic risk factors.</p> <p>Results</p> <p>OLS regression and PLS-PM indicated a significant globally linear spatial association between "DR-TB" and its latent synthetic risk factors. However, the GWR model showed marked spatial variability across the study regions. The "TB Epidemic", "Health Service" and "DOTS (directly-observed treatment strategy) Effect" factors were all positively related to "DR-TB" in most regions of the world, while "Health Expenditure" and "Temperature" factors were negatively related in most areas of the world, and the "Humidity" factor had a negative influence on "DR-TB" in all regions of the world.</p> <p>Conclusions</p> <p>In summary, the influences of the latent synthetic risk factors on DR-TB presented spatial variability. We should formulate regional DR-TB monitoring planning and prevention and control strategies, based on the spatial characteristics of the latent synthetic risk factors and spatial variability of the local relationship between DR-TB and latent synthetic risk factors.</p

    A new insight into underlying disease mechanism through semi-parametric latent differential network model

    Full text link
    Background In genomic studies, to investigate how the structure of a genetic network differs between two experiment conditions is a very interesting but challenging problem, especially in high-dimensional setting. Existing literatures mostly focus on differential network modelling for continuous data. However, in real application, we may encounter discrete data or mixed data, which urges us to propose a unified differential network modelling for various data types. Results We propose a unified latent Gaussian copula differential network model which provides deeper understanding of the unknown mechanism than that among the observed variables. Adaptive rank-based estimation approaches are proposed with the assumption that the true differential network is sparse. The adaptive estimation approaches do not require precision matrices to be sparse, and thus can allow the individual networks to contain hub nodes. Theoretical analysis shows that the proposed methods achieve the same parametric convergence rate for both the difference of the precision matrices estimation and differential structure recovery, which means that the extra modeling flexibility comes at almost no cost of statistical efficiency. Besides theoretical analysis, thorough numerical simulations are conducted to compare the empirical performance of the proposed methods with some other state-of-the-art methods. The result shows that the proposed methods work quite well for various data types. The proposed method is then applied on gene expression data associated with lung cancer to illustrate its empirical usefulness. Conclusions The proposed latent variable differential network models allows for various data-types and thus are more flexible, which also provide deeper understanding of the unknown mechanism than that among the observed variables. Theoretical analysis, numerical simulation and real application all demonstrate the great advantages of the latent differential network modelling and thus are highly recommended

    An Integrative Framework for Bayesian Variable Selection with Informative Priors for Identifying Genes and Pathways

    Full text link
    The discovery of genetic or genomic markers plays a central role in the development of personalized medicine. A notable challenge exists when dealing with the high dimensionality of the data sets, as thousands of genes or millions of genetic variants are collected on a relatively small number of subjects. Traditional gene-wise selection methods using univariate analyses face difficulty to incorporate correlational, structural, or functional structures amongst the molecular measures. For microarray gene expression data, we first summarize solutions in dealing with ‘large p, small n’ problems, and then propose an integrative Bayesian variable selection (iBVS) framework for simultaneously identifying causal or marker genes and regulatory pathways. A novel partial least squares (PLS) g-prior for iBVS is developed to allow the incorporation of prior knowledge on gene-gene interactions or functional relationships. From the point view of systems biology, iBVS enables user to directly target the joint effects of multiple genes and pathways in a hierarchical modeling diagram to predict disease status or phenotype. The estimated posterior selection probabilities offer probabilitic and biological interpretations. Both simulated data and a set of microarray data in predicting stroke status are used in validating the performance of iBVS in a Probit model with binary outcomes. iBVS offers a general framework for effective discovery of various molecular biomarkers by combining data-based statistics and knowledge-based priors. Guidelines on making posterior inferences, determining Bayesian significance levels, and improving computational efficiencies are also discussed
    corecore