97 research outputs found

    Comparing Partial Least Square Approaches in Gene-or Region-based Association Study for Multiple Quantitative Phenotypes

    Get PDF
    On thinking quantitatively of complex diseases, there are at least three statistical strategies for association study: single SNP on single trait, gene-or region (with multiple SNPs) on single trait and on multiple traits. The third of which is the most general in dissecting the genetic mechanism underlying complex diseases underpinning multiple quantitative traits. Gene-or region association methods based on partial least square (PLS) approaches have been shown to have apparent power advantage. However, few attempts are developed for multiple quantitative phenotypes or traits underlying a condition or disease, and the performance of various PLS approaches used in association study for multiple quantitative traits had not been assessed. We, from regression perspective, exploit association between multiple SNPs and multiple phenotypes or traits through exhaustive scan statistics (sliding window) using PLS and sparse PLS (SPLS) regression. Simulations are conducted to assess the performance of the proposed scan statistics and compare them with the existed method. The proposed methods are applied to 12 regions of GWAS data from the European Prospective Investigation of Cancer (EPIC)-Norfolk study

    Gene- or region-based association study via kernel principal component analysis.

    Get PDF
    BACKGROUND: In genetic association study, especially in GWAS, gene- or region-based methods have been more popular to detect the association between multiple SNPs and diseases (or traits). Kernel principal component analysis combined with logistic regression test (KPCA-LRT) has been successfully used in classifying gene expression data. Nevertheless, the purpose of association study is to detect the correlation between genetic variations and disease rather than to classify the sample, and the genomic data is categorical rather than numerical. Recently, although the kernel-based logistic regression model in association study has been proposed by projecting the nonlinear original SNPs data into a linear feature space, it is still impacted by multicolinearity between the projections, which may lead to loss of power. We, therefore, proposed a KPCA-LRT model to avoid the multicolinearity. RESULTS: Simulation results showed that KPCA-LRT was always more powerful than principal component analysis combined with logistic regression test (PCA-LRT) at different sample sizes, different significant levels and different relative risks, especially at the genewide level (1E-5) and lower relative risks (RR = 1.2, 1.3). Application to the four gene regions of rheumatoid arthritis (RA) data from Genetic Analysis Workshop16 (GAW16) indicated that KPCA-LRT had better performance than single-locus test and PCA-LRT. CONCLUSIONS: KPCA-LRT is a valid and powerful gene- or region-based method for the analysis of GWAS data set, especially under lower relative risks and lower significant levels.RIGHTS : This article is licensed under the BioMed Central licence at http://www.biomedcentral.com/about/license which is similar to the 'Creative Commons Attribution Licence'. In brief you may : copy, distribute, and display the work; make derivative works; or make commercial use of the work - under the following conditions: the original author must be given credit; for any reuse or distribution, it must be made clear to others what the license terms of this work are

    Marginal hazard regression for correlated failure time data with auxiliary covariates

    Get PDF
    In many biomedical studies, it is common that due to budget constraints, the primary covariate is only collected in a randomly selected subset from the full study cohort. Often, there is an inexpensive auxiliary covariate for the primary exposure variable that is readily available for all the cohort subjects. Valid statistical methods that make use of the auxiliary information to improve study efficiency need to be developed. To this end, we develop an estimated partial likelihood approach for correlated failure time data with auxiliary information. We assume a marginal hazard model with common baseline hazard function. The asymptotic properties for the proposed estimators are developed. The proof of the asymptotic results for the proposed estimators is nontrivial since the moments used in estimating equation are not martingale-based and the classical martingale theory is not sufficient. Instead, our proofs rely on modern empirical theory. The proposed estimator is evaluated through simulation studies and is shown to have increased efficiency compared to existing methods. The proposed methods are illustrated with a data set from the Framingham study

    Spatial epidemiology and spatial ecology study of worldwide drug-resistant tuberculosis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Drug-resistant tuberculosis (DR-TB) is a major public health problem caused by various factors. It is essential to systematically investigate the epidemiological and, in particular, the ecological factors of DR-TB for its prevention and control. Studies of the ecological factors can provide information on etiology, and assist in the effective prevention and control of disease. So it is of great significance for public health to explore the ecological factors of DR-TB, which can provide guidance for formulating regional prevention and control strategies.</p> <p>Methods</p> <p>Anti-TB drug resistance data were obtained from the World Health Organization/International Union Against Tuberculosis and Lung Disease (WHO/UNION) Global Project on Anti-Tuberculosis Drug Resistance Surveillance, and data on ecological factors were collected to explore the ecological factors for DR-TB. Partial least square path modeling (PLS-PM), in combination with ordinary least squares (OLS) regression, as well as geographically weighted regression (GWR), were used to build a global and local spatial regression model between the latent synthetic DR-TB factor ("DR-TB") and latent synthetic risk factors.</p> <p>Results</p> <p>OLS regression and PLS-PM indicated a significant globally linear spatial association between "DR-TB" and its latent synthetic risk factors. However, the GWR model showed marked spatial variability across the study regions. The "TB Epidemic", "Health Service" and "DOTS (directly-observed treatment strategy) Effect" factors were all positively related to "DR-TB" in most regions of the world, while "Health Expenditure" and "Temperature" factors were negatively related in most areas of the world, and the "Humidity" factor had a negative influence on "DR-TB" in all regions of the world.</p> <p>Conclusions</p> <p>In summary, the influences of the latent synthetic risk factors on DR-TB presented spatial variability. We should formulate regional DR-TB monitoring planning and prevention and control strategies, based on the spatial characteristics of the latent synthetic risk factors and spatial variability of the local relationship between DR-TB and latent synthetic risk factors.</p

    A PLSPM-Based Test Statistic for Detecting Gene-Gene Co-Association in Genome-Wide Association Study with Case-Control Design

    Full text link
    For genome-wide association data analysis, two genes in any pathway, two SNPs in the two linked gene regions respectively or in the two linked exons respectively within one gene are often correlated with each other. We therefore proposed the concept of gene-gene co-association, which refers to the effects not only due to the traditional interaction under nearly independent condition but the correlation between two genes. Furthermore, we constructed a novel statistic for detecting gene-gene co-association based on Partial Least Squares Path Modeling (PLSPM). Through simulation, the relationship between traditional interaction and co-association was highlighted under three different types of co-association. Both simulation and real data analysis demonstrated that the proposed PLSPM-based statistic has better performance than single SNP-based logistic model, PCA-based logistic model, and other gene-based methods

    A Latent Variable Partial Least Squares Path Modeling Approach to Regional Association and Polygenic Effect with Applications to a Human Obesity Study

    Get PDF
    Genetic association studies are now routinely used to identify single nucleotide polymorphisms (SNPs) linked with human diseases or traits through single SNP-single trait tests. Here we introduced partial least squares path modeling (PLSPM) for association between single or multiple SNPs and a latent trait that can involve single or multiple correlated measurement(s). Furthermore, the framework naturally provides estimators of polygenic effect by appropriately weighting trait-attributing alleles. We conducted computer simulations to assess the performance via multiple SNPs and human obesity-related traits as measured by body mass index (BMI), waist and hip circumferences. Our results showed that the associate statistics had type I error rates close to nominal level and were powerful for a range of effect and sample sizes. When applied to 12 candidate regions in data (N = 2,417) from the European Prospective Investigation of Cancer (EPIC)-Norfolk study, a region in FTO was found to have stronger association (rs7204609∼rs9939881 at the first intron P = 4.29×10−7) than single SNP analysis (all with P>10−4) and a latent quantitative phenotype was obtained using a subset sample of EPIC-Norfolk (N = 12,559). We believe our method is appropriate for assessment of regional association and polygenic effect on a single or multiple traits
    corecore