721 research outputs found

    PredictABEL: an R package for the assessment of risk prediction models

    Get PDF
    The rapid identification of genetic markers for multifactorial diseases from genome-wide association studies is fuelling interest in investigating the predictive ability and health care utility of genetic risk models. Various measures are available for the assessment of risk prediction models, each addressing a different aspect of performance and utility. We developed PredictABEL, a package in R that covers descriptive tables, measures and figures that are used in the analysis of risk prediction studies such as measures of model fit, predictive ability and clinical utility, and risk distributions, calibration plot and the receiver operating characteristic plot. Tables and figures are saved as separate files in a user-specified format, which include publication-quality EPS and TIFF formats. All figures are available in a ready-made layout, but they can be customized to the preferences of the user. The package has been developed for the analysis of genetic risk prediction studies, but can also be used for studies that only include non-genetic risk factors. PredictABEL is freely available at the websites of GenABEL (http://www.genabel.org) and CRAN (http://cran.r-project.org/)

    Association between Type 2 Diabetes Loci and Measures of Fatness

    Get PDF
    Background: Type 2 diabetes (T2D) is a metabolic disorder characterized by disturbances of carbohydrate, fat and protein metabolism and insulin resistance. The majority of T2D patients are obese and obesity by itself may be a cause of insulin resistance. Our aim was to evaluate whether the recently identified T2D risk alleles are associated with human measures of fatness as characterized with Dual Energy X-ray Absorptiometry (DEXA). Methodology/Principal Findings: Genotypes and phenotypes of approximately 3,000 participants from cross-sectional ERF study were analyzed. Nine single nucleotide polymorphisms (SNPs) in CDKN2AB, CDKAL1, FTO, HHEX, IGF2BP2, KCNJ11, PPARG, SLC30A8 and TCF7L2 were genotyped. We used linear regression to study association between individual SNPs and the combined allelic risk score with body mass index (BMI), fat mass index (FMI), fat percentage (FAT), waist circumference (WC) and waist to hip ratio (WHR). Significant association was observed between rs8050136 (FTO) and BMI (p = 0.003), FMI (p = 0.007) and WC (p = 0.03); fat percentage was borderline significant (p = 0.053). No other SNPs alone or combined in a risk score demonstrated significant association to the measures of fatness. Conclusions/Significance: From the recently identified T2D risk variants only the risk variant of the FTO gene (rs8050136) showed statistically significant association with BMI, FMI, and WC

    ParallABEL: an R library for generalized parallelization of genome-wide association studies

    Get PDF
    Background: Genome-Wide Association (GWA) analysis is a powerful method for identifying loci associated with complex traits and drug response. Parts of GWA analyses, especially those involving thousands of individuals and consuming hours to months, will benefit from parallel computation. It is arduous acquiring the necessary programming skills to correctly partition and distribute data, control and monitor tasks on clustered computers, and merge output files.Results: Most components of GWA analysis can be divided into four groups based on the types of input data and statistical outputs. The first group contains statistics computed for a particular Single Nucleotide Polymorphism (SNP), or trait, such as SNP characterization statistics or association test statistics. The input data of this group includes the SNPs/traits. The second group concerns statistics characterizing an individual in a study, for example, the summary statistics of genotype quality for each sample. The input data of this group includes individuals. The third group consists of pair-wise statistics derived from analyses between each pair of individuals in the study, for example genome-wide identity-by-state or genomic kinship analyses. The input data of this group includes pairs of SNPs/traits. The final group concerns pair-wise statistics derived for pairs of SNPs, such as the linkage disequilibrium characterisation. The input data of this group includes pairs of individuals. We developed the ParallABEL library, which utilizes the Rmpi library, to parallelize these four types of computations. ParallABEL library is not only aimed at GenABEL, but may also be employed to parallelize various GWA packages in R. The data set from the North American Rheumatoid Arthritis Consortium (NARAC) includes 2,062 individuals with 545,080, SNPs' genotyping, was used to measure ParallABEL performance. Almost perfect speed-up was achieved for many types of analyses. For example, the computing time for the identity-by-state matrix was linearly reduced from approximately eight hours to one hour when ParallABEL employed eight processors.Conclusions: Executing genome-wide association analysis using the ParallABEL library on a computer cluster is an effective way to boost performance, and simplify the parallelization of GWA studies. ParallABEL is a user-friendly parallelization of GenABEL

    ProbABEL package for genome-wide association analysis of imputed data

    Get PDF
    Background: Over the last few years, genome-wide association (GWA) studies became a tool of choice for the identification of loci associated with complex traits. Currently, imputed single nucleotide polymorphisms (SNP) data are frequently used in GWA analyzes. Correct analysis of imputed data calls for the implementation of specific methods which take genotype imputation uncertainty into account.Results: We developed the ProbABEL software package for the analysis of genome-wide imputed SNP data and quantitative, binary, and time-till-event outcomes under linear, logistic, and Cox proportional hazards models, respectively. For quantitative traits, the package also implements a fast two-step mixed model-based score test for association in samples with differential relationships, facilitating analysis in family-based studies, studies performed in human genetically isolated populations and outbred animal populations.Conclusions: ProbABEL package provides fast efficient way to analyze imputed data in genome-wide context and will facilitate future identification of complex trait loci

    Are your covariates under control? How normalization can re-introduce covariate effects

    Get PDF
    Many statistical tests rely on the assumption that the residuals of a model are normally distributed. Rank-based inverse normal transformation (INT) of the dependent variable is one of the most popular approaches to satisfy the normality assumption. Studies regularly adjust for covariates and then normalize the residuals. This study investigated the effect of regressing covariates against the dependent variable and then applying rank-based INT to the residuals. The correlation between the dependent variable and covariates at each stage of processing was assessed. An alternative approach was tested of applying rank-based INT to the dependent variable before regressing covariates was tested. Analyses based on both simulated and real data examples demonstrated that applying rank-based INT to the dependent variable residuals after regressing out covariates re-introduces a linear correlation between the dependent variable and covariates in almost all situations. This will increase type-1 errors and reduce power. Our proposed alternative approach, where rank-based INT was applied prior to controlling for covariate effects, gave residuals that were normally distributed and linearly uncorrelated with covariates. This approach is therefore recommended
    corecore