18 research outputs found

    Statistical Inference with Multiple Data Sources

    Get PDF
    In this dissertation, I develop statistical methods to address three important scientific problems. A common theme behind these methods is the stitching together of multiple data sources to address the scientific questions of interest. In the first paper (Chapter 2), I propose a novel statistical framework to learn about the association between a secondary outcome (e.g., obesity) and a genetic risk factor (e.g., ORMDL3 locus on Chromosome 17) from a genetic case-control study based on asthma. The method involves the use of asthma prevalence information from a relevant sample survey. In the second paper (Chapter 3), I develop a method to evaluate whether there are adverse health consequences of kidney donation. To address this question, I use data on donors from the Live Kidney Donors Database from the Johns Hopkins Hospital and on healthy non-donors from the Atherosclerosis Risk in Communities (ARIC) and Coronary Artery Risk Development in Young Adults (CARDIA) studies. In the third paper (Chapter 4), I propose a covariate-adjusted method for testing the difference between two treatment groups where the measured outcome is a function. The proposed method utilizes information from repeated measures of daily oxygen consumption function and scalar body composition measures (e.g., lean mass, fat mass) on two groups mice, one with and one without a specific gene

    Wasm-iCARE: a portable and privacy-preserving web module to build, validate, and apply absolute risk models

    Full text link
    Objective: Absolute risk models estimate an individual's future disease risk over a specified time interval. Applications utilizing server-side risk tooling, such as the R-based iCARE (R-iCARE), to build, validate, and apply absolute risk models, face serious limitations in portability and privacy due to their need for circulating user data in remote servers for operation. Our objective was to overcome these limitations. Materials and Methods: We refactored R-iCARE into a Python package (Py-iCARE) then compiled it to WebAssembly (Wasm-iCARE): a portable web module, which operates entirely within the privacy of the user's device. Results: We showcase the portability and privacy of Wasm-iCARE through two applications: for researchers to statistically validate risk models, and to deliver them to end-users. Both applications run entirely on the client-side, requiring no downloads or installations, and keeps user data on-device during risk calculation. Conclusions: Wasm-iCARE fosters accessible and privacy-preserving risk tools, accelerating their validation and delivery.Comment: 10 pages, 2 figure

    Combined Associations of a Polygenic Risk Score and Classical Risk Factors With Breast Cancer Risk.

    Get PDF
    We evaluated the joint associations between a new 313-variant PRS (PRS313) and questionnaire-based breast cancer risk factors for women of European ancestry, using 72 284 cases and 80 354 controls from the Breast Cancer Association Consortium. Interactions were evaluated using standard logistic regression and a newly developed case-only method for breast cancer risk overall and by estrogen receptor status. After accounting for multiple testing, we did not find evidence that per-standard deviation PRS313 odds ratio differed across strata defined by individual risk factors. Goodness-of-fit tests did not reject the assumption of a multiplicative model between PRS313 and each risk factor. Variation in projected absolute lifetime risk of breast cancer associated with classical risk factors was greater for women with higher genetic risk (PRS313 and family history) and, on average, 17.5% higher in the highest vs lowest deciles of genetic risk. These findings have implications for risk prevention for women at increased risk of breast cancer

    Assessment of polygenic architecture and risk prediction based on common variants across fourteen cancers

    Get PDF
    Abstract: Genome-wide association studies (GWAS) have led to the identification of hundreds of susceptibility loci across cancers, but the impact of further studies remains uncertain. Here we analyse summary-level data from GWAS of European ancestry across fourteen cancer sites to estimate the number of common susceptibility variants (polygenicity) and underlying effect-size distribution. All cancers show a high degree of polygenicity, involving at a minimum of thousands of loci. We project that sample sizes required to explain 80% of GWAS heritability vary from 60,000 cases for testicular to over 1,000,000 cases for lung cancer. The maximum relative risk achievable for subjects at the 99th risk percentile of underlying polygenic risk scores (PRS), compared to average risk, ranges from 12 for testicular to 2.5 for ovarian cancer. We show that PRS have potential for risk stratification for cancers of breast, colon and prostate, but less so for others because of modest heritability and lower incidence

    Statistical Inference with Multiple Data Sources

    No full text
    In this dissertation, I develop statistical methods to address three important scientific problems. A common theme behind these methods is the stitching together of multiple data sources to address the scientific questions of interest. In the first paper (Chapter 2), I propose a novel statistical framework to learn about the association between a secondary outcome (e.g., obesity) and a genetic risk factor (e.g., ORMDL3 locus on Chromosome 17) from a genetic case-control study based on asthma. The method involves the use of asthma prevalence information from a relevant sample survey. In the second paper (Chapter 3), I develop a method to evaluate whether there are adverse health consequences of kidney donation. To address this question, I use data on donors from the Live Kidney Donors Database from the Johns Hopkins Hospital and on healthy non-donors from the Atherosclerosis Risk in Communities (ARIC) and Coronary Artery Risk Development in Young Adults (CARDIA) studies. In the third paper (Chapter 4), I propose a covariate-adjusted method for testing the difference between two treatment groups where the measured outcome is a function. The proposed method utilizes information from repeated measures of daily oxygen consumption function and scalar body composition measures (e.g., lean mass, fat mass) on two groups mice, one with and one without a specific gene

    iCARE: An R package to build, validate and apply absolute risk models.

    No full text
    This report describes an R package, called the Individualized Coherent Absolute Risk Estimator (iCARE) tool, that allows researchers to build and evaluate models for absolute risk and apply them to estimate an individual's risk of developing disease during a specified time interval based on a set of user defined input parameters. An attractive feature of the software is that it gives users flexibility to update models rapidly based on new knowledge on risk factors and tailor models to different populations by specifying three input arguments: a model for relative risk, an age-specific disease incidence rate and the distribution of risk factors for the population of interest. The tool can handle missing information on risk factors for individuals for whom risks are to be predicted using a coherent approach where all estimates are derived from a single model after appropriate model averaging. The software allows single nucleotide polymorphisms (SNPs) to be incorporated into the model using published odds ratios and allele frequencies. The validation component of the software implements the methods for evaluation of model calibration, discrimination and risk-stratification based on independent validation datasets. We provide an illustration of the utility of iCARE for building, validating and applying absolute risk models using breast cancer as an example
    corecore