1,190 research outputs found

    Analysis of Case-Control Association Studies: SNPs, Imputation and Haplotypes

    Get PDF

    The genetic architecture of type 2 diabetes

    Get PDF
    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

    The genetic architecture of type 2 diabetes

    Get PDF
    The genetic architecture of common traits, including the number, frequency, and effect sizes of inherited variants that contribute to individual risk, has been long debated. Genome-wide association studies have identified scores of common variants associated with type 2 diabetes, but in aggregate, these explain only a fraction of heritability. To test the hypothesis that lower-frequency variants explain much of the remainder, the GoT2D and T2D-GENES consortia performed whole genome sequencing in 2,657 Europeans with and without diabetes, and exome sequencing in a total of 12,940 subjects from five ancestral groups. To increase statistical power, we expanded sample size via genotyping and imputation in a further 111,548 subjects. Variants associated with type 2 diabetes after sequencing were overwhelmingly common and most fell within regions previously identified by genome-wide association studies. Comprehensive enumeration of sequence variation is necessary to identify functional alleles that provide important clues to disease pathophysiology, but large-scale sequencing does not support a major role for lower-frequency variants in predisposition to type 2 diabetes

    Recent Developments in the Econometrics of Program Evaluation

    Get PDF
    Many empirical questions in economics and other social sciences depend on causal effects of programs or policies. In the last two decades much research has been done on the econometric and statistical analysis of the effects of such programs or treatments. This recent theoretical literature has built on, and combined features of, earlier work in both the statistics and econometrics literatures. It has by now reached a level of maturity that makes it an important tool in many areas of empirical research in economics, including labor economics, public finance, development economics, industrial organization and other areas of empirical micro-economics. In this review we discuss some of the recent developments. We focus primarily on practical issues for empirical researchers, as well as provide a historical overview of the area and give references to more technical research.program evaluation, causality, unconfoundedness, Rubin Causal Model, potential outcomes, instrumental variables

    Fitting stratified proportional odds models by amalgamating conditional likelihoods

    Full text link
    Classical methods for fitting a varying intercept logistic regression model to stratified data are based on the conditional likelihood principle to eliminate the stratum-specific nuisance parameters. When the outcome variable has multiple ordered categories, a natural choice for the outcome model is a stratified proportional odds or cumulative logit model. However, classical conditioning techniques do not apply to the general K -category cumulative logit model ( K >2) with varying stratum-specific intercepts as there is no reduction due to sufficiency; the nuisance parameters remain in the conditional likelihood. We propose a methodology to fit stratified proportional odds model by amalgamating conditional likelihoods obtained from all possible binary collapsings of the ordinal scale. The method allows for categorical and continuous covariates in a general regression framework. We provide a robust sandwich estimate of the variance of the proposed estimator. For binary exposures, we show equivalence of our approach to the estimators already proposed in the literature. The proposed recipe can be implemented very easily in standard software. We illustrate the methods via three real data examples related to biomedical research. Simulation results comparing the proposed method with a random effects model on the stratification parameters are also furnished. Copyright © 2008 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/60967/1/3325_ftp.pd

    A Frailty-Model-Based Approach to Estimating the Age-Dependent Penetrance Function of Candidate Genes Using Population-Based Case-Control Study Designs: An Application to Data on the BRCA1 Gene.

    Get PDF
    Summary. The population-based case-control study design is perhaps one of, if not the most, commonly used designs for investigating the genetic and environmental contributions to disease risk in epidemiological studies. Ages at onset and disease status of family members are routinely and systematically collected from the participants in this design. Considering age at onset in relatives as an outcome, this article is focused on using the family history information to obtain the hazard function, i.e., age-dependent penetrance function, of candidate genes from case-control studies. A frailty-model-based approach is proposed to accommodate the shared risk among family members that is not accounted for by observed risk factors. This approach is further extended to accommodate missing genotypes in family members and a two-phase case-control sampling design. Simulation results show that the proposed method performs well in realistic settings. Finally, a population-based two-phase case-control breast cancer study of the BRCA1 gene is used to illustrate the method

    Combining multiple observational data sources to estimate causal effects

    Full text link
    The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators
    corecore