38 research outputs found

    Specifying Exposure Classification Parameters for Sensitivity Analysis: Family Breast Cancer History

    Get PDF
    One of the challenges to implementing sensitivity analysis for exposure misclassification is the process of specifying the classification proportions (eg, sensitivity and specificity). The specification of these assignments is guided by three sources of information: estimates from validation studies, expert judgment, and numerical constraints given the data. The purpose of this teaching paper is to describe the process of using validation data and expert judgment to adjust a breast cancer odds ratio for misclassification of family breast cancer history. The parameterization of various point estimates and prior distributions for sensitivity and specificity were guided by external validation data and expert judgment. We used both nonprobabilistic and probabilistic sensitivity analyses to investigate the dependence of the odds ratio estimate on the classification error. With our assumptions, a wider range of odds ratios adjusted for family breast cancer history misclassification resulted than portrayed in the conventional frequentist confidence interval.Children's Cancer Research Fund, Minneapolis, MN, US

    Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction

    Full text link
    We consider analysis of data from an unmatched case–control study design with a binary genetic factor and a binary environmental exposure when both genetic and environmental exposures could be potentially misclassified. We devise an estimation strategy that corrects for misclassification errors and also exploits the gene–environment independence assumption. The proposed corrected point estimates and confidence intervals for misclassified data reduce back to standard analytical forms as the misclassification error rates go to zero. We illustrate the methods by simulating unmatched case–control data sets under varying levels of disease–exposure association and with different degrees of misclassification. A real data set on a case–control study of colorectal cancer where a validation subsample is available for assessing genotyping error is used to illustrate our methods. Copyright © 2007 John Wiley & Sons, Ltd.Peer Reviewedhttp://deepblue.lib.umich.edu/bitstream/2027.42/58640/1/3044_ftp.pd

    Methods to account for outcome misclassification in epidemiology

    Get PDF
    Outcome misclassification occurs when the endpoint of an epidemiologic study is measured with error. Outcome misclassification is common in epidemiology but is frequently ignored in the analysis of exposure-outcome relationships. We focus on two common types of outcomes in epidemiology that are subject to mismeasurement: participant-reported outcomes and cause-specific mortality. In this work, we leverage information on the misclassification probabilities obtained from internal validation studies, external validation studies, and expert opinion to account for outcome misclassification in various epidemiologic settings. This work describes the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach worked well to account for bias due to outcome misclassification in the odds ratio and risk ratio comparing herpes simplex virus recurrence between participants randomized to receive acyclovir or placebo in the Herpetic Eye Disease Study. In simulations, multiple imputation had greater statistical power than analysis restricted to the validation subgroup, yet both provided unbiased estimates of the odds ratio. Modified maximum likelihood and Bayesian methods are used to explore the effects of outcome misclassification in situations with no validation subgroup. In a cohort of textile workers exposed to asbestos in South Carolina, we perform sensitivity analysis using modified maximum likelihood to estimate the rate ratio of lung cancer death per 100 fiber-years/mL asbestos exposure under varying assumptions about sensitivity and specificity. When specificity of outcome classification was nearly perfect, the modified maximum likelihood approach produced estimates that were similar to analyses that ignore outcome misclassification. Uncertainty in the misclassification parameters is expressed by placing informative prior distributions on sensitivity and specificity in Bayesian analysis. Because, in our examples, lung cancer death is unlikely to be misclassified, posterior estimates are similar to standard estimates. However, modified maximum likelihood and Bayesian methods are needed to verify the robustness of standard estimates, and these approaches will provide unbiased estimates in settings with more misclassification. This work has highlighted the potential for bias due to outcome misclassification and described three flexible tools to account for misclassification. Use of such techniques will improve inference from epidemiologic studies.Doctor of Philosoph

    Bias Analysis for Logistic Regression with a Misclassified Multi-categorical Exposure

    Get PDF
    In epidemiological studies, it is one common issue that the collected data may not be perfect due to technical and/or nancial di culties in reality. It is well known that ignoring such imperfections may lead to misleading inference results (e.g., fail to detect the actual association between two variables). Davidov et al.(2003) have studied asymptotic biases caused by misclassi cation in a binary exposure in a logistic regression context. The aim of this thesis is to extend the work of Davidov et al. to a multi-categorical scenario. I examine asymptotic biases on regression coe cients of a logistic regression model when the multicategorical exposure is subject to misclassi cation. The asymptotic results may provide insight guide for large scale studies when considering whether bias corrections would be necessary. To better understand the asymptotic results, I also conduct some numerical examples and simulation studies

    Accounting for Misclassified Outcomes in Binary Regression Models Using Multiple Imputation With Internal Validation Data

    Get PDF
    Outcome misclassification is widespread in epidemiology, but methods to account for it are rarely used. We describe the use of multiple imputation to reduce bias when validation data are available for a subgroup of study participants. This approach is illustrated using data from 308 participants in the multicenter Herpetic Eye Disease Study between 1992 and 1998 (48% female; 85% white; median age, 49 years). The odds ratio comparing the acyclovir group with the placebo group on the gold-standard outcome (physician-diagnosed herpes simplex virus recurrence) was 0.62 (95% confidence interval (CI): 0.35, 1.09). We masked ourselves to physician diagnosis except for a 30% validation subgroup used to compare methods. Multiple imputation (odds ratio (OR) = 0.60; 95% CI: 0.24, 1.51) was compared with naive analysis using self-reported outcomes (OR = 0.90; 95% CI: 0.47, 1.73), analysis restricted to the validation subgroup (OR = 0.57; 95% CI: 0.20, 1.59), and direct maximum likelihood (OR = 0.62; 95% CI: 0.26, 1.53). In simulations, multiple imputation and direct maximum likelihood had greater statistical power than did analysis restricted to the validation subgroup, yet all 3 provided unbiased estimates of the odds ratio. The multiple-imputation approach was extended to estimate risk ratios using log-binomial regression. Multiple imputation has advantages regarding flexibility and ease of implementation for epidemiologists familiar with missing data methods

    Assessing model stability and sensitivity

    Get PDF
    Statistical inferences from observed studies with error prone measurements are often biased. The bias is a consequence of the deviation of the probability distribution that generates the observed data from that which generates the true unobserved data. For example, in binary data where measurement error is a misclassification problem, an observation with a true value of 0 is observed as 1 or vice versa. Past research in this framework often focuses on the use of a validation study to account for measurement error in the main study. A shortcoming of this approach is a lack of validation data to inform the correction of measurement error in the main study. Another challenge is the non availability of ready to use statistical software in implementation. To overcome some of the challenges of current approach to the analysis of binary data with measurement error, we investigate the performance of the naive logistic regression model, which we refer to as the assumed model, against a modified model. By modified model we mean an extended logistic regression model, where we introduced the probability of measurement error as a modification weight. The modification weight introduced is in the direction of the nondifferential and differential misclassification pattern. Following Cook’s (1986) normal curvature approach, we derive an influence measure for the special cases of when the presence of binary outcome Y∗i = 1 is error prone, and also for the absence of the outcome Y∗i = 0. The method is applied to a dataset from the rehabilitation programme study for juvenile offenders, where Y∗i = 0 is measured with error. As different compositions of measured values of binary outcomes often exist in real studies, hence, we further conducted a simulation study for different scenarios for the special cases Y∗i = 1 and Y∗i = 0. Our theoretical results show that when there is no information about the error size, the assumed model appears to be the most stable model compared to the modified models. But, the assumed model estimates could be biased when measurement error is present. Thus, it is important to investigate model stability and how model estimates behave within a plausible range of error size, and report all the findings

    Use of pathway information in molecular epidemiology

    Full text link

    Modelling non-linear exposure-disease relationships in a large individual participant meta-analysis allowing for the effects of exposure measurement error

    Get PDF
    This thesis was motivated by data from the Emerging Risk Factors Collaboration (ERFC), a large individual participant data (IPD) meta-analysis of risk factors for coronary heart disease(CHD). Cardiovascular disease is the largest cause of death in almost all countries in the world, therefore it is important to be able to characterise the shape of risk factor–CHD relationships. Many of the risk factors for CHD considered by the ERFC are subject to substantial measurement error, and their relationship with CHD non-linear. We firstly consider issues associated with modelling the risk factor–disease relationship in a single study, before using meta-analysis to combine relationships across studies. It is well known that classical measurement error generally attenuates linear exposure–disease relationships, however its precise effect on non-linear relationships is less well understood. We investigate the effect of classical measurement error on the shape of exposure–disease relationships that are commonly encountered in epidemiological studies, and then consider methods for correcting for classical measurement error. We propose the application of a widely used correction method, regression calibration, to fractional polynomial models. We also consider the effects of non-classical error on the observed exposure–disease relationship, and the impact on our correction methods when we erroneously assume classical measurement error. Analyses performed using categorised continuous exposures are common in epidemiology. We show that MacMahon’s method for correcting for measurement error in analyses that use categorised continuous exposures, although simple, does not provide the correct shape for nonlinear exposure–disease relationships. We perform a simulation study to compare alternative methods for categorised continuous exposures. Meta-analysis is the statistical synthesis of results from a number of studies addressing similar research hypotheses. The use of IPD is the gold standard approach because it allows for consistent analysis of the exposure–disease relationship across studies. Methods have recently been proposed for combining non-linear relationships across studies. We discuss these methods, extend them to P-spline models, and consider alternative methods of combining relationships across studies. We apply the methods developed to the relationships of fasting blood glucose and lipoprotein(a) with CHD, using data from the ERFC.This work was supported by the Medical Research Counci
    corecore