22,139 research outputs found

    Partially Identified Prevalence Estimation under Misclassification using the Kappa Coefficient

    Get PDF
    We discuss a new strategy for prevalence estimation in the presence of misclassification. Our method is applicable when misclassification probabilities are unknown but independent replicate measurements are available. This yields the kappa coefficient, which indicates the agreement between the two measurements. From this information, a direct correction for misclassification is not feasible due to non-identifiability. However, it is possible to derive estimation intervals relying on the concept of partial identification. These intervals give interesting insights into possible bias due to misclassification. Furthermore, confidence intervals can be constructed. Our method is illustrated in several theoretical scenarios and in an example from oral health, where prevalence estimation of caries in children is the issue

    Partially identified prevalence estimation under misclassification using the kappa coefficient

    Get PDF
    AbstractWe discuss prevalence estimation under misclassification. That is we are concerned with the estimation of a proportion of units having a certain property (being diseased, showing deviant behavior, etc.) from a random sample when the true variable of interest cannot be observed, but a related proxy variable (e.g. the outcome of a diagnostic test) is available. If the misclassification probabilities were known then unbiased prevalence estimation would be possible. We focus on the frequent case where the misclassification probabilities are unknown but two independent replicate measurements have been taken. While in the traditional precise probabilistic framework a correction from this information is not possible due to non-identifiability, the imprecise probability methodology of partial identification and systematic sensitivity analysis allows to obtain valuable insights into possible bias due to misclassification. We derive tight identification intervals and corresponding confidence regions for the true prevalence, based on the often reported kappa coefficient, which condenses the information of the replicates by measuring agreement between the two measurements. Our method is illustrated in several theoretical scenarios and in an example from oral health on prevalence of caries in children

    Construction and Validation of a 14-Year Cardiovascular Risk Score for Use in the General Population: The Puras-GEVA Chart

    Get PDF
    The current cardiovascular risk tables are based on a 10-year period and therefore, do not allow for predictions in the short or medium term. Thus, we are unable to take more aggressive therapeutic decisions when this risk is very high. To develop and validate a predictive model of cardiovascular disease (CVD), to enable calculation of risk in the short, medium and long term in the general population. Cohort study with 14 years of follow-up (1992–2006) was obtained through random sampling of 342,667 inhabitants in a Spanish region. Main outcome: time-to-CVD. The sample was randomly divided into 2 parts [823 (80%), construction; 227 (20%), validation]. A stepwise Cox model was constructed to determine which variables at baseline (age, sex, blood pressure, etc) were associated with CVD. The model was adapted to a points system and risk groups based on epidemiological criteria (sensitivity and specificity) were established. The risk associated with each score was calculated every 2 years up to a maximum of 14. The estimated model was validated by calculating the C-statistic and comparison between observed and expected events. In the construction sample, 76 patients experienced a CVD during the follow-up (82 cases per 10,000 person-years). Factors in the model included sex, diabetes, left ventricular hypertrophy, occupational physical activity, age, systolic blood pressure × heart rate, number of cigarettes, and total cholesterol. Validation yielded a C-statistic of 0.886 and the comparison between expected and observed events was not significant (P: 0.49–0.75). We constructed and validated a scoring system able to determine, with a very high discriminating power, which patients will develop a CVD in the short, medium, and long term (maximum 14 years). Validation studies are needed for the model constructed.This study has been partially funded by: 1) The Community Board of Castilla-La Mancha, Regional Ministry of Health and Social Affairs (Order of July 3rd, 1992 and Order of September 14th, 1993, both published in Diario Oficial de Castilla-La Mancha, DOCM); 2) Grant from the Foundation for Health Research in Castilla-La Mancha (FISCAM), file number 03069–00

    Combining multiple observational data sources to estimate causal effects

    Full text link
    The era of big data has witnessed an increasing availability of multiple data sources for statistical analyses. We consider estimation of causal effects combining big main data with unmeasured confounders and smaller validation data with supplementary information on these confounders. Under the unconfoundedness assumption with completely observed confounders, the smaller validation data allow for constructing consistent estimators for causal effects, but the big main data can only give error-prone estimators in general. However, by leveraging the information in the big main data in a principled way, we can improve the estimation efficiencies yet preserve the consistencies of the initial estimators based solely on the validation data. Our framework applies to asymptotically normal estimators, including the commonly-used regression imputation, weighting, and matching estimators, and does not require a correct specification of the model relating the unmeasured confounders to the observed variables. We also propose appropriate bootstrap procedures, which makes our method straightforward to implement using software routines for existing estimators

    A General Framework for Updating Belief Distributions

    Full text link
    We propose a framework for general Bayesian inference. We argue that a valid update of a prior belief distribution to a posterior can be made for parameters which are connected to observations through a loss function rather than the traditional likelihood function, which is recovered under the special case of using self information loss. Modern application areas make it is increasingly challenging for Bayesians to attempt to model the true data generating mechanism. Moreover, when the object of interest is low dimensional, such as a mean or median, it is cumbersome to have to achieve this via a complete model for the whole data distribution. More importantly, there are settings where the parameter of interest does not directly index a family of density functions and thus the Bayesian approach to learning about such parameters is currently regarded as problematic. Our proposed framework uses loss-functions to connect information in the data to functionals of interest. The updating of beliefs then follows from a decision theoretic approach involving cumulative loss functions. Importantly, the procedure coincides with Bayesian updating when a true likelihood is known, yet provides coherent subjective inference in much more general settings. Connections to other inference frameworks are highlighted.Comment: This is the pre-peer reviewed version of the article "A General Framework for Updating Belief Distributions", which has been accepted for publication in the Journal of Statistical Society - Series B. This article may be used for non-commercial purposes in accordance with Wiley Terms and Conditions for Self-Archivin

    Prognostic Predictive Model to Estimate the Risk of Multiple Chronic Diseases: Constructing Copulas Using Electronic Medical Record Data

    Get PDF
    Introduction: Multimorbidity, the presence of two or more chronic diseases in an individual, is a pressing medical condition. Novel prevention methods are required to reduce the incidence of multimorbidity. Prognostic predictive models estimate a patient’s risk of developing chronic disease. This thesis developed a single predictive model for three diseases associated with multimorbidity: diabetes, hypertension, and osteoarthritis. Methods: Univariate logistic regression models were constructed, followed by an analysis of the dependence that existed using copulas. All analyses were based on data from the Canadian Primary Care Sentinel Surveillance Network. Results: All univariate models were highly predictive, as demonstrated by their discrimination and calibration. Copula models revealed the dependence between each disease pair. Discussion: By estimating the risk of multiple chronic diseases, prognostic predictive models may enable the prevention of chronic disease through identification of high-risk individuals or delivery of individualized risk assessments to inform patient and health care provider decision-making

    A radiologic-laparoscopic model to predict suboptimal (or complete and optimal) debulking surgery in advanced ovarian cancer: a pilot study

    Get PDF
    Introduction: Medical models assist clinicians in making diagnostic and prognostic decisions in complex situations. In advanced ovarian cancer, medical models could help prevent unnecessary exploratory surgery. We designed two models to predict suboptimal or complete and optimal cytoreductive surgery in patients with advanced ovarian cancer. Methods: We collected clinical, pathological, surgical, and residual tumor data from 110 patients with advanced ovarian cancer. Computed tomographic and laparoscopic data from these patients were used to determine peritoneal cancer index (PCI) and lesion size score. These data were then used to construct two-by-two contingency tables and our two predictive models. Each model included three risk score levels; the R4 model also included operative PCI, while the R3 model did not. Finally, we used the original patient data to validate the models (narrow validation). Results: Our models predicted suboptimal or complete and optimal cytoreductive surgery with a sensitivity of 83% (R4 model) and 69% (R3 model). Our results also showed that PCI>20 was a major risk factor for unresectability. Conclusion: Our medical models successfully predicted suboptimal or complete and optimal cytoreductive surgery in 110 patients with advanced ovarian cancer. Our models are easy to construct, based on readily available laboratory test data, simple to use clinically, and could reduce unnecessary exploratory surgery in this patient group
    corecore