22,139 research outputs found
Partially Identified Prevalence Estimation under Misclassification using the Kappa Coefficient
We discuss a new strategy for prevalence estimation in the presence of misclassification. Our method is applicable when misclassification probabilities are unknown but independent replicate measurements are available. This yields the kappa coefficient, which indicates the agreement between the two measurements. From this information, a direct correction for misclassification is not feasible due to non-identifiability. However, it is possible to derive estimation intervals relying on the concept of partial identification. These intervals give interesting insights into possible bias due to misclassification. Furthermore, confidence intervals can be constructed. Our method is illustrated in several theoretical scenarios and in an example from oral health, where prevalence estimation of caries in children is the issue
Partially identified prevalence estimation under misclassification using the kappa coefficient
AbstractWe discuss prevalence estimation under misclassification. That is we are concerned with the estimation of a proportion of units having a certain property (being diseased, showing deviant behavior, etc.) from a random sample when the true variable of interest cannot be observed, but a related proxy variable (e.g. the outcome of a diagnostic test) is available. If the misclassification probabilities were known then unbiased prevalence estimation would be possible. We focus on the frequent case where the misclassification probabilities are unknown but two independent replicate measurements have been taken. While in the traditional precise probabilistic framework a correction from this information is not possible due to non-identifiability, the imprecise probability methodology of partial identification and systematic sensitivity analysis allows to obtain valuable insights into possible bias due to misclassification. We derive tight identification intervals and corresponding confidence regions for the true prevalence, based on the often reported kappa coefficient, which condenses the information of the replicates by measuring agreement between the two measurements. Our method is illustrated in several theoretical scenarios and in an example from oral health on prevalence of caries in children
Construction and Validation of a 14-Year Cardiovascular Risk Score for Use in the General Population: The Puras-GEVA Chart
The current cardiovascular risk tables are based on a 10-year period and therefore, do not allow for predictions in the short or medium term. Thus, we are unable to take more aggressive therapeutic decisions when this risk is very high. To develop and validate a predictive model of cardiovascular disease (CVD), to enable calculation of risk in the short, medium and long term in the general population. Cohort study with 14 years of follow-up (1992–2006) was obtained through random sampling of 342,667 inhabitants in a Spanish region. Main outcome: time-to-CVD. The sample was randomly divided into 2 parts [823 (80%), construction; 227 (20%), validation]. A stepwise Cox model was constructed to determine which variables at baseline (age, sex, blood pressure, etc) were associated with CVD. The model was adapted to a points system and risk groups based on epidemiological criteria (sensitivity and specificity) were established. The risk associated with each score was calculated every 2 years up to a maximum of 14. The estimated model was validated by calculating the C-statistic and comparison between observed and expected events. In the construction sample, 76 patients experienced a CVD during the follow-up (82 cases per 10,000 person-years). Factors in the model included sex, diabetes, left ventricular hypertrophy, occupational physical activity, age, systolic blood pressure × heart rate, number of cigarettes, and total cholesterol. Validation yielded a C-statistic of 0.886 and the comparison between expected and observed events was not significant (P: 0.49–0.75). We constructed and validated a scoring system able to determine, with a very high discriminating power, which patients will develop a CVD in the short, medium, and long term (maximum 14 years). Validation studies are needed for the model constructed.This study has been partially funded by: 1) The Community Board of Castilla-La Mancha, Regional Ministry of Health and Social Affairs (Order of July 3rd, 1992 and Order of September 14th, 1993, both published in Diario Oficial de Castilla-La Mancha, DOCM); 2) Grant from the Foundation for Health Research in Castilla-La Mancha (FISCAM), file number 03069–00
Combining multiple observational data sources to estimate causal effects
The era of big data has witnessed an increasing availability of multiple data
sources for statistical analyses. We consider estimation of causal effects
combining big main data with unmeasured confounders and smaller validation data
with supplementary information on these confounders. Under the unconfoundedness
assumption with completely observed confounders, the smaller validation data
allow for constructing consistent estimators for causal effects, but the big
main data can only give error-prone estimators in general. However, by
leveraging the information in the big main data in a principled way, we can
improve the estimation efficiencies yet preserve the consistencies of the
initial estimators based solely on the validation data. Our framework applies
to asymptotically normal estimators, including the commonly-used regression
imputation, weighting, and matching estimators, and does not require a correct
specification of the model relating the unmeasured confounders to the observed
variables. We also propose appropriate bootstrap procedures, which makes our
method straightforward to implement using software routines for existing
estimators
A General Framework for Updating Belief Distributions
We propose a framework for general Bayesian inference. We argue that a valid
update of a prior belief distribution to a posterior can be made for parameters
which are connected to observations through a loss function rather than the
traditional likelihood function, which is recovered under the special case of
using self information loss. Modern application areas make it is increasingly
challenging for Bayesians to attempt to model the true data generating
mechanism. Moreover, when the object of interest is low dimensional, such as a
mean or median, it is cumbersome to have to achieve this via a complete model
for the whole data distribution. More importantly, there are settings where the
parameter of interest does not directly index a family of density functions and
thus the Bayesian approach to learning about such parameters is currently
regarded as problematic. Our proposed framework uses loss-functions to connect
information in the data to functionals of interest. The updating of beliefs
then follows from a decision theoretic approach involving cumulative loss
functions. Importantly, the procedure coincides with Bayesian updating when a
true likelihood is known, yet provides coherent subjective inference in much
more general settings. Connections to other inference frameworks are
highlighted.Comment: This is the pre-peer reviewed version of the article "A General
Framework for Updating Belief Distributions", which has been accepted for
publication in the Journal of Statistical Society - Series B. This article
may be used for non-commercial purposes in accordance with Wiley Terms and
Conditions for Self-Archivin
Prognostic Predictive Model to Estimate the Risk of Multiple Chronic Diseases: Constructing Copulas Using Electronic Medical Record Data
Introduction: Multimorbidity, the presence of two or more chronic diseases in an individual, is a pressing medical condition. Novel prevention methods are required to reduce the incidence of multimorbidity. Prognostic predictive models estimate a patient’s risk of developing chronic disease. This thesis developed a single predictive model for three diseases associated with multimorbidity: diabetes, hypertension, and osteoarthritis.
Methods: Univariate logistic regression models were constructed, followed by an analysis of the dependence that existed using copulas. All analyses were based on data from the Canadian Primary Care Sentinel Surveillance Network.
Results: All univariate models were highly predictive, as demonstrated by their discrimination and calibration. Copula models revealed the dependence between each disease pair.
Discussion: By estimating the risk of multiple chronic diseases, prognostic predictive models may enable the prevention of chronic disease through identification of high-risk individuals or delivery of individualized risk assessments to inform patient and health care provider decision-making
A radiologic-laparoscopic model to predict suboptimal (or complete and optimal) debulking surgery in advanced ovarian cancer: a pilot study
Introduction: Medical models assist clinicians in making diagnostic and prognostic decisions in complex situations. In advanced ovarian cancer, medical models could help prevent unnecessary exploratory surgery. We designed two models to predict suboptimal or complete and optimal cytoreductive surgery in patients with advanced ovarian cancer.
Methods: We collected clinical, pathological, surgical, and residual tumor data from 110 patients with advanced ovarian cancer. Computed tomographic and laparoscopic data from these patients were used to determine peritoneal cancer index (PCI) and lesion size score. These data were then used to construct two-by-two contingency tables and our two predictive models. Each model included three risk score levels; the R4 model also included operative PCI, while the R3 model did not. Finally, we used the original patient data to validate the models (narrow validation).
Results: Our models predicted suboptimal or complete and optimal cytoreductive surgery with a sensitivity of 83% (R4 model) and 69% (R3 model). Our results also showed that PCI>20 was a major risk factor for unresectability.
Conclusion: Our medical models successfully predicted suboptimal or complete and optimal cytoreductive surgery in 110 patients with advanced ovarian cancer. Our models are easy to construct, based on readily available laboratory test data, simple to use clinically, and could reduce unnecessary exploratory surgery in this patient group
- …