73,106 research outputs found

    Approaches to canine health surveillance

    Get PDF
    Effective canine health surveillance systems can be used to monitor disease in the general population, prioritise disorders for strategic control and focus clinical research, and to evaluate the success of these measures. The key attributes for optimal data collection systems that support canine disease surveillance are representativeness of the general population, validity of disorder data and sustainability. Limitations in these areas present as selection bias, misclassification bias and discontinuation of the system respectively. Canine health data sources are reviewed to identify their strengths and weaknesses for supporting effective canine health surveillance. Insurance data benefit from large and well-defined denominator populations but are limited by selection bias relating to the clinical events claimed and animals covered. Veterinary referral clinical data offer good reliability for diagnoses but are limited by referral bias for the disorders and animals included. Primary-care practice data have the advantage of excellent representation of the general dog population and recording at the point of care by veterinary professionals but may encounter misclassification problems and technical difficulties related to management and analysis of large datasets. Questionnaire surveys offer speed and low cost but may suffer from low response rates, poor data validation, recall bias and ill-defined denominator population information. Canine health scheme data benefit from well-characterised disorder and animal data but reflect selection bias during the voluntary submissions process. Formal UK passive surveillance systems are limited by chronic under-reporting and selection bias. It is concluded that active collection systems using secondary health data provide the optimal resource for canine health surveillance

    Partially Identified Prevalence Estimation under Misclassification using the Kappa Coefficient

    Get PDF
    We discuss a new strategy for prevalence estimation in the presence of misclassification. Our method is applicable when misclassification probabilities are unknown but independent replicate measurements are available. This yields the kappa coefficient, which indicates the agreement between the two measurements. From this information, a direct correction for misclassification is not feasible due to non-identifiability. However, it is possible to derive estimation intervals relying on the concept of partial identification. These intervals give interesting insights into possible bias due to misclassification. Furthermore, confidence intervals can be constructed. Our method is illustrated in several theoretical scenarios and in an example from oral health, where prevalence estimation of caries in children is the issue

    Matched-Pair Studies with Misclassified Ordinal Data

    Get PDF
    The problem of matched-pair studies with misclassified ordinal data is considered. Misclassification is assumed to occur only between the adjacent columns/rows. Bias-adjusted generalized odds ratio and a test for marginal homogeneity are presented to account for misclassification bias. Data from lambing records of 227 Merino ewes are used to illustrate how to calculate these bias-adjusted estimators and – because validation data are not available – a sensitivity analysis is conducted

    Inducing safer oblique trees without costs

    Get PDF
    Decision tree induction has been widely studied and applied. In safety applications, such as determining whether a chemical process is safe or whether a person has a medical condition, the cost of misclassification in one of the classes is significantly higher than in the other class. Several authors have tackled this problem by developing cost-sensitive decision tree learning algorithms or have suggested ways of changing the distribution of training examples to bias the decision tree learning process so as to take account of costs. A prerequisite for applying such algorithms is the availability of costs of misclassification. Although this may be possible for some applications, obtaining reasonable estimates of costs of misclassification is not easy in the area of safety. This paper presents a new algorithm for applications where the cost of misclassifications cannot be quantified, although the cost of misclassification in one class is known to be significantly higher than in another class. The algorithm utilizes linear discriminant analysis to identify oblique relationships between continuous attributes and then carries out an appropriate modification to ensure that the resulting tree errs on the side of safety. The algorithm is evaluated with respect to one of the best known cost-sensitive algorithms (ICET), a well-known oblique decision tree algorithm (OC1) and an algorithm that utilizes robust linear programming

    Asymptotic Variance Estimation for the Misclassification SIMEX

    Get PDF
    Most epidemiological studies suffer from misclassification in the response and/or the covariates. Since ignoring misclassification induces bias on the parameter estimates, correction for such errors is important. For measurement error, the continuous analog to misclassification, a general approach for bias correction is the SIMEX (simulation extrapolation) originally suggested by Cook and Stefanski (1994). This approach has been recently extended to regression models with a possibly misclassified categorical response and/or the covariates by Küchenhoff et al. (2005), and is called the MC-SIMEX approach. To assess the importance of a regressor not only its (corrected) estimate is needed, but also its standard error. For the original SIMEX approach. Carroll et al. (1996) developed a method for estimating the asymptotic variance. Here we derive the asymptotic variance estimators for the MC-SIMEX approach, extending the methodology of Carroll et al. (1996). We also include the case where the misclassification probabilities are estimated by a validation study. An extensive simulation study shows the good performance of our approach. The approach is illustrated using an example in caries research including a logistic regression model, where the response and a binary covariate are possibly misclassified

    The existence of standard-biased mortality ratios due to death certificate misclassification - a simulation study based on a true story

    Get PDF
    Background: Mortality statistics are used to compare health status of populations; optimally, they base on individual death certificates. However, determining cause of death is error-prone. E.g. cardiovascular disease (CVD) death determination is characterized by sensitivity (SE) and specificity (SP) lower than 85%. Furthermore, differential misclassification may be present in case of homogenous target populations. We investigate the bias of standardized mortality ratios (SMR), based on real-world data. Methods: CVD mortality of 6378 ethnic German repatriates was assessed and the SMR calculated. Non-differential age-dependent misclassification was introduced into data by scenarios of equal SE and SP in a range of 0.7 to 0.85. The bias between originally reported and actual SMR was calculated for each pair of values. Additionally, four differential misclassification scenarios were simulated, reflecting two extreme scenarios of both quality criteria varied in the cohort but fixed to either higher or lower in the reference, and two scenarios of crossed criteria values. Results: In case of non-differential misclassification the bias is always towards the null-hypothesis. The lowest bias was 13.5% (SE, SP = 0.85 constantly), the maximum bias was 40% (SP = 0.7). However, in case of differential misclassification the observed SMR can be on the wrong track. If SP is high but SE low in the cohort, negative bias up to −10% can occur. In case SE is low but SP is high in the reference, the bias remains always positive. In the opposite case plus SP is high in the cohort, the bias can reach −30%. Conclusion: SMR values are always biased due to the diagnostic test character of death determination. In majority of epidemiological studies the bias should be towards the null-hypothesis (non-differential misclassification). However, caution is needed in case of differential misclassification, possibly experienced in studies on homogenous subgroups, and in large prospective cohorts with specifically trained personnel

    No Free Lunch versus Occam's Razor in Supervised Learning

    Full text link
    The No Free Lunch theorems are often used to argue that domain specific knowledge is required to design successful algorithms. We use algorithmic information theory to argue the case for a universal bias allowing an algorithm to succeed in all interesting problem domains. Additionally, we give a new algorithm for off-line classification, inspired by Solomonoff induction, with good performance on all structured problems under reasonable assumptions. This includes a proof of the efficacy of the well-known heuristic of randomly selecting training data in the hope of reducing misclassification rates.Comment: 16 LaTeX pages, 1 figur

    Misreported schooling and returns to education: Evidence from the Uk

    Full text link
    In this paper we study the impact of misreported treatment status on the estimation of causal treatment effects. We characterise the bias introduced by misclassification on the average treatment effect on the treated under the assumption of selection on observables. Although the bias of matching-type estimators computed from misclassified data cannot in general be signed, we show that the bias is most likely to be downward if misclassification does not depend on variables entering the selection-on-observables assumption, or only depends on such variables via the propensity score index. We extend the framework to multiple treatments. We provide results to bound the returns to a number of educational qualifications in the UK semi-parametrically, and by using the unique nature of our data we assess the plausibility for the two biases from measurement error and from omitted variables to cancel out

    Case-Control Studies with Jointly Misclassified Exposure and Confounding Variables

    Get PDF
    The issue of 2 × 2 × 2 case-control studies is addressed when both exposure and confounding variables are jointly misclassified. Two scenarios are considered: the classification errors of exposure and confounding variables are independent or not independent. The bias-adjusted cell probability estimates which account for the misclassification bias are presented. The effect of misclassification on the measure of crude odds ratio either unstratified or stratified by the confounder, Mantel-Haenszel summary odds ratio, the confounding component in the crude odds ratio, the first and second order multiplicative interaction are assessed through the sensitivity analysis from using the data on the asthma deaths of 5-45 aged patients in New Zealand
    corecore