17 research outputs found

    A Metadata Manifesto: The Need for Global Health Metadata

    Get PDF
    Administrative health data recorded for individual health episodes (such as births, deaths, physician visits, and hospital stays) are being widely used to study policy-relevant scientific questions about population health, health services, and quality of care. Furthermore, an increasing number of international health comparisons are being undertaken with these data. An essential pre-requisite to such international comparative work is a detailed characterization of existing international health data resources, so that they can be more readily used in comparison studies across counties. A major challenge to such international comparative work is the variability across countries in the extent, content, and validity of existing administrative data holdings. Recognizing this, we have undertaken an international pilot process of compiling detailed data about data – i.e., a “meta-data catalogue” – for existing international administrative health data holdings. The methodological process for collecting these meta-data is described here, along with some general descriptive results for selected countries included in the pilot

    Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data.

    Get PDF
    OBJECTIVE: Data quality assessment is a challenging facet for research using coded administrative health data. Current assessment approaches are time and resource intensive. We explored whether association rule mining (ARM) can be used to develop rules for assessing data quality. MATERIALS AND METHODS: We extracted 2013 and 2014 records from the hospital discharge abstract database (DAD) for patients between the ages of 55 and 65 from five acute care hospitals in Alberta, Canada. The ARM was conducted using the 2013 DAD to extract rules with support ≥0.0019 and confidence ≥0.5 using the bootstrap technique, and tested in the 2014 DAD. The rules were compared against the method of coding frequency and assessed for their ability to detect error introduced by two kinds of data manipulation: random permutation and random deletion. RESULTS: The association rules generally had clear clinical meanings. Comparing 2014 data to 2013 data (both original), there were 3 rules with a confidence difference >0.1, while coding frequency difference of codes in the right hand of rules was less than 0.004. After random permutation of 50% of codes in the 2014 data, average rule confidence dropped from 0.72 to 0.27 while coding frequency remained unchanged. Rule confidence decreased with the increase of coding deletion, as expected. Rule confidence was more sensitive to code deletion compared to coding frequency, with slope of change ranging from 1.7 to 184.9 with a median of 9.1. CONCLUSION: The ARM is a promising technique to assess data quality. It offers a systematic way to derive coding association rules hidden in data, and potentially provides a sensitive and efficient method of assessing data quality compared to standard methods

    Data on coding association rules from an inpatient administrative health data coded by International classification of disease - 10th revision (ICD-10) codes

    Get PDF
    Data presented in this article relates to the research article entitled “Exploration of association rule mining for coding consistency and completeness assessment in inpatient administrative health data” (Peng et al. [1]) in preparation). We provided a set of ICD-10 coding association rules in the age group of 55 to 65. The rules were extracted from an inpatient administrative health data at five acute care hospitals in Alberta, Canada, using association rule mining. Thresholds of support and confidence for the association rules mining process were set at 0.19% and 50% respectively. The data set contains 426 rules, in which 86 rules are not nested. Data are provided in the supplementary material. The presented coding association rules provide a reference for future researches on the use of association rule mining for data quality assessment

    Serially Combining Epidemiological Designs Does Not Improve Overall Signal Detection in Vaccine Safety Surveillance

    Get PDF
    Introduction: Vaccine safety surveillance commonly includes a serial testing approach with a sensitive method for ‘signal generation’ and specific method for ‘signal validation.’ The extent to which serial testing in real-world studies improves or hinders overall performance in terms of sensitivity and specificity remains unknown. Methods: We assessed the overall performance of serial testing using three administrative claims and one electronic health record database. We compared type I and II errors before and after empirical calibration for historical comparator, self-controlled case series (SCCS), and the serial combination of those designs against six vaccine exposure groups with 93 negative control and 279 imputed positive control outcomes. Results: The historical comparator design mostly had fewer type II errors than SCCS. SCCS had fewer type I errors than the historical comparator. Before empirical calibration, the serial combination increased specificity and decreased sensitivity. Type II errors mostly exceeded 50%. After empirical calibration, type I errors returned to nominal; sensitivity was lowest when the methods were combined. Conclusion: While serial combination produced fewer false-positive signals compared with the most specific method, it generated more false-negative signals compared with the most sensitive method. Using a historical comparator design followed by an SCCS analysis yielded decreased sensitivity in evaluating safety signals relative to a one-stage SCCS approach. While the current use of serial testing in vaccine surveillance may provide a practical paradigm for signal identification and triage, single epidemiological designs should be explored as valuable approaches to detecting signals

    Bias, Precision and Timeliness of Historical (Background) Rate Comparison Methods for Vaccine Safety Monitoring: An Empirical Multi-Database Analysis

    No full text
    Using real-world data and past vaccination data, we conducted a large-scale experiment to quantify bias, precision and timeliness of different study designs to estimate historical background (expected) compared to post-vaccination (observed) rates of safety events for several vaccines. We used negative (not causally related) and positive control outcomes. The latter were synthetically generated true safety signals with incident rate ratios ranging from 1.5 to 4. Observed vs. expected analysis using within-database historical background rates is a sensitive but unspecific method for the identification of potential vaccine safety signals. Despite good discrimination, most analyses showed a tendency to overestimate risks, with 20%-100% type 1 error, but low (0% to 20%) type 2 error in the large databases included in our study. Efforts to improve the comparability of background and post-vaccine rates, including age-sex adjustment and anchoring background rates around a visit, reduced type 1 error and improved precision but residual systematic error persisted. Additionally, empirical calibration dramatically reduced type 1 to nominal but came at the cost of increasing type 2 error

    COVID-19 in patients with autoimmune diseases : characteristics and outcomes in a multinational network of cohorts across three countries

    Get PDF
    Patients with autoimmune diseases were advised to shield to avoid COVID-19, but information on their prognosis is lacking. We characterised 30-day outcomes and mortality after hospitalisation with COVID-19 among patients with prevalent autoimmune diseases, and compared outcomes after hospital admissions among similar patients with seasonal influenza. A multinational network cohort study was conducted using electronic health records data from Columbia University Irving Medical Center (CUIMC) (United States [US]), Optum [US], Department of Veterans Affairs (VA) (US), Information System for Research in Primary Care-Hospitalisation Linked Data (SIDIAP-H) (Spain), and claims data from IQVIA Open Claims (US) and Health Insurance and Review Assessment (HIRA) (South Korea). All patients with prevalent autoimmune diseases, diagnosed and/or hospitalised between January and June 2020 with COVID-19, and similar patients hospitalised with influenza in 2017-2018 were included. Outcomes were death and complications within 30 days of hospitalisation. We studied 133 589 patients diagnosed and 48 418 hospitalised with COVID-19 with prevalent autoimmune diseases. Most patients were female, aged ≥50 years with previous comorbidities. The prevalence of hypertension (45.5-93.2%), chronic kidney disease (14.0-52.7%) and heart disease (29.0-83.8%) was higher in hospitalised vs diagnosed patients with COVID-19. Compared with 70 660 hospitalised with influenza, those admitted with COVID-19 had more respiratory complications including pneumonia and acute respiratory distress syndrome, and higher 30-day mortality (2.2% to 4.3% vs 6.3% to 24.6%). Compared with influenza, COVID-19 is a more severe disease, leading to more complications and higher mortality

    Real‐world evidence to advance knowledge in pulmonary hypertension: Status, challenges, and opportunities. A consensus statement from the Pulmonary Vascular Research Institute's Innovative Drug Development Initiative's Real‐world Evidence Working Group

    No full text
    Abstract This manuscript on real‐world evidence (RWE) in pulmonary hypertension (PH) incorporates the broad experience of members of the Pulmonary Vascular Research Institute's Innovative Drug Development Initiative Real‐World Evidence Working Group. We aim to strengthen the research community's understanding of RWE in PH to facilitate clinical research advances and ultimately improve patient care. Herein, we review real‐world data (RWD) sources, discuss challenges and opportunities when using RWD sources to study PH populations, and identify resources needed to support the generation of meaningful RWE for the global PH community

    Reproducible variability: assessing investigator discordance across 9 research teams attempting to reproduce the same observational study

    No full text
    OBJECTIVE: Observational studies can impact patient care but must be robust and reproducible. Nonreproducibility is primarily caused by unclear reporting of design choices and analytic procedures. This study aimed to: (1) assess how the study logic described in an observational study could be interpreted by independent researchers and (2) quantify the impact of interpretations' variability on patient characteristics. MATERIALS AND METHODS: Nine teams of highly qualified researchers reproduced a cohort from a study by Albogami et al. The teams were provided the clinical codes and access to the tools to create cohort definitions such that the only variable part was their logic choices. We executed teams' cohort definitions against the database and compared the number of subjects, patient overlap, and patient characteristics. RESULTS: On average, the teams' interpretations fully aligned with the master implementation in 4 out of 10 inclusion criteria with at least 4 deviations per team. Cohorts' size varied from one-third of the master cohort size to 10 times the cohort size (2159-63 619 subjects compared to 6196 subjects). Median agreement was 9.4% (interquartile range 15.3-16.2%). The teams' cohorts significantly differed from the master implementation by at least 2 baseline characteristics, and most of the teams differed by at least 5. CONCLUSIONS: Independent research teams attempting to reproduce the study based on its free-text description alone produce different implementations that vary in the population size and composition. Sharing analytical code supported by a common data model and open-source tools allows reproducing a study unambiguously thereby preserving initial design choices
    corecore