26 research outputs found

    Assessing heterogeneity of electronic health-care databases: A case study of background incidence rates of venous thromboembolism

    Get PDF
    Purpose: Heterogeneous results from multi-database studies have been observed, for example, in the context of generating background incidence rates (IRs) for adverse events of special interest for SARS-CoV-2 vaccines. In this study, we aimed to explore different between-database sources of heterogeneity influencing the estimated background IR of venous thromboembolism (VTE). Methods: Through forest plots and random-effects models, we performed a qualitative and quantitative assessment of heterogeneity of VTE background IR derived from 11 databases from 6 European countries, using age and gender stratified background IR for the years 2017–2019 estimated in two studies. Sensitivity analyses were performed to assess the impact of selection criteria on the variability of the reported IR. Results: A total of 54 257 284 subjects were included in this study. Age–gender pooled VTE IR varied from 5 to 421/100 000 person-years and IR increased with increasing age for both genders. Wide confidence intervals (CIs) demonstrated considerable within-data-source heterogeneity. Selecting databases with similar characteristics had only a minor impact on the variability as shown in forest plots and the magnitude of the I2 statistic, which remained large. Solely including databases with primary care and hospital data resulted in a noticeable decrease in heterogeneity. Conclusions: Large variability in IR between data sources and within age group and gender strata warrants the need for stratification and limits the feasibility of a meaningful pooled estimate. A more detailed knowledge of the data characteristics, operationalisation of case definitions and cohort population might support an informed choice of the adequate databases to calculate reliable estimates

    Multiple imputation of incomplete multilevel data using Heckman selection models

    Get PDF
    Missing data is a common problem in medical research, and is commonly addressed using multiple imputation. Although traditional imputation methods allow for valid statistical inference when data are missing at random (MAR), their implementation is problematic when the presence of missingness depends on unobserved variables, that is, the data are missing not at random (MNAR). Unfortunately, this MNAR situation is rather common, in observational studies, registries and other sources of real-world data. While several imputation methods have been proposed for addressing individual studies when data are MNAR, their application and validity in large datasets with multilevel structure remains unclear. We therefore explored the consequence of MNAR data in hierarchical data in-depth, and proposed a novel multilevel imputation method for common missing patterns in clustered datasets. This method is based on the principles of Heckman selection models and adopts a two-stage meta-analysis approach to impute binary and continuous variables that may be outcomes or predictors and that are systematically or sporadically missing. After evaluating the proposed imputation model in simulated scenarios, we illustrate it use in a cross-sectional community survey to estimate the prevalence of malaria parasitemia in children aged 2-10 years in five regions in Uganda

    Adjusting for misclassification of an exposure in an individual participant data meta-analysis

    Get PDF
    A common problem in the analysis of multiple data sources, including individual participant data meta-analysis (IPD-MA), is the misclassification of binary variables. Misclassification may lead to biased estimators of model parameters, even when the misclassification is entirely random. We aimed to develop statistical methods that facilitate unbiased estimation of adjusted and unadjusted exposure-outcome associations and between-study heterogeneity in IPD-MA, where the extent and nature of exposure misclassification may vary across studies. We present Bayesian methods that allow misclassification of binary exposure variables to depend on study- and participant-level characteristics. In an example of the differential diagnosis of dengue using two variables, where the gold standard measurement for the exposure variable was unavailable for some studies which only measured a surrogate prone to misclassification, our methods yielded more accurate estimates than analyses naive with regard to misclassification or based on gold standard measurements alone. In a simulation study, the evaluated misclassification model yielded valid estimates of the exposure-outcome association, and was more accurate than analyses restricted to gold standard measurements. Our proposed framework can appropriately account for the presence of binary exposure misclassification in IPD-MA. It requires that some studies supply IPD for the surrogate and gold standard exposure, and allows misclassification to follow a random effects distribution across studies conditional on observed covariates (and outcome). The proposed methods are most beneficial when few large studies that measured the gold standard are available, and when misclassification is frequent

    BAYESIAN ADJUSTMENT FOR PREFERENTIAL TESTING IN ESTIMATING INFECTION FATALITY RATES, AS MOTIVATED BY THE COVID-19 PANDEMIC

    Get PDF
    A key challenge in estimating the infection fatality rate (IFR), along with its relation with various factors of interest, is determining the total number of cases. The total number of cases is not known not only because not everyone is tested but also, more importantly, because tested individuals are not representative of the population at large. We refer to the phenomenon whereby infected individuals are more likely to be tested than noninfected individuals as “preferential testing.” An open question is whether or not it is possible to reliably estimate the IFR without any specific knowledge about the degree to which the data are biased by preferential testing. In this paper we take a partial identifiability approach, formulating clearly where deliberate prior assumptions can be made and presenting a Bayesian model which pools information from different samples. When the model is fit to European data obtained from seroprevalence studies and national official COVID-19 statistics, we estimate the overall COVID-19 IFR for Europe to be 0.53%, 95% C.I. =[0.38%, 0.70%]

    Propensity-based standardization to enhance the validation and interpretation of prediction model discrimination for a target population

    Get PDF
    External validation of the discriminative ability of prediction models is of key importance. However, the interpretation of such evaluations is challenging, as the ability to discriminate depends on both the sample characteristics (ie, case-mix) and the generalizability of predictor coefficients, but most discrimination indices do not provide any insight into their respective contributions. To disentangle differences in discriminative ability across external validation samples due to a lack of model generalizability from differences in sample characteristics, we propose propensity-weighted measures of discrimination. These weighted metrics, which are derived from propensity scores for sample membership, are standardized for case-mix differences between the model development and validation samples, allowing for a fair comparison of discriminative ability in terms of model characteristics in a target population of interest. We illustrate our methods with the validation of eight prediction models for deep vein thrombosis in 12 external validation data sets and assess our methods in a simulation study. In the illustrative example, propensity score standardization reduced between-study heterogeneity of discrimination, indicating that between-study variability was partially attributable to case-mix. The simulation study showed that only flexible propensity-score methods (allowing for non-linear effects) produced unbiased estimates of model discrimination in the target population, and only when the positivity assumption was met. Propensity score-based standardization may facilitate the interpretation of (heterogeneity in) discriminative ability of a prediction model as observed across multiple studies, and may guide model updating strategies for a particular target population. Careful propensity score modeling with attention for non-linear relations is recommended

    Network Meta-analysis for the Diagnostic Approach to Pathologic Nipple Discharge

    Get PDF
    Pathologic nipple discharge (PND) is one of the most common breast-related complaints for referral because of its supposed association with breast cancer. The aim of this network meta-analysis (NMA) was to compare the diagnostic efficacy of ultrasound, mammogram, cytology, magnetic resonance imaging (MRI), and ductoscopy in patients with PND, as well as to determine the best diagnostic strategy to assess the risk of malignancy as cause for PND. Cochrane Library, PubMed, and Embase were searched to collect relevant literature from the inception of each of the diagnostic methods until January 27, 2020. The search yielded 1472 original citations, of which 36 studies with 3764 patients were finally included for analysis. Direct and indirect comparisons were performed using an NMA approach to evaluate the combined odd ratios and to determine the surface under the cumulative ranking curves (SUCRA) of the diagnostic value of different imaging methods for the detection of breast cancer in patients with PND. Additionally, a subgroup meta-analysis comparing ductoscopy to MRI when conventional imaging was negative was also performed. According to this NMA, sensitivity for detection of malignancy in patients with PND was highest for MRI (83%), followed by ductoscopy (58%), ultrasound (50%), cytology (38%), and mammogram (22%). Specificity was highest for mammogram (93%) followed by ductoscopy (92%), cytology (90%), MRI (76%), and ultrasound (69%). Diagnostic accuracy was the highest for ductoscopy (88%), followed by cytology (82%), MRI (77%), mammogram (76%), and ultrasound (65%). Subgroup meta-analysis (comparing ductoscopy to MRI when ultrasound and mammogram were negative) showed no significant difference in sensitivity, but ductoscopy was statistically significantly better with regard to specificity and diagnostic accuracy. The results from this NMA indicate that although ultrasound and mammogram may remain low-cost useful first choices for the detection of malignancy in patients with PND, ductoscopy outperforms most imaging techniques (especially MRI) and cytology

    Current trends in the application of causal inference methods to pooled longitudinal non-randomised data: A protocol for a methodological systematic review

    Get PDF
    Introduction Causal methods have been adopted and adapted across health disciplines, particularly for the analysis of single studies. However, the sample sizes necessary to best inform decision-making are often not attainable with single studies, making pooled individual-level data analysis invaluable for public health efforts. Researchers commonly implement causal methods prevailing in their home disciplines, and how these are selected, evaluated, implemented and reported may vary widely. To our knowledge, no article has yet evaluated trends in the implementation and reporting of causal methods in studies leveraging individual-level data pooled from several studies. We undertake this review to uncover patterns in the implementation and reporting of causal methods used across disciplines in research focused on health outcomes. We will investigate variations in methods to infer causality used across disciplines, time and geography and identify gaps in reporting of methods to inform the development of reporting standards and the conversation required to effect change. Methods and analysis We will search four databases (EBSCO, Embase, PubMed, Web of Science) using a search strategy developed with librarians from three universities (Heidelberg University, Harvard University, and University of California, San Francisco). The search strategy includes terms such as 'pool∗', 'harmoniz∗', 'cohort∗', 'observational', variations on 'individual-level data'. Four reviewers will independently screen articles using Covidence and extract data from included articles. The extracted data will be analysed descriptively in tables and graphically to reveal the pattern in methods implementation and reporting. This protocol has been registered with PROSPERO (CRD42020143148). Ethics and dissemination No ethical approval was required as only publicly available data were used. The results will be submitted as a manuscript to a peer-reviewed journal, disseminated in conferences if relevant, and published as part of doctoral dissertations in Global Health at the Heidelberg University Hospital

    Application of causal inference methods in individual-participant data meta-analyses in medicine: addressing data handling and reporting gaps with new proposed reporting guidelines

    Get PDF
    Observational data provide invaluable real-world information in medicine, but certain methodological considerations are required to derive causal estimates. In this systematic review, we evaluated the methodology and reporting quality of individual-level patient data meta-analyses (IPD-MAs) conducted with non-randomized exposures, published in 2009, 2014, and 2019 that sought to estimate a causal relationship in medicine. We screened over 16,000 titles and abstracts, reviewed 45 full-text articles out of the 167 deemed potentially eligible, and included 29 into the analysis. Unfortunately, we found that causal methodologies were rarely implemented, and reporting was generally poor across studies. Specifically, only three of the 29 articles used quasi-experimental methods, and no study used G-methods to adjust for time-varying confounding. To address these issues, we propose stronger collaborations between physicians and methodologists to ensure that causal methodologies are properly implemented in IPD-MAs. In addition, we put forward a suggested checklist of reporting guidelines for IPD-MAs that utilize causal methods. This checklist could improve reporting thereby potentially enhancing the quality and trustworthiness of IPD-MAs, which can be considered one of the most valuable sources of evidence for health policy

    Dealing with missing data using the Heckman selection model: methods primer for epidemiologists

    No full text
    Missing data is a common problem in epidemiologic studies and is often addressed by omitting incomplete records or adopting multiple imputation. Although these methods can produce unbiased estimates of study associations, their validity becomes problematic when data are missing not at random (MNAR), and the missing data mechanism is nonignorable. This situation typically arises when the presence of missing values depends on characteristics of the measurement or recording process, which is common in surveys and databases with electronic healthcare records. In this article, we discuss the relevance and implementation of Heckman selection models to impute variables that are missing not at random
    corecore