6 research outputs found

    Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis

    Full text link
    Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data

    Estimating the sample mean and standard deviation from commonly reported quantiles in meta-analysis

    No full text
    Researchers increasingly use meta-analysis to synthesize the results of several studies in order to estimate a common effect. When the outcome variable is continuous, standard meta-analytic approaches assume that the primary studies report the sample mean and standard deviation of the outcome. However, when the outcome is skewed, authors sometimes summarize the data by reporting the sample median and one or both of (i) the minimum and maximum values and (ii) the first and third quartiles, but do not report the mean or standard deviation. To include these studies in meta-analysis, several methods have been developed to estimate the sample mean and standard deviation from the reported summary data. A major limitation of these widely used methods is that they assume that the outcome distribution is normal, which is unlikely to be tenable for studies reporting medians. We propose two novel approaches to estimate the sample mean and standard deviation when data are suspected to be non-normal. Our simulation results and empirical assessments show that the proposed methods often perform better than the existing methods when applied to non-normal data

    External validation of a shortened screening tool using individual participant data meta-analysis: A case study of the Patient Health Questionnaire-Dep-4

    No full text
    Shortened versions of self-reported questionnaires may be used to reduce respondent burden. When shortened screening tools are used, it is desirable to maintain equivalent diagnostic accuracy to full-length forms. This manuscript presents a case study that illustrates how external data and individual participant data meta-analysis can be used to assess the equivalence in diagnostic accuracy between a shortened and full-length form. This case study compares the Patient Health Questionnaire-9 (PHQ-9) and a 4-item shortened version (PHQ-Dep-4) that was previously developed using optimal test assembly methods. Using a large database of 75 primary studies (34,698 participants, 3,392 major depression cases), we evaluated whether the PHQ-Dep-4 cutoff of ≥ 4 maintained equivalent diagnostic accuracy to a PHQ-9 cutoff of ≥ 10. Using this external validation dataset, a PHQ-Dep-4 cutoff of ≥ 4 maximized the sum of sensitivity and specificity, with a sensitivity of 0.88 (95% CI 0.81, 0.93), 0.68 (95% CI 0.56, 0.78), and 0.80 (95% CI 0.73, 0.85) for the semi-structured, fully structured, and MINI reference standard categories, respectively, and a specificity of 0.79 (95% CI 0.74, 0.83), 0.85 (95% CI 0.78, 0.90), and 0.83 (95% CI 0.80, 0.86) for the semi-structured, fully structured, and MINI reference standard categories, respectively. While equivalence with a PHQ-9 cutoff of ≥ 10 was not established, we found the sensitivity of the PHQ-Dep-4 to be non-inferior to that of the PHQ-9, and the specificity of the PHQ-Dep-4 to be marginally smaller than the PHQ-9

    An empirical comparison of statistical methods for multiple cut-off diagnostic test accuracy meta-analysis of the Edinburgh postnatal depression scale (EPDS) depression screening tool using published results vs individual participant data

    No full text
    Abstract Background Selective reporting of results from only well-performing cut-offs leads to biased estimates of accuracy in primary studies of questionnaire-based screening tools and in meta-analyses that synthesize results. Individual participant data meta-analysis (IPDMA) of sensitivity and specificity at each cut-off via bivariate random-effects models (BREMs) can overcome this problem. However, IPDMA is laborious and depends on the ability to successfully obtain primary datasets, and BREMs ignore the correlation between cut-offs within primary studies. Methods We compared the performance of three recent multiple cut-off models developed by Steinhauser et al., Jones et al., and Hoyer and Kuss, that account for missing cut-offs when meta-analyzing diagnostic accuracy studies with multiple cut-offs, to BREMs fitted at each cut-off. We used data from 22 studies of the accuracy of the Edinburgh Postnatal Depression Scale (EPDS; 4475 participants, 758 major depression cases). We fitted each of the three multiple cut-off models and BREMs to a dataset with results from only published cut-offs from each study (published data) and an IPD dataset with results for all cut-offs (full IPD data). We estimated pooled sensitivity and specificity with 95% confidence intervals (CIs) for each cut-off and the area under the curve. Results Compared to the BREMs fitted to the full IPD data, the Steinhauser et al., Jones et al., and Hoyer and Kuss models fitted to the published data produced similar receiver operating characteristic curves; though, the Hoyer and Kuss model had lower area under the curve, mainly due to estimating slightly lower sensitivity at lower cut-offs. When fitting the three multiple cut-off models to the full IPD data, a similar pattern of results was observed. Importantly, all models had similar 95% CIs for sensitivity and specificity, and the CI width increased with cut-off levels for sensitivity and decreased with an increasing cut-off for specificity, even the BREMs which treat each cut-off separately. Conclusions Multiple cut-off models appear to be the favorable methods when only published data are available. While collecting IPD is expensive and time consuming, IPD can facilitate subgroup analyses that cannot be conducted with published data only

    Probability of major depression classification based on the SCID, CIDI, and MINI diagnostic interviews: a synthesis of three individual participant data meta-analyses

    No full text
    Introduction: Three previous individual participant data meta-analyses (IPDMAs) reported that, compared to the Structured Clinical Interview for the DSM (SCID), alternative reference standards, primarily the Composite International Diagnostic Interview (CIDI) and the Mini International Neuropsychiatric Interview (MINI), tended to misclassify major depression status, when controlling for depression symptom severity. However, there was an important lack of precision in the results. Objective: To compare the odds of the major depression classification based on the SCID, CIDI, and MINI. Methods: We included and standardized data from 3 IPDMA databases. For each IPDMA, separately, we fitted binomial generalized linear mixed models to compare the adjusted odds ratios (aORs) of major depression classification, controlling for symptom severity and characteristics of participants, and the interaction between interview and symptom severity. Next, we synthesized results using a DerSimonian-Laird random-effects meta-analysis. Results: In total, 69,405 participants (7,574 [11%] with major depression) from 212 studies were included. Controlling for symptom severity and participant characteristics, the MINI (74 studies; 25,749 participants) classified major depression more often than the SCID (108 studies; 21,953 participants; aOR 1.46; 95% confidence interval [CI] 1.11–1.92]). Classification odds for the CIDI (30 studies; 21,703 participants) and the SCID did not differ overall (aOR 1.19; 95% CI 0.79–1.75); however, as screening scores increased, the aOR increased less for the CIDI than the SCID (interaction aOR 0.64; 95% CI 0.52–0.80). Conclusions: Compared to the SCID, the MINI classified major depression more often. The odds of the depression classification with the CIDI increased less as symptom levels increased. Interpretation of research that uses diagnostic interviews to classify depression should consider the interview characteristics

    Data-driven methods distort optimal cutoffs and accuracy estimates of depression screening tools: a simulation study using individual participant data

    No full text
    Objective: To evaluate, across multiple sample sizes, the degree that data-driven methods result in (1) optimal cutoffs different from population optimal cutoff and (2) bias in accuracy estimates. Study design and setting: A total of 1,000 samples of sample size 100, 200, 500 and 1,000 each were randomly drawn to simulate studies of different sample sizes from a database (n = 13,255) synthesized to assess Edinburgh Postnatal Depression Scale (EPDS) screening accuracy. Optimal cutoffs were selected by maximizing Youden's J (sensitivity+specificity–1). Optimal cutoffs and accuracy estimates in simulated samples were compared to population values. Results: Optimal cutoffs in simulated samples ranged from ≥ 5 to ≥ 17 for n = 100, ≥ 6 to ≥ 16 for n = 200, ≥ 6 to ≥ 14 for n = 500, and ≥ 8 to ≥ 13 for n = 1,000. Percentage of simulated samples identifying the population optimal cutoff (≥ 11) was 30% for n = 100, 35% for n = 200, 53% for n = 500, and 71% for n = 1,000. Mean overestimation of sensitivity and underestimation of specificity were 6.5 percentage point (pp) and -1.3 pp for n = 100, 4.2 pp and -1.1 pp for n = 200, 1.8 pp and -1.0 pp for n = 500, and 1.4 pp and -1.0 pp for n = 1,000. Conclusions: Small accuracy studies may identify inaccurate optimal cutoff and overstate accuracy estimates with data-driven methods.</p
    corecore