3 research outputs found

    A case study of an individual participant data meta-analysis of diagnostic accuracy showed that prediction regions represented heterogeneity well

    No full text
    Abstract The diagnostic accuracy of a screening tool is often characterized by its sensitivity and specificity. An analysis of these measures must consider their intrinsic correlation. In the context of an individual participant data meta-analysis, heterogeneity is one of the main components of the analysis. When using a random-effects meta-analytic model, prediction regions provide deeper insight into the effect of heterogeneity on the variability of estimated accuracy measures across the entire studied population, not just the average. This study aimed to investigate heterogeneity via prediction regions in an individual participant data meta-analysis of the sensitivity and specificity of the Patient Health Questionnaire-9 for screening to detect major depression. From the total number of studies in the pool, four dates were selected containing roughly 25%, 50%, 75% and 100% of the total number of participants. A bivariate random-effects model was fitted to studies up to and including each of these dates to jointly estimate sensitivity and specificity. Two-dimensional prediction regions were plotted in ROC-space. Subgroup analyses were carried out on sex and age, regardless of the date of the study. The dataset comprised 17,436 participants from 58 primary studies of which 2322 (13.3%) presented cases of major depression. Point estimates of sensitivity and specificity did not differ importantly as more studies were added to the model. However, correlation of the measures increased. As expected, standard errors of the logit pooled TPR and FPR consistently decreased as more studies were used, while standard deviations of the random-effects did not decrease monotonically. Subgroup analysis by sex did not reveal important contributions for observed heterogeneity; however, the shape of the prediction regions differed. Subgroup analysis by age did not reveal meaningful contributions to the heterogeneity and the prediction regions were similar in shape. Prediction intervals and regions reveal previously unseen trends in a dataset. In the context of a meta-analysis of diagnostic test accuracy, prediction regions can display the range of accuracy measures in different populations and settings

    Accuracy of the PHQ-2 alone and in combination with the PHQ-9 for screening to detect major depression

    No full text
    Importance: The Patient Health Questionnaire depression module (PHQ-9) is a 9-item self-administered instrument used for detecting depression and assessing severity of depression. The Patient Health Questionnaire–2 (PHQ-2) consists of the first 2 items of the PHQ-9 (which assess the frequency of depressed mood and anhedonia) and can be used as a first step to identify patients for evaluation with the full PHQ-9. Objective: To estimate PHQ-2 accuracy alone and combined with the PHQ-9 for detecting major depression. Data Sources: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, and Web of Science (January 2000-May 2018). Study Selection: Eligible data sets compared PHQ-2 scores with major depression diagnoses from a validated diagnostic interview. Data Extraction and Synthesis: Individual participant data were synthesized with bivariate random-effects meta-analysis to estimate pooled sensitivity and specificity of the PHQ-2 alone among studies using semistructured, fully structured, or Mini International Neuropsychiatric Interview (MINI) diagnostic interviews separately and in combination with the PHQ-9 vs the PHQ-9 alone for studies that used semistructured interviews. The PHQ-2 score ranges from 0 to 6, and the PHQ-9 score ranges from 0 to 27. Results: Individual participant data were obtained from 100 of 136 eligible studies (44 318 participants; 4572 with major depression [10%]; mean [SD] age, 49 [17] years; 59% female). Among studies that used semistructured interviews, PHQ-2 sensitivity and specificity (95% CI) were 0.91 (0.88-0.94) and 0.67 (0.64-0.71) for cutoff scores of 2 or greater and 0.72 (0.67-0.77) and 0.85 (0.83-0.87) for cutoff scores of 3 or greater. Sensitivity was significantly greater for semistructured vs fully structured interviews. Specificity was not significantly different across the types of interviews. The area under the receiver operating characteristic curve was 0.88 (0.86-0.89) for semistructured interviews, 0.82 (0.81-0.84) for fully structured interviews, and 0.87 (0.85-0.88) for the MINI. There were no significant subgroup differences. For semistructured interviews, sensitivity for PHQ-2 scores of 2 or greater followed by PHQ-9 scores of 10 or greater (0.82 [0.76-0.86]) was not significantly different than PHQ-9 scores of 10 or greater alone (0.86 [0.80-0.90]); specificity for the combination was significantly but minimally higher (0.87 [0.84-0.89] vs 0.85 [0.82-0.87]). The area under the curve was 0.90 (0.89-0.91). The combination was estimated to reduce the number of participants needing to complete the full PHQ-9 by 57% (56%-58%). Conclusions and Relevance: In an individual participant data meta-analysis of studies that compared PHQ scores with major depression diagnoses, the combination of PHQ-2 (with cutoff ≥2) followed by PHQ-9 (with cutoff ≥10) had similar sensitivity but higher specificity compared with PHQ-9 cutoff scores of 10 or greater alone. Further research is needed to understand the clinical and research value of this combined approach to screening

    External validation of a shortened screening tool using individual participant data meta-analysis: A case study of the Patient Health Questionnaire-Dep-4

    No full text
    Shortened versions of self-reported questionnaires may be used to reduce respondent burden. When shortened screening tools are used, it is desirable to maintain equivalent diagnostic accuracy to full-length forms. This manuscript presents a case study that illustrates how external data and individual participant data meta-analysis can be used to assess the equivalence in diagnostic accuracy between a shortened and full-length form. This case study compares the Patient Health Questionnaire-9 (PHQ-9) and a 4-item shortened version (PHQ-Dep-4) that was previously developed using optimal test assembly methods. Using a large database of 75 primary studies (34,698 participants, 3,392 major depression cases), we evaluated whether the PHQ-Dep-4 cutoff of ≥ 4 maintained equivalent diagnostic accuracy to a PHQ-9 cutoff of ≥ 10. Using this external validation dataset, a PHQ-Dep-4 cutoff of ≥ 4 maximized the sum of sensitivity and specificity, with a sensitivity of 0.88 (95% CI 0.81, 0.93), 0.68 (95% CI 0.56, 0.78), and 0.80 (95% CI 0.73, 0.85) for the semi-structured, fully structured, and MINI reference standard categories, respectively, and a specificity of 0.79 (95% CI 0.74, 0.83), 0.85 (95% CI 0.78, 0.90), and 0.83 (95% CI 0.80, 0.86) for the semi-structured, fully structured, and MINI reference standard categories, respectively. While equivalence with a PHQ-9 cutoff of ≥ 10 was not established, we found the sensitivity of the PHQ-Dep-4 to be non-inferior to that of the PHQ-9, and the specificity of the PHQ-Dep-4 to be marginally smaller than the PHQ-9
    corecore