12 research outputs found
Accuracy of the Patient Health Questionnaire-9 for screening to detect major depression : updated systematic review and individual participant data meta-analysis
Objective: To update a previous individual participant data meta-analysis and determine the accuracy of the Patient Health Questionnaire-9 (PHQ-9), the most commonly used depression screening tool in general practice, for detecting major depression overall and by study or participant subgroups.
Design: Systematic review and individual participant data meta-analysis.
Data sources: Medline, Medline In-Process, and Other Non-Indexed Citations via Ovid, PsycINFO, Web of Science searched through 9 May 2018.
Review methods: Eligible studies administered the PHQ-9 and classified current major depression status using a validated semistructured diagnostic interview (designed for clinician administration), fully structured interview (designed for lay administration), or the Mini International Neuropsychiatric Interview (MINI; a brief interview designed for lay administration). A bivariate random effects meta-analytic model was used to obtain point and interval estimates of pooled PHQ-9 sensitivity and specificity at cut-off values 5-15, separately, among studies that used semistructured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual), fully structured interviews (eg, Composite International Diagnostic Interview), and the MINI. Meta-regression was used to investigate whether PHQ-9 accuracy correlated with reference standard categories and participant characteristics.
Results: Data from 44 503 total participants (27 146 additional from the update) were obtained from 100 of 127 eligible studies (42 additional studies; 79% eligible studies; 86% eligible participants). Among studies with a semistructured interview reference standard, pooled PHQ-9 sensitivity and specificity (95% confidence interval) at the standard cut-off value of ≥10, which maximised combined sensitivity and specificity, were 0.85 (0.79 to 0.89) and 0.85 (0.82 to 0.87), respectively. Specificity was similar across reference standards, but sensitivity in studies with semistructured interviews was 7-24% (median 21%) higher than with fully structured reference standards and 2-14% (median 11%) higher than with the MINI across cut-off values. Across reference standards and cut-off values, specificity was 0-10% (median 3%) higher for men and 0-12 (median 5%) higher for people aged 60 or older.
Conclusions: Researchers and clinicians could use results to determine outcomes, such as total number of positive screens and false positive screens, at different PHQ-9 cut-off values for different clinical settings using the knowledge translation tool at www.depressionscreening100.com/phq.
Study registration: PROSPERO CRD42014010673
An evaluation of computational methods for aggregate data meta-analyses of diagnostic test accuracy studies
Abstract Background A Generalized Linear Mixed Model (GLMM) is recommended to meta-analyze diagnostic test accuracy studies (DTAs) based on aggregate or individual participant data. Since a GLMM does not have a closed-form likelihood function or parameter solutions, computational methods are conventionally used to approximate the likelihoods and obtain parameter estimates. The most commonly used computational methods are the Iteratively Reweighted Least Squares (IRLS), the Laplace approximation (LA), and the Adaptive Gauss-Hermite quadrature (AGHQ). Despite being widely used, it has not been clear how these computational methods compare and perform in the context of an aggregate data meta-analysis (ADMA) of DTAs. Methods We compared and evaluated the performance of three commonly used computational methods for GLMM - the IRLS, the LA, and the AGHQ, via a comprehensive simulation study and real-life data examples, in the context of an ADMA of DTAs. By varying several parameters in our simulations, we assessed the performance of the three methods in terms of bias, root mean squared error, confidence interval (CI) width, coverage of the 95% CI, convergence rate, and computational speed. Results For most of the scenarios, especially when the meta-analytic data were not sparse (i.e., there were no or negligible studies with perfect diagnosis), the three computational methods were comparable for the estimation of sensitivity and specificity. However, the LA had the largest bias and root mean squared error for pooled sensitivity and specificity when the meta-analytic data were sparse. Moreover, the AGHQ took a longer computational time to converge relative to the other two methods, although it had the best convergence rate. Conclusions We recommend practitioners and researchers carefully choose an appropriate computational algorithm when fitting a GLMM to an ADMA of DTAs. We do not recommend the LA for sparse meta-analytic data sets. However, either the AGHQ or the IRLS can be used regardless of the characteristics of the meta-analytic data
An empirical comparison of statistical methods for multiple cut-off diagnostic test accuracy meta-analysis of the Edinburgh postnatal depression scale (EPDS) depression screening tool using published results vs individual participant data
Abstract Background Selective reporting of results from only well-performing cut-offs leads to biased estimates of accuracy in primary studies of questionnaire-based screening tools and in meta-analyses that synthesize results. Individual participant data meta-analysis (IPDMA) of sensitivity and specificity at each cut-off via bivariate random-effects models (BREMs) can overcome this problem. However, IPDMA is laborious and depends on the ability to successfully obtain primary datasets, and BREMs ignore the correlation between cut-offs within primary studies. Methods We compared the performance of three recent multiple cut-off models developed by Steinhauser et al., Jones et al., and Hoyer and Kuss, that account for missing cut-offs when meta-analyzing diagnostic accuracy studies with multiple cut-offs, to BREMs fitted at each cut-off. We used data from 22 studies of the accuracy of the Edinburgh Postnatal Depression Scale (EPDS; 4475 participants, 758 major depression cases). We fitted each of the three multiple cut-off models and BREMs to a dataset with results from only published cut-offs from each study (published data) and an IPD dataset with results for all cut-offs (full IPD data). We estimated pooled sensitivity and specificity with 95% confidence intervals (CIs) for each cut-off and the area under the curve. Results Compared to the BREMs fitted to the full IPD data, the Steinhauser et al., Jones et al., and Hoyer and Kuss models fitted to the published data produced similar receiver operating characteristic curves; though, the Hoyer and Kuss model had lower area under the curve, mainly due to estimating slightly lower sensitivity at lower cut-offs. When fitting the three multiple cut-off models to the full IPD data, a similar pattern of results was observed. Importantly, all models had similar 95% CIs for sensitivity and specificity, and the CI width increased with cut-off levels for sensitivity and decreased with an increasing cut-off for specificity, even the BREMs which treat each cut-off separately. Conclusions Multiple cut-off models appear to be the favorable methods when only published data are available. While collecting IPD is expensive and time consuming, IPD can facilitate subgroup analyses that cannot be conducted with published data only
Galactomannan, Beta-D-Glucan and PCR-Based Assays for the Diagnosis of Invasive Fungal Disease in Pediatric Cancer and Hematopoietic Stem Cell Transplantation : A Systematic Review and Meta-Analysis
We systematically reviewed and analyzed the available data for galactomannan (GM), beta-D-glucan (BG), and polymerase-chain reaction (PCR)-based assays to detect invasive fungal disease (IFD) in pediatric cancer or hematopoietic stem cell transplantation (HSCT) patients when used as screening tools during immunosuppression or as diagnostic tests in patients presenting with symptoms such as fever during neutropenia (FN). Out of 1,532 studies screened, 25 studies reported on GM (n=19), BG (n=3) and PCR (n=11). All fungal biomarkers demonstrated highly variable sensitivity, specificity and positive predictive values, and these were generally poor in both clinical settings. GM negative predictive values were high, ranging from 85-100% for screening and 70-100% in the diagnostic setting, but failure to identify non-Aspergillus molds limits its usefulness. Future work could focus on the usefulness of combinations of fungal biomarkers in pediatric cancer and HSCT
Accuracy of the Patient Health Questionnaire-9 for screening to detect major depression: updated systematic review and individual participant data meta-analysis
Objective To evaluate the accuracy of the depression subscale of the Hospital Anxiety and Depression Scale (HADS-D) to screen for major depression among people with physical health problems. Design Systematic review and individual participant data meta-analysis. Data sources Medline, Medline In-Process and Other Non-Indexed Citations, PsycInfo, and Web of Science (from inception to 25 October 2018). Review methods Eligible datasets included HADS-D scores and major depression status based on a validated diagnostic interview. Primary study data and study level data extracted from primary reports were combined. For HADS-D cut-off thresholds of 5-15, a bivariate random effects meta-analysis was used to estimate pooled sensitivity and specificity, separately, in studies that used semi-structured diagnostic interviews (eg, Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders), fully structured interviews (eg, Composite International Diagnostic Interview), and the Mini International Neuropsychiatric Interview. One stage metaregression was used to examine whether accuracy was associated with reference standard categories and the characteristics of participants. Sensitivity analyses were done to assess whether including published results from studies that did not provide raw data influenced the results. Results Individual participant data were obtained from 101 of 168 eligible studies (60%; 25574 participants (72% of eligible participants), 2549 with major depression). Combined sensitivity and specificity was maximised at a cut-off value of seven or higher for semi-structured interviews, fully structured interviews, and the Mini International Neuropsychiatric Interview. Among studies with a semi-structured interview (57 studies, 10664 participants, 1048 with major depression), sensitivity and specificity were 0.82 (95% confidence interval 0.76 to 0.87) and 0.78 (0.74 to 0.81) for a cut-off value of seven or higher, 0.74 (0.68 to 0.79) and 0.84 (0.81 to 0.87) for a cut-off value of eight or higher, and 0.44 (0.38 to 0.51) and 0.95 (0.93 to 0.96) for a cut-off value of 11 or higher. Accuracy was similar across reference standards and subgroups and when published results from studies that did not contribute data were included. Co nclusions When screening for major depression, a HADS-D cut-off value of seven or higher maximised combined sensitivity and specificity. A cut-off value of eight or higher generated similar combined sensitivity and specificity but was less sensitive and more specific. To identify medically ill patients with depression with the HADS-D, lower cut-off values could be used to avoid false negatives and higher cut-off values to reduce false positives and identify people with higher symptom levels.Fil: Negeri, Zelalem F. McGill University; CanadáFil: Levis, Brooke. Keele University; Reino UnidoFil: Sun, Ying. Lady Davis Institute For Medical Research; CanadáFil: He, Chen. Lady Davis Institute For Medical Research; CanadáFil: Krishnan, Ankur. Lady Davis Institute For Medical Research; CanadáFil: Wu, Yin. Lady Davis Institute For Medical Research; CanadáFil: Bhandari, Parash Mani. McGill University; CanadáFil: Neupane, Dipika. McGill University; CanadáFil: Brehaut, Eliana. Lady Davis Institute For Medical Research; CanadáFil: Benedetti, Andrea. McGill University; CanadáFil: Thombs, Brett D.. McGill University; CanadáFil: Daray, Federico Manuel. Universidad de Buenos Aires. Facultad de Medicina. Instituto de Farmacologia; Argentina. Consejo Nacional de Investigaciones Científicas y Técnicas. Oficina de Coordinación Administrativa Houssay; Argentin
Preliminary COVID-19 Fears Questionnaire: Systemic Sclerosis and Chronic Medical Conditions Versions
Objective: People with chronic medical conditions are at risk of complications and mortality during the COVID-19 outbreak. Understanding mental health implications, including effects of isolation and movement restrictions, should consider specific fears and level of fear experienced. No questionnaires or scales, however, have been developed for this purpose. We sought input from people living with the chronic autoimmune disease systemic sclerosis (SSc; scleroderma) on items for inclusion in a preliminary version of the COVID-19 Fears Questionnaire for Systemic Sclerosis, which we also adapted for a preliminary general version, the COVID-19 Fears Questionnaire for Chronic Medical Conditions.
Methods: We solicited lists of general and disease-specific fears during COVID-19 from people with SSc via social media and postings by patient organizations. Respondents could provide between 1 and 10 fears via a Qualtrics survey. We used content analysis to generate initial item themes. Selection of items for inclusion, item content, and item wording were done iteratively via email discussion among research team members and the project’s patient advisory team. Items were then adapted for the general chronic diseases version.
Results: A total of 121 respondents from 10 countries provided a mean and median of 4 item suggestions. The preliminary COVID-19 Fears Questionnaire for Systemic Sclerosis includes 16 items, and the general version includes 15 items.
Conclusion: The COVID-19 Fears Questionnaire for Systemic Sclerosis will be validated via a cohort study, and other researchers are encouraged to validate the general version. Suggestions are made to facilitate rapid sharing of methods and results
External validation of a shortened screening tool using individual participant data meta-analysis: A case study of the Patient Health Questionnaire-Dep-4
Shortened versions of self-reported questionnaires may be used to reduce respondent burden. When shortened screening tools are used, it is desirable to maintain equivalent diagnostic accuracy to full-length forms. This manuscript presents a case study that illustrates how external data and individual participant data meta-analysis can be used to assess the equivalence in diagnostic accuracy between a shortened and full-length form. This case study compares the Patient Health Questionnaire-9 (PHQ-9) and a 4-item shortened version (PHQ-Dep-4) that was previously developed using optimal test assembly methods. Using a large database of 75 primary studies (34,698 participants, 3,392 major depression cases), we evaluated whether the PHQ-Dep-4 cutoff of ≥ 4 maintained equivalent diagnostic accuracy to a PHQ-9 cutoff of ≥ 10. Using this external validation dataset, a PHQ-Dep-4 cutoff of ≥ 4 maximized the sum of sensitivity and specificity, with a sensitivity of 0.88 (95% CI 0.81, 0.93), 0.68 (95% CI 0.56, 0.78), and 0.80 (95% CI 0.73, 0.85) for the semi-structured, fully structured, and MINI reference standard categories, respectively, and a specificity of 0.79 (95% CI 0.74, 0.83), 0.85 (95% CI 0.78, 0.90), and 0.83 (95% CI 0.80, 0.86) for the semi-structured, fully structured, and MINI reference standard categories, respectively. While equivalence with a PHQ-9 cutoff of ≥ 10 was not established, we found the sensitivity of the PHQ-Dep-4 to be non-inferior to that of the PHQ-9, and the specificity of the PHQ-Dep-4 to be marginally smaller than the PHQ-9
Comparison of the accuracy of the 7-item HADS Depression subscale and 14-item total HADS for screening for major depression: A systematic review and individual participant data meta-analysis.
The seven-item Hospital Anxiety and Depression Scale Depression subscale (HADS-D) and the total score of the 14-item HADS (HADS-T) are both used for major depression screening. Compared to the HADS-D, the HADS-T includes anxiety items and requires more time to complete. We compared the screening accuracy of the HADS-D and HADS-T for major depression detection. We conducted an individual participant data meta-analysis and fit bivariate random effects models to assess diagnostic accuracy among participants with both HADS-D and HADS-T scores. We identified optimal cutoffs, estimated sensitivity and specificity with 95% confidence intervals, and compared screening accuracy across paired cutoffs via two-stage and individual-level models. We used a 0.05 equivalence margin to assess equivalency in sensitivity and specificity. 20,700 participants (2,285 major depression cases) from 98 studies were included. Cutoffs of ≥7 for the HADS-D (sensitivity 0.79 [0.75, 0.83], specificity 0.78 [0.75, 0.80]) and ≥15 for the HADS-T (sensitivity 0.79 [0.76, 0.82], specificity 0.81 [0.78, 0.83]) minimized the distance to the top-left corner of the receiver operating characteristic curve. Across all sets of paired cutoffs evaluated, differences of sensitivity between HADS-T and HADS-D ranged from -0.05 to 0.01 (0.00 at paired optimal cutoffs), and differences of specificity were within 0.03 for all cutoffs (0.02-0.03). The pattern was similar among outpatients, although the HADS-T was slightly (not nonequivalently) more specific among inpatients. The accuracy of HADS-T was equivalent to the HADS-D for detecting major depression. In most settings, the shorter HADS-D would be preferred