Search CORE

3 research outputs found

A case study of an individual participant data meta-analysis of diagnostic accuracy showed that prediction regions represented heterogeneity well

Author: Andrea Benedetti
Aurelio López Malo Vázquez de Lara
Brett Thombs
Brooke Levis
DEPRESsion Screening Data (DEPRESSD) PHQ-9 Collaboration
Parash Mani Bhandari
Yin Wu
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2023
Field of study

Abstract The diagnostic accuracy of a screening tool is often characterized by its sensitivity and specificity. An analysis of these measures must consider their intrinsic correlation. In the context of an individual participant data meta-analysis, heterogeneity is one of the main components of the analysis. When using a random-effects meta-analytic model, prediction regions provide deeper insight into the effect of heterogeneity on the variability of estimated accuracy measures across the entire studied population, not just the average. This study aimed to investigate heterogeneity via prediction regions in an individual participant data meta-analysis of the sensitivity and specificity of the Patient Health Questionnaire-9 for screening to detect major depression. From the total number of studies in the pool, four dates were selected containing roughly 25%, 50%, 75% and 100% of the total number of participants. A bivariate random-effects model was fitted to studies up to and including each of these dates to jointly estimate sensitivity and specificity. Two-dimensional prediction regions were plotted in ROC-space. Subgroup analyses were carried out on sex and age, regardless of the date of the study. The dataset comprised 17,436 participants from 58 primary studies of which 2322 (13.3%) presented cases of major depression. Point estimates of sensitivity and specificity did not differ importantly as more studies were added to the model. However, correlation of the measures increased. As expected, standard errors of the logit pooled TPR and FPR consistently decreased as more studies were used, while standard deviations of the random-effects did not decrease monotonically. Subgroup analysis by sex did not reveal important contributions for observed heterogeneity; however, the shape of the prediction regions differed. Subgroup analysis by age did not reveal meaningful contributions to the heterogeneity and the prediction regions were similar in shape. Prediction intervals and regions reveal previously unseen trends in a dataset. In the context of a meta-analysis of diagnostic test accuracy, prediction regions can display the range of accuracy measures in different populations and settings

Directory of Open Access Journals

Accuracy of the PHQ-2 alone and in combination with the PHQ-9 for screening to detect major depression

Author: Akena D
Amtmann D
Arroll B
Asunción Lara M
Ayalon L
Azar M
Baradaran H
Benedetti A
Beraldi A
Bernstein C
Bhana A
Bhandari PM
Bombardier C
Boruff J
Brehaut E
Buji RI
Butterworth P
Carter G
Chagas M
Chan J
Chan LF
Che L
Chibanda D
Cholera R
Clover K
Conway A
Conwell Y
Cuijpers P
Daray F
de Man-van Ginkel J
Delgadillo J
Depression Screening Data (DEPRESSD) PHQ Collaboration
Diez-Quevedo C
Fann J
Field S
Fischer FH
Fisher J
Fung D
Garman E
Gelaye B
Gholizadeh L
Gibson L
Gilbody S
Goodyear-Smith F
Green E
Greeno C
Hall B
Hampel P
Hantsoo L
Haroz E
Harter M
He C
Hegerl U
Hides L
Hobfoll S
Honikman S
Hudson M
Hyphantis T
Imran M
Inagaki M
Ioannidis J
Ismail K
Jeon HJ
Jetté N
Khamseh M
Kiely K
Kloda L
Kohler S
Kohrt B
Krishnan A
Kwan Y
Lamers F
Levin-Aspenson H
Levis A
Levis B
Li Wang J
Lino V
Liu S-I
Lotrakul M
Loureiro S
Luitel N
Lund C
Löwe B
Marrie RA
Marsh L
Marx B
McGuire A
McMillan D
Mohd Sidik S
Moore A
Munhoz T
Muramatsu K
Nakku J
Navarrete L
Negeri Z
Neupane D
Osório F
Patel V
Patten S
Pence B
Persoons P
Petersen I
Picardi A
Pugh S
Quinn T
Rancans E
Rathod S
Reuter K
Rice D
Riehm K
Roch S
Rooney A
Rowe H
Saadat N
Santos I
Schram M
Shaaban J
Shinn E
Shrier I
Sidebottom A
Simning A
Spangenberg L
Stafford L
Sun Y
Sung S
Suzuki K
Swartz R
Tan PLL
Taylor-Rowan M
Thombs BD
Tran T
Turner A
van der Feltz-Cornelis C
van Heyningen T
van Weert H
Wagner L
White J
Winkley K
Wu Y
Wynter K
Yamada M
Zhang Y
Zhi Zeng Q
Ziegelstein R
Publication venue: 'American Medical Association (AMA)'
Publication date: 09/06/2020
Field of study

Importance: The Patient Health Questionnaire depression module (PHQ-9) is a 9-item self-administered instrument used for detecting depression and assessing severity of depression. The Patient Health Questionnaire–2 (PHQ-2) consists of the first 2 items of the PHQ-9 (which assess the frequency of depressed mood and anhedonia) and can be used as a first step to identify patients for evaluation with the full PHQ-9. Objective: To estimate PHQ-2 accuracy alone and combined with the PHQ-9 for detecting major depression. Data Sources: MEDLINE, MEDLINE In-Process & Other Non-Indexed Citations, PsycINFO, and Web of Science (January 2000-May 2018). Study Selection: Eligible data sets compared PHQ-2 scores with major depression diagnoses from a validated diagnostic interview. Data Extraction and Synthesis: Individual participant data were synthesized with bivariate random-effects meta-analysis to estimate pooled sensitivity and specificity of the PHQ-2 alone among studies using semistructured, fully structured, or Mini International Neuropsychiatric Interview (MINI) diagnostic interviews separately and in combination with the PHQ-9 vs the PHQ-9 alone for studies that used semistructured interviews. The PHQ-2 score ranges from 0 to 6, and the PHQ-9 score ranges from 0 to 27. Results: Individual participant data were obtained from 100 of 136 eligible studies (44 318 participants; 4572 with major depression [10%]; mean [SD] age, 49 [17] years; 59% female). Among studies that used semistructured interviews, PHQ-2 sensitivity and specificity (95% CI) were 0.91 (0.88-0.94) and 0.67 (0.64-0.71) for cutoff scores of 2 or greater and 0.72 (0.67-0.77) and 0.85 (0.83-0.87) for cutoff scores of 3 or greater. Sensitivity was significantly greater for semistructured vs fully structured interviews. Specificity was not significantly different across the types of interviews. The area under the receiver operating characteristic curve was 0.88 (0.86-0.89) for semistructured interviews, 0.82 (0.81-0.84) for fully structured interviews, and 0.87 (0.85-0.88) for the MINI. There were no significant subgroup differences. For semistructured interviews, sensitivity for PHQ-2 scores of 2 or greater followed by PHQ-9 scores of 10 or greater (0.82 [0.76-0.86]) was not significantly different than PHQ-9 scores of 10 or greater alone (0.86 [0.80-0.90]); specificity for the combination was significantly but minimally higher (0.87 [0.84-0.89] vs 0.85 [0.82-0.87]). The area under the curve was 0.90 (0.89-0.91). The combination was estimated to reduce the number of participants needing to complete the full PHQ-9 by 57% (56%-58%). Conclusions and Relevance: In an individual participant data meta-analysis of studies that compared PHQ scores with major depression diagnoses, the combination of PHQ-2 (with cutoff ≥2) followed by PHQ-9 (with cutoff ≥10) had similar sensitivity but higher specificity compared with PHQ-9 cutoff scores of 10 or greater alone. Further research is needed to understand the clinical and research value of this combined approach to screening

Deakin Research Online

OPUS - University of Technology Sydney

Edinburgh Research Explorer

Enlighten

White Rose Research Online

External validation of a shortened screening tool using individual participant data meta-analysis: A case study of the Patient Health Questionnaire-Dep-4

Author: Amtmann Dagmar
Ann Marrie Ruth
Asunción Lara Maria
Ayalon Liat
Azar Marleine
Baradaran Hamid R.
Benedetti Andrea
Beraldi Anna
Bernstein Charles N.
Bhana Arvin
Boruff Jill
C. N. Chan Juliana
Chagas Marcos H.
Chibanda Dixon
Conway Aaron
Cuijpers Pim
Daray Federico M.
de Man-van Ginkel Janneke M.
DEPRESsion Screening Data DEPRESSD PHQ Collaboration .
Diez-Quevedo Crisanto
E. M. Nakku Juliet
Field Sally
Fischer Felix
Fisher Jane R.W.
Flisher Alan J
Flisher Alan J
Fong Chan Lai
Fung Daniel
Garman Emily C.
Gelaye Bizu
Gholizadeh Leila
Gibson Lorna J.
Gilbody Simon
Green Eric P.
Hall Brian J.
Hantsoo Liisa
Harel Daphna
Haroz Emily E.
He Chen
Hegerl Ulrich
Hides Leanne
Hobfoll Stevan E.
Honikman Simone
Hudson Marie
Hyphantis Thomas
Härter Martin
Imma Buji Ryna
Imran Mahrukh
Inagaki Masatoshi
Ioannidis John P.A.
Jetté Nathalie
Jin Jeon Hong
Khamseh Mohammad E.
Kloda Lorie A.
Kohrt Brandon A.
Krishnan Ankur
Kwan Yunxin
Köhler Sebastian
Lamers Femke
Levin-Aspenson Holly F.
Levis Alexander W.
Levis Brooke
Li Wang Jian
Lin Lynnette Tan Pei
Liu Shen-Ing
Lotrakul Manote
Loureiro Sonia R.
Luitel Nagendra P.
Lund Crick
Löwe Bernd
Mani Bhandari Parash
Markham Sarah
Marx Brian P.
Mohd Sidik Sherina
Munhoz Tiago N.
Muramatsu Kumiko
Navarrete Laura
Negeri Zelalem
Neupane Dipika
Osório Flávia L.
Patten Scott B.
Persoons Philippe
Picardi Angelo
Pugh Stephanie L.
Quinn Terence J.
Rancans Elmars
Rathod Sujit D.
Reuter Katrin
Rice Danielle B.
Riehm Kira E.
Rowe Heather J.
Santos Iná S.
Schram Miranda T.
Shaaban Juwita
Shinn Eileen H.
Spangenberg Lena
Stafford Lesley
Sun Ying
Sung Sharon C.
Suzuki Keiko
Taylor-Rowan Martin
Thombs Brett D.
Tran Thach D.
van der Feltz-Cornelis Christina M.
van Heyningen Thandi
van Weert Henk C.
Wagner Lynne I.
Watson David
Wu Yin
Wynter Karen
Yamada Mitsuhiko
Zhang Yuying
Zhi Zeng Qing
Ziegelstein Roy C.
Publication venue: Elsevier
Publication date: 01/08/2022
Field of study

Shortened versions of self-reported questionnaires may be used to reduce respondent burden. When shortened screening tools are used, it is desirable to maintain equivalent diagnostic accuracy to full-length forms. This manuscript presents a case study that illustrates how external data and individual participant data meta-analysis can be used to assess the equivalence in diagnostic accuracy between a shortened and full-length form. This case study compares the Patient Health Questionnaire-9 (PHQ-9) and a 4-item shortened version (PHQ-Dep-4) that was previously developed using optimal test assembly methods. Using a large database of 75 primary studies (34,698 participants, 3,392 major depression cases), we evaluated whether the PHQ-Dep-4 cutoff of ≥ 4 maintained equivalent diagnostic accuracy to a PHQ-9 cutoff of ≥ 10. Using this external validation dataset, a PHQ-Dep-4 cutoff of ≥ 4 maximized the sum of sensitivity and specificity, with a sensitivity of 0.88 (95% CI 0.81, 0.93), 0.68 (95% CI 0.56, 0.78), and 0.80 (95% CI 0.73, 0.85) for the semi-structured, fully structured, and MINI reference standard categories, respectively, and a specificity of 0.79 (95% CI 0.74, 0.83), 0.85 (95% CI 0.78, 0.90), and 0.83 (95% CI 0.80, 0.86) for the semi-structured, fully structured, and MINI reference standard categories, respectively. While equivalence with a PHQ-9 cutoff of ≥ 10 was not established, we found the sensitivity of the PHQ-Dep-4 to be non-inferior to that of the PHQ-9, and the specificity of the PHQ-Dep-4 to be marginally smaller than the PHQ-9

Queensland University of Technology ePrints Archive

Enlighten