27 research outputs found

    Are false positives in suicide classification models a risk group? Evidence for “true alarms” in a population-representative longitudinal study of Norwegian adolescents

    Get PDF
    IntroductionFalse positives in retrospective binary suicide attempt classification models are commonly attributed to sheer classification error. However, when machine learning suicide attempt classification models are trained with a multitude of psycho-socio-environmental factors and achieve high accuracy in suicide risk assessment, false positives may turn out to be at high risk of developing suicidal behavior or attempting suicide in the future. Thus, they may be better viewed as “true alarms,” relevant for a suicide prevention program. In this study, using large population-based longitudinal dataset, we examine three hypotheses: (1) false positives, compared to the true negatives, are at higher risk of suicide attempt in future, (2) the suicide attempts risk for the false positives increase as a function of increase in specificity threshold; and (3) as specificity increases, the severity of risk factors between false positives and true positives becomes more similar.MethodsUtilizing the Gradient Boosting algorithm, we used a sample of 11,369 Norwegian adolescents, assessed at two timepoints (1992 and 1994), to classify suicide attempters at the first time point. We then assessed the relative risk of suicide attempt at the second time point for false positives in comparison to true negatives, and in relation to the level of specificity.ResultsWe found that false positives were at significantly higher risk of attempting suicide compared to true negatives. When selecting a higher classification risk threshold by gradually increasing the specificity cutoff from 60% to 97.5%, the relative suicide attempt risk of the false positive group increased, ranging from minimum of 2.96 to 7.22 times. As the risk threshold increased, the severity of various mental health indicators became significantly more comparable between false positives and true positives.ConclusionWe argue that the performance evaluation of machine learning suicide classification models should take the clinical relevance into account, rather than focusing solely on classification error metrics. As shown here, the so-called false positives represent a truly at-risk group that should be included in suicide prevention programs. Hence, these findings should be taken into consideration when interpreting machine learning suicide classification models as well as planning future suicide prevention interventions for adolescents

    Validating the Spanish translation of the posttraumatic stress disorder checklist (PCL-5) in a sample of individuals with traumatic brain injury

    Get PDF
    IntroductionThere is controversy regarding the comorbidity of posttraumatic stress disorder (PTSD) and traumatic brain injury (TBI). The present study translated the PTSD Checklist for DSM-5 (PCL-5) to Spanish and validated it in a sample of patients with TBI 6 months after the injury.MethodsThe study included 233 patients (162 males and 71 females) recruited from four Spanish hospitals within 24 h of traumatic brain injury. A total of 12.2% of the sample met the provisional PTSD diagnostic criteria, and the prevalence was equal between male and female participants.ResultsThe analysis confirmed the internal consistency of the translated instrument (α = 0.95). The concurrent validity of the instrument was confirmed based on high correlation coefficients of 0.7 and 0.74 with the General Anxiety Disorder-7 (GAD-7) and Patient Health Questionnaire (PHQ-9), respectively. Exploratory factor analysis also confirmed that the items on the PCL-5 can be differentiated from the GAD-7 and PHQ-9 items. Confirmatory factor analysis (CFA) was used to examine the structural validity of the Spanish translation of the PCL-5 with three different models. CFA partially confirmed the four-factor PTSD model, whereas both the six-factor anhedonia model and the seven-factor hybrid model showed adequate fit. However, the difference between the anhedonia and hybrid models was not statistically significant; moreover, both models showed signs of overfitting. Therefore, the utility of these models should be reexamined in future studies.ConclusionOverall, the results suggest that the Spanish translation of the PCL-5 is a reliable and valid instrument for screening PTSD symptoms among Spanish TBI patients. The Spanish translation of the PCL-5 is also presented in the manuscript

    Predicting suicide attempts among Norwegian adolescents without using suicide-related items: a machine learning approach

    Get PDF
    IntroductionResearch on the classification models of suicide attempts has predominantly depended on the collection of sensitive data related to suicide. Gathering this type of information at the population level can be challenging, especially when it pertains to adolescents. We addressed two main objectives: (1) the feasibility of classifying adolescents at high risk of attempting suicide without relying on specific suicide-related survey items such as history of suicide attempts, suicide plan, or suicide ideation, and (2) identifying the most important predictors of suicide attempts among adolescents.MethodsNationwide survey data from 173,664 Norwegian adolescents (ages 13–18) were utilized to train a binary classification model, using 169 questionnaire items. The Extreme Gradient Boosting (XGBoost) algorithm was fine-tuned to classify adolescent suicide attempts, and the most important predictors were identified.ResultsXGBoost achieved a sensitivity of 77% with a specificity of 90%, and an AUC of 92.1% and an AUPRC of 47.1%. A coherent set of predictors in the domains of internalizing problems, substance use, interpersonal relationships, and victimization were pinpointed as the most important items related to recent suicide attempts.ConclusionThis study underscores the potential of machine learning for screening adolescent suicide attempts on a population scale without requiring sensitive suicide-related survey items. Future research investigating the etiology of suicidal behavior may direct particular attention to internalizing problems, interpersonal relationships, victimization, and substance use

    Measurement invariance of assessments of depression (PHQ-9) and anxiety (GAD-7) across sex, strata and linguistic backgrounds in a European-wide sample of patients after Traumatic Brain Injury

    Get PDF
    Background The Patient Health Questionnaire-9 (PHQ-9) and the Generalized Anxiety Disorder (GAD-7) are two widely used instruments to screen patients for depression and anxiety. Comparable psychometric properties across different demographic and linguistic groups are necessary for multiple group comparison and international research on depression and anxiety. Objectives and Method We examine measurement invariance for the PHQ-9 and GAD-7 by: (a) the sex of the participants, (b) recruitment stratum, and (c) linguistic background. This study is based on non-randomized observational data six months after Traumatic Brain Injury (TBI) that were collected in 18 countries. We used multiple methods to detect Differential Item Functioning (DIF) including Item Response Theory, logistic regression, and the Mantel-Haenszel method. Results At the 6-month post-injury, 2137 (738 [34.5%] women) participants completed the PHQ-9 and GAD-7 questionnaires: 885 [41.4%] patients were primarily admitted to the Intensive Care Unit (ICU), 805 [37.7%] were admitted to hospital ward, and 447 [20.9%] were evaluated in the Emergency Room and discharged. Results supported the invariance of PHQ-9 and GAD-7 across sex, patient strata and linguistic background. For different strata three PHQ-9 items and one GAD-7 item and for different linguistic groups only two GAD-7 items were flagged as showing differences in two out of four DIF tests. However, the magnitude of the DIF effect was negligible. Limitation Despite high number of participants from ICU, patients have mostly mild TBI. Conclusion The findings demonstrate adequate psychometric properties for PHQ-9 and GAD-7, allowing direct multigroup comparison across sex, strata, and linguistic background

    Rethinking literate programming in statistics

    No full text
    Literate programming is becoming increasingly trendy for data analysis because it allows the generation of dynamic-analysis reports for communicating data analysis and eliminates untraceable human errors in analysis reports. Traditionally, literate programming includes separate processes for compiling the code and preparing the documentation. While this workflow might be satisfactory for software documentation, it is not ideal for writing statistical analysis reports. Instead, these processes should run in parallel. In this article, I introduce the weaver package, which examines this idea by creating a new log system in HTML or LATEX that can be used simultaneously with the Stata log system. The new log system provides many features that the Stata log system lacks; for example, it can render mathematical notations, insert figures, create publication-ready dynamic tables, and style text, and it includes a built-in syntax highlighter. The weaver package also produces dynamic PDF documents by converting the HTML log to PDF or by typesetting the LATEX log and thus provides a real-time preview of the document without recompiling the code. I also discuss potential applications of the weaver package

    markdoc: Literate programming in Stata

    No full text
    Rigorous documentation of the analysis plan, procedure, and computer codes enhances the comprehensibility and transparency of data analysis. Documentation is particularly critical when the codes and data are meant to be publicly shared and examined by the scientific community to evaluate the analysis or adapt the results. The popular approach for documenting computer codes is known as literate programming, which requires preparing a trilingual script file that includes a programming language for running the data analysis, a human language for documentation, and a markup language for typesetting the document. In this article, I introduce markdoc, a software package for interactive literate programming and generating dynamic-analysis documents in Stata. markdoc recognizes Markdown, LATEX, and HTML markup languages and can export documents in several formats, such as PDF, Microsoft Office .docx, OpenOffice and LibreOffice .odt, LATEX, HTML, ePub, and Markdown

    Mental Health, Well-Being, and Extremism: A Machine Learning Study on Norwegian Adolescents

    No full text
    a repository for the journal articl

    Seamless interactive language interfacing between R and Stata

    No full text
    In this article, I propose a new approach to language interfacing for statistical software by allowing automatic interprocess communication between R and Stata. I advocate interactive language interfacing in statistical software by automatizing data communication. I introduce the rcall package and provide examples of how the R language can be used interactively within Stata or embedded into Stata programs using the proposed approach to interfacing. Moreover, I discuss the pros and cons of object synchronization in language interfacing

    Software documentation with markdoc 5.0

    No full text
    markdoc is a general-purpose literate programming package for generating dynamic documents, dynamic presentation slides, Stata help les, and package vignettes in various formats. In this article, I introduce markdoc version 5.0, which performs independently of any third-party software, using the mini engine. The mini engine is a lightweight alternative to Pandoc (MacFarlane [2006, https://pandoc.org/]), completely written in Stata. I also propose a procedure for remodeling package documentation and data documentation in Stata and present a tutorial for generating help les, package vignettes, and GitHub Wiki documentation using markdoc

    Developing, maintaining, and hosting Stata statistical software on GitHub

    No full text
    The popularity of GitHub is growing, among not only software developers but also statisticians and data scientists. In this article, I discuss why social coding platforms such as GitHub are preferable for developing, documenting, maintaining, and collaborating on statistical software. Furthermore, I introduce the github command version 2.0 for Stata, which facilitates building, searching, installing, and managing statistical packages hosted on GitHub. I also provide a command for searching filenames in all Stata packages published on the Statistical Software Components Archive and GitHub to ensure unique filenames and package names, which is a common concern among Stata programmers. I make further suggestions to enhance the practice of developing and hosting statistical packages on GitHub as well as using them for data analysis
    corecore