4 research outputs found

    Information extraction framework for disability determination using a mental functioning use-case

    Get PDF
    Natural language processing (NLP) in health care enables transformation of complex narrative information into high value products such as clinical decision support and adverse event monitoring in real time via the electronic health record (EHR). However, information technologies for mental health have consistently lagged because of the complexity of measuring and modeling mental health and illness. The use of NLP to support management of mental health conditions is a viable topic that has not been explored in depth. This paper provides a framework for the advanced application of NLP methods to identify, extract, and organize information on mental health and functioning to inform the decision-making process applied to assessing mental health. We present a use-case related to work disability, guided by the disability determination process of the US Social Security Administration (SSA). From this perspective, the following questions must be addressed about each problem that leads to a disability benefits claim: When did the problem occur and how long has it existed? How severe is it? Does it affect the person’s ability to work? and What is the source of the evidence about the problem? Our framework includes 4 dimensions of medical information that are central to assessing disability—temporal sequence and duration, severity, context, and information source. We describe key aspects of each dimension and promising approaches for application in mental functioning. For example, to address temporality, a complete functional timeline must be created with all relevant aspects of functioning such as intermittence, persistence, and recurrence. Severity of mental health symptoms can be successfully identified and extracted on a 4-level ordinal scale from absent to severe. Some NLP work has been reported on the extraction of context for specific cases of wheelchair use in clinical settings. We discuss the links between the task of information source assessment and work on source attribution, coreference resolution, event extraction, and rule-based methods. Gaps were identified in NLP applications that directly applied to the framework and in existing relevant annotated data sets. We highlighted NLP methods with the potential for advanced application in the field of mental functioning. Findings of this work will inform the development of instruments for supporting SSA adjudicators in their disability determination process. The 4 dimensions of medical information may have relevance for a broad array of individuals and organizations responsible for assessing mental health function and ability. Further, our framework with 4 specific dimensions presents significant opportunity for the application of NLP in the realm of mental health and functioning beyond the SSA setting, and it may support the development of robust tools and methods for decision-making related to clinical care, program implementation, and other outcomes

    A Likelihood Ratio Based Forensic Text Comparison with Multiple Types of Features

    Get PDF
    This study aims at further improving forensic text comparison (FTC) under the likelihood ratio (LR) framework. While the use of the LR framework to conclude the strength of evidence is well recognised in forensic science, studies on forensic text evidence within the LR framework are limited, and this study is an attempt of alleviating this situation. There have already been initiatives to obtain LRs for textual evidence by adopting various approaches and using different sets of stylometric features. (Carne & Ishihara, 2020; Ishihara, 2014, 2017a, 2017b, 2021). However, only few features have been tested in the similarity-only score-based approach (Ishihara, 2021), and there are many features left to be further investigated. To achieve the aim of the study, we will investigate some of the features in LR-based FTC and demonstrate how they contribute to the further improvement of the LR-based FTC system. Statistic, word n-gram (n=1,2,3), character n-gram (n=1,2,3,4), and part of speech (POS) n-gram (n=1,2,3) features were separately tested first in this study, and then the separately estimated LRs were fused for overall LRs. The databased used was prepared by Ishihara (2021), and the documents of comparison were modelled into feature vectors using a bag-of-words model. Two groups of documents, which both contained documents of 700, 1,400, and 2,100 words, were concatenated for each author, resulting in the total of 719 same-author comparisons and 516,242 different-author comparisons. The Cosine similarity was used to measure the similarity of texts, and the similarity-only score-based approach was used to estimate the LRs from the scores of similarity (Helper et al., 2012; Bolck et al., 2015). Log-likelihood ratio cost (Cllr) and their composites—Cllrmin and Cllrcal—were used as assessment metrics. Findings indicate that (a) when the LRs of all the feature types are fused, the fused Cllr values are 0.56, 0.30, and 0.19 for 700, 1,400, and 2,100 words, respectively, and (b) feature selection depending on the nature of an FTC task matters to the performance of the FTC system and can contribute to the improvement of LR-based FTC

    The anonymous 1821 translation of Goethe's Faust :a cluster analytic approach

    Get PDF
    PhD ThesisThis study tests the hypothesis proposed by Frederick Burwick and James McKusick in 2007 that Samuel Taylor Coleridge was the author of the anonymous translation of Goethe's Faust published by Thomas Boosey in 1821. The approach to hypothesis testing is stylometric. Specifically, function word usage is selected as the stylometric criterion, and 80 function words are used to define a 73-dimensional function word frequency profile vector for each text in the corpus of Coleridge's literary works and for a selection of works by a range of contemporary English authors. Each profile vector is a point in 80- dimensional vector space, and cluster analytic methods are used to determine the distribution of profile vectors in the space. If the hypothesis being tested is valid, then the profile for the 1821 translation should be closer in the space to works known to be by Coleridge than to works by the other authors. The cluster analytic results show, however, that this is not the case, and the conclusion is that the Burwick and McKusick hypothesis is falsified relative to the stylometric criterion and analytic methodology used
    corecore