13 research outputs found

    TagLine: Information Extraction for Semi-Structured Text Elements In Medical Progress Notes

    Get PDF
    Text analysis has become an important research activity in the Department of Veterans Affairs (VA). Statistical text mining and natural language processing have been shown to be very effective for extracting useful information from medical documents. However, neither of these techniques is effective at extracting the information stored in semi-structure text elements. A prototype system (TagLine) was developed as a method for extracting information from the semi-structured portions of text using machine learning. Features for the learning machine were suggested by prior work, as well as by examining the text, and selecting those attributes that help distinguish the various classes of text lines. The classes were derived empirically from the text and guided by an ontology developed by the Consortium for Health Informatics Research (CHIR), a nationwide research initiative focused on medical informatics. Decision trees and Levenshtein approximate string matching techniques were tested and compared on 5,055 unseen lines of text. The performance of the decision tree method was found to be superior to the fuzzy string match method on this task. Decision trees achieved an overall accuracy of 98.5 percent, while the string match method only achieved an accuracy of 87 percent. Overall, the results for line classification were very encouraging. The labels applied to the lines were used to evaluate TagLines\u27 performance for identifying the semi-structures text elements, including tables, slots and fillers. Results for slots and fillers were impressive while the results for tables were also acceptable

    Finding Falls in Ambulatory Care Clinical Documents Using Statistical Text Mining

    No full text
    Objective: To determine how well statistical text mining (STM) models can identify falls within clinical text associated with an ambulatory encounter. Materials and Methods: 2241 patients were selected with a fall-related ICD-9-CM E-code or matched injury diagnosis code while being treated as an outpatient at one of four sites within the Veterans Health Administration. All clinical documents within a 48-h window of the recorded E-code or injury diagnosis code for each patient were obtained (n=26 010; 611 distinct document titles) and annotated for falls. Logistic regression, support vector machine, and cost-sensitive support vector machine (SVM-cost) models were trained on a stratified sample of 70% of documents from one location (dataset Atrain) and then applied to the remaining unseen documents (datasets Atest-D). Results: All three STM models obtained area under the receiver operating characteristic curve (AUC) scores above 0.950 on the four test datasets (Atest-D). The SVM-cost model obtained the highest AUC scores, ranging from 0.953 to 0.978. The SVM-cost model also achieved F-measure values ranging from 0.745 to 0.853, sensitivity from 0.890 to 0.931, and specificity from 0.877 to 0.944. Discussion: The STM models performed well across a large heterogeneous collection of document titles. In addition, the models also generalized across other sites, including a traditionally bilingual site that had distinctly different grammatical patterns. Conclusions: The results of this study suggest STM-based models have the potential to improve surveillance of falls. Furthermore, the encouraging evidence shown here that STM is a robust technique for mining clinical documents bodes well for other surveillance-related topics

    The Value of Extracting Clinician-Recorded Affect for Advancing Clinical Research on Depression: Proof-of-Concept Study Applying Natural Language Processing to Electronic Health Records

    No full text
    BackgroundAffective characteristics are associated with depression severity, course, and prognosis. Patients’ affect captured by clinicians during sessions may provide a rich source of information that more naturally aligns with the depression course and patient-desired depression outcomes. ObjectiveIn this paper, we propose an information extraction vocabulary used to pilot the feasibility and reliability of identifying clinician-recorded patient affective states in clinical notes from electronic health records. MethodsAffect and mood were annotated in 147 clinical notes of 109 patients by 2 independent coders across 3 pilots. Intercoder discrepancies were settled by a third coder. This reference annotation set was used to test a proof-of-concept natural language processing (NLP) system using a named entity recognition approach. ResultsConcepts were frequently addressed in templated format and free text in clinical notes. Annotated data demonstrated that affective characteristics were identified in 87.8% (129/147) of the notes, while mood was identified in 97.3% (143/147) of the notes. The intercoder reliability was consistently good across the pilots (interannotator agreement [IAA] >70%). The final NLP system showed good reliability with the final reference annotation set (mood IAA=85.8%; affect IAA=80.9%). ConclusionsAffect and mood can be reliably identified in clinician reports and are good targets for NLP. We discuss several next steps to expand on this proof of concept and the value of this research for depression clinical research

    Using information from the electronic health record to improve measurement of unemployment in service members and veterans with mTBI and post-deployment stress.

    No full text
    OBJECTIVE:The purpose of this pilot study is 1) to develop an annotation schema and a training set of annotated notes to support the future development of a natural language processing (NLP) system to automatically extract employment information, and 2) to determine if information about employment status, goals and work-related challenges reported by service members and Veterans with mild traumatic brain injury (mTBI) and post-deployment stress can be identified in the Electronic Health Record (EHR). DESIGN:Retrospective cohort study using data from selected progress notes stored in the EHR. SETTING:Post-deployment Rehabilitation and Evaluation Program (PREP), an in-patient rehabilitation program for Veterans with TBI at the James A. Haley Veterans' Hospital in Tampa, Florida. PARTICIPANTS:Service members and Veterans with TBI who participated in the PREP program (N = 60). MAIN OUTCOME MEASURES:Documentation of employment status, goals, and work-related challenges reported by service members and recorded in the EHR. RESULTS:Two hundred notes were examined and unique vocational information was found indicating a variety of self-reported employment challenges. Current employment status and future vocational goals along with information about cognitive, physical, and behavioral symptoms that may affect return-to-work were extracted from the EHR. The annotation schema developed for this study provides an excellent tool upon which NLP studies can be developed. CONCLUSIONS:Information related to employment status and vocational history is stored in text notes in the EHR system. Information stored in text does not lend itself to easy extraction or summarization for research and rehabilitation planning purposes. Development of NLP systems to automatically extract text-based employment information provides data that may improve the understanding and measurement of employment in this important cohort

    Veterans with TBI Demographic and other Characteristics (N = 60).

    No full text
    <p><i>Note:*</i> A large percentage of Veterans indicated they took college classes but were unable to finish their degree. <sup>†</sup>Other  =  Education that was not otherwise specified. <sup>‡</sup>Status from most recent note reviewed.</p><p>Veterans with TBI Demographic and other Characteristics (N = 60).</p

    Military Rank, MOS, Work History, Employment Goals, Vocational Interests & Driving Ability (N = 60).

    No full text
    <p><i>Note</i>:* Veteran may report more than one previous work history. <sup>†</sup>Veteran many have more than one future goal. <sup>‡</sup>Veterans had a record of a CareerScope assessment including interest inventory clusters.</p><p>Military Rank, MOS, Work History, Employment Goals, Vocational Interests & Driving Ability (N = 60).</p

    Note Titles Reviewed and Selected.

    No full text
    <p>*Selection based upon richness of employment information found within each note title.</p><p>Note Titles Reviewed and Selected.</p
    corecore