1,343 research outputs found

    Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation

    Get PDF
    Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline. Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources. Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%. Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain

    Information Technology and Computer Science

    No full text
    Abstract- The healthcare system is a knowledge driven industry which consists of vast and growing volumes of narrative information obtained from discharge summaries/reports, physicians case notes, pathologists as well as radiologists reports. This information is usually stored in unstructured and non-standardized formats in electronic healthcare systems which make it difficult for the systems to understand the information contents of the narrative information. Thus, the access to valuable and meaningful healthcare information for decision making is a challenge. Nevertheless, Natural Language Processing (NLP) techniques have been used to structure narrative information in healthcare. Thus, NLP techniques have the capability to capture unstructured healthcare information, analyze its grammatical structure, determine the meaning of the information and translate the information so that it can be easily understood by the electronic healthcare systems. Consequently, NLP techniques reduce cost as well as improve the quality of healthcare. It is therefore against this background that this paper reviews the NLP techniques used in healthcare, their applications as well as their limitations

    Doctor of Philosophy

    Get PDF
    DissertationHealth information technology (HIT) in conjunction with quality improvement (QI) methodologies can promote higher quality care at lower costs. Unfortunately, most inpatient hospital settings have been slow to adopt HIT and QI methodologies. Successful adoption requires close attention to workflow. Workflow is the sequence of tasks, processes, and the set of people or resources needed for those tasks that are necessary to accomplish a given goal. Assessing the impact on workflow is an important component of determining whether a HIT implementation will be successful, but little research has been conducted on the impact of eMeasure (electronic performance measure) implementation on workflow. One solution to addressing implementation challenges such as the lack of attention to workflow is an implementation toolkit. An implementation toolkit is an assembly of instruments such as checklists, forms, and planning documents. We developed an initial eMeasure Implementation Toolkit for the heart failure (HF) eMeasure to allow QI and information technology (IT) professionals and their team to assess the impact of implementation on workflow. During the development phase of the toolkit, we undertook a literature review to determine the components of the toolkit. We conducted stakeholder interviews with HIT and QI key informants and subject matter experts (SMEs) at the US Department of Veteran Affairs (VA). Key informants provided a broad understanding about the context of workflow during eMeasure implementation. Based on snowball sampling, we also interviewed other SMEs based on the recommendations of the key informants who suggested tools and provided information essential to the toolkit development. The second phase involved evaluation of the toolkit for relevance and clarity, by experts in non-VA settings. The experts evaluated the sections of the toolkit that contained the tools, via a survey. The final toolkit provides a distinct set of resources and tools, which were iteratively developed during the research and available to users in a single source document. The research methodology provided a strong unified overarching implementation framework in the form of the Promoting Action on Research Implementation in Health Services (PARIHS) model in combination with a sociotechnical model of HIT that strengthened the overall design of the study

    Automated Transformation of Semi-Structured Text Elements

    Get PDF
    Interconnected systems, such as electronic health records (EHR), considerably improved the handling and processing of health information while keeping the costs at a controlled level. Since the EHR virtually stores all data in digitized form, personal medical documents are easily and swiftly available when needed. However, multiple formats and differences in the health documents managed by various health care providers severely reduce the efficiency of the data sharing process. This paper presents a rule-based transformation system that converts semi-structured (annotated) text into standardized formats, such as HL7 CDA. It identifies relevant information in the input document by analyzing its structure as well as its content and inserts the required elements into corresponding reusable CDA templates, where the templates are selected according to the CDA document type-specific requirements

    Enhancing rule-based text classification of neurosurgical notes using filtered feature weight vectors

    Get PDF
    Clinicians need to record clinical encounters in written or spoken language, not only for its work-flow naturalness but also for its expressivity, precision, and capacity to convey all required information, which codified structure data is incapable of. Therefore, the structured data which is required for aggregation and analysis must be obtained from clinical text as a later step. Specialised areas of medicine use their own clinical language and clinical coding systems, resulting in unique challenges for the extraction process. Rule-based information extraction have been used effectively in commercial systems and are favoured because they are easily understood and controlled. However, there is promising research into the use of machine language techniques for extracting information, and this research explores the effectiveness of a hybrid rule-based and machine learning-based audit coding system developed for the neurosurgical department of a major trauma hospital

    Intelligent audit code generation from free text in the context of neurosurgery

    Get PDF
    Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semi-automated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary-based matching of the remaining text

    COHORT IDENTIFICATION FROM FREE-TEXT CLINICAL NOTES USING SNOMED CT’S SEMANTIC RELATIONS

    Get PDF
    In this paper, a new cohort identification framework that exploits the semantic hierarchy of SNOMED CT is proposed to overcome the limitations of supervised machine learning-based approaches. Eligibility criteria descriptions and free-text clinical notes from the 2018 National NLP Clinical Challenge (n2c2) were processed to map to relevant SNOMED CT concepts and to measure semantic similarity between the eligibility criteria and patients. The eligibility of a patient was determined if the patient had a similarity score higher than a threshold cut-off value, which was established where the best F1 score could be achieved. The performance of the proposed system was evaluated for three eligibility criteria. The current framework’s macro-average F1 score across three eligibility criteria was higher than the previously reported results of the 2018 n2c2 (0.933 vs. 0.889). This study demonstrated that SNOMED CT alone can be leveraged for cohort identification tasks without referring to external textual sources for training.Doctor of Philosoph

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone
    • …
    corecore