1,343 research outputs found
Using data-driven sublanguage pattern mining to induce knowledge models: application in medical image reports knowledge representation
Background: The use of knowledge models facilitates information retrieval, knowledge base development, and therefore supports new knowledge discovery that ultimately enables decision support applications. Most existing works have employed machine learning techniques to construct a knowledge base. However, they often suffer from low precision in extracting entity and relationships. In this paper, we described a data-driven sublanguage pattern mining method that can be used to create a knowledge model. We combined natural language processing (NLP) and semantic network analysis in our model generation pipeline.
Methods: As a use case of our pipeline, we utilized data from an open source imaging case repository, Radiopaedia.org, to generate a knowledge model that represents the contents of medical imaging reports. We extracted entities and relationships using the Stanford part-of-speech parser and the “Subject:Relationship:Object” syntactic data schema. The identified noun phrases were tagged with the Unified Medical Language System (UMLS) semantic types. An evaluation was done on a dataset comprised of 83 image notes from four data sources.
Results: A semantic type network was built based on the co-occurrence of 135 UMLS semantic types in 23,410 medical image reports. By regrouping the semantic types and generalizing the semantic network, we created a knowledge model that contains 14 semantic categories. Our knowledge model was able to cover 98% of the content in the evaluation corpus and revealed 97% of the relationships. Machine annotation achieved a precision of 87%, recall of 79%, and F-score of 82%.
Conclusion: The results indicated that our pipeline was able to produce a comprehensive content-based knowledge model that could represent context from various sources in the same domain
Information Technology and Computer Science
Abstract- The healthcare system is a knowledge driven industry which consists of vast and growing volumes of narrative information obtained from discharge summaries/reports, physicians case notes, pathologists as well as radiologists reports. This information is usually stored in unstructured and non-standardized formats in electronic healthcare systems which make it difficult for the systems to understand the information contents of the narrative information. Thus, the access to valuable and meaningful healthcare information for decision making is a challenge. Nevertheless, Natural Language Processing (NLP) techniques have been used to structure narrative information in healthcare. Thus, NLP techniques have the capability to capture unstructured healthcare information, analyze its grammatical structure, determine the meaning of the information and translate the information so that it can be easily understood by the electronic healthcare systems. Consequently, NLP techniques reduce cost as well as improve the quality of healthcare. It is therefore against this background that this paper reviews the NLP techniques used in healthcare, their applications as well as their limitations
Doctor of Philosophy
DissertationHealth information technology (HIT) in conjunction with quality improvement (QI) methodologies can promote higher quality care at lower costs. Unfortunately, most inpatient hospital settings have been slow to adopt HIT and QI methodologies. Successful adoption requires close attention to workflow. Workflow is the sequence of tasks, processes, and the set of people or resources needed for those tasks that are necessary to accomplish a given goal. Assessing the impact on workflow is an important component of determining whether a HIT implementation will be successful, but little research has been conducted on the impact of eMeasure (electronic performance measure) implementation on workflow. One solution to addressing implementation challenges such as the lack of attention to workflow is an implementation toolkit. An implementation toolkit is an assembly of instruments such as checklists, forms, and planning documents. We developed an initial eMeasure Implementation Toolkit for the heart failure (HF) eMeasure to allow QI and information technology (IT) professionals and their team to assess the impact of implementation on workflow. During the development phase of the toolkit, we undertook a literature review to determine the components of the toolkit. We conducted stakeholder interviews with HIT and QI key informants and subject matter experts (SMEs) at the US Department of Veteran Affairs (VA). Key informants provided a broad understanding about the context of workflow during eMeasure implementation. Based on snowball sampling, we also interviewed other SMEs based on the recommendations of the key informants who suggested tools and provided information essential to the toolkit development. The second phase involved evaluation of the toolkit for relevance and clarity, by experts in non-VA settings. The experts evaluated the sections of the toolkit that contained the tools, via a survey. The final toolkit provides a distinct set of resources and tools, which were iteratively developed during the research and available to users in a single source document. The research methodology provided a strong unified overarching implementation framework in the form of the Promoting Action on Research Implementation in Health Services (PARIHS) model in combination with a sociotechnical model of HIT that strengthened the overall design of the study
Automated Transformation of Semi-Structured Text Elements
Interconnected systems, such as electronic health records (EHR), considerably improved the handling and processing of health information while keeping the costs at a controlled level. Since the EHR virtually stores all data in digitized form, personal medical documents are easily and swiftly available when needed. However, multiple formats and differences in the health documents managed by various health care providers severely reduce the efficiency of the data sharing process. This paper presents a rule-based transformation system that converts semi-structured (annotated) text into standardized formats, such as HL7 CDA. It identifies relevant information in the input document by analyzing its structure as well as its content and inserts the required elements into corresponding reusable CDA templates, where the templates are selected according to the CDA document type-specific requirements
Enhancing rule-based text classification of neurosurgical notes using filtered feature weight vectors
Clinicians need to record clinical encounters in written or spoken language, not only for its work-flow naturalness but also for its expressivity, precision, and capacity to convey all required information, which codified structure data is incapable of. Therefore, the structured data which is required for aggregation and analysis must be obtained from clinical text as a later step. Specialised areas of medicine use their own clinical language and clinical coding systems, resulting in unique challenges for the extraction process. Rule-based information extraction have been used effectively in commercial systems and are favoured because they are easily understood and controlled. However, there is promising research into the use of machine language techniques for extracting information, and this research explores the effectiveness of a hybrid rule-based and machine learning-based audit coding system developed for the neurosurgical department of a major trauma hospital
Intelligent audit code generation from free text in the context of neurosurgery
Clinical auditing requires codified data for aggregation and analysis of patterns. However in the medical domain obtaining structured data can be difficult as the most natural, expressive and comprehensive way to record a clinical encounter is through natural language. The task of creating structured data from naturally expressed information is known as information extraction. Specialised areas of medicine use their own language and data structures; the translation process has unique challenges, and often requires a fresh approach. This research is devoted to creating a novel semi-automated method for generating codified auditing data from clinical notes recorded in a neurosurgical department in an Australian teaching hospital. The method encapsulates specialist knowledge in rules that instantaneously make precise decisions for the majority of the matches, followed up by dictionary-based matching of the remaining text
Recommended from our members
Addressing Semantic Interoperability and Text Annotations. Concerns in Electronic Health Records using Word Embedding, Ontology and Analogy
Electronic Health Record (EHR) creates a huge number of databases which are
being updated dynamically. Major goal of interoperability in healthcare is to
facilitate the seamless exchange of healthcare related data and an environment
to supports interoperability and secure transfer of data. The health care
organisations face difficulties in exchanging patient’s health care information
and laboratory reports etc. due to a lack of semantic interoperability. Hence,
there is a need of semantic web technologies for addressing healthcare
interoperability problems by enabling various healthcare standards from various
healthcare entities (doctors, clinics, hospitals etc.) to exchange data and its
semantics which can be understood by both machines and humans. Thus, a
framework with a similarity analyser has been proposed in the thesis that dealt
with semantic interoperability. While dealing with semantic interoperability,
another consideration was the use of word embedding and ontology for
knowledge discovery. In medical domain, the main challenge for medical
information extraction system is to find the required information by considering
explicit and implicit clinical context with high degree of precision and accuracy.
For semantic similarity of medical text at different levels (conceptual, sentence
and document level), different methods and techniques have been widely
presented, but I made sure that the semantic content of a text that is presented
includes the correct meaning of words and sentences. A comparative analysis
of approaches included ontology followed by word embedding or vice-versa
have been applied to explore the methodology to define which approach gives
better results for gaining higher semantic similarity. Selecting the Kidney Cancer
dataset as a use case, I concluded that both approaches work better in different circumstances. However, the approach in which ontology is followed by word
embedding to enrich data first has shown better results. Apart from enriching
the EHR, extracting relevant information is also challenging. To solve this
challenge, the concept of analogy has been applied to explain similarities
between two different contents as analogies play a significant role in
understanding new concepts. The concept of analogy helps healthcare
professionals to communicate with patients effectively and help them
understand their disease and treatment. So, I utilised analogies in this thesis to
support the extraction of relevant information from the medical text. Since
accessing EHR has been challenging, tweets text is used as an alternative for
EHR as social media has appeared as a relevant data source in recent years.
An algorithm has been proposed to analyse medical tweets based on analogous
words. The results have been used to validate the proposed methods. Two
experts from medical domain have given their views on the proposed methods
in comparison with the similar method named as SemDeep. The quantitative
and qualitative results have shown that the proposed analogy-based method
bring diversity and are helpful in analysing the specific disease or in text
classification
COHORT IDENTIFICATION FROM FREE-TEXT CLINICAL NOTES USING SNOMED CT’S SEMANTIC RELATIONS
In this paper, a new cohort identification framework that exploits the semantic hierarchy of SNOMED CT is proposed to overcome the limitations of supervised machine learning-based approaches. Eligibility criteria descriptions and free-text clinical notes from the 2018 National NLP Clinical Challenge (n2c2) were processed to map to relevant SNOMED CT concepts and to measure semantic similarity between the eligibility criteria and patients. The eligibility of a patient was determined if the patient had a similarity score higher than a threshold cut-off value, which was established where the best F1 score could be achieved. The performance of the proposed system was evaluated for three eligibility criteria. The current framework’s macro-average F1 score across three eligibility criteria was higher than the previously reported results of the 2018 n2c2 (0.933 vs. 0.889). This study demonstrated that SNOMED CT alone can be leveraged for cohort identification tasks without referring to external textual sources for training.Doctor of Philosoph
Doctor of Philosophy
dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone
- …