40 research outputs found
Coreference resolution on entities and events for hospital discharge summaries
Includes bibliographical references (p. 76-80).Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2007.The wealth of medical information contained in electronic medical records (EMRs) and Natural Language Processing (NLP) technologies that can automatically extract information from them have opened the doors to automatic patient-care quality monitoring and medical- assist question answering systems. This thesis studies coreference resolution, an information extraction (IE) subtask that links together specific mentions to each entity. Coreference resolution enables us to find changes in the state of entities and makes it possible to answer questions regarding the information thus obtained. We perform coreference resolution on a specific type of EMR, the hospital discharge summary. We treat coreference resolution as a binary classification problem. Our approach yields insights into the critical features for coreference resolution for entities that fall into five medical semantic categories that commonly appear in discharge summaries.by Tian Ye He.M.Eng
Recommended from our members
A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives
In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework
How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?
Electronic health records capture patient information using structured
controlled vocabularies and unstructured narrative text. While structured data
typically encodes lab values, encounters and medication lists, unstructured
data captures the physician's interpretation of the patient's condition,
prognosis, and response to therapeutic intervention. In this paper, we
demonstrate that information extraction from unstructured clinical narratives
is essential to most clinical applications. We perform an empirical study to
validate the argument and show that structured data alone is insufficient in
resolving eligibility criteria for recruiting patients onto clinical trials for
chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is
essential to solving 59% of the CLL trial criteria and 77% of the prostate
cancer trial criteria. More specifically, for resolving eligibility criteria
with temporal constraints, we show the need for temporal reasoning and
information integration with medical events within and across unstructured
clinical narratives and structured data.Comment: AMIA TBI 2014, 6 page
Knowledge-based Biomedical Data Science 2019
Knowledge-based biomedical data science (KBDS) involves the design and
implementation of computer systems that act as if they knew about biomedicine.
Such systems depend on formally represented knowledge in computer systems,
often in the form of knowledge graphs. Here we survey the progress in the last
year in systems that use formally represented knowledge to address data science
problems in both clinical and biological domains, as well as on approaches for
creating knowledge graphs. Major themes include the relationships between
knowledge graphs and machine learning, the use of natural language processing,
and the expansion of knowledge-based approaches to novel domains, such as
Chinese Traditional Medicine and biodiversity.Comment: Manuscript 43 pages with 3 tables; Supplemental material 43 pages
with 3 table
Information extraction from medication leaflets
Tese de mestrado integrado. Engenharia Informática e Computação. Faculdade de Engenharia. Universidade do Porto. 201
DEVELOPING A CLINICAL LINGUISTIC FRAMEWORK FOR PROBLEM LIST GENERATION FROM CLINICAL TEXT
Regulatory institutions such as the Institute of Medicine and Joint Commission endorse problem lists as an effective method to facilitate transitions of care for patients. In practice, the problem list is a common model for documenting a care provider's medical reasoning with respect to a problem and its status during patient care. Although natural language processing (NLP) systems have been developed to support problem list generation, encoding many information layers - morphological, syntactic, semantic, discourse, and pragmatic - can
prove computationally expensive. The contribution of each information layer for accurate problem list generation has not been formally assessed. We would expect a problem list generator that relies on natural language processing would improve its performance with the addition of rich semantic features
We hypothesize that problem list generation can be approached as a two-step classification problem - problem mention status (Aim One) and patient problem status (Aim Two) classification. In Aim One, we will automatically classify the status of each problem mention using semantic features about problems described in the clinical narrative. In Aim Two, we will classify active patient problems from individual problem mentions and their statuses.
We believe our proposal is significant in two ways. First, our experiments will develop and evaluate semantic features, some commonly modeled and others not in the clinical text. The annotations we use will be made openly available to other NLP researchers to encourage future research on this task and other related problems including foundational NLP algorithms (assertion classification and coreference resolution) and applied clinical applications (patient timeline and record visualization). Second, by generating and evaluating existing
NLP systems, we are building an open-source problem list generator and demonstrating the performance for problem list generation using these features
Doctor of Philosophy
dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level
Ontologies and Information Extraction
This report argues that, even in the simplest cases, IE is an ontology-driven
process. It is not a mere text filtering method based on simple pattern
matching and keywords, because the extracted pieces of texts are interpreted
with respect to a predefined partial domain model. This report shows that
depending on the nature and the depth of the interpretation to be done for
extracting the information, more or less knowledge must be involved. This
report is mainly illustrated in biology, a domain in which there are critical
needs for content-based exploration of the scientific literature and which
becomes a major application domain for IE