67,130 research outputs found
EliXR-TIME: A Temporal Knowledge Representation for Clinical Research Eligibility Criteria.
Effective clinical text processing requires accurate extraction and representation of temporal expressions. Multiple temporal information extraction models were developed but a similar need for extracting temporal expressions in eligibility criteria (e.g., for eligibility determination) remains. We identified the temporal knowledge representation requirements of eligibility criteria by reviewing 100 temporal criteria. We developed EliXR-TIME, a frame-based representation designed to support semantic annotation for temporal expressions in eligibility criteria by reusing applicable classes from well-known clinical temporal knowledge representations. We used EliXR-TIME to analyze a training set of 50 new temporal eligibility criteria. We evaluated EliXR-TIME using an additional random sample of 20 eligibility criteria with temporal expressions that have no overlap with the training data, yielding 92.7% (76 / 82) inter-coder agreement on sentence chunking and 72% (72 / 100) agreement on semantic annotation. We conclude that this knowledge representation can facilitate semantic annotation of the temporal expressions in eligibility criteria
Doctor of Philosophy
dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone
How essential are unstructured clinical narratives and information fusion to clinical trial recruitment?
Electronic health records capture patient information using structured
controlled vocabularies and unstructured narrative text. While structured data
typically encodes lab values, encounters and medication lists, unstructured
data captures the physician's interpretation of the patient's condition,
prognosis, and response to therapeutic intervention. In this paper, we
demonstrate that information extraction from unstructured clinical narratives
is essential to most clinical applications. We perform an empirical study to
validate the argument and show that structured data alone is insufficient in
resolving eligibility criteria for recruiting patients onto clinical trials for
chronic lymphocytic leukemia (CLL) and prostate cancer. Unstructured data is
essential to solving 59% of the CLL trial criteria and 77% of the prostate
cancer trial criteria. More specifically, for resolving eligibility criteria
with temporal constraints, we show the need for temporal reasoning and
information integration with medical events within and across unstructured
clinical narratives and structured data.Comment: AMIA TBI 2014, 6 page
J Biomed Inform
We followed a systematic approach based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses to identify existing clinical natural language processing (NLP) systems that generate structured information from unstructured free text. Seven literature databases were searched with a query combining the concepts of natural language processing and structured data capture. Two reviewers screened all records for relevance during two screening phases, and information about clinical NLP systems was collected from the final set of papers. A total of 7149 records (after removing duplicates) were retrieved and screened, and 86 were determined to fit the review criteria. These papers contained information about 71 different clinical NLP systems, which were then analyzed. The NLP systems address a wide variety of important clinical and research tasks. Certain tasks are well addressed by the existing systems, while others remain as open challenges that only a small number of systems attempt, such as extraction of temporal information or normalization of concepts to standard terminologies. This review has identified many NLP systems capable of processing clinical free text and generating structured output, and the information collected and evaluated here will be important for prioritizing development of new approaches for clinical NLP.CC999999/ImCDC/Intramural CDC HHS/United States2019-11-20T00:00:00Z28729030PMC6864736694
Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.
Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models
Improving Syntactic Parsing of Clinical Text Using Domain Knowledge
Syntactic parsing is one of the fundamental tasks of Natural Language Processing (NLP). However, few studies have explored syntactic parsing in the medical domain. This dissertation systematically investigated different methods to improve the performance of syntactic parsing of clinical text, including (1) Constructing two clinical treebanks of discharge summaries and progress notes by developing annotation guidelines that handle missing elements in clinical sentences; (2) Retraining four state-of-the-art parsers, including the Stanford parser, Berkeley parser, Charniak parser, and Bikel parser, using clinical treebanks, and comparing their performance to identify better parsing approaches; and (3) Developing new methods to reduce syntactic ambiguity caused by Prepositional Phrase (PP) attachment and coordination using semantic information.
Our evaluation showed that clinical treebanks greatly improved the performance of existing parsers. The Berkeley parser achieved the best F-1 score of 86.39% on the MiPACQ treebank. For PP attachment, our proposed methods improved the accuracies of PP attachment by 2.35% on the MiPACQ corpus and 1.77% on the I2b2 corpus. For coordination, our method achieved a precision of 94.9% and a precision of 90.3% for the MiPACQ and i2b2 corpus, respectively. To further demonstrate the effectiveness of the improved parsing approaches, we applied outputs of our parsers to two external NLP tasks: semantic role labeling and temporal relation extraction. The experimental results showed that performance of both tasks’ was improved by using the parse tree information from our optimized parsers, with an improvement of 3.26% in F-measure for semantic role labelling and an improvement of 1.5% in F-measure for temporal relation extraction
Recommended from our members
A modular, open-source information extraction framework for identifying clinical concepts and processes of care in clinical narratives
In this thesis, a synthesis is presented of the knowledge models required by clinical informa- tion systems that provide decision support for longitudinal processes of care. Qualitative research techniques and thematic analysis are novelly applied to a systematic review of the literature on the challenges in implementing such systems, leading to the development of an original conceptual framework. The thesis demonstrates how these process-oriented systems make use of a knowledge base derived from workflow models and clinical guidelines, and argues that one of the major barriers to implementation is the need to extract explicit and implicit information from diverse resources in order to construct the knowledge base. Moreover, concepts in both the knowledge base and in the electronic health record (EHR) must be mapped to a common ontological model. However, the majority of clinical guideline information remains in text form, and much of the useful clinical information residing in the EHR resides in the free text fields of progress notes and laboratory reports. In this thesis, it is shown how natural language processing and information extraction techniques provide a means to identify and formalise the knowledge components required by the knowledge base. Original contributions are made in the development of lexico-syntactic patterns and the use of external domain knowledge resources to tackle a variety of information extraction tasks in the clinical domain, such as recognition of clinical concepts, events, temporal relations, term disambiguation and abbreviation expansion. Methods are developed for adapting existing tools and resources in the biomedical domain to the processing of clinical texts, and approaches to improving the scalability of these tools are proposed and evalu- ated. These tools and techniques are then combined in the creation of a novel approach to identifying processes of care in the clinical narrative. It is demonstrated that resolution of coreferential and anaphoric relations as narratively and temporally ordered chains provides a means to extract linked narrative events and processes of care from clinical notes. Coreference performance in discharge summaries and progress notes is largely dependent on correct identification of protagonist chains (patient, clinician, family relation), pronominal resolution, and string matching that takes account of experiencer, temporal, spatial, and anatomical context; whereas for laboratory reports additional, external domain knowledge is required. The types of external knowledge and their effects on system performance are identified and evaluated. Results are compared against existing systems for solving these tasks and are found to improve on them, or to approach the performance of recently reported, state-of-the- art systems. Software artefacts developed in this research have been made available as open-source components within the General Architecture for Text Engineering framework
End to end approach for i2b2 2012 challenge based on Cross-lingual models
BACKGROUND - We propose a Cross-lingual approach to i2b2 2012 challenge for Clinical
Records focused on the temporal relations in clinical narratives. Corpus of discharge
summaries annotated with temporal information was provided for automatically
extracting : (1) clinically significant events, including both clinical concepts such as
problems, tests, treatments, and clinical departments, and events relevant to the patient’s
clinical timeline, such as admissions, transfers between departments, etc; (2) temporal
expressions, referring to the dates, times, duration, or frequencies in the clinical text. The
values of the extracted temporal expressions had to be normalized to an ISO specification
standard; and (3) temporal relations, among the clinical events and temporal expressions.
GOALS - The objectives involved in the current work consists on outperforming previous
State of the Art for the i2b2 2012 challenge and adapting Cross-lingual model into
clinical specific domain with low Data resources available.
METHODS - The task has been conceived as a pipeline of different modules, an event and
temporal expression token-classifier and a text-classifier for relation extraction, each of
them independently developed from the other. We used XLM-RoBERTa Cross-lingual
model.
RESULTS - For event detection, the proposed token-classifier obtains a 0.91 Span F1. For
temporal expressions, our sentence-classifier achieves a 0.91 Span F1. For temporal
relation, we propose sentence classifier based on sequential-taggers that performs at 0.29
F1 measure.DESKRIBAPENA - Narratiba klinikoen domeinuan i2b2 2012 erronkarako hizkuntzarteko
ikuspegia jorratzen duen soluzioa proposatzen dugu. Erronka honek txosten medikuetan
islatzen diren gertaeren arteko denbora-erlazioak iragartzea du helburu. Horretarako, lan
hau alde batetik (1) klinikoki esanguratsuak diren gertaerak, adibidez, kontzeptu
klinikoak, probak, tratamenduak, sail klinikoak eta bestetik, (2) denbora-adierazpenak,
adibidez, txostenak esleituta duen data, denbora, iraupen edo maiztasuna adierazten
duten espresioak antzeman eta bukatzeko gertaera klinikoen eta (3)
denbora-adierazpenen arteako erlazioak anotatuta duen corpus batetik abiatzen da.
HELBURUAK - Lanaren helburuak i2b2 2012 artearen egoera hobetzea eta Cross-lingual
modeloa Data baliabide baxuak dituen domeinu kliniko espezifikora egokitzea dira.
METODOAK - Lana modulu desberdinetako hobi gisa ulertu da, gertaera eta
denbora-adierazpenetarako sekuentzia-markatzaileak, eta denbora-erlaziorako
perpaus-sailkatzailea, independenteki garatu dira. XLM-RoBERTa Cross-lingual modeloa
erabili izan da lan honetan.
EMAITZAK - Gertaerak atzemateko, 0.91 Span F1 exekutatzen duen
sekuentzia-markatzailea proposatzen dugu. Denbora-adierazpenetarako, 0.91 Span F1
egiten duen sekuentzia-markatzailea bat proposatzen dugu. Denbora-erlaziorako, 0.29 F1
neurria egiten duten sekuentzia-markatzaileetan oinarritutako perpaus-sailkatzailea
proposatzen dugu
- …