24,784 research outputs found
Analysis of Temporal Expressions Annotated in Clinical Notes
Annotating the semantics of time in language is important. THYME is a recent temporal annotation standard for clinical texts. This paper examines temporal expressions in the first major corpus
released under this standard. It investigates where the standard has proven difficult to apply, and
gives a series of recommendations regarding temporal annotation in this important domain
Building a semantically annotated corpus of clinical texts
In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains
Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus
The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
Temporal expression normalisation in natural language texts
Automatic annotation of temporal expressions is a research challenge of great
interest in the field of information extraction. In this report, I describe a
novel rule-based architecture, built on top of a pre-existing system, which is
able to normalise temporal expressions detected in English texts. Gold standard
temporally-annotated resources are limited in size and this makes research
difficult. The proposed system outperforms the state-of-the-art systems with
respect to TempEval-2 Shared Task (value attribute) and achieves substantially
better results with respect to the pre-existing system on top of which it has
been developed. I will also introduce a new free corpus consisting of 2822
unique annotated temporal expressions. Both the corpus and the system are
freely available on-line.Comment: 7 pages, 1 figure, 5 table
Recommended from our members
A lightweight, pattern-based approach to identification and formalisation of TimeML expressions in clinical narratives
General Architecture for Text Engineering (GATE) components for identifying clinical events and temporal expressions are developed and evaluated against a corpus of 120 discharge summaries
RSDD-Time: Temporal Annotation of Self-Reported Mental Health Diagnoses
Self-reported diagnosis statements have been widely employed in studying
language related to mental health in social media. However, existing research
has largely ignored the temporality of mental health diagnoses. In this work,
we introduce RSDD-Time: a new dataset of 598 manually annotated self-reported
depression diagnosis posts from Reddit that include temporal information about
the diagnosis. Annotations include whether a mental health condition is present
and how recently the diagnosis happened. Furthermore, we include exact temporal
spans that relate to the date of diagnosis. This information is valuable for
various computational methods to examine mental health through social media
because one's mental health state is not static. We also test several baseline
classification and extraction approaches, which suggest that extracting
temporal information from self-reported diagnosis statements is challenging.Comment: 6 pages, accepted for publication at the CLPsych workshop at
NAACL-HLT 201
Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective
This paper presents a Lisp architecture for a portable NLP system, termed
LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard,
customized and in-house developed NLP tools. Our system facilitates portability
across different institutions and data systems by incorporating an enriched
Common Data Model (CDM) to standardize necessary data elements. It utilizes
UMLS to perform domain adaptation when integrating generic domain NLP tools. It
also features stand-off annotations that are specified by positional reference
to the original document. We built an interval tree based search engine to
efficiently query and retrieve the stand-off annotations by specifying
positional requirements. We also developed a utility to convert an inline
annotation format to stand-off annotations to enable the reuse of clinical text
datasets with inline annotations. We experimented with our system on several
NLP facilitated tasks including computational phenotyping for lymphoma patients
and semantic relation extraction for clinical notes. These experiments
showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape
- …