13,092 research outputs found

    Building a semantically annotated corpus of clinical texts

    Get PDF
    In this paper, we describe the construction of a semantically annotated corpus of clinical texts for use in the development and evaluation of systems for automatically extracting clinically significant information from the textual component of patient records. The paper details the sampling of textual material from a collection of 20,000 cancer patient records, the development of a semantic annotation scheme, the annotation methodology, the distribution of annotations in the final corpus, and the use of the corpus for development of an adaptive information extraction system. The resulting corpus is the most richly semantically annotated resource for clinical text processing built to date, whose value has been demonstrated through its use in developing an effective information extraction system. The detailed presentation of our corpus construction and annotation methodology will be of value to others seeking to build high-quality semantically annotated corpora in biomedical domains

    Implementing a Portable Clinical NLP System with a Common Data Model - a Lisp Perspective

    Full text link
    This paper presents a Lisp architecture for a portable NLP system, termed LAPNLP, for processing clinical notes. LAPNLP integrates multiple standard, customized and in-house developed NLP tools. Our system facilitates portability across different institutions and data systems by incorporating an enriched Common Data Model (CDM) to standardize necessary data elements. It utilizes UMLS to perform domain adaptation when integrating generic domain NLP tools. It also features stand-off annotations that are specified by positional reference to the original document. We built an interval tree based search engine to efficiently query and retrieve the stand-off annotations by specifying positional requirements. We also developed a utility to convert an inline annotation format to stand-off annotations to enable the reuse of clinical text datasets with inline annotations. We experimented with our system on several NLP facilitated tasks including computational phenotyping for lymphoma patients and semantic relation extraction for clinical notes. These experiments showcased the broader applicability and utility of LAPNLP.Comment: 6 pages, accepted by IEEE BIBM 2018 as regular pape

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning

    Automated Measurement of Adherence to Traumatic Brain Injury (TBI) Guidelines using Neurological ICU Data

    Get PDF
    Using a combination of physiological and treatment information from neurological ICU data-sets, adherence to traumatic brain injury (TBI) guidelines on hypotension, intracranial pressure (ICP) and cerebral perfusion pressure (CPP) is calculated automatically. The ICU output is evaluated to capture pressure events and actions taken by clinical staff for patient management, and are then re-expressed as simplified process models. The official TBI guidelines from the Brain Trauma Foundation are similarly evaluated, so the two structures can be compared and a quantifiable distance between the two calculated (the measure of adherence). The methods used include: the compilation of physiological and treatment information into event logs and subsequently process models; the expression of the BTF guidelines in process models within the real-time context of the ICU; a calculation of distance between the two processes using two algorithms (“Direct” and “Weighted”) building on work conducted in th e business process domain. Results are presented across two categories each with clinical utility (minute-by-minute and single patient stays) using a real ICU data-set. Results of two sample patients using a weighted algorithm show a non-adherence level of 6.25% for 42 mins and 56.25% for 708 mins and non-adherence of 18.75% for 17 minutes and 56.25% for 483 minutes. Expressed as two combinatorial metrics (duration/non-adherence (A) and duration * non-adherence (B)), which together indicate the clinical importance of the non-adherence, one has a mean of A=4.63 and B=10014.16 and the other a mean of A=0.43 and B=500.0

    Hypotheses, evidence and relationships: The HypER approach for representing scientific knowledge claims

    Get PDF
    Biological knowledge is increasingly represented as a collection of (entity-relationship-entity) triplets. These are queried, mined, appended to papers, and published. However, this representation ignores the argumentation contained within a paper and the relationships between hypotheses, claims and evidence put forth in the article. In this paper, we propose an alternate view of the research article as a network of 'hypotheses and evidence'. Our knowledge representation focuses on scientific discourse as a rhetorical activity, which leads to a different direction in the development of tools and processes for modeling this discourse. We propose to extract knowledge from the article to allow the construction of a system where a specific scientific claim is connected, through trails of meaningful relationships, to experimental evidence. We discuss some current efforts and future plans in this area

    Fuzzy Logic in Clinical Practice Decision Support Systems

    Get PDF
    Computerized clinical guidelines can provide significant benefits to health outcomes and costs, however, their effective implementation presents significant problems. Vagueness and ambiguity inherent in natural (textual) clinical guidelines is not readily amenable to formulating automated alerts or advice. Fuzzy logic allows us to formalize the treatment of vagueness in a decision support architecture. This paper discusses sources of fuzziness in clinical practice guidelines. We consider how fuzzy logic can be applied and give a set of heuristics for the clinical guideline knowledge engineer for addressing uncertainty in practice guidelines. We describe the specific applicability of fuzzy logic to the decision support behavior of Care Plan On-Line, an intranet-based chronic care planning system for General Practitioners

    Improving automation standards via semantic modelling: Application to ISA88

    Get PDF
    Standardization is essential for automation. Extensibility, scalability, and reusability are important features for automation software that rely in the efficient modelling of the addressed systems. The work presented here is from the ongoing development of a methodology for semi-automatic ontology construction methodology from technical documents. The main aim of this work is to systematically check the consistency of technical documents and support the improvement of technical document consistency. The formalization of conceptual models and the subsequent writing of technical standards are simultaneously analyzed, and guidelines proposed for application to future technical standards. Three paradigms are discussed for the development of domain ontologies from technical documents, starting from the current state of the art, continuing with the intermediate method presented and used in this paper, and ending with the suggested paradigm for the future. The ISA88 Standard is taken as a representative case study. Linguistic techniques from the semi-automatic ontology construction methodology is applied to the ISA88 Standard and different modelling and standardization aspects that are worth sharing with the automation community is addressed. This study discusses different paradigms for developing and sharing conceptual models for the subsequent development of automation software, along with presenting the systematic consistency checking methodPeer ReviewedPostprint (author's final draft
    corecore