32 research outputs found

    Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.

    Get PDF
    Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models

    Chrono: A System for Normalizing Temporal Expressions

    Get PDF
    The Chrono System: Chrono is a hybrid rule-based and machine learning system written in Python and built from the ground up to identify temporal expressions in text and normalizes them into the SCATE schema. Input text is preprocessed using Python’s NLTK package, and is run through each of the four primary modules highlighted here. Note that Chrono does not remove stopwords because they add temporal information and context, and Chrono does not tokenize sentences. Output is an Anafora XML file with annotated SCATE entities. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6. Chrono is available on GitHub at https://github.com/AmyOlex/Chrono. Future Work: Chrono is still under development. Future improvements will include: additional entity parsing, like “event”; evaluating the impact of sentence tokenization; implement an ensemble ML module that utilizes all four ML methods for disambiguation; extract temporal phrase parsing algorithm to be stand-alone and compare to similar systems; evaluate performance on THYME medical corpus; migrate to UIMA framework and implement Ruta Rules for portability and easier customization

    Using Active Learning To Build A Foundation For Bioinformatics Training.

    Get PDF
    As Health Sciences Libraries evolve, the support they offer graduate students has evolved to incorporate many aspects of the research life cycle. At Tompkins-McCaw Library for the Health Sciences, we have partnered with the Wright Center for Clinical and Translational Research to offer training workshops for graduate students who are interested in using bioinformatics to plan, analyze, or execute scientific experiments. We offer two series: 1) an 8-week, 1-hour per week seminar series providing a general overview of available techniques and 2) a week-long intensive, two hours per session, series on utilizing free databases from the National Center for Biotechnology and Information (NCBI). Workshops have been offered for four years; a consistent challenge has been the variety of experience of participants, particularly in their biological science content background. To address this challenge and provide a solid foundation for the series, in 2019 we conducted a basic genetics session prior to engaging with the NCBI databases. In this lesson, we introduced participants to the central dogma of biology and utilized that knowledge in active learning sessions, with the goal of a shared understanding of the biological processes of transcription and translation. This understanding is essential to effectively using the gene and protein databases to interpret data and plan experiments. In addition to laying a solid content foundation, these activities set the stage for an interactive series and allowed participants to feel comfortable with the content and with interacting with each other. Feedback for the sessions was largely positive with 86% of survey respondents indicating enjoying the genetics portion specifically. The activities utilized open access learning materials and could be adapted for bioinformatic workshops at other institutions

    Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dendritic cells (DC) play a central role in primary immune responses and become potent stimulators of the adaptive immune response after undergoing the critical process of maturation. Understanding the dynamics of DC maturation would provide key insights into this important process. Time course microarray experiments can provide unique insights into DC maturation dynamics. Replicate experiments are necessary to address the issues of experimental and biological variability. Statistical methods and averaging are often used to identify significant signals. Here a novel strategy for filtering of replicate time course microarray data, which identifies consistent signals between the replicates, is presented and applied to a DC time course microarray experiment.</p> <p>Results</p> <p>The temporal dynamics of DC maturation were studied by stimulating DC with poly(I:C) and following gene expression at 5 time points from 1 to 24 hours. The novel filtering strategy uses standard statistical and fold change techniques, along with the consistency of replicate temporal profiles, to identify those differentially expressed genes that were consistent in two biological replicate experiments. To address the issue of cluster reproducibility a consensus clustering method, which identifies clusters of genes whose expression varies consistently between replicates, was also developed and applied. Analysis of the resulting clusters revealed many known and novel characteristics of DC maturation, such as the up-regulation of specific immune response pathways. Intriguingly, more genes were down-regulated than up-regulated. Results identify a more comprehensive program of down-regulation, including many genes involved in protein synthesis, metabolism, and housekeeping needed for maintenance of cellular integrity and metabolism.</p> <p>Conclusions</p> <p>The new filtering strategy emphasizes the importance of consistent and reproducible results when analyzing microarray data and utilizes consistency between replicate experiments as a criterion in both feature selection and clustering, without averaging or otherwise combining replicate data. Observation of a significant down-regulation program during DC maturation indicates that DC are preparing for cell death and provides a path to better understand the process. This new filtering strategy can be adapted for use in analyzing other large-scale time course data sets with replicates.</p

    Untapped Potential of Clinical Text for Opioid Surveillance

    Get PDF
    Accurate surveillance is needed to combat the growing opioid epidemic. To investigate the potential volume of missed opioid overdoses, we compare overdose encounters identified by ICD-10-CM codes and an NLP pipeline from two different medical systems. Our results show that the NLP pipeline identified a larger percentage of OOD encounters than ICD-10-CM codes. Thus, incorporating sophisticated NLP techniques into current diagnostic methods has the potential to improve surveillance on the incidence of opioid overdoses

    SC2ATmd: a tool for integration of the figure of merit with cluster analysis for gene expression data

    No full text
    Summary: Standard and Consensus Clustering Analysis Tool for Microarray Data (SC2ATmd) is a MATLAB-implemented application specifically designed for the exploration of microarray gene expression data via clustering. Implementation of two versions of the clustering validation method figure of merit allows for performance comparisons between different clustering algorithms, and tailors the cluster analysis process to the varying characteristics of each dataset. Along with standard clustering algorithms this application also offers a consensus clustering method that can generate reproducible clusters across replicate experiments or different clustering algorithms. This application was designed specifically for the analysis of gene expression data, but may be used with any numerical data as long as it is in the right format
    corecore