35 research outputs found
Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.
Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models
Chrono: A System for Normalizing Temporal Expressions
The Chrono System: Chrono is a hybrid rule-based and machine learning system written in Python and built from the ground up to identify temporal expressions in text and normalizes them into the SCATE schema. Input text is preprocessed using Pythonâs NLTK package, and is run through each of the four primary modules highlighted here. Note that Chrono does not remove stopwords because they add temporal information and context, and Chrono does not tokenize sentences. Output is an Anafora XML file with annotated SCATE entities. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6. Chrono is available on GitHub at https://github.com/AmyOlex/Chrono.
Future Work: Chrono is still under development. Future improvements will include: additional entity parsing, like âeventâ; evaluating the impact of sentence tokenization; implement an ensemble ML module that utilizes all four ML methods for disambiguation; extract temporal phrase parsing algorithm to be stand-alone and compare to similar systems; evaluate performance on THYME medical corpus; migrate to UIMA framework and implement Ruta Rules for portability and easier customization
Using Active Learning To Build A Foundation For Bioinformatics Training.
As Health Sciences Libraries evolve, the support they offer graduate students has evolved to incorporate many aspects of the research life cycle. At Tompkins-McCaw Library for the Health Sciences, we have partnered with the Wright Center for Clinical and Translational Research to offer training workshops for graduate students who are interested in using bioinformatics to plan, analyze, or execute scientific experiments. We offer two series: 1) an 8-week, 1-hour per week seminar series providing a general overview of available techniques and 2) a week-long intensive, two hours per session, series on utilizing free databases from the National Center for Biotechnology and Information (NCBI). Workshops have been offered for four years; a consistent challenge has been the variety of experience of participants, particularly in their biological science content background. To address this challenge and provide a solid foundation for the series, in 2019 we conducted a basic genetics session prior to engaging with the NCBI databases. In this lesson, we introduced participants to the central dogma of biology and utilized that knowledge in active learning sessions, with the goal of a shared understanding of the biological processes of transcription and translation. This understanding is essential to effectively using the gene and protein databases to interpret data and plan experiments. In addition to laying a solid content foundation, these activities set the stage for an interactive series and allowed participants to feel comfortable with the content and with interacting with each other. Feedback for the sessions was largely positive with 86% of survey respondents indicating enjoying the genetics portion specifically. The activities utilized open access learning materials and could be adapted for bioinformatic workshops at other institutions
Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates
<p>Abstract</p> <p>Background</p> <p>Dendritic cells (DC) play a central role in primary immune responses and become potent stimulators of the adaptive immune response after undergoing the critical process of maturation. Understanding the dynamics of DC maturation would provide key insights into this important process. Time course microarray experiments can provide unique insights into DC maturation dynamics. Replicate experiments are necessary to address the issues of experimental and biological variability. Statistical methods and averaging are often used to identify significant signals. Here a novel strategy for filtering of replicate time course microarray data, which identifies consistent signals between the replicates, is presented and applied to a DC time course microarray experiment.</p> <p>Results</p> <p>The temporal dynamics of DC maturation were studied by stimulating DC with poly(I:C) and following gene expression at 5 time points from 1 to 24 hours. The novel filtering strategy uses standard statistical and fold change techniques, along with the consistency of replicate temporal profiles, to identify those differentially expressed genes that were consistent in two biological replicate experiments. To address the issue of cluster reproducibility a consensus clustering method, which identifies clusters of genes whose expression varies consistently between replicates, was also developed and applied. Analysis of the resulting clusters revealed many known and novel characteristics of DC maturation, such as the up-regulation of specific immune response pathways. Intriguingly, more genes were down-regulated than up-regulated. Results identify a more comprehensive program of down-regulation, including many genes involved in protein synthesis, metabolism, and housekeeping needed for maintenance of cellular integrity and metabolism.</p> <p>Conclusions</p> <p>The new filtering strategy emphasizes the importance of consistent and reproducible results when analyzing microarray data and utilizes consistency between replicate experiments as a criterion in both feature selection and clustering, without averaging or otherwise combining replicate data. Observation of a significant down-regulation program during DC maturation indicates that DC are preparing for cell death and provides a path to better understand the process. This new filtering strategy can be adapted for use in analyzing other large-scale time course data sets with replicates.</p
Untapped Potential of Clinical Text for Opioid Surveillance
Accurate surveillance is needed to combat the growing opioid epidemic. To investigate the potential volume of missed opioid overdoses, we compare overdose encounters identified by ICD-10-CM codes and an NLP pipeline from two different medical systems. Our results show that the NLP pipeline identified a larger percentage of OOD encounters than ICD-10-CM codes. Thus, incorporating sophisticated NLP techniques into current diagnostic methods has the potential to improve surveillance on the incidence of opioid overdoses
Increased Incidence of Vestibular Disorders in Patients With SARS-CoV-2
OBJECTIVE: Determine the incidence of vestibular disorders in patients with SARS-CoV-2 compared to the control population.
STUDY DESIGN: Retrospective.
SETTING: Clinical data in the National COVID Cohort Collaborative database (N3C).
METHODS: Deidentified patient data from the National COVID Cohort Collaborative database (N3C) were queried based on variant peak prevalence (untyped, alpha, delta, omicron 21K, and omicron 23A) from covariants.org to retrospectively analyze the incidence of vestibular disorders in patients with SARS-CoV-2 compared to control population, consisting of patients without documented evidence of COVID infection during the same period.
RESULTS: Patients testing positive for COVID-19 were significantly more likely to have a vestibular disorder compared to the control population. Compared to control patients, the odds ratio of vestibular disorders was significantly elevated in patients with untyped (odds ratio [OR], 2.39; confidence intervals [CI], 2.29-2.50;
CONCLUSIONS: The incidence of vestibular disorders differed between COVID-19 variants and was significantly elevated in COVID-19-positive patients compared to the control population. These findings have implications for patient counseling and further research is needed to discern the long-term effects of these findings
SC2ATmd: a tool for integration of the figure of merit with cluster analysis for gene expression data
Summary: Standard and Consensus Clustering Analysis Tool for Microarray Data (SC2ATmd) is a MATLAB-implemented application specifically designed for the exploration of microarray gene expression data via clustering. Implementation of two versions of the clustering validation method figure of merit allows for performance comparisons between different clustering algorithms, and tailors the cluster analysis process to the varying characteristics of each dataset. Along with standard clustering algorithms this application also offers a consensus clustering method that can generate reproducible clusters across replicate experiments or different clustering algorithms. This application was designed specifically for the analysis of gene expression data, but may be used with any numerical data as long as it is in the right format