43 research outputs found

    Temporal disambiguation of relative temporal expressions in clinical texts using temporally fine-tuned contextual word embeddings.

    Get PDF
    Temporal reasoning is the ability to extract and assimilate temporal information to reconstruct a series of events such that they can be reasoned over to answer questions involving time. Temporal reasoning in the clinical domain is challenging due to specialized medical terms and nomenclature, shorthand notation, fragmented text, a variety of writing styles used by different medical units, redundancy of information that has to be reconciled, and an increased number of temporal references as compared to general domain texts. Work in the area of clinical temporal reasoning has progressed, but the current state-of-the-art still has a ways to go before practical application in the clinical setting will be possible. Much of the current work in this field is focused on direct and explicit temporal expressions and identifying temporal relations. However, there is little work focused on relative temporal expressions, which can be difficult to normalize, but are vital to ordering events on a timeline. This work introduces a new temporal expression recognition and normalization tool, Chrono, that normalizes temporal expressions into both SCATE and TimeML schemes. Chrono advances clinical timeline extraction as it is capable of identifying more vague and relative temporal expressions than the current state-of-the-art and utilizes contextualized word embeddings from fine-tuned BERT models to disambiguate temporal types, which achieves state-of-the-art performance on relative temporal expressions. In addition, this work shows that fine-tuning BERT models on temporal tasks modifies the contextualized embeddings so that they achieve improved performance in classical SVM and CNN classifiers. Finally, this works provides a new tool for linking temporal expressions to events or other entities by introducing a novel method to identify which tokens an entire temporal expression is paying the most attention to by summarizing the attention weight matrices output by BERT models

    Parsing MetaMap Files in Hadoop

    Get PDF
    The UMLS::Association CUICollector module identifies UMLS Concept Unique Identifier bigrams and their frequencies in a biomedical text corpus. CUICollector was re-implemented in Hadoop MapReduce to improve algorithm speed, flexibility, and scalability. Evaluation of the Hadoop implementation compared to the serial module produced equivalent results and achieved a 28x speedup on a single-node Hadoop system

    Chrono: A System for Normalizing Temporal Expressions

    Get PDF
    The Chrono System: Chrono is a hybrid rule-based and machine learning system written in Python and built from the ground up to identify temporal expressions in text and normalizes them into the SCATE schema. Input text is preprocessed using Python’s NLTK package, and is run through each of the four primary modules highlighted here. Note that Chrono does not remove stopwords because they add temporal information and context, and Chrono does not tokenize sentences. Output is an Anafora XML file with annotated SCATE entities. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 Task 6. Chrono is available on GitHub at https://github.com/AmyOlex/Chrono. Future Work: Chrono is still under development. Future improvements will include: additional entity parsing, like “event”; evaluating the impact of sentence tokenization; implement an ensemble ML module that utilizes all four ML methods for disambiguation; extract temporal phrase parsing algorithm to be stand-alone and compare to similar systems; evaluate performance on THYME medical corpus; migrate to UIMA framework and implement Ruta Rules for portability and easier customization

    Short Courses: Flexible Learning Opportunities in Informatics

    Get PDF
    In today’s fast-paced, data-driven world, researchers need to have a good foundation in informatics to store, organize, process, and analyze growing amounts of data. However, not all degree programs offer such training. Obtaining training in informatics on your own can be a daunting task for both new and established researchers who have little informatics experience. Providing educational opportunities appropriate for various skill levels and that mesh with a full-time schedule can remove barriers and foster a collaborative, informatics-savvy community that is better equipped to push science forward. To enhance informatics education in bioinformatics, VCUs Wright Center for Clinical and Translational Research of- fers a complementary series of seminars and workshops. These short course offerings introduce attendees to bioinformatics concepts and applications, and provide hands-on experience using online Bioinformatics databases. Bioinformatics 101 (B101) is an 8-week long series of 1-hour seminars focused on introducing topics in bioinformatics related to Next Generation Sequencing (NGS). Lectures are application focused and include overviews of NGS technology, practical bioinformatics pipelines, and examples of how the technology can influence downstream bioinformatics analyses. Bioinformatics 102 (B102) is a 5-day, 2 hours per day workshop developed in collaboration with VCU Libraries that provides attendees with hands-on experience accessing and using public data repositories. Sessions include a brief lecture followed by hands-on exercises. A Certificate of Completion is awarded upon meeting certain criteria for either the 101 or 102 courses. Bioinformatics 101 has been offered 3 times with a combined total of 246 registrants, and Bioinformatics 102 has been offered twice with a total of 78 registrants (limited to 30 per session per day). From course surveys, 82% (n=108) and 95% (n=47) of respondents gave B101 and B102 a positive rating, respectively. In addition, 89% of B101 respondents indicated their knowledge was improved, with 100% of B102 respondents indicating the same. A total of 84 and 33 certificates have been awarded for B101 and B102, respectively. The Bioinformatics 101 and 102 courses have become highly anticipated across the university, and have gained the external attention of surrounding businesses and colleges. Registrants have diverse backgrounds including biological, clinical, computational, administrative, librarian, business, and others with a total of 77 departments across VCU and VCU Health represented. Due to this interest, Bioinformatics 101 began offering live online attendance to accommodate those who were unable to travel across campus, or who are attending from outside VCU. This past year, 50% of attendance was online indicating a growing need for flexible education opportunities in informatics. Increasing researcher knowledge of Bioinformatics along with awareness of university resources for informatics support fosters an informatics-savvy research community that is empowered to take advantage of existing and new data sources in the pursuit of new insights and scientific discoveries for the betterment of human health. Future work will include the development of a more comprehensive educational framework by creating new and flexible learning opportunities that will make informatics education easy and convenient for our dedicated researchers

    Using Active Learning To Build A Foundation For Bioinformatics Training.

    Get PDF
    As Health Sciences Libraries evolve, the support they offer graduate students has evolved to incorporate many aspects of the research life cycle. At Tompkins-McCaw Library for the Health Sciences, we have partnered with the Wright Center for Clinical and Translational Research to offer training workshops for graduate students who are interested in using bioinformatics to plan, analyze, or execute scientific experiments. We offer two series: 1) an 8-week, 1-hour per week seminar series providing a general overview of available techniques and 2) a week-long intensive, two hours per session, series on utilizing free databases from the National Center for Biotechnology and Information (NCBI). Workshops have been offered for four years; a consistent challenge has been the variety of experience of participants, particularly in their biological science content background. To address this challenge and provide a solid foundation for the series, in 2019 we conducted a basic genetics session prior to engaging with the NCBI databases. In this lesson, we introduced participants to the central dogma of biology and utilized that knowledge in active learning sessions, with the goal of a shared understanding of the biological processes of transcription and translation. This understanding is essential to effectively using the gene and protein databases to interpret data and plan experiments. In addition to laying a solid content foundation, these activities set the stage for an interactive series and allowed participants to feel comfortable with the content and with interacting with each other. Feedback for the sessions was largely positive with 86% of survey respondents indicating enjoying the genetics portion specifically. The activities utilized open access learning materials and could be adapted for bioinformatic workshops at other institutions

    Dynamics of dendritic cell maturation are identified through a novel filtering strategy applied to biological time-course microarray replicates

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Dendritic cells (DC) play a central role in primary immune responses and become potent stimulators of the adaptive immune response after undergoing the critical process of maturation. Understanding the dynamics of DC maturation would provide key insights into this important process. Time course microarray experiments can provide unique insights into DC maturation dynamics. Replicate experiments are necessary to address the issues of experimental and biological variability. Statistical methods and averaging are often used to identify significant signals. Here a novel strategy for filtering of replicate time course microarray data, which identifies consistent signals between the replicates, is presented and applied to a DC time course microarray experiment.</p> <p>Results</p> <p>The temporal dynamics of DC maturation were studied by stimulating DC with poly(I:C) and following gene expression at 5 time points from 1 to 24 hours. The novel filtering strategy uses standard statistical and fold change techniques, along with the consistency of replicate temporal profiles, to identify those differentially expressed genes that were consistent in two biological replicate experiments. To address the issue of cluster reproducibility a consensus clustering method, which identifies clusters of genes whose expression varies consistently between replicates, was also developed and applied. Analysis of the resulting clusters revealed many known and novel characteristics of DC maturation, such as the up-regulation of specific immune response pathways. Intriguingly, more genes were down-regulated than up-regulated. Results identify a more comprehensive program of down-regulation, including many genes involved in protein synthesis, metabolism, and housekeeping needed for maintenance of cellular integrity and metabolism.</p> <p>Conclusions</p> <p>The new filtering strategy emphasizes the importance of consistent and reproducible results when analyzing microarray data and utilizes consistency between replicate experiments as a criterion in both feature selection and clustering, without averaging or otherwise combining replicate data. Observation of a significant down-regulation program during DC maturation indicates that DC are preparing for cell death and provides a path to better understand the process. This new filtering strategy can be adapted for use in analyzing other large-scale time course data sets with replicates.</p

    Providing Hands-on Training with Bioinformatics Databases: A Collaboration Between VCU Libraries & Wright Center for Clinical and Translational Research

    Get PDF
    BackgroundWith the goal of increasing specialized services for researchers, Virginia Commonwealth University (VCU) Libraries sent its basic science librarians to an intensive training on bioinformatics databases, “A Librarian’s Guide to NCBI.” VCU’s Wright Center for Clinical and Translational Research (Wright CCTR) was expanding the educational component of its bioinformatics support around the same time. This year, the librarians partnered with the Wright CCTR to offer an introductory bioinformatics database workshop introducing researchers to genetic/genomic databases. MethodsFor one week in June, sessions were conducted introducing up to 30 faculty and staff to The Cancer Genome Atlas and NCBI’s Gene, BLAST, Variation Viewer and Gene Expression Omnibus. Librarians taught resources they learned in the NCBI training, and Wright CCTR staff taught resources they use often. Each day’s 1.5 hour session included presentations, demonstrations, and hands-on assignment time. Certificates were awarded to participants who completed 4 out of 5 assignments. ResultsRegistration for the workshop was full in under a week with a waiting list. All survey respondents (n=27) evaluated the overall quality of the workshop as good or excellent and indicated that they would recommend the workshop to a colleague or student. ConclusionsThis successful partnership between VCU Libraries and the Wright CCTR allowed for a broader range of bioinformatics topics to be covered, in addition to easing the planning and teaching workload for each group. The strong interest in this series across a variety of disciplines from both VCU and VCU Health indicates a need for staff and faculty-oriented bioinformatics training within the university

    Expanding Our Understanding of Adherence: The Role of Health Literacy and Cognitive Function in Adherence and Outcomes in Head and Neck Cancer

    Get PDF
    Background: Health literacy is the degree to which a person has the capacity to obtain, process, and understand basic information and services needed to make decisions about their health care. Poor health literacy has been associated with difficulties managing medications, assessing and evaluating health information, completing medical and financial forms, and comparing nutritional information of foods. As such, health literacy is closely related to adherence to medical treatment. Cognitive function contributes to one\u27s health literacy, though also independently contributes to adherence. Patients with head and neck cancers require complex, often multimodal care, and both health literacy and cognitive function have been found to be lower than the general population. However no study has examined the interaction between cognitive function and health literacy within treatment for head and neck cancer and outcomes. Objectives: To examine the role of cognitive function and health literacy in adherence to definitive and adjuvant radiotherapy and chemoradiotherapy and disease-free and overall survival in patients with head and neck cancer. Methods: 149 patients who received either definitive or adjuvant radiotherapy or chemoradiotherapy for squamous cell carcinoma of the head and neck and were assessed by psycho-oncology provider before initiating treatment were included. Patients between August 2017 through March 2020 were included. Patients were administered the Montreal Cognitive Assessment (MoCA) and the Rapid Estimate of Adult Literacy in Medicine (REALM-SF) by the psych-oncologist before starting treatment. Cancer and treatment related variables, including adherence, were obtained via chart review. Adherence was defined as having completed the treatment recommended by the Multi-disciplinary Tumor Board. Results: Patients were predominantly male (78%), white (73%), with an average age of 62 years (SD=9.1). The average years of education was 13.6 years (SD=2.6). The mean health literacy score was 6.3 out of 7 (SD=1.3, range 0-7), indicating reading at 7-8th grade level. The mean cognitive function score was 23.8 out of 30 (SD=3.6, range 10-30, scores less than 26 are indicative of cognitive impairment). Sixteen percent of patients were non-adherent to treatment recommendations and this was not associated with either health literacy or cognitive function (P=0.5 & 0.36, respectively). Lower health literacy was associated with later stage at presentation (P\u3c0.05). Health literacy was not associated with disease-free or overall survival (P=0.66 & 0.11, respectively). However, cognitive function was associated with overall survival (P\u3c0.0001) but not disease-free survival (P=0.22). Conclusions: Psychosocial variables such as health literacy and cognitive function are infrequently considered or studied in head and neck cancer. However, there exists significant evidence that patients with head and neck cancer tend to have higher rates of cognitive impairment and lower health literacy than the general population. Further, literacy and cognitive function are known to contribute to health outcomes in other populations. The current study found that cognitive impairment, but not health literacy, is associated with overall survival, while not being associated with treatment adherence. Further research is needed into the pathways that cognitive function interacts with cancer care and survival. This study highlights the need for assessment of cognitive function in patients with head and neck cancer, as identification and intervention with these patients can aid in survival outcomes and quality of life

    Untapped Potential of Clinical Text for Opioid Surveillance

    Get PDF
    Accurate surveillance is needed to combat the growing opioid epidemic. To investigate the potential volume of missed opioid overdoses, we compare overdose encounters identified by ICD-10-CM codes and an NLP pipeline from two different medical systems. Our results show that the NLP pipeline identified a larger percentage of OOD encounters than ICD-10-CM codes. Thus, incorporating sophisticated NLP techniques into current diagnostic methods has the potential to improve surveillance on the incidence of opioid overdoses

    Increased Incidence of Vestibular Disorders in Patients With SARS-CoV-2

    Get PDF
    OBJECTIVE: Determine the incidence of vestibular disorders in patients with SARS-CoV-2 compared to the control population. STUDY DESIGN: Retrospective. SETTING: Clinical data in the National COVID Cohort Collaborative database (N3C). METHODS: Deidentified patient data from the National COVID Cohort Collaborative database (N3C) were queried based on variant peak prevalence (untyped, alpha, delta, omicron 21K, and omicron 23A) from covariants.org to retrospectively analyze the incidence of vestibular disorders in patients with SARS-CoV-2 compared to control population, consisting of patients without documented evidence of COVID infection during the same period. RESULTS: Patients testing positive for COVID-19 were significantly more likely to have a vestibular disorder compared to the control population. Compared to control patients, the odds ratio of vestibular disorders was significantly elevated in patients with untyped (odds ratio [OR], 2.39; confidence intervals [CI], 2.29-2.50; CONCLUSIONS: The incidence of vestibular disorders differed between COVID-19 variants and was significantly elevated in COVID-19-positive patients compared to the control population. These findings have implications for patient counseling and further research is needed to discern the long-term effects of these findings
    corecore