833 research outputs found

    Enhancing clinical concept extraction with distributional semantics

    Get PDF
    AbstractExtracting concepts (such as drugs, symptoms, and diagnoses) from clinical narratives constitutes a basic enabling technology to unlock the knowledge within and support more advanced reasoning applications such as diagnosis explanation, disease progression modeling, and intelligent analysis of the effectiveness of treatment. The recent release of annotated training sets of de-identified clinical narratives has contributed to the development and refinement of concept extraction methods. However, as the annotation process is labor-intensive, training data are necessarily limited in the concepts and concept patterns covered, which impacts the performance of supervised machine learning applications trained with these data. This paper proposes an approach to minimize this limitation by combining supervised machine learning with empirical learning of semantic relatedness from the distribution of the relevant words in additional unannotated text.The approach uses a sequential discriminative classifier (Conditional Random Fields) to extract the mentions of medical problems, treatments and tests from clinical narratives. It takes advantage of all Medline abstracts indexed as being of the publication type “clinical trials” to estimate the relatedness between words in the i2b2/VA training and testing corpora. In addition to the traditional features such as dictionary matching, pattern matching and part-of-speech tags, we also used as a feature words that appear in similar contexts to the word in question (that is, words that have a similar vector representation measured with the commonly used cosine metric, where vector representations are derived using methods of distributional semantics). To the best of our knowledge, this is the first effort exploring the use of distributional semantics, the semantics derived empirically from unannotated text often using vector space models, for a sequence classification task such as concept extraction. Therefore, we first experimented with different sliding window models and found the model with parameters that led to best performance in a preliminary sequence labeling task.The evaluation of this approach, performed against the i2b2/VA concept extraction corpus, showed that incorporating features based on the distribution of words across a large unannotated corpus significantly aids concept extraction. Compared to a supervised-only approach as a baseline, the micro-averaged F-score for exact match increased from 80.3% to 82.3% and the micro-averaged F-score based on inexact match increased from 89.7% to 91.3%. These improvements are highly significant according to the bootstrap resampling method and also considering the performance of other systems. Thus, distributional semantic features significantly improve the performance of concept extraction from clinical narratives by taking advantage of word distribution information obtained from unannotated data

    TRESTLE: Toolkit for Reproducible Execution of Speech, Text and Language Experiments

    Full text link
    The evidence is growing that machine and deep learning methods can learn the subtle differences between the language produced by people with various forms of cognitive impairment such as dementia and cognitively healthy individuals. Valuable public data repositories such as TalkBank have made it possible for researchers in the computational community to join forces and learn from each other to make significant advances in this area. However, due to variability in approaches and data selection strategies used by various researchers, results obtained by different groups have been difficult to compare directly. In this paper, we present TRESTLE (\textbf{T}oolkit for \textbf{R}eproducible \textbf{E}xecution of \textbf{S}peech \textbf{T}ext and \textbf{L}anguage \textbf{E}xperiments), an open source platform that focuses on two datasets from the TalkBank repository with dementia detection as an illustrative domain. Successfully deployed in the hackallenge (Hackathon/Challenge) of the International Workshop on Health Intelligence at AAAI 2022, TRESTLE provides a precise digital blueprint of the data pre-processing and selection strategies that can be reused via TRESTLE by other researchers seeking comparable results with their peers and current state-of-the-art (SOTA) approaches.Comment: Accepted at AMIA Informatics Summi

    CELLS: A Parallel Corpus for Biomedical Lay Language Generation

    Full text link
    Recent lay language generation systems have used Transformer models trained on a parallel corpus to increase health information accessibility. However, the applicability of these models is constrained by the limited size and topical breadth of available corpora. We introduce CELLS, the largest (63k pairs) and broadest-ranging (12 journals) parallel corpus for lay language generation. The abstract and the corresponding lay language summary are written by domain experts, assuring the quality of our dataset. Furthermore, qualitative evaluation of expert-authored plain language summaries has revealed background explanation as a key strategy to increase accessibility. Such explanation is challenging for neural models to generate because it goes beyond simplification by adding content absent from the source. We derive two specialized paired corpora from CELLS to address key challenges in lay language generation: generating background explanations and simplifying the original abstract. We adopt retrieval-augmented models as an intuitive fit for the task of background explanation generation, and show improvements in summary quality and simplicity while maintaining factual correctness. Taken together, this work presents the first comprehensive study of background explanation for lay language generation, paving the path for disseminating scientific knowledge to a broader audience. CELLS is publicly available at: https://github.com/LinguisticAnomalies/pls_retrieval

    Embedding Probabilities in Predication Space with Hermitian Holographic Reduced Representations

    Get PDF
    Abstract. Predication-based Semantic Indexing (PSI) is an approach to generating high-dimensional vector representations of concept-relation-concept triplets. In this paper, we develop a variant of PSI that accommodates estimation of the probability of encountering a particular predication (such as fluoxetine TREATS major depressive disorder) in a collection of predications concerning a concept of interest (such as major depressive disorder). PSI leverages reversible vector transformations provided by representational approaches known as Vector Symbolic Architectures (VSA). To embed probabilities we develop a novel VSA variant, Hermitian Holographic Reduced Representations, with improvements in predictive modeling experiments. The probabilistic interpretation this facilitates reveals previously unrecognized connections between PSI and quantum theory -perhaps most notably that PSI's estimation of relatedness across multiple reasoning pathways corresponds to the estimation of the probability of traversing indistinguishable pathways in accordance with the rules of quantum probability

    EpiphaNet: An Interactive Tool to Support Biomedical Discoveries

    Get PDF
    Background. EpiphaNet (http://epiphanet.uth.tmc.edu) is an interactive knowledge discovery system, which enables researchers to explore visually sets of relations extracted from MEDLINE using a combination of language processing techniques. In this paper, we discuss the theoretical and methodological foundations of the system, and evaluate the utility of the models that underlie it for literature‐based discovery. In addition, we present a summary of results drawn from a qualitative analysis of over six hours of interaction with the system by basic medical scientists. Results: The system is able to simulate open and closed discovery, and is shown to generate associations that are both surprising and interesting within the area of expertise of the researchers concerned. Conclusions: EpiphaNet provides an interactive visual representation of associations between concepts, which is derived from distributional statistics drawn from across the spectrum of biomedical citations in MEDLINE. This tool is available online, providing biomedical scientists with the opportunity to identify and explore associations of interest to them

    Students’ perceptions of school acoustics and the impact of noise on teaching and learning in secondary schools : findings of a questionnaire survey

    Get PDF
    This paper will present the design and findings of an online questionnaire survey of 11–16 year olds’ impressions of their school's acoustic environment, and of an experimental study into the effects of typical levels of classroom noise on adolescent's performance on numeracy and cognitive functioning tasks. Analysis of the responses to the questionnaire found that pupils who reported additional learning needs such as hearing impairment, speaking English as an additional language or receiving learning support reported being significantly more affected by poor school acoustics than pupils reporting no additional learning needs. Pupils attending suburban schools featuring cellular classrooms that were not exposed to a nearby noise sources were more positive about their school acoustics than pupils at schools with open plan classroom designs or attending schools that were exposed to external noise sources. The study demonstrates that adolescents are reliable judges of their school's acoustic environment, and have insight into the disruption to teaching and learning caused by poor listening conditions. Furthermore, pupils with additional learning needs are more at risk from the negative effects of poor school acoustics

    Maternal obesity reduces placental autophagy marker expression in uncomplicated pregnancies

    Get PDF
    AIM: Obesity has been associated with changes in autophagy and its increasing prevalence among pregnant women is implicated in higher rates of placental-mediated complications of pregnancy such as pre-eclampsia and intrauterine growth restriction. Autophagy is involved in normal placentation, thus changes in autophagy may lead to impaired placental function and development. The aim of this study was to investigate the connection between obesity and autophagy in the placenta in otherwise uncomplicated pregnancies. METHODS: Immunohistochemistry and western blot analysis were done on placental and omental samples from obese (body mass index [BMI] ≥30 kg/m RESULTS: As pre-pregnancy BMI increased, there was an increase in both placental and fetal weight as well as decreased levels of LC3B in the central region of the placenta (P = 0.0046). Within the obese patient group, LC3B levels were significantly decreased in the placentas of male fetuses compared to females (P \u3c 0.0001). Adipocytes, compared to milky spots and vasculature, had lower levels of p62 (P = 0.0127) and LC3B (P = 0.003) in obese omenta and lower levels of LC3B in control omenta (P = 0.0071). CONCLUSION: Obesity leads to reduced placental autophagy in uncomplicated pregnancies; thus, changes in autophagy may be involved in the underlying mechanisms of obesity-related placental diseases of pregnancy

    Robotic-assisted laparoscopic donor nephrectomy: Decreasing length of stay

    Get PDF
    Background: The number of robotic operations performed with the da Vinci Surgical System has increased during the past decade. This system allows for greater maneuverability and control than hand-assisted laparoscopic procedures, resulting in less tissue manipulation and irritation
    corecore