61 research outputs found

    End to end approach for i2b2 2012 challenge based on Cross-lingual models

    Get PDF
    BACKGROUND - We propose a Cross-lingual approach to i2b2 2012 challenge for Clinical Records focused on the temporal relations in clinical narratives. Corpus of discharge summaries annotated with temporal information was provided for automatically extracting : (1) clinically significant events, including both clinical concepts such as problems, tests, treatments, and clinical departments, and events relevant to the patient’s clinical timeline, such as admissions, transfers between departments, etc; (2) temporal expressions, referring to the dates, times, duration, or frequencies in the clinical text. The values of the extracted temporal expressions had to be normalized to an ISO specification standard; and (3) temporal relations, among the clinical events and temporal expressions. GOALS - The objectives involved in the current work consists on outperforming previous State of the Art for the i2b2 2012 challenge and adapting Cross-lingual model into clinical specific domain with low Data resources available. METHODS - The task has been conceived as a pipeline of different modules, an event and temporal expression token-classifier and a text-classifier for relation extraction, each of them independently developed from the other. We used XLM-RoBERTa Cross-lingual model. RESULTS - For event detection, the proposed token-classifier obtains a 0.91 Span F1. For temporal expressions, our sentence-classifier achieves a 0.91 Span F1. For temporal relation, we propose sentence classifier based on sequential-taggers that performs at 0.29 F1 measure.DESKRIBAPENA - Narratiba klinikoen domeinuan i2b2 2012 erronkarako hizkuntzarteko ikuspegia jorratzen duen soluzioa proposatzen dugu. Erronka honek txosten medikuetan islatzen diren gertaeren arteko denbora-erlazioak iragartzea du helburu. Horretarako, lan hau alde batetik (1) klinikoki esanguratsuak diren gertaerak, adibidez, kontzeptu klinikoak, probak, tratamenduak, sail klinikoak eta bestetik, (2) denbora-adierazpenak, adibidez, txostenak esleituta duen data, denbora, iraupen edo maiztasuna adierazten duten espresioak antzeman eta bukatzeko gertaera klinikoen eta (3) denbora-adierazpenen arteako erlazioak anotatuta duen corpus batetik abiatzen da. HELBURUAK - Lanaren helburuak i2b2 2012 artearen egoera hobetzea eta Cross-lingual modeloa Data baliabide baxuak dituen domeinu kliniko espezifikora egokitzea dira. METODOAK - Lana modulu desberdinetako hobi gisa ulertu da, gertaera eta denbora-adierazpenetarako sekuentzia-markatzaileak, eta denbora-erlaziorako perpaus-sailkatzailea, independenteki garatu dira. XLM-RoBERTa Cross-lingual modeloa erabili izan da lan honetan. EMAITZAK - Gertaerak atzemateko, 0.91 Span F1 exekutatzen duen sekuentzia-markatzailea proposatzen dugu. Denbora-adierazpenetarako, 0.91 Span F1 egiten duen sekuentzia-markatzailea bat proposatzen dugu. Denbora-erlaziorako, 0.29 F1 neurria egiten duten sekuentzia-markatzaileetan oinarritutako perpaus-sailkatzailea proposatzen dugu

    Search strategy formulation for systematic reviews: Issues, challenges and opportunities

    Get PDF
    Systematic literature reviews play a vital role in identifying the best available evidence for health and social care research, policy, and practice. The resources required to produce systematic reviews can be significant, and a key to the success of any review is the search strategy used to identify relevant literature. However, the methods used to construct search strategies can be complex, time consuming, resource intensive and error prone. In this review, we examine the state of the art in resolving complex structured information needs, focusing primarily on the healthcare context. We analyse the literature to identify key challenges and issues and explore appropriate solutions and workarounds. From this analysis we propose a way forward to facilitate trust and to aid explainability and transparency, reproducibility and replicability through a set of key design principles for tools to support the development of search strategies in systematic literature reviews

    SemClinBr -- a multi institutional and multi specialty semantically annotated corpus for Portuguese clinical NLP tasks

    Full text link
    The high volume of research focusing on extracting patient's information from electronic health records (EHR) has led to an increase in the demand for annotated corpora, which are a very valuable resource for both the development and evaluation of natural language processing (NLP) algorithms. The absence of a multi-purpose clinical corpus outside the scope of the English language, especially in Brazilian Portuguese, is glaring and severely impacts scientific progress in the biomedical NLP field. In this study, we developed a semantically annotated corpus using clinical texts from multiple medical specialties, document types, and institutions. We present the following: (1) a survey listing common aspects and lessons learned from previous research, (2) a fine-grained annotation schema which could be replicated and guide other annotation initiatives, (3) a web-based annotation tool focusing on an annotation suggestion feature, and (4) both intrinsic and extrinsic evaluation of the annotations. The result of this work is the SemClinBr, a corpus that has 1,000 clinical notes, labeled with 65,117 entities and 11,263 relations, and can support a variety of clinical NLP tasks and boost the EHR's secondary use for the Portuguese language

    Normalizing acronyms and abbreviations to aid patient understanding of clinical texts: ShARe/CLEF eHealth Challenge 2013, Task 2

    Get PDF
    Background: The ShARe/CLEF eHealth challenge lab aims to stimulate development of natural language processing and information retrieval technologies to aid patients in understanding their clinical reports. In clinical text, acronyms and abbreviations, also referenced as short forms, can be difficult for patients to understand. For one of three shared tasks in 2013 (Task 2), we generated a reference standard of clinical short forms normalized to the Unified Medical Language System. This reference standard can be used to improve patient understanding by linking to web sources with lay descriptions of annotated short forms or by substituting short forms with a more simplified, lay term. Methods: In this study, we evaluate 1) accuracy of participating systems’ normalizing short forms compared to a majority sense baseline approach, 2) performance of participants’ systems for short forms with variable majority sense distributions, and 3) report the accuracy of participating systems’ normalizing shared normalized concepts between the test set and the Consumer Health Vocabulary, a vocabulary of lay medical terms. Results: The best systems submitted by the five participating teams performed with accuracies ranging from 43 to 72 %. A majority sense baseline approach achieved the second best performance. The performance of participating systems for normalizing short forms with two or more senses with low ambiguity (majority sense greater than 80 %) ranged from 52 to 78 % accuracy, with two or more senses with moderate ambiguity (majority sense between 50 and 80 %) ranged from 23 to 57 % accuracy, and with two or more senses with high ambiguity (majority sense less than 50 %) ranged from 2 to 45 % accuracy. With respect to the ShARe test set, 69 % of short form annotations contained common concept unique identifiers with the Consumer Health Vocabulary. For these 2594 possible annotations, the performance of participating systems ranged from 50 to 75 % accuracy. Conclusion: Short form normalization continues to be a challenging problem. Short form normalization systems perform with moderate to reasonable accuracies. The Consumer Health Vocabulary could enrich its knowledge base with missed concept unique identifiers from the ShARe test set to further support patient understanding of unfamiliar medical terms.</p

    Ordinal Convolutional Neural Networks for Predicting RDoC Positive Valence Psychiatric Symptom Severity Scores

    Get PDF
    Background—The CEGS N-GRID 2016 Shared Task in Clinical Natural Language Processing (NLP) provided a set of 1000 neuropsychiatric notes to participants as part of a competition to predict psychiatric symptom severity scores. This paper summarizes our methods, results, and experiences based on our participation in the second track of the shared task. Objective—Classical methods of text classification usually fall into one of three problem types: binary, multi-class, and multi-label classification. In this effort, we study ordinal regression problems with text data where misclassifications are penalized differently based on how far apart the ground truth and model predictions are on the ordinal scale. Specifically, we present our entries (methods and results) in the N-GRID shared task in predicting research domain criteria (RDoC) positive valence ordinal symptom severity scores (absent, mild, moderate, and severe) from psychiatric notes. Methods—We propose a novel convolutional neural network (CNN) model designed to handle ordinal regression tasks on psychiatric notes. Broadly speaking, our model combines an ordinal loss function, a CNN, and conventional feature engineering (wide features) into a single model which is learned end-to-end. Given interpretability is an important concern with nonlinear models, we apply a recent approach called locally interpretable model-agnostic explanation (LIME) to identify important words that lead to instance specific predictions. Results—Our best model entered into the shared task placed third among 24 teams and scored a macro mean absolute error (MMAE) based normalized score (100 · (1 − M M AE)) of 83.86. Since the competition, we improved our score (using basic ensembling) to 85.55, comparable with the winning shared task entry. Applying LIME to model predictions, we demonstrate the feasibility of instance specific prediction interpretation by identifying words that led to a particular decision

    Study of Short-Term Personalized Glucose Predictive Models on Type-1 Diabetic Children

    Full text link
    Research in diabetes, especially when it comes to building data-driven models to forecast future glucose values, is hindered by the sensitive nature of the data. Because researchers do not share the same data between studies, progress is hard to assess. This paper aims at comparing the most promising algorithms in the field, namely Feedforward Neural Networks (FFNN), Long Short-Term Memory (LSTM) Recurrent Neural Networks, Extreme Learning Machines (ELM), Support Vector Regression (SVR) and Gaussian Processes (GP). They are personalized and trained on a population of 10 virtual children from the Type 1 Diabetes Metabolic Simulator software to predict future glucose values at a prediction horizon of 30 minutes. The performances of the models are evaluated using the Root Mean Squared Error (RMSE) and the Continuous Glucose-Error Grid Analysis (CG-EGA). While most of the models end up having low RMSE, the GP model with a Dot-Product kernel (GP-DP), a novel usage in the context of glucose prediction, has the lowest. Despite having good RMSE values, we show that the models do not necessarily exhibit a good clinical acceptability, measured by the CG-EGA. Only the LSTM, SVR and GP-DP models have overall acceptable results, each of them performing best in one of the glycemia regions

    Novel Event Detection and Classification for Historical Texts

    Get PDF
    Event processing is an active area of research in the Natural Language Processing community but resources and automatic systems developed so far have mainly addressed contemporary texts. However, the recognition and elaboration of events is a crucial step when dealing with historical texts particularly in the current era of massive digitization of historical sources: research in this domain can lead to the development of methodologies and tools that can assist historians in enhancing their work, while having an impact also on the field of Natural Language Processing. Our work aims at shedding light on the complex concept of events when dealing with historical texts. More specifically, we introduce new annotation guidelines for event mentions and types, categorised into 22 classes. Then, we annotate a historical corpus accordingly, and compare two approaches for automatic event detection and classification following this novel scheme. We believe that this work can foster research in a field of inquiry so far underestimated in the area of Temporal Information Processing. To this end, we release new annotation guidelines, a corpus and new models for automatic annotation
    • …
    corecore