554 research outputs found

    The use of Natural Language Processing techniques to support Health Literacy: an evidence-based review

    Get PDF
    Background and objectives: To conduct a literature search and analysis of the existing research using natural language processing for improving or helping health literacy, as well as to discuss the importance and potentials of addressing both fields in a joint manner. This review targets researchers who are unfamiliar with natural language processing in the field of health literacy, and in general, any researcher, regardless of his or her background, interested in multi-disciplinary research involving technology and health care. Methods: We introduce the concepts of health literacy and natural language processing. Then, a thorough search is performed using relevant databases and well-defined criteria. We review the existing literature addressing these topics, both in an independent and joint manner, and provide an overview of the state of the art using natural language processing in health literacy. We additionally discuss how the different issues in health literacy that are related to the comprehension of specialised health texts can be improved using natural language processing techniques, and the challenges involved in these processes. Results: The search process yielded 235 potential relevant references, 49 of which fully fulfilled the established search criteria, and therefore they were later analysed in more detail. These articles were clustered into groups with respect to their purpose, and most of them were focused on the development of specific natural language processing modules, such as question answering, information retrieval, text simplification or natural language generation in order to facilitate the understanding of health information.This research work has been partially funded by the University of Alicante, Generalitat Valenciana, Spanish Government and the European Commission through the projects, "Tratamiento inteligente de la informacion para la ayuda a la toma de decisiones" (GRE12-44), "Explotacion y tratamiento de la informacion disponible en Internet para la anotacion y generacion de textos adaptados al usuario" (GRE13-15), DIIM2.0 (PROMETEOII/2014/001), ATTOS (TIN2012-38536-C03-03), LEGOLANG-UAGE (TIN2012-31224), SAM (FP7-611312), and FIRST (FP7-287607)

    Automated Detection of Substance-Use Status and Related Information from Clinical Text

    Get PDF
    This study aims to develop and evaluate an automated system for extracting information related to patient substance use (smoking, alcohol, and drugs) from unstructured clinical text (medical discharge records). The authors propose a four-stage system for the extraction of the substance-use status and related attributes (type, frequency, amount, quit-time, and period). The first stage uses a keyword search technique to detect sentences related to substance use and to exclude unrelated records. In the second stage, an extension of the NegEx negation detection algorithm is developed and employed for detecting the negated records. The third stage involves identifying the temporal status of the substance use by applying windowing and chunking methodologies. Finally, in the fourth stage, regular expressions, syntactic patterns, and keyword search techniques are used in order to extract the substance-use attributes. The proposed system achieves an F1-score of up to 0.99 for identifying substance-use-related records, 0.98 for detecting the negation status, and 0.94 for identifying temporal status. Moreover, F1-scores of up to 0.98, 0.98, 1.00, 0.92, and 0.98 are achieved for the extraction of the amount, frequency, type, quit-time, and period attributes, respectively. Natural Language Processing (NLP) and rule-based techniques are employed efficiently for extracting substance-use status and attributes, with the proposed system being able to detect substance-use status and attributes over both sentence-level and document-level data. Results show that the proposed system outperforms the compared state-of-the-art substance-use identification system on an unseen dataset, demonstrating its generalisability

    Hypersensitivity Adverse Event Reporting in Clinical Cancer Trials: Barriers and Potential Solutions to Studying Severe Events on a Population Level

    Get PDF
    ABSTRACT HYPERSENSITIVITY ADVERSE EVENT REPORTING IN CLINICAL CANCER TRIALS: BARRIERS AND POTENTIAL SOLUTIONS TO STUDYING ALLERGIC EVENTS ON A POPULATION LEVEL by Christina Eldredge The University of Wisconsin-Milwaukee, 2020 Under the Supervision of Professor Timothy Patrick Clinical cancer trial interventions are associated with hypersensitivity events (HEs) which are recorded in the national clinical trial registry, ClinicalTrials.gov and publicly available. This data could potentially be leveraged to study predictors for HEs to identify at risk patients who may benefit from desensitization therapies to prevent these potentially life-threatening reactions. However, variation in investigator reporting methods is a barrier to leveraging this data for aggregation and analysis. The National Cancer Institute has developed the CTCAE classification system to address this barrier. This study analyzes the comprehensiveness of CTCAE to describe severe HEs in clinical cancer trials in comparison to other systems or terminologies. An XML parser was used to extract readable text from adverse event tables. Queries of the parsed data elements were performed to identify immune disorder events associated with biological and chemotherapy interventions. A data subset of severe anaphylactic and anaphylactoid events was created and analyzed. 1,331 clinical trials with 13088 immune disorder events occurred from September 20, 1999 to March 2018. 2409 (18.4%) of these were recorded as “serious” events. In the severe subset, MedDRA terminology, CTCAE or CTC classification systems were used to describe HEs, however, a large number of studies did not specify the system. The CTCAE term “anaphylaxis” was miscoded as “other (not including serious)” in 76.2% of events. The CTCAE classification system severity grades levels were not used to describe any of the severe events and the majority of terms did not include the allergen and therefore, in dual or multi- drug therapies, the etiologic agent was not identifiable. Furthermore, collection methods were not specified in 76% of events. Therefore, CTCAE was not found to improve the ability to capture event etiology or severity in anaphylaxis and anaphylactoid events in cancer clinical trials. Potential solutions to improving CTCAE HE description include adapting terms with a low percentage of HE severity miscoding (e.g. anaphylactic reaction) and terms which include drugs, biological agents and/or drug classes to improve study of anaphylaxis etiology and incidence in multi-drug cancer therapy, therefore, making a significant impact on patient safety

    Safeguarding Privacy Through Deep Learning Techniques

    Get PDF
    Over the last few years, there has been a growing need to meet minimum security and privacy requirements. Both public and private companies have had to comply with increasingly stringent standards, such as the ISO 27000 family of standards, or the various laws governing the management of personal data. The huge amount of data to be managed has required a huge effort from the employees who, in the absence of automatic techniques, have had to work tirelessly to achieve the certification objectives. Unfortunately, due to the delicate information contained in the documentation relating to these problems, it is difficult if not impossible to obtain material for research and study purposes on which to experiment new ideas and techniques aimed at automating processes, perhaps exploiting what is in ferment in the scientific community and linked to the fields of ontologies and artificial intelligence for data management. In order to bypass this problem, it was decided to examine data related to the medical world, which, especially for important reasons related to the health of individuals, have gradually become more and more freely accessible over time, without affecting the generality of the proposed methods, which can be reapplied to the most diverse fields in which there is a need to manage privacy-sensitive information

    SIFR BioPortal : Un portail ouvert et générique d’ontologies et de terminologies biomédicales françaises au service de l’annotation sémantique

    Get PDF
    National audienceContexte – Le volume de données en biomédecine ne cesse de croître. En dépit d'une large adoption de l'anglais, une quantité significative de ces données est en français. Dans le do-maine de l’intégration de données, les terminologies et les ontologies jouent un rôle central pour structurer les données biomédicales et les rendre interopérables. Cependant, outre l'existence de nombreuses ressources en anglais, il y a beaucoup moins d'ontologies en français et il manque crucialement d'outils et de services pour les exploiter. Cette lacune contraste avec le montant considérable de données biomédicales produites en français, par-ticulièrement dans le monde clinique (e.g., dossiers médicaux électroniques). Methode & Résultats – Dans cet article, nous présentons certains résultats du projet In-dexation sémantique de ressources biomédicales francophones (SIFR), en particulier le SIFR BioPortal, une plateforme ouverte et générique pour l’hébergement d’ontologies et de terminologies biomédicales françaises, basée sur la technologie du National Center for Biomedical Ontology. Le portail facilite l’usage et la diffusion des ontologies du domaine en offrant un ensemble de services (recherche, alignements, métadonnées, versionnement, vi-sualisation, recommandation) y inclus pour l’annotation sémantique. En effet, le SIFR An-notator est un outil d’annotation basé sur les ontologies pour traiter des données textuelles en français. Une évaluation préliminaire, montre que le service web obtient des résultats équivalents à ceux reportés précedement, tout en étant public, fonctionnel et tourné vers les standards du web sémantique. Nous présentons également de nouvelles fonctionnalités pour les services à base d’ontologies pour l’anglais et le français

    Neuroanatomical domain of the foundational model of anatomy ontology

    Full text link

    Doctor of Philosophy

    Get PDF
    dissertationElectronic Health Records (EHRs) provide a wealth of information for secondary uses. Methods are developed to improve usefulness of free text query and text processing and demonstrate advantages to using these methods for clinical research, specifically cohort identification and enhancement. Cohort identification is a critical early step in clinical research. Problems may arise when too few patients are identified, or the cohort consists of a nonrepresentative sample. Methods of improving query formation through query expansion are described. Inclusion of free text search in addition to structured data search is investigated to determine the incremental improvement of adding unstructured text search over structured data search alone. Query expansion using topic- and synonym-based expansion improved information retrieval performance. An ensemble method was not successful. The addition of free text search compared to structured data search alone demonstrated increased cohort size in all cases, with dramatic increases in some. Representation of patients in subpopulations that may have been underrepresented otherwise is also shown. We demonstrate clinical impact by showing that a serious clinical condition, scleroderma renal crisis, can be predicted by adding free text search. A novel information extraction algorithm is developed and evaluated (Regular Expression Discovery for Extraction, or REDEx) for cohort enrichment. The REDEx algorithm is demonstrated to accurately extract information from free text clinical iv narratives. Temporal expressions as well as bodyweight-related measures are extracted. Additional patients and additional measurement occurrences are identified using these extracted values that were not identifiable through structured data alone. The REDEx algorithm transfers the burden of machine learning training from annotators to domain experts. We developed automated query expansion methods that greatly improve performance of keyword-based information retrieval. We also developed NLP methods for unstructured data and demonstrate that cohort size can be greatly increased, a more complete population can be identified, and important clinical conditions can be detected that are often missed otherwise. We found a much more complete representation of patients can be obtained. We also developed a novel machine learning algorithm for information extraction, REDEx, that efficiently extracts clinical values from unstructured clinical text, adding additional information and observations over what is available in structured text alone

    Neuroanatomical Domain of the Foundational Model of Anatomy Ontology

    Get PDF
    Background: The diverse set of human brain structure and function analysis methods represents a difficult challenge for reconciling multiple views of neuroanatomical organization. While different views of organization are expected and valid, no widely adopted approach exists to harmonize different brain labeling protocols and terminologies. Our approach uses the natural organizing framework provided by anatomical structure to correlate terminologies commonly used in neuroimaging. Description: The Foundational Model of Anatomy (FMA) Ontology provides a semantic framework for representing the anatomical entities and relationships that constitute the phenotypic organization of the human body. In this paper we describe recent enhancements to the neuroanatomical content of the FMA that models cytoarchitectural and morphological regions of the cerebral cortex, as well as white matter structure and connectivity. This modeling effort is driven by the need to correlate and reconcile the terms used in neuroanatomical labeling protocols. By providing an ontological framework that harmonizes multiple views of neuroanatomical organization, the FMA provides developers with reusable and computable knowledge for a range of biomedical applications. Conclusions: A requirement for facilitating the integration of basic and clinical neuroscience data from diverse sources is a well-structured ontology that can incorporate, organize, and associate neuroanatomical data. We applied the ontological framework of the FMA to align the vocabularies used by several human brain atlases, and to encode emerging knowledge about structural connectivity in the brain. We highlighted several use cases of these extensions, including ontology reuse, neuroimaging data annotation, and organizing 3D brain models
    • …
    corecore