589 research outputs found

    Integrating speculation detection and deep learning to extract lung cancer diagnosis from clinical notes

    Full text link
    Despite efforts to develop models for extracting medical concepts from clinical notes, there are still some challenges in particular to be able to relate concepts to dates. The high number of clinical notes written for each single patient, the use of negation, speculation, and different date formats cause ambiguity that has to be solved to reconstruct the patient’s natural history. In this paper, we concentrate on extracting from clinical narratives the cancer diagnosis and relating it to the diagnosis date. To address this challenge, a hybrid approach that combines deep learning-based and rule-based methods is proposed. The approach integrates three steps: (i) lung cancer named entity recognition, (ii) negation and speculation detection, and (iii) relating the cancer diagnosis to a valid date. In particular, we apply the proposed approach to extract the lung cancer diagnosis and its diagnosis date from clinical narratives written in Spanish. Results obtained show an F-score of 90% in the named entity recognition task, and a 89% F-score in the task of relating the cancer diagnosis to the diagnosis date. Our findings suggest that speculation detection is together with negation detection a key component to properly extract cancer diagnosis from clinical notesThis work is supported by the EU Horizon 2020 innovation program under grant agreement No. 780495, project BigMedilytics (Big Data for Medical Analytics). It has been also supported by Fundación AECC and Instituto de Salud Carlos III (grant AC19/00034), under the frame of ERA-NET PerMe

    Exploiting thesauri knowledge in medical guideline formalization

    Get PDF
    Abstract. As in software product lifecycle, the effort spent in maintaining medical knowledge in guidelines can be reduced, if modularization, formalization and tracking of domain knowledge are employed across the guideline development phases. We propose to exploit and combine knowledge templates with medical background knowledge from existing thesauri in order to produce reusable building blocks used in guideline development. These templates enable easier guideline formalization, by describing how chunks of medical knowledge can be combined into more complex ones and how they are linked to a textual representation. By linking our ontology used in guideline formalization with existing thesauri, we can use compilations of thesauri knowledge as building blocks for modeling and maintaining the content of a medical guideline. Our paper investigates whether medical knowledge acquired from several medical thesauri can be molded on a guideline pattern, such that it supports building of executable models of guidelines. Keywords: Linguistic and Control Patterns, Guideline Modelling and Formalization. 1. Objective Evidence-based clinical guidelines, representing disseminated state-of-the-art medical practice, undergo frequent changes due to new research results, and require permanent maintenance, similar to that required in a software project. Existing guideline modelin

    Exploring Identifiers of Research Articles Related to Food and Disease using Artificial Intelligence

    Get PDF
    The research project aims to understand how variation in writing styles and flexibility of text mining methods control their ability to extract useful information from articles about food and health. Those areas of study are significant because they incorporate features of text mining methods and food-health articles. The project will build a database and mining tools that would change the way we search and collect information from scientific publications and the way we analyze this information for further applications. The strategy to achieve the project’s goal is to engage several teams of undergraduate students in Applied Computing to develop a food-health portal. Some teams will develop text mining tools and others use these tools and existing data-mining tools to extract the portal contents from articles about food-health. The information extracted will create and inform a database of food/health relationships. The project addresses several issues of central importance to the success of text mining techniques extracting useful food-health information for serving society now and in future. Those include: how writing style of an article is determined automatically, how main topic of an article/document is identified automatically, how useful information is extracted from an article/document to help national and international researchers in conducting further research, how available food articles can be quickly utilized to help the society, how undergraduate students gain skills required for extracting useful information from the huge amount of data available on the internet

    Cancer biotherapy resource

    Get PDF
    `Cancer Biotherapy\u27 - as opposed to cancer chemotherapy- is the use of macromolecular, biological agents instead of organic chemicals or drugs to treat cancer. Biotherapy is a treatment modality that blocks the growth of cancer cells by interfering with specific, targeted molecules needed for carcinogenesis and tumor growth instead of simply interfering with rapidly dividing cells as in chemotherapy1. In light to the much higher selectivity of biological agents than chemical agents for cancer cells over normal cells, there is a much less toxic side effect in biotherapy as compared to chemotherapy. As solid tumor cancer continues to be analyzed as a chronic condition, there is an absolute need for long-term treatment with minimal side effects. The International Society for Biological Therapy of Cancer, being the only available information database for cancer biotherapy, lacks some crucial information about various cancer biotherapy regimens and the information presented seemed unorganized and unsystematic making it difficult to search for results. With the increasing rate of cancer deaths across the world and biotherapy studies, it is acutely necessary to have a comprehensive curetted cancer biotherapy database. The database accessible to cancer patients and also should be a sounding board for scientific ideas by cancer researchers. The database/web server has information about main families of cancer biotherapy regimens to date, namely, 1.) Protein Kinase Inhibitors, 2.) Ras Pathway Inhibitors, 3.) Cell-Cycle Active Agents, 4.) MAbs (monoclonal antibodies), 5.) ADEPT (Antibody-Directed Enzyme Pro-Drug Therapy), 6.) Cytokines (interferons, interleukins, etc.), 7.) Anti-Angiogenesis Agents, 8.) Cancer Vaccines (peptides, proteins, DNA), 9.) Cell-based Immunotherapeutics, 10.) Gene Therapy, 11.) Hematopoietic Growth Factors, and 12.) Retinoids 13.) CAAT. For each biotherapy regimen, we will extract the following attributes in populating the database: (a.) Cancer type, (b.) Gene/s and gene product/s involved, (c.) Gene sequence (GenBank ID), (d.) Organs affected (e.) Chemo treatment, (f.) Reference papers, (g.) Clinical phase/stage, (h.) Survival rate (chemo. Vs. biother.), (i.) Clinical test center locations, (j.) Cost, (k.) Patient blog, (l.) Researcher blog, (m.) Future work. The database accessible to public through a website and had FAQs for making it understandable to the laymen and discussion page for researchers to express their views and ideas. In addition to information about the biotherapy regimens, the website is linked to other biologically significant databases like structural proteomics, metabolomics, glycomics, and lipidomics web servers. Also, the websites presented the news in the field of biotherapy and other links which are relevant from biotherapy point of view. The database attributes would be regularly updated for novel attributes as discoveries would be made

    Locating previously unknown patterns in data-mining results: a dual data- and knowledge-mining method

    Get PDF
    BACKGROUND: Data mining can be utilized to automate analysis of substantial amounts of data produced in many organizations. However, data mining produces large numbers of rules and patterns, many of which are not useful. Existing methods for pruning uninteresting patterns have only begun to automate the knowledge acquisition step (which is required for subjective measures of interestingness), hence leaving a serious bottleneck. In this paper we propose a method for automatically acquiring knowledge to shorten the pattern list by locating the novel and interesting ones. METHODS: The dual-mining method is based on automatically comparing the strength of patterns mined from a database with the strength of equivalent patterns mined from a relevant knowledgebase. When these two estimates of pattern strength do not match, a high "surprise score" is assigned to the pattern, identifying the pattern as potentially interesting. The surprise score captures the degree of novelty or interestingness of the mined pattern. In addition, we show how to compute p values for each surprise score, thus filtering out noise and attaching statistical significance. RESULTS: We have implemented the dual-mining method using scripts written in Perl and R. We applied the method to a large patient database and a biomedical literature citation knowledgebase. The system estimated association scores for 50,000 patterns, composed of disease entities and lab results, by querying the database and the knowledgebase. It then computed the surprise scores by comparing the pairs of association scores. Finally, the system estimated statistical significance of the scores. CONCLUSION: The dual-mining method eliminates more than 90% of patterns with strong associations, thus identifying them as uninteresting. We found that the pruning of patterns using the surprise score matched the biomedical evidence in the 100 cases that were examined by hand. The method automates the acquisition of knowledge, thus reducing dependence on the knowledge elicited from human expert, which is usually a rate-limiting step

    Network-based approaches to explore complex biological systems towards network medicine

    Get PDF
    Network medicine relies on different types of networks: from the molecular level of protein–protein interactions to gene regulatory network and correlation studies of gene expression. Among network approaches based on the analysis of the topological properties of protein–protein interaction (PPI) networks, we discuss the widespread DIAMOnD (disease module detection) algorithm. Starting from the assumption that PPI networks can be viewed as maps where diseases can be identified with localized perturbation within a specific neighborhood (i.e., disease modules), DIAMOnD performs a systematic analysis of the human PPI network to uncover new disease-associated genes by exploiting the connectivity significance instead of connection density. The past few years have witnessed the increasing interest in understanding the molecular mechanism of post-transcriptional regulation with a special emphasis on non-coding RNAs since they are emerging as key regulators of many cellular processes in both physiological and pathological states. Recent findings show that coding genes are not the only targets that microRNAs interact with. In fact, there is a pool of different RNAs—including long non-coding RNAs (lncRNAs) —competing with each other to attract microRNAs for interactions, thus acting as competing endogenous RNAs (ceRNAs). The framework of regulatory networks provides a powerful tool to gather new insights into ceRNA regulatory mechanisms. Here, we describe a data-driven model recently developed to explore the lncRNA-associated ceRNA activity in breast invasive carcinoma. On the other hand, a very promising example of the co-expression network is the one implemented by the software SWIM (switch miner), which combines topological properties of correlation networks with gene expression data in order to identify a small pool of genes—called switch genes—critically associated with drastic changes in cell phenotype. Here, we describe SWIM tool along with its applications to cancer research and compare its predictions with DIAMOnD disease genes
    • …
    corecore