110 research outputs found

    Deep learning in clinical natural language processing: a methodical review.

    Get PDF
    OBJECTIVE: This article methodically reviews the literature on deep learning (DL) for natural language processing (NLP) in the clinical domain, providing quantitative analysis to answer 3 research questions concerning methods, scope, and context of current research. MATERIALS AND METHODS: We searched MEDLINE, EMBASE, Scopus, the Association for Computing Machinery Digital Library, and the Association for Computational Linguistics Anthology for articles using DL-based approaches to NLP problems in electronic health records. After screening 1,737 articles, we collected data on 25 variables across 212 papers. RESULTS: DL in clinical NLP publications more than doubled each year, through 2018. Recurrent neural networks (60.8%) and word2vec embeddings (74.1%) were the most popular methods; the information extraction tasks of text classification, named entity recognition, and relation extraction were dominant (89.2%). However, there was a long tail of other methods and specific tasks. Most contributions were methodological variants or applications, but 20.8% were new methods of some kind. The earliest adopters were in the NLP community, but the medical informatics community was the most prolific. DISCUSSION: Our analysis shows growing acceptance of deep learning as a baseline for NLP research, and of DL-based NLP in the medical community. A number of common associations were substantiated (eg, the preference of recurrent neural networks for sequence-labeling named entity recognition), while others were surprisingly nuanced (eg, the scarcity of French language clinical NLP with deep learning). CONCLUSION: Deep learning has not yet fully penetrated clinical NLP and is growing rapidly. This review highlighted both the popular and unique trends in this active field

    Predicting the artificial immunity induced by RUTI® vaccine against tuberculosis using universal immune system simulator (UISS)

    Get PDF
    BACKGROUND: Tuberculosis (TB) represents a worldwide cause of mortality (it infects one third of the world's population) affecting mostly developing countries, including India, and recently also developed ones due to the increased mobility of the world population and the evolution of different new bacterial strains capable to provoke multi-drug resistance phenomena. Currently, antitubercular drugs are unable to eradicate subpopulations of Mycobacterium tuberculosis (MTB) bacilli and therapeutic vaccinations have been postulated to overcome some of the critical issues related to the increase of drug-resistant forms and the difficult clinical and public health management of tuberculosis patients. The Horizon 2020 EC funded project "In Silico Trial for Tuberculosis Vaccine Development" (STriTuVaD) to support the identification of new therapeutic interventions against tuberculosis through novel in silico modelling of human immune responses to disease and vaccines, thereby drastically reduce the cost of clinical trials in this critical sector of public healthcare

    DeepEP: A Deep Learning Framework for Identifying Essential Proteins

    Get PDF
    Background: Essential proteins are crucial for cellular life and thus, identification of essential proteins is an important topic and a challenging problem for researchers. Recently lots of computational approaches have been proposed to handle this problem. However, traditional centrality methods cannot fully represent the topological features of biological networks. In addition, identifying essential proteins is an imbalanced learning problem; but few current shallow machine learning-based methods are designed to handle the imbalanced characteristics. Results: We develop DeepEP based on a deep learning framework that uses the node2vec technique, multi-scale convolutional neural networks and a sampling technique to identify essential proteins. In DeepEP, the node2vec technique is applied to automatically learn topological and semantic features for each protein in protein-protein interaction (PPI) network. Gene expression profiles are treated as images and multi-scale convolutional neural networks are applied to extract their patterns. In addition, DeepEP uses a sampling method to alleviate the imbalanced characteristics. The sampling method samples the same number of the majority and minority samples in a training epoch, which is not biased to any class in training process. The experimental results show that DeepEP outperforms traditional centrality methods. Moreover, DeepEP is better than shallow machine learning-based methods. Detailed analyses show that the dense vectors which are generated by node2vec technique contribute a lot to the improved performance. It is clear that the node2vec technique effectively captures the topological and semantic properties of PPI network. The sampling method also improves the performance of identifying essential proteins. Conclusion: We demonstrate that DeepEP improves the prediction performance by integrating multiple deep learning techniques and a sampling method. DeepEP is more effective than existing methods

    Interpretable Deep Neural Network for Cancer Survival Analysis by Integrating Genomic and Clinical Data

    Get PDF
    Background: Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis. Results: We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified. Conclusions: Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet

    Medinoid : computer-aided diagnosis and localization of glaucoma using deep learning

    Get PDF
    Glaucoma is a leading eye disease, causing vision loss by gradually affecting peripheral vision if left untreated. Current diagnosis of glaucoma is performed by ophthalmologists, human experts who typically need to analyze different types of medical images generated by different types of medical equipment: fundus, Retinal Nerve Fiber Layer (RNFL), Optical Coherence Tomography (OCT) disc, OCT macula, perimetry, and/or perimetry deviation. Capturing and analyzing these medical images is labor intensive and time consuming. In this paper, we present a novel approach for glaucoma diagnosis and localization, only relying on fundus images that are analyzed by making use of state-of-the-art deep learning techniques. Specifically, our approach towards glaucoma diagnosis and localization leverages Convolutional Neural Networks (CNNs) and Gradient-weighted Class Activation Mapping (Grad-CAM), respectively. We built and evaluated different predictive models using a large set of fundus images, collected and labeled by ophthalmologists at Samsung Medical Center (SMC). Our experimental results demonstrate that our most effective predictive model is able to achieve a high diagnosis accuracy of 96%, as well as a high sensitivity of 96% and a high specificity of 100% for Dataset-Optic Disc (OD), a set of center-cropped fundus images highlighting the optic disc. Furthermore, we present Medinoid, a publicly-available prototype web application for computer-aided diagnosis and localization of glaucoma, integrating our most effective predictive model in its back-end

    Graph Theoretic and Pearson Correlation-Based Discovery of Network Biomarkers for Cancer

    Get PDF
    Two graph theoretic concepts—clique and bipartite graphs—are explored to identify the network biomarkers for cancer at the gene network level. The rationale is that a group of genes work together by forming a cluster or a clique-like structures to initiate a cancer. After initiation, the disease signal goes to the next group of genes related to the second stage of a cancer, which can be represented as a bipartite graph. In other words, bipartite graphs represent the cross-talk among the genes between two disease stages. To prove this hypothesis, gene expression values for three cancers— breast invasive carcinoma (BRCA), colorectal adenocarcinoma (COAD) and glioblastoma multiforme (GBM)—are used for analysis. First, a co-expression gene network is generated with highly correlated gene pairs with a Pearson correlation coefficient ≥ 0.9. Second, clique structures of all sizes are isolated from the co-expression network. Then combining these cliques, three different biomarker modules are developed—maximal clique-like modules, 2-clique-1-bipartite modules, and 3-clique-2-bipartite modules. The list of biomarker genes discovered from these network modules are validated as the essential genes for causing a cancer in terms of network properties and survival analysis. This list of biomarker genes will help biologists to design wet lab experiments for further elucidating the complex mechanism of cancer

    The RareDis corpus: A corpus annotated with rare diseases, their signs and symptoms

    Get PDF
    Rare diseases affect a small number of people compared to the general population. However, more than 6,000 different rare diseases exist and, in total, they affect more than 300 million people worldwide. Rare diseases share as part of their main problem, the delay in diagnosis and the sparse information available for researchers, clinicians, and patients. Finding a diagnostic can be a very long and frustrating experience for patients and their families. The average diagnostic delay is between 6–8 years. Many of these diseases result in different manifestations among patients, which hampers even more their detection and the correct treatment choice. Therefore, there is an urgent need to increase the scientific and medical knowledge about rare diseases. Natural Language Processing (NLP) can help to extract relevant information about rare diseases to facilitate their diagnosis and treatments, but most NLP techniques require manually annotated corpora. Therefore, our goal is to create a gold standard corpus annotated with rare diseases and their clinical manifestations. It could be used to train and test NLP approaches and the information extracted through NLP could enrich the knowledge of rare diseases, and thereby, help to reduce the diagnostic delay and improve the treatment of rare diseases. The paper describes the selection of 1,041 texts to be included in the corpus, the annotation process and the annotation guidelines. The entities (disease, rare disease, symptom, sign and anaphor) and the relationships (produces, is a, is acron, is synon, increases risk of, anaphora) were annotated. The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases. This could open the door to further NLP applications, which would facilitate the diagnosis and treatment of these rare diseases and, therefore, would improve dramatically the quality of life of these patients.This work was supported by the Madrid Government (Comunidad de Madrid) under the Multiannual Agreement with UC3M in the line of "Fostering Young Doctors Research" (NLP4RARE-CM-UC3M) and in the context of the V PRICIT (Regional Programme of Research and Technological Innovation; the Multiannual Agreement with UC3M in the line of "Excellence of University Professors (EPUC3M17)"; and a grant from Spanish Ministry of Economy and Competitiveness (SAF2017-86810-R)
    • …
    corecore