9 research outputs found

    An information model for computable cancer phenotypes

    Get PDF

    Enhance Representation Learning of Clinical Narrative with Neural Networks for Clinical Predictive Modeling

    Get PDF
    Medicine is undergoing a technological revolution. Understanding human health from clinical data has major challenges from technical and practical perspectives, thus prompting methods that understand large, complex, and noisy data. These methods are particularly necessary for natural language data from clinical narratives/notes, which contain some of the richest information on a patient. Meanwhile, deep neural networks have achieved superior performance in a wide variety of natural language processing (NLP) tasks because of their capacity to encode meaningful but abstract representations and learn the entire task end-to-end. In this thesis, I investigate representation learning of clinical narratives with deep neural networks through a number of tasks ranging from clinical concept extraction, clinical note modeling, and patient-level language representation. I present methods utilizing representation learning with neural networks to support understanding of clinical text documents. I first introduce the notion of representation learning from natural language processing and patient data modeling. Then, I investigate word-level representation learning to improve clinical concept extraction from clinical notes. I present two works on learning word representations and evaluate them to extract important concepts from clinical notes. The first study focuses on cancer-related information, and the second study evaluates shared-task data. The aims of these two studies are to automatically extract important entities from clinical notes. Next, I present a series of deep neural networks to encode hierarchical, longitudinal, and contextual information for modeling a series of clinical notes. I also evaluate the models by predicting clinical outcomes of interest, including mortality, length of stay, and phenotype predictions. Finally, I propose a novel representation learning architecture to develop a generalized and transferable language representation at the patient level. I also identify pre-training tasks appropriate for constructing a generalizable language representation. The main focus is to improve predictive performance of phenotypes with limited data, a challenging task due to a lack of data. Overall, this dissertation addresses issues in natural language processing for medicine, including clinical text classification and modeling. These studies show major barriers to understanding large-scale clinical notes. It is believed that developing deep representation learning methods for distilling enormous amounts of heterogeneous data into patient-level language representations will improve evidence-based clinical understanding. The approach to solving these issues by learning representations could be used across clinical applications despite noisy data. I conclude that considering different linguistic components in natural language and sequential information between clinical events is important. Such results have implications beyond the immediate context of predictions and further suggest future directions for clinical machine learning research to improve clinical outcomes. This could be a starting point for future phenotyping methods based on natural language processing that construct patient-level language representations to improve clinical predictions. While significant progress has been made, many open questions remain, so I will highlight a few works to demonstrate promising directions

    Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks

    Get PDF
    Interventional cancer clinical trials are generally too restrictive, and some patients are often excluded on the basis of comorbidity, past or concomitant treatments, or the fact that they are over a certain age. The efficacy and safety of new treatments for patients with these characteristics are, therefore, not defined. In this work, we built a model to automatically predict whether short clinical statements were considered inclusion or exclusion criteria. We used protocols from cancer clinical trials that were available in public registries from the last 18 years to train word-embeddings, and we constructed a~dataset of 6M short free-texts labeled as eligible or not eligible. A text classifier was trained using deep neural networks, with pre-trained word-embeddings as inputs, to predict whether or not short free-text statements describing clinical information were considered eligible. We additionally analyzed the semantic reasoning of the word-embedding representations obtained and were able to identify equivalent treatments for a type of tumor analogous with the drugs used to treat other tumors. We show that representation learning using {deep} neural networks can be successfully leveraged to extract the medical knowledge from clinical trial protocols for potentially assisting practitioners when prescribing treatments

    COMPUTATIONAL PHENOTYPING AND DRUG REPURPOSING FROM ELECTRONIC MEDICAL RECORDS

    Get PDF
    Using electronic medical records (EMR) for research involves selecting cohorts and manipulating data for tasks like predictive analysis. Computational phenotyping for cohort characterization and stratification is becoming increasingly important for researchers to produce clinically relevant findings. There are significant amounts of time and effort devoted to manual chart abstraction by subject matter experts and researchers, which creates a large bottleneck for progress in clinical research. I focus on developing computational phenotyping pipelines, and I also focus on using EMR for drug repurposing in breast cancer. Drug repurposing is defined as the process of applying known drugs that are already on the market to new disease indications. Using EMR data for drug repurposing has the unique advantage of being able to observe a patient cohort over time and see drug effects on outcomes. In this dissertation, I present work on computational phenotyping and EMR-based drug repurposing. First, I use embedding models and foundational natural language processing methods to predict oral cancer risk with pathology notes. Second, I use natural language processing methods and transfer learning for breast cancer cohort selection and information extraction. Third, I present a pipeline for producing drug repurposing candidates from EMR and provide supporting evidence for predictions with biomedical literature and existing clinical trials.Doctor of Philosoph

    Classificação automática de registos eletrónicos médicos por diagnóstico

    Get PDF
    A crescente implementação de sistemas de registos eletrônicos médicos (REM’s) nos Hospitais, com vista a apoiar o atendimento individual dos pacientes, está a provocar um aumento do processamento e armazenamento dos dados clínicos diariamente. Estes registos contêm uma fonte infindável de informação clínica, no entanto o facto de não haver estrutura no texto produzido pelos médicos e o facto das informações introduzidas divergirem de paciente para paciente e de especialidade médica para especialidade médica, dificulta o aproveitamento destes dados. Outra dificuldade que existe na análise deste tipo de dados é conseguir criar um sistema capaz de extrair informação minuciosa presente nos REM’s, de forma a ajudar os profissionais de saúde a reduzir a taxa de erro de diagnóstico, prevendo o tipo de doença do paciente. Atualmente, para superar este desafio os hospitais realizam este processo manualmente, no entanto este processo é longo e está suscetível a erros. Esta dissertação pretende propor uma solução para este problema, ao utilizar técnicas de Processamento de Linguagem Natural e de Aprendizagem Automática, de forma a permitir um sistema que possibilite a extração de conhecimento clínico e respetiva classificação do REM por tipo de doença/ diagnóstico, de uma forma automática. Este sistema foi desenvolvido em língua portuguesa, visto que todos os sistemas médicos de extração de conhecimento existentes são desenvolvidos para língua inglesa. Este cenário visa ajudar na evolução do aproveitamento das informações contidas nos REM’s e, consequentemente, visa contribuir para o crescimento deste tipo de sistemas dentro do hospital português envolvido nesta dissertação.The growing implementation of electronic medical record (EMR’s) systems in Hospitals, to support individual patient care, is causing an increase in the processing and storage of clinical data daily. These records contain an endless source of clinical information, however, the fact that there is no structure in the text produced by doctors and the fact that the information entered differ from patient to patient and from medical speciality to medical speciality, makes it difficult to use these data. Another difficulty that exists in the analysis of this type of data is to be able to create a system capable of extracting detailed information present in the EMR's, in order to help health professionals to reduce the error rate of diagnosis, predicting the type of disease of the patient. Currently, to overcome this challenge, hospitals carry out this process manually, however, this process is long and susceptible to errors. This dissertation intends to propose a solution to this problem, using techniques of Natural Language Processing and Machine Learning, in order to allow a system that allows the extraction of clinical knowledge and respective classification of EMR by type of disease/diagnosis, from an automatically. This system was developed in Portuguese language since all existing medical knowledge extraction systems are developed for English. This scenario aims to help in the evolution of the use of the information contained in the EMR’s and, consequently, aims to contribute to the growth of this type of systems within the Portuguese hospital involved in this dissertation

    Semantics Enhanced Deep Learning Medical Text Classifier

    Get PDF
    Electronic health records (EHR) contain a vast amount of data with the potential to leverage applications that improve patient outcomes and enhance the work of health care providers. A major portion of this data is inside unstructured text in the form of clinical narratives. To effectively use clinical text, NLP tools have been developed and applied to numerous problems involving clinical decision support systems, cohort identification, and phenotyping among others. However, one of the main problems that face the development of NLP tools for the clinical domain is the lack of large annotated data sets. Clinical language and report style variations are another major problem for clinical NLP. These variations lead to problems where NLP systems created with data from one institution exhibit significantly different performance when tested in a different institution. One way to address the lack of large annotated datasets and variations in clinical language is the explicit incorporation of semantics into the development of clinical NLP tools. Semantics allow us to know that the meaning of words, and thus help us account for language variations. In this work, we incorporate the semantics from ontologies into a loss function of a deep learning text classifier. Also, to specifically address the problem of the lack of large annotated datasets we used a large unannotated or unlabeled dataset, increasing the sample size as a result. To properly use such unlabeled data, we adapted a semi-supervised binary approach that uses the unlabeled dataset during training. To the best of our knowledge we are the first to do so, and for that reason, this is one of the main theoretical contributions of this work. Also, by reducing the need for extensive annotations, we believe this work could enable researchers in clinical settings to embrace and leverage the full potential of clinical NLP tools given the reduced effort required to achieve the desired performance. Furthermore, all the methods in this work are designed as reproducible and extensible software tools that aid further biomedical informatics research in this area
    corecore