232 research outputs found

    Toward More Predictive Models by Leveraging Multimodal Data

    Get PDF
    Data is often composed of structured and unstructured data. Both forms of data have information that can be exploited by machine learning models to increase their prediction performance on a task. However, integrating the features from both these data forms is a hard, complicated task. This is all the more true for models which operate on time-constraints. Time-constrained models are machine learning models that work on input where time causality has to be maintained such as predicting something in the future based on past data. Most previous work does not have a dedicated pipeline that is generalizable to different tasks and domains, especially under time-constraints. In this work, we present a systematic, domain-agnostic pipeline for integrating features from structured and unstructured data while maintaining time causality for building models. We focus on the healthcare and consumer market domain and perform experiments, preprocess data, and build models to demonstrate the generalizability of the pipeline. More specifically, we focus on the task of identifying patients who are at risk of an imminent ICU admission. We use our pipeline to solve this task and show how augmenting unstructured data with structured data improves model performance. We found that by combining structured and unstructured data we can get a performance improvement of up to 8.5

    Linking social media, medical literature, and clinical notes using deep learning.

    Get PDF
    Researchers analyze data, information, and knowledge through many sources, formats, and methods. The dominant data format includes text and images. In the healthcare industry, professionals generate a large quantity of unstructured data. The complexity of this data and the lack of computational power causes delays in analysis. However, with emerging deep learning algorithms and access to computational powers such as graphics processing unit (GPU) and tensor processing units (TPUs), processing text and images is becoming more accessible. Deep learning algorithms achieve remarkable results in natural language processing (NLP) and computer vision. In this study, we focus on NLP in the healthcare industry and collect data not only from electronic medical records (EMRs) but also medical literature and social media. We propose a framework for linking social media, medical literature, and EMRs clinical notes using deep learning algorithms. Connecting data sources requires defining a link between them, and our key is finding concepts in the medical text. The National Library of Medicine (NLM) introduces a Unified Medical Language System (UMLS) and we use this system as the foundation of our own system. We recognize social media’s dynamic nature and apply supervised and semi-supervised methodologies to generate concepts. Named entity recognition (NER) allows efficient extraction of information, or entities, from medical literature, and we extend the model to process the EMRs’ clinical notes via transfer learning. The results include an integrated, end-to-end, web-based system solution that unifies social media, literature, and clinical notes, and improves access to medical knowledge for the public and experts

    USING ARTIFICIAL INTELLIGENCE TO IMPROVE HEALTHCARE QUALITY AND EFFICIENCY

    Get PDF
    In recent years, artificial intelligence (AI), especially machine learning (ML) and deep learning (DL), has represented one of the most exciting advances in science. The performance of ML-based AI in many areas, such as computer vision, voice recognition, and natural language processing has improved dramatically, offering unprecedented opportunities for application in a variety of different domains. In the critical domain of healthcare, great potential exists for a broader application of ML to improve quality and efficiency. At the same time, there are substantial challenges in the development and implementation of AI in healthcare. This dissertation aims to study the application of state-of-the-art AI technologies in healthcare, ranging from original method development to model interpretation and real-world implementation. First, a novel DL-based method is developed to efficiently analyze the rich and complex electronic health record data. This DL-based approach shows promise in facilitating the analysis of real-world data and can complement clinical knowledge by revealing deeper insights. Both knowledge discovery and performance of predictive models are demonstrably boosted by this method. Second, a recurrent neural network (named LSTM-DL) is developed and shown to outperform all existing methods in addressing an important real-world question, patient cost prediction. A series of novel analyses is used to derive a deeper understanding of deep learning’s advantages. The LSTM-DL model consistently outperforms other models with nearly the same level of advantages across different subgroups. Interestingly, the advantage of the LSTM-DL is significantly driven by the amount of fluctuation in the sequential data. By opening the “black box,” the parameters learned during the training period are examined, and is it demonstrated that LSTM-DL’s ability to react to high fluctuation is gained during the training rather than inherited from its special architecture. LSTM-DL can also learn to be less sensitive to fluctuations if the fluctuation is not playing an important role. Finally, the implementation of ML models in real practice is studied. Since at its current stage of development, ML-based AI will most likely assistant human workers rather than replace them, it is critical to understand how human workers collaborate with AI. An AI tool was developed in collaboration with a medical coding company, and successfully implemented in the real work environment. The impact of this tool on worker performance is examined. Findings show that use of AI can significantly boost the work productivity of human coders. The heterogeneity of AI’s effects is further investigated, and results show that the human circadian rhythm and coder seniority are both significant factors in conditioning productivity gains. One interesting finding regarding heterogeneity is that the AI has its best effects when a coder is at her/his peak of performance (as opposed to other times), which supports the theory of human-AI complementarity. However, this theory does not necessarily hold true across different coders. While it could be assumed that senior coders would benefit more from the AI, junior coders’ productivity is found to improve more. A further qualitative study uncovers the underlying mechanism driving this interesting effect: senior coders express strong resistance to AI, and their low trust in AI significantly hinders them from realizing the AI’s value

    Enrichment of ontologies using machine learning and summarization

    Get PDF
    Biomedical ontologies are structured knowledge systems in biomedicine. They play a major role in enabling precise communications in support of healthcare applications, e.g., Electronic Healthcare Records (EHR) systems. Biomedical ontologies are used in many different contexts to facilitate information and knowledge management. The most widely used clinical ontology is the SNOMED CT. Placing a new concept into its proper position in an ontology is a fundamental task in its lifecycle of curation and enrichment. A large biomedical ontology, which typically consists of many tens of thousands of concepts and relationships, can be viewed as a complex network with concepts as nodes and relationships as links. This large-size node-link diagram can easily become overwhelming for humans to understand or work with. Adding concepts is a challenging and time-consuming task that requires domain knowledge and ontology skills. IS-A links (aka subclass links) are the most important relationships of an ontology, enabling the inheritance of other relationships. The position of a concept, represented by its IS-A links to other concepts, determines how accurately it is modeled. Therefore, considering as many parent candidate concepts as possible leads to better modeling of this concept. Traditionally, curators rely on classifiers to place concepts into ontologies. However, this assumes the accurate relationship modeling of the new concept as well as the existing concepts. Since many concepts in existing ontologies, are underspecified in terms of their relationships, the placement by classifiers may be wrong. In cases where the curator does not manually check the automatic placement by classifier programs, concepts may end up in wrong positions in the IS-A hierarchy. A user searching for a concept, without knowing its precise name, would not find it in its expected location. Automated or semi-automated techniques that can place a concept or narrow down the places where to insert it, are highly desirable. Hence, this dissertation is addressing the problem of concept placement by automatically identifying IS-A links and potential parent concepts correctly and effectively for new concepts, with the assistance of two powerful techniques, Machine Learning (ML) and Abstraction Networks (AbNs). Modern neural networks have revolutionized Machine Learning in vision and Natural Language Processing (NLP). They also show great promise for ontology-related tasks, including ontology enrichment, i.e., insertion of new concepts. This dissertation presents research using ML and AbNs to achieve knowledge enrichment of ontologies. Abstraction networks (AbNs), are compact summary networks that preserve a significant amount of the semantics and structure of the underlying ontologies. An Abstraction Network is automatically derived from the ontology itself. It consists of nodes, where each node represents a set of concepts that are similar in their structure and semantics. Various kinds of AbNs have been previously developed by the Structural Analysis of Biomedical Ontologies Center (SABOC) to support the summarization, visualization, and quality assurance (QA) of biomedical ontologies. Two basic kinds of AbNs are the Area Taxonomy and the Partial-area Taxonomy, which have been developed for various biomedical ontologies (e.g., SNOMED CT of SNOMED International and NCIt of the National Cancer Institute). This dissertation presents four enrichment studies of SNOMED CT, utilizing both ML and AbN-based techniques

    Histopathology Image Analysis and NLP for Digital Pathology

    Get PDF
    Information technologies based on ML with quantitative imaging and texts are playing an essential role, particularly in general medicine and oncology. DL in particular has demonstrated significant breakthroughs in Computer Vision and NLP which could enhance disease detection and the establishment of efficient treatments. Furthermore, considering a large number of people with cancer and the substantial volume of data generated during cancer treatment, there is a significant interest in the use of AI to improve oncologic care. In digital pathology, high-resolution microscope images of tissue samples are stored along with written medical reports in databases that are used by pathologists. The diagnosis is made through tissue analysis of the biopsy sample and is written as a brief unstructured report which is stored as free text in Electronic Medical Record (EMR)systems. For the transition towards digitization of medical records to achieve its maximum benefits, these reports must be accessible and usable by medical practitioners to easily understand them and help them precisely identify the disease. Concerning the histopathology images, which is the basis of diagnosis and study of diseases of the tissues, image analysis helps us identify the disease’s location and allows us to classify the type of cancer. Recently, due to the abundant accumulation of WSIs, there has been an increased demand for effective and efficient gigapixel image analysis, such as computer-aided diagnosis using DL techniques. Also, due to the high diversity of shapes and structures in WSIs, it is not possible to use conventional DL techniques for classification. Though computer-aided diagnosis using DL has good prediction accuracy, in the medical domain, there is a need to explain the prediction of the model to have a better understanding beyond standard quantitative performance evaluation. This thesis presents three different findings. Firstly, I provide a comparative analysis of various transformer models such as BioBERT, Clinical BioBERT, BioMed-RoBERTaand TF-IDF and our results demonstrate the effectiveness of various word embedding techniques for pathology reports in the classification task. Secondly, with the help of slide labels of WSIs, I classify them to their disease types, with an architecture having an attention mechanism and instance-level clustering. Finally, I introduced a method to fuse the features of the pathology reports and the features of their respective images. I investigated the effect of the combination of the features in the classification of both histopathology images and their respective reports simultaneously. This proved to be better than the individual classification tasks achieving an accuracy of 95.73%

    Transforming unstructured digital clinical notes for improved health literacy

    Get PDF
    Purpose – Clinical notes typically contain medical jargons and specialized words and phrases that are complicated and technical to most people, which is one of the most challenging obstacles in health information dissemination to consumers by healthcare providers. The authors aim to investigate how to leverage machine learning techniques to transform clinical notes of interest into understandable expressions. Design/methodology/approach – The authors propose a natural language processing pipeline that is capable of extracting relevant information from long unstructured clinical notes and simplifying lexicons by replacing medical jargons and technical terms. Particularly, the authors develop an unsupervised keywords matching method to extract relevant information from clinical notes. To automatically evaluate completeness of the extracted information, the authors perform a multi-label classification task on the relevant texts. To simplify lexicons in the relevant text, the authors identify complex words using a sequence labeler and leverage transformer models to generate candidate words for substitution. The authors validate the proposed pipeline using 58,167 discharge summaries from critical care services. Findings – The results show that the proposed pipeline can identify relevant information with high completeness and simplify complex expressions in clinical notes so that the converted notes have a high level of readability but a low degree of meaning change. Social implications – The proposed pipeline can help healthcare consumers well understand their medical information and therefore strengthen communications between healthcare providers and consumers for better care. Originality/value – An innovative pipeline approach is developed to address the health literacy problem confronted by healthcare providers and consumers in the ongoing digital transformation process in the healthcare industry

    Identifying Outcomes of Care from Medical Records to Improve Doctor-Patient Communication

    Get PDF
    Between appointments, healthcare providers have limited interaction with their patients, but patients have similar patterns of care. Medications have common side effects; injuries have an expected healing time; and so on. By modeling patient interventions with outcomes, healthcare systems can equip providers with better feedback. In this work, we present a pipeline for analyzing medical records according to an ontology directed at allowing closed-loop feedback between medical encounters. Working with medical data from multiple domains, we use a combination of data processing, machine learning, and clinical expertise to extract knowledge from patient records. While our current focus is on technique, the ultimate goal of this research is to inform development of a system using these models to provide knowledge-driven clinical decision-making
    • …
    corecore