112 research outputs found

    Clinical Data Reuse or Secondary Use: Current Status and Potential Future Progress

    Get PDF
    Objective: To perform a review of recent research in clinical data reuse or secondary use, and envision future advances in this field. Methods: The review is based on a large literature search in MEDLINE (through PubMed), conference proceedings, and the ACM Digital Library, focusing only on research published between 2005 and early 2016. Each selected publication was reviewed by the authors, and a structured analysis and summarization of its content was developed. Results: The initial search produced 359 publications, reduced after a manual examination of abstracts and full publications. The following aspects of clinical data reuse are discussed: motivations and challenges, privacy and ethical concerns, data integration and interoperability, data models and terminologies, unstructured data reuse, structured data mining, clinical practice and research integration, and examples of clinical data reuse (quality measurement and learning healthcare systems). Conclusion: Reuse of clinical data is a fast-growing field recognized as essential to realize the potentials for high quality healthcare, improved healthcare management, reduced healthcare costs, population health management, and effective clinical research

    Clinical text data in machine learning: Systematic review

    Get PDF
    Background: Clinical narratives represent the main form of communication within healthcare providing a personalized account of patient history and assessments, offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective: The main aim of this study is to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigate the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multi-faceted interface, to perform a literature search against MEDLINE. We identified a total of 110 relevant studies and extracted information about the text data used to support machine learning, the NLP tasks supported and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation and any relevant statistics. Results: The vast majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable due to sensitive nature of data considered. Beside the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The vast majority of studies focused on the task of text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management and surveillance. Conclusions: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which does not require data annotation

    Artificial intelligence (AI) in rare diseases: is the future brighter?

    Get PDF
    The amount of data collected and managed in (bio)medicine is ever-increasing. Thus, there is a need to rapidly and efficiently collect, analyze, and characterize all this information. Artificial intelligence (AI), with an emphasis on deep learning, holds great promise in this area and is already being successfully applied to basic research, diagnosis, drug discovery, and clinical trials. Rare diseases (RDs), which are severely underrepresented in basic and clinical research, can particularly benefit from AI technologies. Of the more than 7000 RDs described worldwide, only 5% have a treatment. The ability of AI technologies to integrate and analyze data from different sources (e.g., multi-omics, patient registries, and so on) can be used to overcome RDs' challenges (e.g., low diagnostic rates, reduced number of patients, geographical dispersion, and so on). Ultimately, RDs' AI-mediated knowledge could significantly boost therapy development. Presently, there are AI approaches being used in RDs and this review aims to collect and summarize these advances. A section dedicated to congenital disorders of glycosylation (CDG), a particular group of orphan RDs that can serve as a potential study model for other common diseases and RDs, has also been included.info:eu-repo/semantics/publishedVersio

    A review of automatic phenotyping approaches using electronic health records

    Get PDF
    Electronic Health Records (EHR) are a rich repository of valuable clinical information that exist in primary and secondary care databases. In order to utilize EHRs for medical observational research a range of algorithms for automatically identifying individuals with a specific phenotype have been developed. This review summarizes and offers a critical evaluation of the literature relating to studies conducted into the development of EHR phenotyping systems. This review describes phenotyping systems and techniques based on structured and unstructured EHR data. Articles published on PubMed and Google scholar between 2013 and 2017 have been reviewed, using search terms derived from Medical Subject Headings (MeSH). The popularity of using Natural Language Processing (NLP) techniques in extracting features from narrative text has increased. This increased attention is due to the availability of open source NLP algorithms, combined with accuracy improvement. In this review, Concept extraction is the most popular NLP technique since it has been used by more than 50% of the reviewed papers to extract features from EHR. High-throughput phenotyping systems using unsupervised machine learning techniques have gained more popularity due to their ability to efficiently and automatically extract a phenotype with minimal human effort

    The Impact of Automatic Pre-annotation in Clinical Note Data Element Extraction - the CLEAN Tool

    Full text link
    Objective. Annotation is expensive but essential for clinical note review and clinical natural language processing (cNLP). However, the extent to which computer-generated pre-annotation is beneficial to human annotation is still an open question. Our study introduces CLEAN (CLinical note rEview and ANnotation), a pre-annotation-based cNLP annotation system to improve clinical note annotation of data elements, and comprehensively compares CLEAN with the widely-used annotation system Brat Rapid Annotation Tool (BRAT). Materials and Methods. CLEAN includes an ensemble pipeline (CLEAN-EP) with a newly developed annotation tool (CLEAN-AT). A domain expert and a novice user/annotator participated in a comparative usability test by tagging 87 data elements related to Congestive Heart Failure (CHF) and Kawasaki Disease (KD) cohorts in 84 public notes. Results. CLEAN achieved higher note-level F1-score (0.896) over BRAT (0.820), with significant difference in correctness (P-value < 0.001), and the mostly related factor being system/software (P-value < 0.001). No significant difference (P-value 0.188) in annotation time was observed between CLEAN (7.262 minutes/note) and BRAT (8.286 minutes/note). The difference was mostly associated with note length (P-value < 0.001) and system/software (P-value 0.013). The expert reported CLEAN to be useful/satisfactory, while the novice reported slight improvements. Discussion. CLEAN improves the correctness of annotation and increases usefulness/satisfaction with the same level of efficiency. Limitations include untested impact of pre-annotation correctness rate, small sample size, small user size, and restrictedly validated gold standard. Conclusion. CLEAN with pre-annotation can be beneficial for an expert to deal with complex annotation tasks involving numerous and diverse target data elements

    Enhance Representation Learning of Clinical Narrative with Neural Networks for Clinical Predictive Modeling

    Get PDF
    Medicine is undergoing a technological revolution. Understanding human health from clinical data has major challenges from technical and practical perspectives, thus prompting methods that understand large, complex, and noisy data. These methods are particularly necessary for natural language data from clinical narratives/notes, which contain some of the richest information on a patient. Meanwhile, deep neural networks have achieved superior performance in a wide variety of natural language processing (NLP) tasks because of their capacity to encode meaningful but abstract representations and learn the entire task end-to-end. In this thesis, I investigate representation learning of clinical narratives with deep neural networks through a number of tasks ranging from clinical concept extraction, clinical note modeling, and patient-level language representation. I present methods utilizing representation learning with neural networks to support understanding of clinical text documents. I first introduce the notion of representation learning from natural language processing and patient data modeling. Then, I investigate word-level representation learning to improve clinical concept extraction from clinical notes. I present two works on learning word representations and evaluate them to extract important concepts from clinical notes. The first study focuses on cancer-related information, and the second study evaluates shared-task data. The aims of these two studies are to automatically extract important entities from clinical notes. Next, I present a series of deep neural networks to encode hierarchical, longitudinal, and contextual information for modeling a series of clinical notes. I also evaluate the models by predicting clinical outcomes of interest, including mortality, length of stay, and phenotype predictions. Finally, I propose a novel representation learning architecture to develop a generalized and transferable language representation at the patient level. I also identify pre-training tasks appropriate for constructing a generalizable language representation. The main focus is to improve predictive performance of phenotypes with limited data, a challenging task due to a lack of data. Overall, this dissertation addresses issues in natural language processing for medicine, including clinical text classification and modeling. These studies show major barriers to understanding large-scale clinical notes. It is believed that developing deep representation learning methods for distilling enormous amounts of heterogeneous data into patient-level language representations will improve evidence-based clinical understanding. The approach to solving these issues by learning representations could be used across clinical applications despite noisy data. I conclude that considering different linguistic components in natural language and sequential information between clinical events is important. Such results have implications beyond the immediate context of predictions and further suggest future directions for clinical machine learning research to improve clinical outcomes. This could be a starting point for future phenotyping methods based on natural language processing that construct patient-level language representations to improve clinical predictions. While significant progress has been made, many open questions remain, so I will highlight a few works to demonstrate promising directions

    ‘Big data’ in mental health research:current status and emerging possibilities

    Get PDF
    PURPOSE: ‘Big data’ are accumulating in a multitude of domains and offer novel opportunities for research. The role of these resources in mental health investigations remains relatively unexplored, although a number of datasets are in use and supporting a range of projects. We sought to review big data resources and their use in mental health research to characterise applications to date and consider directions for innovation in future. METHODS: A narrative review. RESULTS: Clear disparities were evident in geographic regions covered and in the disorders and interventions receiving most attention. DISCUSSION: We discuss the strengths and weaknesses of the use of different types of data and the challenges of big data in general. Current research output from big data is still predominantly determined by the information and resources available and there is a need to reverse the situation so that big data platforms are more driven by the needs of clinical services and service users

    From Big Data to Precision Medicine.

    Get PDF
    For over a decade the term "Big data" has been used to describe the rapid increase in volume, variety and velocity of information available, not just in medical research but in almost every aspect of our lives. As scientists, we now have the capacity to rapidly generate, store and analyse data that, only a few years ago, would have taken many years to compile. However, "Big data" no longer means what it once did. The term has expanded and now refers not to just large data volume, but to our increasing ability to analyse and interpret those data. Tautologies such as "data analytics" and "data science" have emerged to describe approaches to the volume of available information as it grows ever larger. New methods dedicated to improving data collection, storage, cleaning, processing and interpretation continue to be developed, although not always by, or for, medical researchers. Exploiting new tools to extract meaning from large volume information has the potential to drive real change in clinical practice, from personalized therapy and intelligent drug design to population screening and electronic health record mining. As ever, where new technology promises "Big Advances," significant challenges remain. Here we discuss both the opportunities and challenges posed to biomedical research by our increasing ability to tackle large datasets. Important challenges include the need for standardization of data content, format, and clinical definitions, a heightened need for collaborative networks with sharing of both data and expertise and, perhaps most importantly, a need to reconsider how and when analytic methodology is taught to medical researchers. We also set "Big data" analytics in context: recent advances may appear to promise a revolution, sweeping away conventional approaches to medical science. However, their real promise lies in their synergy with, not replacement of, classical hypothesis-driven methods. The generation of novel, data-driven hypotheses based on interpretable models will always require stringent validation and experimental testing. Thus, hypothesis-generating research founded on large datasets adds to, rather than replaces, traditional hypothesis driven science. Each can benefit from the other and it is through using both that we can improve clinical practice.Wellcome Trus
    • …
    corecore