393 research outputs found

    Clinical text data in machine learning: Systematic review

    Get PDF
    Background: Clinical narratives represent the main form of communication within healthcare providing a personalized account of patient history and assessments, offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective: The main aim of this study is to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigate the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multi-faceted interface, to perform a literature search against MEDLINE. We identified a total of 110 relevant studies and extracted information about the text data used to support machine learning, the NLP tasks supported and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation and any relevant statistics. Results: The vast majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable due to sensitive nature of data considered. Beside the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The vast majority of studies focused on the task of text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management and surveillance. Conclusions: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which does not require data annotation

    TEMPORAL DATA EXTRACTION AND QUERY SYSTEM FOR EPILEPSY SIGNAL ANALYSIS

    Get PDF
    The 2016 Epilepsy Innovation Institute (Ei2) community survey reported that unpredictability is the most challenging aspect of seizure management. Effective and precise detection, prediction, and localization of epileptic seizures is a fundamental computational challenge. Utilizing epilepsy data from multiple epilepsy monitoring units can enhance the quantity and diversity of datasets, which can lead to more robust epilepsy data analysis tools. The contributions of this dissertation are two-fold. One is the implementation of a temporal query for epilepsy data; the other is the machine learning approach for seizure detection, seizure prediction, and seizure localization. The three key components of our temporal query interface are: 1) A pipeline for automatically extract European Data Format (EDF) information and epilepsy annotation data from cross-site sources; 2) Data quantity monitoring for Epilepsy temporal data; 3) A web-based annotation query interface for preliminary research and building customized epilepsy datasets. The system extracted and stored about 450,000 epilepsy-related events of more than 2,497 subjects from seven institutes up to September 2019. Leveraging the epilepsy temporal events query system, we developed machine learning models for seizure detection, prediction, and localization. Using 135 extracted features from EEG signals, we trained a channel-based eXtreme Gradient Boosting model to detect seizures on 8-second EEG segments. A long-term EEG recording evaluation shows that the model can detect about 90.34% seizures on an existing EEG dataset with 961 hours of data. The model achieved 89.88% accuracy, 92.32% sensitivity, and 84.76% AUC based on the segments evaluation. We also introduced a transfer learning approach consisting of 1) a base deep learning model pre-trained by ImageNet dataset and 2) customized fully connected layers, to train the patient-specific pre-ictal and inter-ictal data from our database. Two convolutional neural network architectures were evaluated using 53 pre-ictal segments and 265 continuous hours of inter-ictal EEG data. The evaluation shows that our model reached 86.79% sensitivity and 3.38% false-positive rate. Another transfer learning model for seizure localization uses a pre-trained ResNext50 structure and was trained with an image augmentation dataset labeling by fingerprint. Our model achieved 88.22% accuracy, 34.99% sensitivity, 1.02% false-positive rate, and 34.3% positive likelihood rate

    Towards a Personalized Multi-Domain Digital Neurophenotyping Model for the Detection and Treatment of Mood Trajectories

    Get PDF
    The commercial availability of many real-life smart sensors, wearables, and mobile apps provides a valuable source of information about a wide range of human behavioral, physiological, and social markers that can be used to infer the user’s mental state and mood. However, there are currently no commercial digital products that integrate these psychosocial metrics with the real-time measurement of neural activity. In particular, electroencephalography (EEG) is a well-validated and highly sensitive neuroimaging method that yields robust markers of mood and affective processing, and has been widely used in mental health research for decades. The integration of wearable neuro-sensors into existing multimodal sensor arrays could hold great promise for deep digital neurophenotyping in the detection and personalized treatment of mood disorders. In this paper, we propose a multi-domain digital neurophenotyping model based on the socioecological model of health. The proposed model presents a holistic approach to digital mental health, leveraging recent neuroscientific advances, and could deliver highly personalized diagnoses and treatments. The technological and ethical challenges of this model are discussed

    Natural language processing (NLP) for clinical information extraction and healthcare research

    Get PDF
    Introduction: Epilepsy is a common disease with multiple comorbidities. Routinely collected health care data have been successfully used in epilepsy research, but they lack the level of detail needed for in-depth study of complex interactions between the aetiology, comorbidities, and treatment that affect patient outcomes. The aim of this work is to use natural language processing (NLP) technology to create detailed disease-specific datasets derived from the free text of clinic letters in order to enrich the information that is already available. Method: An NLP pipeline for the extraction of epilepsy clinical text (ExECT) was redeveloped to extract a wider range of variables. A gold standard annotation set for epilepsy clinic letters was created for the validation of the ExECT v2 output. A set of clinic letters from the Epi25 study was processed and the datasets produced were validated against Swansea Neurology Biobank records. A data linkage study investigating genetic influences on epilepsy outcomes using GP and hospital records was supplemented with the seizure frequency dataset produced by ExECT v2. Results: The validation of ExECT v2 produced overall precision, recall, and F1 score of 0.90, 0.86, and 0.88, respectively. A method of uploading, annotating, and linking genetic variant datasets within the SAIL databank was established. No significant differences in the genetic burden of rare and potentially damaging variants were observed between the individuals with vs without unscheduled admissions, and between individuals on monotherapy vs polytherapy. No significant difference was observed in the genetic burden between people who were seizure free for over a year and those who experienced at least one seizure a year. Conclusion: This work presents successful extraction of epilepsy clinical information and explores how this information can be used in epilepsy research. The approach taken in the development of ExECT v2, and the research linking the NLP outputs, routinely collected health care data, and genetics set the way for wider research

    Artificial Intelligence: Development and Applications in Neurosurgery

    Get PDF
    The last decade has witnessed a significant increase in the relevance of artificial intelligence (AI) in neuroscience. Gaining notoriety from its potential to revolutionize medical decision making, data analytics, and clinical workflows, AI is poised to be increasingly implemented into neurosurgical practice. However, certain considerations pose significant challenges to its immediate and widespread implementation. Hence, this chapter will explore current developments in AI as it pertains to the field of clinical neuroscience, with a primary focus on neurosurgery. Additionally included is a brief discussion of important economic and ethical considerations related to the feasibility and implementation of AI-based technologies in neurosciences, including future horizons such as the operational integrations of human and non-human capabilities
    • …
    corecore