19,978 research outputs found

    Clinical Text Mining: Secondary Use of Electronic Patient Records

    Get PDF
    This open access book describes the results of natural language processing and machine learning methods applied to clinical text from electronic patient records. It is divided into twelve chapters. Chapters 1-4 discuss the history and background of the original paper-based patient records, their purpose, and how they are written and structured. These initial chapters do not require any technical or medical background knowledge. The remaining eight chapters are more technical in nature and describe various medical classifications and terminologies such as ICD diagnosis codes, SNOMED CT, MeSH, UMLS, and ATC. Chapters 5-10 cover basic tools for natural language processing and information retrieval, and how to apply them to clinical text. The difference between rule-based and machine learning-based methods, as well as between supervised and unsupervised machine learning methods, are also explained. Next, ethical concerns regarding the use of sensitive patient records for research purposes are discussed, including methods for de-identifying electronic patient records and safely storing patient records. The book’s closing chapters present a number of applications in clinical text mining and summarise the lessons learned from the previous chapters. The book provides a comprehensive overview of technical issues arising in clinical text mining, and offers a valuable guide for advanced students in health informatics, computational linguistics, and information retrieval, and for researchers entering these fields

    Learning signals of adverse drug-drug interactions from the unstructured text of electronic health records.

    Get PDF
    Drug-drug interactions (DDI) account for 30% of all adverse drug reactions, which are the fourth leading cause of death in the US. Current methods for post marketing surveillance primarily use spontaneous reporting systems for learning DDI signals and validate their signals using the structured portions of Electronic Health Records (EHRs). We demonstrate a fast, annotation-based approach, which uses standard odds ratios for identifying signals of DDIs from the textual portion of EHRs directly and which, to our knowledge, is the first effort of its kind. We developed a gold standard of 1,120 DDIs spanning 14 adverse events and 1,164 drugs. Our evaluations on this gold standard using millions of clinical notes from the Stanford Hospital confirm that identifying DDI signals from clinical text is feasible (AUROC=81.5%). We conclude that the text in EHRs contain valuable information for learning DDI signals and has enormous utility in drug surveillance and clinical decision support

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Deepr: A Convolutional Net for Medical Records

    Full text link
    Feature engineering remains a major bottleneck when creating predictive systems from electronic medical records. At present, an important missing element is detecting predictive regular clinical motifs from irregular episodic records. We present Deepr (short for Deep record), a new end-to-end deep learning system that learns to extract features from medical records and predicts future risk automatically. Deepr transforms a record into a sequence of discrete elements separated by coded time gaps and hospital transfers. On top of the sequence is a convolutional neural net that detects and combines predictive local clinical motifs to stratify the risk. Deepr permits transparent inspection and visualization of its inner working. We validate Deepr on hospital data to predict unplanned readmission after discharge. Deepr achieves superior accuracy compared to traditional techniques, detects meaningful clinical motifs, and uncovers the underlying structure of the disease and intervention space

    Big data and data repurposing – using existing data to answer new questions in vascular dementia research

    Get PDF
    Introduction: Traditional approaches to clinical research have, as yet, failed to provide effective treatments for vascular dementia (VaD). Novel approaches to collation and synthesis of data may allow for time and cost efficient hypothesis generating and testing. These approaches may have particular utility in helping us understand and treat a complex condition such as VaD. Methods: We present an overview of new uses for existing data to progress VaD research. The overview is the result of consultation with various stakeholders, focused literature review and learning from the group’s experience of successful approaches to data repurposing. In particular, we benefitted from the expert discussion and input of delegates at the 9th International Congress on Vascular Dementia (Ljubljana, 16-18th October 2015). Results: We agreed on key areas that could be of relevance to VaD research: systematic review of existing studies; individual patient level analyses of existing trials and cohorts and linking electronic health record data to other datasets. We illustrated each theme with a case-study of an existing project that has utilised this approach. Conclusions: There are many opportunities for the VaD research community to make better use of existing data. The volume of potentially available data is increasing and the opportunities for using these resources to progress the VaD research agenda are exciting. Of course, these approaches come with inherent limitations and biases, as bigger datasets are not necessarily better datasets and maintaining rigour and critical analysis will be key to optimising data use

    Named Entity Recognition in Electronic Health Records Using Transfer Learning Bootstrapped Neural Networks

    Full text link
    Neural networks (NNs) have become the state of the art in many machine learning applications, especially in image and sound processing [1]. The same, although to a lesser extent [2,3], could be said in natural language processing (NLP) tasks, such as named entity recognition. However, the success of NNs remains dependent on the availability of large labelled datasets, which is a significant hurdle in many important applications. One such case are electronic health records (EHRs), which are arguably the largest source of medical data, most of which lies hidden in natural text [4,5]. Data access is difficult due to data privacy concerns, and therefore annotated datasets are scarce. With scarce data, NNs will likely not be able to extract this hidden information with practical accuracy. In our study, we develop an approach that solves these problems for named entity recognition, obtaining 94.6 F1 score in I2B2 2009 Medical Extraction Challenge [6], 4.3 above the architecture that won the competition. Beyond the official I2B2 challenge, we further achieve 82.4 F1 on extracting relationships between medical terms. To reach this state-of-the-art accuracy, our approach applies transfer learning to leverage on datasets annotated for other I2B2 tasks, and designs and trains embeddings that specially benefit from such transfer.Comment: 11 pages, 4 figures, 8 table

    Annotating patient clinical records with syntactic chunks and named entities: the Harvey corpus

    Get PDF
    The free text notes typed by physicians during patient consultations contain valuable information for the study of disease and treatment. These notes are difficult to process by existing natural language analysis tools since they are highly telegraphic (omitting many words), and contain many spelling mistakes, inconsistencies in punctuation, and non-standard word order. To support information extraction and classification tasks over such text, we describe a de-identified corpus of free text notes, a shallow syntactic and named entity annotation scheme for this kind of text, and an approach to training domain specialists with no linguistic background to annotate the text. Finally, we present a statistical chunking system for such clinical text with a stable learning rate and good accuracy, indicating that the manual annotation is consistent and that the annotation scheme is tractable for machine learning
    • …
    corecore