1,750 research outputs found

    Learning to detect chest radiographs containing lung nodules using visual attention networks

    Get PDF
    Machine learning approaches hold great potential for the automated detection of lung nodules in chest radiographs, but training the algorithms requires vary large amounts of manually annotated images, which are difficult to obtain. Weak labels indicating whether a radiograph is likely to contain pulmonary nodules are typically easier to obtain at scale by parsing historical free-text radiological reports associated to the radiographs. Using a repositotory of over 700,000 chest radiographs, in this study we demonstrate that promising nodule detection performance can be achieved using weak labels through convolutional neural networks for radiograph classification. We propose two network architectures for the classification of images likely to contain pulmonary nodules using both weak labels and manually-delineated bounding boxes, when these are available. Annotated nodules are used at training time to deliver a visual attention mechanism informing the model about its localisation performance. The first architecture extracts saliency maps from high-level convolutional layers and compares the estimated position of a nodule against the ground truth, when this is available. A corresponding localisation error is then back-propagated along with the softmax classification error. The second approach consists of a recurrent attention model that learns to observe a short sequence of smaller image portions through reinforcement learning. When a nodule annotation is available at training time, the reward function is modified accordingly so that exploring portions of the radiographs away from a nodule incurs a larger penalty. Our empirical results demonstrate the potential advantages of these architectures in comparison to competing methodologies

    Generating semantically enriched diagnostics for radiological images using machine learning

    Get PDF
    Development of Computer Aided Diagnostic (CAD) tools to aid radiologists in pathology detection and decision making relies considerably on manually annotated images. With the advancement of deep learning techniques for CAD development, these expert annotations no longer need to be hand-crafted, however, deep learning algorithms require large amounts of data in order to generalise well. One way in which to access large volumes of expert-annotated data is through radiological exams consisting of images and reports. Using past radiological exams obtained from hospital archiving systems has many advantages: they are expert annotations available in large quantities, covering a population-representative variety of pathologies, and they provide additional context to pathology diagnoses, such as anatomical location and severity. Learning to auto-generate such reports from images presents many challenges such as the difficulty in representing and generating long, unstructured textual information, accounting for spelling errors and repetition or redundancy, and the inconsistency across different annotators. In this thesis, the problem of learning to automate disease detection from radiological exams is approached from three directions. Firstly, a report generation model is developed such that it is conditioned on radiological image features. Secondly, a number of approaches are explored aimed at extracting diagnostic information from free-text reports. Finally, an alternative approach to image latent space learning from current state-of-the-art is developed that can be applied to accelerated image acquisition.Open Acces

    Clinical text data in machine learning: Systematic review

    Get PDF
    Background: Clinical narratives represent the main form of communication within healthcare providing a personalized account of patient history and assessments, offering rich information for clinical decision making. Natural language processing (NLP) has repeatedly demonstrated its feasibility to unlock evidence buried in clinical narratives. Machine learning can facilitate rapid development of NLP tools by leveraging large amounts of text data. Objective: The main aim of this study is to provide systematic evidence on the properties of text data used to train machine learning approaches to clinical NLP. We also investigate the types of NLP tasks that have been supported by machine learning and how they can be applied in clinical practice. Methods: Our methodology was based on the guidelines for performing systematic reviews. In August 2018, we used PubMed, a multi-faceted interface, to perform a literature search against MEDLINE. We identified a total of 110 relevant studies and extracted information about the text data used to support machine learning, the NLP tasks supported and their clinical applications. The data properties considered included their size, provenance, collection methods, annotation and any relevant statistics. Results: The vast majority of datasets used to train machine learning models included only hundreds or thousands of documents. Only 10 studies used tens of thousands of documents with a handful of studies utilizing more. Relatively small datasets were utilized for training even when much larger datasets were available. The main reason for such poor data utilization is the annotation bottleneck faced by supervised machine learning algorithms. Active learning was explored to iteratively sample a subset of data for manual annotation as a strategy for minimizing the annotation effort while maximizing predictive performance of the model. Supervised learning was successfully used where clinical codes integrated with free text notes into electronic health records were utilized as class labels. Similarly, distant supervision was used to utilize an existing knowledge base to automatically annotate raw text. Where manual annotation was unavoidable, crowdsourcing was explored, but it remains unsuitable due to sensitive nature of data considered. Beside the small volume, training data were typically sourced from a small number of institutions, thus offering no hard evidence about the transferability of machine learning models. The vast majority of studies focused on the task of text classification. Most commonly, the classification results were used to support phenotyping, prognosis, care improvement, resource management and surveillance. Conclusions: We identified the data annotation bottleneck as one of the key obstacles to machine learning approaches in clinical NLP. Active learning and distant supervision were explored as a way of saving the annotation efforts. Future research in this field would benefit from alternatives such as data augmentation and transfer learning, or unsupervised learning, which does not require data annotation

    Annotation analysis for testing drug safety signals using unstructured clinical notes

    Get PDF
    BackgroundThe electronic surveillance for adverse drug events is largely based upon the analysis of coded data from reporting systems. Yet, the vast majority of electronic health data lies embedded within the free text of clinical notes and is not gathered into centralized repositories. With the increasing access to large volumes of electronic medical data-in particular the clinical notes-it may be possible to computationally encode and to test drug safety signals in an active manner.ResultsWe describe the application of simple annotation tools on clinical text and the mining of the resulting annotations to compute the risk of getting a myocardial infarction for patients with rheumatoid arthritis that take Vioxx. Our analysis clearly reveals elevated risks for myocardial infarction in rheumatoid arthritis patients taking Vioxx (odds ratio 2.06) before 2005.ConclusionsOur results show that it is possible to apply annotation analysis methods for testing hypotheses about drug safety using electronic medical records

    Recuperação de informação multimodal em repositórios de imagem médica

    Get PDF
    The proliferation of digital medical imaging modalities in hospitals and other diagnostic facilities has created huge repositories of valuable data, often not fully explored. Moreover, the past few years show a growing trend of data production. As such, studying new ways to index, process and retrieve medical images becomes an important subject to be addressed by the wider community of radiologists, scientists and engineers. Content-based image retrieval, which encompasses various methods, can exploit the visual information of a medical imaging archive, and is known to be beneficial to practitioners and researchers. However, the integration of the latest systems for medical image retrieval into clinical workflows is still rare, and their effectiveness still show room for improvement. This thesis proposes solutions and methods for multimodal information retrieval, in the context of medical imaging repositories. The major contributions are a search engine for medical imaging studies supporting multimodal queries in an extensible archive; a framework for automated labeling of medical images for content discovery; and an assessment and proposal of feature learning techniques for concept detection from medical images, exhibiting greater potential than feature extraction algorithms that were pertinently used in similar tasks. These contributions, each in their own dimension, seek to narrow the scientific and technical gap towards the development and adoption of novel multimodal medical image retrieval systems, to ultimately become part of the workflows of medical practitioners, teachers, and researchers in healthcare.A proliferação de modalidades de imagem médica digital, em hospitais, clínicas e outros centros de diagnóstico, levou à criação de enormes repositórios de dados, frequentemente não explorados na sua totalidade. Além disso, os últimos anos revelam, claramente, uma tendência para o crescimento da produção de dados. Portanto, torna-se importante estudar novas maneiras de indexar, processar e recuperar imagens médicas, por parte da comunidade alargada de radiologistas, cientistas e engenheiros. A recuperação de imagens baseada em conteúdo, que envolve uma grande variedade de métodos, permite a exploração da informação visual num arquivo de imagem médica, o que traz benefícios para os médicos e investigadores. Contudo, a integração destas soluções nos fluxos de trabalho é ainda rara e a eficácia dos mais recentes sistemas de recuperação de imagem médica pode ser melhorada. A presente tese propõe soluções e métodos para recuperação de informação multimodal, no contexto de repositórios de imagem médica. As contribuições principais são as seguintes: um motor de pesquisa para estudos de imagem médica com suporte a pesquisas multimodais num arquivo extensível; uma estrutura para a anotação automática de imagens; e uma avaliação e proposta de técnicas de representation learning para deteção automática de conceitos em imagens médicas, exibindo maior potencial do que as técnicas de extração de features visuais outrora pertinentes em tarefas semelhantes. Estas contribuições procuram reduzir as dificuldades técnicas e científicas para o desenvolvimento e adoção de sistemas modernos de recuperação de imagem médica multimodal, de modo a que estes façam finalmente parte das ferramentas típicas dos profissionais, professores e investigadores da área da saúde.Programa Doutoral em Informátic

    Text Mining to Support Knowledge Discovery from Electronic Health Records

    Get PDF
    The use of electronic health records (EHRs) has grown rapidly in the last decade. The EHRs are no longer being used only for storing information for clinical purposes but the secondary use of the data in the healthcare research has increased rapidly as well. The data in EHRs are recorded in a structured manner as much as possible, however, many EHRs often also contain large amount of unstructured free‐text. The structured and unstructured clinical data presents several challenges to the researchers since the data are not primarily collected for research purposes. The issues related to structured data can be missing data, noise, and inconsistency. The unstructured free-text is even more challenging to use since they often have no fixed format and may vary from clinician to clinician and from database to database. Text and data mining techniques are increasingly being used to effectively and efficiently process large EHRs for research purposes. Most of the me

    Added benefits of computer-assisted analysis of Hematoxylin-Eosin stained breast histopathological digital slides

    Get PDF
    This thesis aims at determining if computer-assisted analysis can be used to better understand pathologists’ perception of mitotic figures on Hematoxylin-Eosin (HE) stained breast histopathological digital slides. It also explores the feasibility of reproducible histologic nuclear atypia scoring by incorporating computer-assisted analysis to cytological scores given by a pathologist. In addition, this thesis investigates the possibility of computer-assisted diagnosis for categorizing HE breast images into different subtypes of cancer or benign masses. In the first study, a data set of 453 mitoses and 265 miscounted non-mitoses within breast cancer digital slides were considered. Different features were extracted from the objects in different channels of eight colour spaces. The findings from the first research study suggested that computer-aided image analysis can provide a better understanding of image-related features related to discrepancies among pathologists in recognition of mitoses. Two tasks done routinely by the pathologists are making diagnosis and grading the breast cancer. In the second study, a new tool for reproducible nuclear atypia scoring in breast cancer histological images was proposed. The third study proposed and tested MuDeRN (MUlti-category classification of breast histopathological image using DEep Residual Networks), which is a framework for classifying hematoxylin-eosin stained breast digital slides either as benign or cancer, and then categorizing cancer and benign cases into four different subtypes each. The studies indicated that computer-assisted analysis can aid in both nuclear grading (COMPASS) and breast cancer diagnosis (MuDeRN). The results could be used to improve current status of breast cancer prognosis estimation through reducing the inter-pathologist disagreement in counting mitotic figures and reproducible nuclear grading. It can also improve providing a second opinion to the pathologist for making a diagnosis

    Clinical Big Data and Deep Learning: Applications, Challenges, and Future Outlooks

    Get PDF
    The explosion of digital healthcare data has led to a surge of data-driven medical research based on machine learning. In recent years, as a powerful technique for big data, deep learning has gained a central position in machine learning circles for its great advantages in feature representation and pattern recognition. This article presents a comprehensive overview of studies that employ deep learning methods to deal with clinical data. Firstly, based on the analysis of the characteristics of clinical data, various types of clinical data (e.g., medical images, clinical notes, lab results, vital signs and demographic informatics) are discussed and details provided of some public clinical datasets. Secondly, a brief review of common deep learning models and their characteristics is conducted. Then, considering the wide range of clinical research and the diversity of data types, several deep learning applications for clinical data are illustrated: auxiliary diagnosis, prognosis, early warning, and other tasks. Although there are challenges involved in applying deep learning techniques to clinical data, it is still worthwhile to look forward to a promising future for deep learning applications in clinical big data in the direction of precision medicine
    corecore