541 research outputs found

    Identification of Patients with Family History of Pancreatic Cancer - Investigation of an NLP System Portability

    Get PDF
    In this study we have developed a rule-based natural language processing (NLP) system to identify patients with family history of pancreatic cancer. The algorithm was developed in a Unstructured Information Management Architecture (UIMA) framework and consisted of section segmentation, relation discovery, and negation detection. The system was evaluated on data from two institutions. The family history identification precision was consistent across the institutions shifting from 88.9% on Indiana University (IU) dataset to 87.8% on Mayo Clinic dataset. Customizing the algorithm on the the Mayo Clinic data, increased its precision to 88.1%. The family member relation discovery achieved precision, recall, and F-measure of 75.3%, 91.6% and 82.6% respectively. Negation detection resulted in precision of 99.1%. The results show that rule-based NLP approaches for specific information extraction tasks are portable across institutions; however customization of the algorithm on the new dataset improves its performance

    Extracting information from the text of electronic medical records to improve case detection: a systematic review

    Get PDF
    Background: Electronic medical records (EMRs) are revolutionizing health-related research. One key issue for study quality is the accurate identification of patients with the condition of interest. Information in EMRs can be entered as structured codes or unstructured free text. The majority of research studies have used only coded parts of EMRs for case-detection, which may bias findings, miss cases, and reduce study quality. This review examines whether incorporating information from text into case-detection algorithms can improve research quality. Methods: A systematic search returned 9659 papers, 67 of which reported on the extraction of information from free text of EMRs with the stated purpose of detecting cases of a named clinical condition. Methods for extracting information from text and the technical accuracy of case-detection algorithms were reviewed. Results: Studies mainly used US hospital-based EMRs, and extracted information from text for 41 conditions using keyword searches, rule-based algorithms, and machine learning methods. There was no clear difference in case-detection algorithm accuracy between rule-based and machine learning methods of extraction. Inclusion of information from text resulted in a significant improvement in algorithm sensitivity and area under the receiver operating characteristic in comparison to codes alone (median sensitivity 78% (codes + text) vs 62% (codes), P = .03; median area under the receiver operating characteristic 95% (codes + text) vs 88% (codes), P = .025). Conclusions: Text in EMRs is accessible, especially with open source information extraction algorithms, and significantly improves case detection when combined with codes. More harmonization of reporting within EMR studies is needed, particularly standardized reporting of algorithm accuracy metrics like positive predictive value (precision) and sensitivity (recall)

    Data Analysis Methods for Software Systems

    Get PDF
    Using statistics, econometrics, machine learning, and functional data analysis methods, we evaluate the consequences of the lockdown during the COVID-19 pandemics for wage inequality and unemployment. We deduce that these two indicators mostly reacted to the first lockdown from March till June 2020. Also, analysing wage inequality, we conduct analysis separately for males and females and different age groups.We noticed that young females were affected mostly by the lockdown.Nevertheless, all the groups reacted to the lockdown at some level

    ANALYSIS OF INFORMATION VALUE CHAINS FOR GOUT SELF-CARE MANAGEMENT

    Get PDF
    This value chain analysis study sought to identify information needed by gout patients to successfully manage their disease, leading to a model for information extraction from patient health records. A scoping review was conducted to identify the types of information needed by gout patients. The findings of each included study were divided and analyzed according to the stages of the care delivery value chain. The results of the review were then used to create a gout information value chain as criteria for annotating the information deemed important for gout patients contained in publicly available patient education materials according to the stages of care delivery. The resulting annotations were used to develop a named entity recognition model capable of automatically labelling medical concepts from clinical notes by value chain stage. To identify concepts specifically relevant to gout patients, the concepts extracted from patient notes were used as candidate features in a phenotyping algorithm to identify concepts associated with gout flares. While this study was able to develop a model for identifying information relevant to gout flares, the findings suggest that there is information missing from patient education materials and their clinical notes that could be valuable to gout patients for self-management

    Better Decision Making in Cancer:Screening tests and prediction models

    Get PDF

    Better Decision Making in Cancer:Screening tests and prediction models

    Get PDF

    MEDICAL MACHINE INTELLIGENCE: DATA-EFFICIENCY AND KNOWLEDGE-AWARENESS

    Get PDF
    Traditional clinician diagnosis requires massive manual labor from experienced doctors, which is time-consuming and costly. Computer-aided systems are therefore proposed to reduce doctors’ efforts by using machines to automatically make diagnosis and treatment recommendations. The recent success in deep learning has largely advanced the field of computer-aided diagnosis by offering an avenue to deliver automated medical image analysis. Despite such progress, there remain several challenges towards medical machine intelligence, such as unsatisfactory performance regarding challenging small targets, insufficient training data, high annotation cost, the lack of domain-specific knowledge, etc. These challenges cultivate the need for developing data-efficient and knowledge-aware deep learning techniques which can generalize to different medical tasks without requiring intensive manual labeling efforts, and incorporate domain-specific knowledge in the learning process. In this thesis, we rethink the current progress of deep learning in medical image analysis, with a focus on the aforementioned challenges, and present different data-efficient and knowledge-aware deep learning approaches to address them accordingly. Firstly, we introduce coarse-to-fine mechanisms which use the prediction from the first (coarse) stage to shrink the input region for the second (fine) stage, to enhance the model performance especially for segmenting small challenging structures, such as the pancreas which occupies only a very small fraction (e.g., < 0.5%) of the entire CT volume. The method achieved the state-of-the-art result on the NIH pancreas segmentation dataset. Further extensions also demonstrated effectiveness for segmenting neoplasms such as pancreatic cysts or multiple organs. Secondly, we present a semi-supervised learning framework for medical image segmentation by leveraging both limited labeled data and abundant unlabeled data. Our learning method encourages the segmentation output to be consistent for the same input under different viewing conditions. More importantly, the outputs from different viewing directions are fused altogether to improve the quality of the target, which further enhances the overall performance. The comparison with fully-supervised methods on multi-organ segmentation confirms the effectiveness of this method. Thirdly, we discuss how to incorporate knowledge priors for multi-organ segmentation. Noticing that the abdominal organ sizes exhibit similar distributions across different cohorts, we propose to explicitly incorporate anatomical priors on abdominal organ sizes, guiding the training process with domain-specific knowledge. The approach achieves 84.97% on the MICCAI 2015 challenge “Multi-Atlas Labeling Beyond the Cranial Vault”, which significantly outperforms previous state-of-the-art even using fewer annotations. Lastly, by rethinking how radiologists interpret medical images, we identify one limitation for existing deep-learning-based works on detecting pancreatic ductal adenocarcinoma is the lack of knowledge integration from multi-phase images. Thereby, we introduce a dual-path network where different paths are connected for multi-phase information exchange, and an additional loss is added for removing view divergence. By effectively incorporating multi-phase information, the presented method shows superior performance than prior arts on this matter

    Ontology Enrichment from Free-text Clinical Documents: A Comparison of Alternative Approaches

    Get PDF
    While the biomedical informatics community widely acknowledges the utility of domain ontologies, there remain many barriers to their effective use. One important requirement of domain ontologies is that they achieve a high degree of coverage of the domain concepts and concept relationships. However, the development of these ontologies is typically a manual, time-consuming, and often error-prone process. Limited resources result in missing concepts and relationships, as well as difficulty in updating the ontology as domain knowledge changes. Methodologies developed in the fields of Natural Language Processing (NLP), Information Extraction (IE), Information Retrieval (IR), and Machine Learning (ML) provide techniques for automating the enrichment of ontology from free-text documents. In this dissertation, I extended these methodologies into biomedical ontology development. First, I reviewed existing methodologies and systems developed in the fields of NLP, IR, and IE, and discussed how existing methods can benefit the development of biomedical ontologies. This previously unconducted review was published in the Journal of Biomedical Informatics. Second, I compared the effectiveness of three methods from two different approaches, the symbolic (the Hearst method) and the statistical (the Church and Lin methods), using clinical free-text documents. Third, I developed a methodological framework for Ontology Learning (OL) evaluation and comparison. This framework permits evaluation of the two types of OL approaches that include three OL methods. The significance of this work is as follows: 1) The results from the comparative study showed the potential of these methods for biomedical ontology enrichment. For the two targeted domains (NCIT and RadLex), the Hearst method revealed an average of 21% and 11% new concept acceptance rates, respectively. The Lin method produced a 74% acceptance rate for NCIT; the Church method, 53%. As a result of this study (published in the Journal of Methods of Information in Medicine), many suggested candidates have been incorporated into the NCIT; 2) The evaluation framework is flexible and general enough that it can analyze the performance of ontology enrichment methods for many domains, thus expediting the process of automation and minimizing the likelihood that key concepts and relationships would be missed as domain knowledge evolves
    • …
    corecore