872 research outputs found

    Selecting information in electronic health records for knowledge acquisition

    Get PDF
    AbstractKnowledge acquisition of relations between biomedical entities is critical for many automated biomedical applications, including pharmacovigilance and decision support. Automated acquisition of statistical associations from biomedical and clinical documents has shown some promise. However, acquisition of clinically meaningful relations (i.e. specific associations) remains challenging because textual information is noisy and co-occurrence does not typically determine specific relations. In this work, we focus on acquisition of two types of relations from clinical reports: disease-manifestation related symptom (MRS) and drug-adverse drug event (ADE), and explore the use of filtering by sections of the reports to improve performance. Evaluation indicated that applying the filters improved recall (disease-MRS: from 0.85 to 0.90; drug-ADE: from 0.43 to 0.75) and precision (disease-MRS: from 0.82 to 0.92; drug-ADE: from 0.16 to 0.31). This preliminary study demonstrates that selecting information in narrative electronic reports based on the sections improves the detection of disease-MRS and drug-ADE types of relations. Further investigation of complementary methods, such as more sophisticated statistical methods, more complex temporal models and use of information from other knowledge sources, is needed

    Doctor of Philosophy

    Get PDF
    dissertationThe primary objective of cancer registries is to capture clinical care data of cancer populations and aid in prevention, allow early detection, determine prognosis, and assess quality of various treatments and interventions. Furthermore, the role of cancer registries is paramount in supporting cancer epidemiological studies and medical research. Existing cancer registries depend mostly on humans, known as Cancer Tumor Registrars (CTRs), to conduct manual abstraction of the electronic health records to find reportable cancer cases and extract other data elements required for regulatory reporting. This is often a time-consuming and laborious task prone to human error affecting quality, completeness and timeliness of cancer registries. Central state cancer registries take responsibility for consolidating data received from multiple sources for each cancer case and to assign the most accurate information. The Utah Cancer Registry (UCR) at the University of Utah, for instance, leads and oversees more than 70 cancer treatment facilities in the state of Utah to collect data for each diagnosed cancer case and consolidate multiple sources of information.Although software tools helping with the manual abstraction process exist, they mainly focus on cancer case findings based on pathology reports and do not support automatic extraction of other data elements such as TNM cancer stage information, an important prognostic factor required before initiating clinical treatment. In this study, I present novel applications of natural language processing (NLP) and machine learning (ML) to automatically extract clinical and pathological TNM stage information from unconsolidated clinical records of cancer patients available at the central Utah Cancer Registry. To further support CTRs in their manual efforts, I demonstrate a new approach based on machine learning to consolidate TNM stages from multiple records at the patient level

    A cascade of classifiers for extracting medication information from discharge summaries

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Extracting medication information from clinical records has many potential applications, and recently published research, systems, and competitions reflect an interest therein. Much of the early extraction work involved rules and lexicons, but more recently machine learning has been applied to the task.</p> <p>Methods</p> <p>We present a hybrid system consisting of two parts. The first part, field detection, uses a cascade of statistical classifiers to identify medication-related named entities. The second part uses simple heuristics to link those entities into medication events.</p> <p>Results</p> <p>The system achieved performance that is comparable to other approaches to the same task. This performance is further improved by adding features that reference external medication name lists.</p> <p>Conclusions</p> <p>This study demonstrates that our hybrid approach outperforms purely statistical or rule-based systems. The study also shows that a cascade of classifiers works better than a single classifier in extracting medication information. The system is available as is upon request from the first author.</p

    Building blocks for meta-synthesis: data integration tables for summarising, mapping, and synthesising evidence on interventions for communicating with health consumers

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Systematic reviews have developed into a powerful method for summarising and synthesising evidence. The rise in systematic reviews creates a methodological opportunity and associated challenges and this is seen in the development of overviews, or reviews of systematic reviews. One of these challenges is how to summarise evidence from systematic reviews of complex interventions for inclusion in an overview. Interventions for communicating with and involving consumers in their care are frequently complex. In this article we outline a method for preparing data integration tables to enable review-level synthesis of the evidence on interventions for communication and participation in health.</p> <p>Methods and Results</p> <p>Systematic reviews published by the Cochrane Consumers and Communication Review Group were utilised as the basis from which to develop linked steps for data extraction, evidence assessment and synthesis. The resulting output is called a data integration table. Four steps were undertaken in designing the data integration tables: first, relevant information for a comprehensive picture of the characteristics of the review was identified from each review, extracted and summarised. Second, results for the outcomes of the review were assessed and translated to standardised evidence statements. Third, outcomes and evidence statements were mapped into an outcome taxonomy that we developed, using language specific to the field of interventions for communication and participation. Fourth, the implications of the review were assessed after the mapping step clarified the level of evidence available for each intervention.</p> <p>Conclusion</p> <p>The data integration tables represent building blocks for constructing overviews of review-level evidence and for the conduct of meta-synthesis. Individually, each table aims to improve the consistency of reporting on the features and effects of interventions for communication and participation; provides a broad assessment of the strength of evidence derived from different methods of analysis; indicates a degree of certainty with results; and reports outcomes and gaps in the evidence in a consistent and coherent way. In addition, individual tables can serve as a valuable tool for accurate dissemination of large amounts of complex information on communication and participation to professionals as well as to members of the public.</p

    Automatic Population of Structured Reports from Narrative Pathology Reports

    Get PDF
    There are a number of advantages for the use of structured pathology reports: they can ensure the accuracy and completeness of pathology reporting; it is easier for the referring doctors to glean pertinent information from them. The goal of this thesis is to extract pertinent information from free-text pathology reports and automatically populate structured reports for cancer diseases and identify the commonalities and differences in processing principles to obtain maximum accuracy. Three pathology corpora were annotated with entities and relationships between the entities in this study, namely the melanoma corpus, the colorectal cancer corpus and the lymphoma corpus. A supervised machine-learning based-approach, utilising conditional random fields learners, was developed to recognise medical entities from the corpora. By feature engineering, the best feature configurations were attained, which boosted the F-scores significantly from 4.2% to 6.8% on the training sets. Without proper negation and uncertainty detection, the quality of the structured reports will be diminished. The negation and uncertainty detection modules were built to handle this problem. The modules obtained overall F-scores ranging from 76.6% to 91.0% on the test sets. A relation extraction system was presented to extract four relations from the lymphoma corpus. The system achieved very good performance on the training set, with 100% F-score obtained by the rule-based module and 97.2% F-score attained by the support vector machines classifier. Rule-based approaches were used to generate the structured outputs and populate them to predefined templates. The rule-based system attained over 97% F-scores on the training sets. A pipeline system was implemented with an assembly of all the components described above. It achieved promising results in the end-to-end evaluations, with 86.5%, 84.2% and 78.9% F-scores on the melanoma, colorectal cancer and lymphoma test sets respectively
    corecore