10 research outputs found

    SORTA:a system for ontology-based re-coding and technical annotation of biomedical phenotype data

    Get PDF
    There is an urgent need to standardize the semantics of biomedical data values, such as phenotypes, to enable comparative and integrative analyses. However, it is unlikely that all studies will use the same data collection protocols. As a result, retrospective standardization is often required, which involves matching of original (unstructured or locally coded) data to widely used coding or ontology systems such as SNOMED CT (clinical terms), ICD-10 (International Classification of Disease) and HPO (Human Phenotype Ontology). This data curation process is usually a time-consuming process performed by a human expert. To help mechanize this process, we have developed SORTA, a computer-aided system for rapidly encoding free text or locally coded values to a formal coding system or ontology. SORTA matches original data values (uploaded in semicolon delimited format) to a target coding system (uploaded in Excel spreadsheet, OWL ontology web language or OBO open biomedical ontologies format). It then semi-automatically shortlists candidate codes for each data value using Lucene and n-gram based matching algorithms, and can also learn from matches chosen by human experts. We evaluated SORTA's applicability in two use cases. For the LifeLines biobank, we used SORTA to recode 90 000 free text values (including 5211 unique values) about physical exercise to MET (Metabolic Equivalent of Task) codes. For the CINEAS clinical symptom coding system, we used SORTA to map to HPO, enriching HPO when necessary (315 terms matched so far). Out of the shortlists at rank 1, we found a precision/recall of 0.97/0.98 in LifeLines and of 0.58/0.45 in CINEAS. More importantly, users found the tool both a major time saver and a quality improvement because SORTA reduced the chances of human mistakes. Thus, SORTA can dramatically ease data (re) coding tasks and we believe it will prove useful for many more projects

    Reusability of coded data in the primary care electronic medical record:A dynamic cohort study concerning cancer diagnoses

    No full text
    Objectives: To assess quality and reusability of coded cancer diagnoses in routine primary care data. To identify factors that influence data quality and areas for improvement. Methods: A dynamic cohort study in a Dutch network database containing 250,000 anonymized electronic medical records (EMRs) from 52 general practices was performed. Coded data from 2000 to 2011 for the three most common cancer types (breast, colon and prostate cancer) was compared to the Netherlands Cancer Registry. Measurements: Data quality is expressed in Standard Incidence Ratios (SIRs): the ratio between the number of coded cases observed in the primary care network database and the expected number of cases based on the Netherlands Cancer Registry. Ratios were multiplied by 100% for readability. Results: The overall SIR was 91.5% (95%CI 88.594.5) and showed improvement over the years. SIRs differ between cancer types: from 71.5% for colon cancer in males to 103.9% for breast cancer. There are differences in data quality (SIRs 76.2% - 99.7%) depending on the EMR system used, with SIRs up to 232.9% for breast cancer. Frequently observed errors in routine healthcare data can be classified as: lack of integrity checks, inaccurate use and/or lack of codes, and lack of EMR system functionality. Conclusions: Re-users of coded routine primary care Electronic Medical Record data should be aware that 30% of cancer cases can be missed. Up to 130% of cancer cases found in the EMR data can be false-positive. The type of EMR system and the type of cancer influence the quality of coded diagnosis registry. While data quality can be improved (e.g. through improving system design and by training EMR system users), re-use should only be taken care of by appropriately trained experts. (C) 2016 Published by Elsevier Ireland Ltd

    Predicting COVID-19 symptoms from free text in medical records using Artificial Intelligence : feasibility study

    No full text
    BACKGROUND: Electronic medical records have opened opportunities to analyze clinical practice at large scale. Structured registries and coding procedures such as the International Classification of Primary Care further improved these procedures. However, a large part of the information about the state of patient and the doctors’ observations is still entered in free text fields. The main function of those fields is to report the doctor’s line of thought, to remind oneself and his or her colleagues on follow-up actions, and to be accountable for clinical decisions. These fields contain rich information that can be complementary to that in coded fields, and until now, they have been hardly used for analysis. OBJECTIVE: This study aims to develop a prediction model to convert the free text information on COVID-19–related symptoms from out of hours care electronic medical records into usable symptom-based data that can be analyzed at large scale. METHODS: The design was a feasibility study in which we examined the content of the raw data, steps and methods for modelling, as well as the precision and accuracy of the models. A data prediction model for 27 preidentified COVID-19–relevant symptoms was developed for a data set derived from the database of primary-care out-of-hours consultations in Flanders. A multiclass, multilabel categorization classifier was developed. We tested two approaches, which were (1) a classical machine learning–based text categorization approach, Binary Relevance, and (2) a deep neural network learning approach with BERTje, including a domain-adapted version. Ethical approval was acquired through the Institutional Review Board of the Institute of Tropical Medicine and the ethics committee of the University Hospital of Antwerpen (ref 20/50/693). RESULTS: The sample set comprised 3957 fields. After cleaning, 2313 could be used for the experiments. Of the 2313 fields, 85% (n=1966) were used to train the model, and 15% (n=347) for testing. The normal BERTje model performed the best on the data. It reached a weighted F1 score of 0.70 and an exact match ratio or accuracy score of 0.38, indicating the instances for which the model has identified all correct codes. The other models achieved respectable results as well, ranging from 0.59 to 0.70 weighted F1. The Binary Relevance method performed the best on the data without a frequency threshold. As for the individual codes, the domain-adapted version of BERTje performs better on several of the less common objective codes, while BERTje reaches higher F1 scores for the least common labels especially, and for most other codes in general. CONCLUSIONS: The artificial intelligence model BERTje can reliably predict COVID-19–related information from medical records using text mining from the free text fields generated in primary care settings. This feasibility study invites researchers to examine further possibilities to use primary care routine data

    Do GPs know their patients with cancer?Assessing the quality of cancer registration in Dutch primary care : a cross-sectional validation study

    No full text
    OBJECTIVES: To assess the quality of cancer registry in primary care. DESIGN AND SETTING: A cross-sectional validation study using linked data from primary care electronic health records (EHRs) and the Netherlands Cancer Registry (NCR). POPULATION: 290 000 patients, registered with 120 general practitioners (GPs), from 50 practice centres in the Utrecht area, the Netherlands, in January 2013. INTERVENTION: Linking the EHRs of all patients in the Julius General Practitioners' Network database at an individual patient level to the full NCR (∼1.7 million tumours between 1989 and 2011), to determine the proportion of matching cancer diagnoses. Full-text EHR extraction and manual analysis for non-matching diagnoses. MAIN OUTCOME MEASURES: Proportions of matching and non-matching breast, lung, colorectal and prostate cancer diagnoses between 2007 and 2011, stratified by age category, cancer type and EHR system. Differences in year of diagnosis between the EHR and the NCR. Reasons for non-matching diagnoses. RESULTS: In the Primary Care EHR, 60.6% of cancer cases were registered and coded in accordance with the NCR. Of the EHR diagnoses, 48.9% were potentially false positive (not registered in the NCR). Results differed between EHR systems but not between age categories or cancer types. The year of diagnosis corresponded in 80.6% of matching coded diagnoses. Adding full-text EHR analysis improved results substantially. A national disease registry (the NCR) proved incomplete. CONCLUSIONS: Even though GPs do know their patients with cancer, only 60.6% are coded in concordance with the NCR. Reusers of coded EHR data should be aware that 40% of cases can be missed, and almost half can be false positive. The type of EHR system influences registration quality. If full-text manual EHR analysis is used, only 10% of cases will be missed and 20% of cases found will be wrong. EHR data should only be reused with care

    Do GPs know their patients with cancer? Assessing the quality of cancer registration in Dutch primary care: a cross-sectional validation study

    Get PDF
    Objectives: To assess the quality of cancer registry in primary care. Design and setting: A cross-sectional validation study using linked data from primary care electronic health records (EHRs) and the Netherlands Cancer Registry (NCR). Population: 290 000 patients, registered with 120 general practitioners (GPs), from 50 practice centres in the Utrecht area, the Netherlands, in January 2013. Intervention: Linking the EHRs of all patients in the Julius General Practitioners' Network database at an individual patient level to the full NCR (similar to 1.7 million tumours between 1989 and 2011), to determine the proportion of matching cancer diagnoses. Full-text EHR extraction and manual analysis for non-matching diagnoses. Main outcome measures: Proportions of matching and non-matching breast, lung, colorectal and prostate cancer diagnoses between 2007 and 2011, stratified by age category, cancer type and EHR system. Differences in year of diagnosis between the EHR and the NCR. Reasons for non-matching diagnoses. Results: In the Primary Care EHR, 60.6% of cancer cases were registered and coded in accordance with the NCR. Of the EHR diagnoses, 48.9% were potentially false positive (not registered in the NCR). Results differed between EHR systems but not between age categories or cancer types. The year of diagnosis corresponded in 80.6% of matching coded diagnoses. Adding full-text EHR analysis improved results substantially. A national disease registry (the NCR) proved incomplete. Conclusions: Even though GPs do know their patients with cancer, only 60.6% are coded in concordance with the NCR. Reusers of coded EHR data should be aware that 40% of cases can be missed, and almost half can be false positive. The type of EHR system influences registration quality. If full-text manual EHR analysis is used, only 10% of cases will be missed and 20% of cases found will be wrong. EHR data should only be reused with care

    Interventions for cutaneous molluscum contagiosum

    No full text
    Background: Molluscum contagiosum is a common skin infection that is caused by a pox virus and occurs mainly in children. The infection usually resolves within months in people without immune deficiency, but treatment may be preferred for social and cosmetic reasons or to avoid spreading the infection. A clear evidence base supporting the various treatments is lacking. This is an update of a Cochrane Review first published in 2006, and updated previously in 2009. Objectives: To assess the effects of specific treatments and management strategies, including waiting for natural resolution, for cutaneous, non-genital molluscum contagiosum in people without immune deficiency. Search methods: We updated our searches of the following databases to July 2016: the Cochrane Skin Group Specialised Register, CENTRAL, MEDLINE, Embase, and LILACS. We searched six trial registers and checked the reference lists of included studies and review articles for further references to relevant randomised controlled trials. We contacted pharmaceutical companies and experts in the field to identify further relevant randomised controlled trials. Selection criteria: Randomised controlled trials of any treatment of molluscum contagiosum in people without immune deficiency. We excluded trials on sexually transmitted molluscum contagiosum and in people with immune deficiency (including those with HIV infection). Data collection and analysis: Two review authors independently selected studies, assessed methodological quality, and extracted data from selected studies. We obtained missing data from study authors where possible. Main results: We found 11 new studies for this update, resulting in 22 included studies with a total of 1650 participants. The studies examined the effects of topical (20 studies) and systemic interventions (2 studies). Among the new included studies were the full trial reports of three large unpublished studies, brought to our attention by an expert in the field. They all provided moderate-quality evidence for a lack of effect of 5% imiquimod compared to vehicle (placebo) on short-term clinical cure (4 studies, 850 participants, 12 weeks after start of treatment, risk ratio (RR) 1.33, 95% confidence interval (CI) 0.92 to 1.93), medium-term clinical cure (2 studies, 702 participants, 18 weeks after start of treatment, RR 0.88, 95% CI 0.67 to 1.14), and long-term clinical cure (2 studies, 702 participants, 28 weeks after start of treatment, RR 0.97, 95% CI 0.79 to 1.17). We found similar but more certain results for short-term improvement (4 studies, 850 participants, 12 weeks after start of treatment, RR 1.14, 95% CI 0.89 to 1.47; high-quality evidence). For the outcome 'any adverse effect', we found high-quality evidence for little or no difference between topical 5% imiquimod and vehicle (3 studies, 827 participants, RR 0.97, 95% CI 0.88 to 1.07), but application site reactions were more frequent in the groups treated with imiquimod (moderate-quality evidence): any application site reaction (3 studies, 827 participants, RR 1.41, 95% CI 1.13 to 1.77, the number needed to treat for an additional harmful outcome (NNTH) was 11); severe application site reaction (3 studies, 827 participants, RR 4.33, 95% CI 1.16 to 16.19, NNTH over 40). For the following 11 comparisons, there was limited evidence to show which treatment was superior in achieving short-term clinical cure (low-quality evidence): 5% imiquimod less effective than cryospray (1 study, 74 participants, RR 0.60, 95% CI 0.46 to 0.78) and 10% potassium hydroxide (2 studies, 67 participants, RR 0.65, 95% CI 0.46 to 0.93); 10% Australian lemon myrtle oil more effective than olive oil (1 study, 31 participants, RR 17.88, 95% CI 1.13 to 282.72); 10% benzoyl peroxide cream more effective than 0.05% tretinoin (1 study, 30 participants, RR 2.20, 95% CI 1.01 to 4.79); 5% sodium nitrite co-applied with 5% salicylic acid more effective than 5% salicylic acid alone (1 study, 30 participants, RR 3.50, 95% CI 1.23 to 9.92); and iodine plus tea tree oil more effective than tea tree oil (1 study, 37 participants, RR 0.20, 95% CI 0.07 to 0.57) or iodine alone (1 study, 37 participants, RR 0.07, 95% CI 0.01 to 0.50). Although there is some uncertainty, 10% potassium hydroxide appears to be more effective than saline (1 study, 20 participants, RR 3.50, 95% CI 0.95 to 12.90); homeopathic calcarea carbonica appears to be more effective than placebo (1 study, 20 participants, RR 5.57, 95% CI 0.93 to 33.54); 2.5% appears to be less effective than 5% solution of potassium hydroxide (1 study, 25 participants, RR 0.35, 95% CI 0.12 to 1.01); and 10% povidone iodine solution plus 50% salicylic acid plaster appears to be more effective than salicylic acid plaster alone (1 study, 30 participants, RR 1.43, 95% CI 0.95 to 2.16). We found no statistically significant differences for other comparisons (most of which addressed two different topical treatments). We found no randomised controlled trial evidence for expressing lesions or topical hydrogen peroxide. Study limitations included no blinding, many dropouts, and no intention-to-treat analysis. Except for the severe application site reactions of imiquimod, none of the evaluated treatments described above were associated with serious adverse effects (low-quality evidence). Among the most common adverse events were pain during application, erythema, and itching. Included studies of the following comparisons did not report adverse effects: calcarea carbonica versus placebo, 10% povidone iodine plus 50% salicylic acid plaster versus salicylic acid plaster, and 10% benzoyl peroxide versus 0.05% tretinoin. We were unable to judge the risk of bias in most studies due to insufficient information, especially regarding concealment of allocation and possible selective reporting. We considered five studies to be at low risk of bias. Authors' conclusions: No single intervention has been shown to be convincingly effective in the treatment of molluscum contagiosum. We found moderate-quality evidence that topical 5% imiquimod was no more effective than vehicle in terms of clinical cure, but led to more application site reactions, and high-quality evidence that there was no difference between the treatments in terms of short-term improvement. However, high-quality evidence showed a similar number of general side effects in both groups. As the evidence found did not favour any one treatment, the natural resolution of molluscum contagiosum remains a strong method for dealing with the condition

    A New Coding System for Metabolic Disorders Demonstrates Gaps in the International Disease Classifications ICD-10 and SNOMED-CT, Which Can Be Barriers to Genotype-Phenotype Data Sharing

    Get PDF
    <p>Data sharing is essential for a better understanding of genetic disorders. Good phenotype coding plays a key role in this process. Unfortunately, the two most widely used coding systems in medicine, ICD-10 and SNOMED-CT, lack information necessary for the detailed classification and annotation of rare and genetic disorders. This prevents the optimal registration of such patients in databases and thus data-sharing efforts. To improve care and to facilitate research for patients with metabolic disorders, we developed a new coding system for metabolic diseases with a dedicated group of clinical specialists. Next, we compared the resulting codes with those in ICD and SNOMED-CT. No matches were found in 76% of cases in ICD-10 and in 54% in SNOMED-CT. We conclude that there are sizable gaps in the SNOMED-CT and ICD coding systems for metabolic disorders. There may be similar gaps for other classes of rare and genetic disorders. We have demonstrated that expert groups can help in addressing such coding issues. Our coding system has been made available to the ICD and SNOMED-CT organizations as well as to the Orphanet and HPO organizations for further public application and updates will be published online (www.ddrmd.nl and www.cineas.org).</p>
    corecore