297 research outputs found

    Rewriting and suppressing UMLS terms for improved biomedical term identification

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Identification of terms is essential for biomedical text mining.. We concentrate here on the use of vocabularies for term identification, specifically the Unified Medical Language System (UMLS). To make the UMLS more suitable for biomedical text mining we implemented and evaluated nine term rewrite and eight term suppression rules. The rules rely on UMLS properties that have been identified in previous work by others, together with an additional set of new properties discovered by our group during our work with the UMLS. Our work complements the earlier work in that we measure the impact on the number of terms identified by the different rules on a MEDLINE corpus. The number of uniquely identified terms and their frequency in MEDLINE were computed before and after applying the rules. The 50 most frequently found terms together with a sample of 100 randomly selected terms were evaluated for every rule.</p> <p>Results</p> <p>Five of the nine rewrite rules were found to generate additional synonyms and spelling variants that correctly corresponded to the meaning of the original terms and seven out of the eight suppression rules were found to suppress only undesired terms. Using the five rewrite rules that passed our evaluation, we were able to identify 1,117,772 new occurrences of 14,784 rewritten terms in MEDLINE. Without the rewriting, we recognized 651,268 terms belonging to 397,414 concepts; with rewriting, we recognized 666,053 terms belonging to 410,823 concepts, which is an increase of 2.8% in the number of terms and an increase of 3.4% in the number of concepts recognized. Using the seven suppression rules, a total of 257,118 undesired terms were suppressed in the UMLS, notably decreasing its size. 7,397 terms were suppressed in the corpus.</p> <p>Conclusions</p> <p>We recommend applying the five rewrite rules and seven suppression rules that passed our evaluation when the UMLS is to be used for biomedical term identification in MEDLINE. A software tool to apply these rules to the UMLS is freely available at <url>http://biosemantics.org/casper</url>.</p

    Thesaurus-based disambiguation of gene symbols

    Get PDF
    BACKGROUND: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. RESULTS: We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. CONCLUSION: The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications

    Application of a Common Data Model (CDM) to rank the paediatric user and prescription prevalence of 15 different drug classes in South Korea, Hong Kong, Taiwan, Japan and Australia: an observational, descriptive study

    Get PDF
    Objective: To measure the paediatric user and prescription prevalence in inpatient and ambulatory settings in South Korea, Hong Kong, Taiwan, Japan and Australia by age and gender. A further objective was to list the most commonly used drugs per drug class, per country. Design and setting: Hospital inpatient and insurance paediatric healthcare data from the following databases were used to conduct this descriptive drug utilisation study: (i) the South Korean Ajou University School of Medicine database; (ii) the Hong Kong Clinical Data Analysis and Reporting System; (iii) the Japan Medical Data Center; (iv) Taiwan’s National Health Insurance Research Database and (v) the Australian Pharmaceutical Benefits Scheme. Country-specific data were transformed into the Observational Medical Outcomes Partnership Common Data Model. Patients: Children (≤18 years) with at least 1 day of observation in any of the respective databases from January 2009 until December 2013 were included. Main outcome measures: For each drug class, we assessed the per-protocol overall user and prescription prevalence rates (per 1000 persons) per country and setting. Results: Our study population comprised 1 574 524 children (52.9% male). The highest proportion of dispensings was recorded in the youngest age category (<2 years) for inpatients (45.1%) with a relatively high user prevalence of analgesics and antibiotics. Adrenergics, antihistamines, mucolytics and corticosteroids were used in 10%–15% of patients. For ambulatory patients, the highest proportion of dispensings was recorded in the middle age category (2–11 years, 67.1%) with antibiotics the most dispensed drug overall. Conclusions: Country-specific paediatric drug utilisation patterns were described, ranked and compared between four East Asian countries and Australia. The widespread use of mucolytics in East Asia warrants further investigation

    Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems

    Get PDF
    Objective: To evaluate positive predictive value (PPV) of different disease codes and free text in identifying acute myocardial infarction (AMI) from electronic healthcare records (EHRs). Design: Validation study of cases of AMI identified from general practitioner records and hospital discharge diagnoses using free text and codes from the International Classification of Primary Care (ICPC), International Classification of Diseases 9th revision-clinical modification (ICD9-CM) and ICD-10th revision (ICD-10). Setting: Population-based databases comprising routinely collected data from primary care in Italy and the Netherlands and from secondary care in Denmark from 1996 to 2009. Participants: A total of 4 034 232 individuals with 22 428 883 person-years of follow-up contributed to the data, from which 42 774 potential AMI cases were identified. A random sample of 800 cases was subsequently obtained for validation. Main outcome measures: PPVs were calculated overall and for each code/free text. 'Best-case scenario' and 'worst-case scenario' PPVs were calculated, the latter taking into account non-retrievable/non-assessable cases. We further assessed the effects of AMI misclassification on estimates of risk during drug exposure. Results: Records of 748 cases (93.5% of sample) were retrieved. ICD-10 codes had a 'best-case scenario' PPV of 100% while ICD9-CM codes had a PPV of 96.6% (95% CI 93.2% to 99.9%). ICPC codes had a 'best-case scenario' PPV of 75% (95% CI 67.4% to 82.6%) and free text had PPV ranging from 20% to 60%. Corresponding PPVs in the 'worst-case scenario' all decreased. Use of codes with lower PPV generally resulted in small changes in AMI risk during drug exposure, but codes with higher PPV resulted in attenuation of risk for positive associations. Conclusions: ICD9-CM and ICD-10 codes have good PPV in identifying AMI from EHRs; strategies are necessary to further optimise utility of ICPC codes and free-text search. Use of specific AMI disease codes in estimation of risk during drug exposure may lead to small but significant changes and at the expense of decreased precision

    Medication-Wide Association Studies

    Get PDF
    Undiscovered side effects of drugs can have a profound effect on the health of the nation, and electronic health-care databases offer opportunities to speed up the discovery of these side effects. We applied a “medication-wide association study” approach that combined multivariate analysis with exploratory visualization to study four health outcomes of interest in an administrative claims database of 46 million patients and a clinical database of 11 million patients. The technique had good predictive value, but there was no threshold high enough to eliminate false-positive findings. The visualization not only highlighted the class effects that strengthened the review of specific products but also underscored the challenges in confounding. These findings suggest that observational databases are useful for identifying potential associations that warrant further consideration but are unlikely to provide definitive evidence of causal effects

    Incidence, prevalence and prescription patterns of antipsychotic medications use in Asia and US: A cross-nation comparison with common data model

    Get PDF
    The use of antipsychotic medications (APMs) could be different among countries due to availability, approved indications, characteristics and clinical practice. However, there is limited literature providing comparisons of APMs use among countries. To examine trends in antipsychotic prescribing in Taiwan, Hong Kong, Japan, and the United States, we conducted a cross-national study from 2002 to 2014 by using the distributed network approach with common data model. We included all patients who had at least a record of antipsychotic prescription in this study, and defined patients without previous exposure of antipsychotics for 6 months before the index date as new users for incidence estimation. We calculated the incidence, prevalence, and prescription rate of each medication by calendar year. Among older patients, sulpiride was the most incident [incidence rate (IR) 11.0-23.3) and prevalent [prevalence rate (PR) 11.9-14.3) APM in Taiwan, and most prevalent (PR 2.5-3.9) in Japan. Quetiapine and haloperidol were most common in the United States (IR 8.1-9.5; PR 18.0-18.4) and Hong Kong (PR 8.8-13.7; PR 10.6-12.7), respectively. The trend of quetiapine use was increasing in Taiwan, Hong Kong and the United States. As compared to older patients, the younger patients had more propensity to be prescribed second-generation APM for treatment in four countries. Trends in antipsychotic prescribing varied among countries. Quetiapine use was most prevalent in the United States and increasing in Taiwan and Hong Kong. The increasing use of quetiapine in the elderly patients might be due to its safety profile compared to other APMs

    PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model.

    Get PDF
    MotivationElectronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge.ResultsWe present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes.Availability and implementationPatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu.Supplementary informationSupplementary data are available at Bioinformatics online

    Prenatal antidepressant use and risk of attention-deficit/hyperactivity disorder in offspring:population based cohort study

    Get PDF
    textabstractObjective To assess the potential association between prenatal use of antidepressants and the risk of attention-deficit/hyperactivity disorder (ADHD) in offspring. Design Population based cohort study. Setting Data from the Hong Kong population based electronic medical records on the Clinical Data Analysis and Reporting System. Participants 190 618 children born in Hong Kong public hospitals between January 2001 and December 2009 and followed-up to December 2015. Main outcome measure Hazard ratio of maternal antidepressant use during pregnancy and ADHD in children aged 6 to 14 years, with an average follow-up time of 9.3 years (range 7.4-11.0 years). Results Among 190 618 children, 1252 had a mother who used prenatal antidepressants. 5659 children (3.0%) were given a diagnosis of ADHD or received treatment for ADHD. The crude hazard ratio of maternal antidepressant use during pregnancy was 2.26 (P<0.01) compared with non-use. After adjustment for potential confounding factors, including maternal psychiatric disorders and use of other psychiatric drugs, the adjusted hazard ratio was reduced to 1.39 (95% confidence interval 1.07 to 1.82, P=0.01). Likewise, similar results were observed when comparing children of mothers who had used antidepressants before pregnancy with those who were never users (1.76, 1.36 to 2.30, P<0.01). The risk of ADHD in the children of mothers with psychiatric disorders was higher compared with the children of mothers without psychiatric disorders even if the mothers had never used antidepressants (1.84, 1.54 to 2.18, P<0.01). All sensitivity analyses yielded similar results. Sibling matched analysis identified no significant difference in risk of ADHD in siblings exposed to antidepressants during gestation and those not exposed during gestation (0.54, 0.17 to 1.74, P=0.30). Conclusions The findings suggest that the association between prenatal use of antidepressants and risk of ADHD in offspring can be partially explained by confounding by indication of antidepressants. If there is a causal association, the size of the effect is probably smaller than that reported previously
    corecore