256 research outputs found

    Thesaurus-based disambiguation of gene symbols

    Get PDF
    BACKGROUND: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. RESULTS: We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. CONCLUSION: The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications

    Application of a Common Data Model (CDM) to rank the paediatric user and prescription prevalence of 15 different drug classes in South Korea, Hong Kong, Taiwan, Japan and Australia: an observational, descriptive study

    Get PDF
    Objective: To measure the paediatric user and prescription prevalence in inpatient and ambulatory settings in South Korea, Hong Kong, Taiwan, Japan and Australia by age and gender. A further objective was to list the most commonly used drugs per drug class, per country. Design and setting: Hospital inpatient and insurance paediatric healthcare data from the following databases were used to conduct this descriptive drug utilisation study: (i) the South Korean Ajou University School of Medicine database; (ii) the Hong Kong Clinical Data Analysis and Reporting System; (iii) the Japan Medical Data Center; (iv) Taiwan’s National Health Insurance Research Database and (v) the Australian Pharmaceutical Benefits Scheme. Country-specific data were transformed into the Observational Medical Outcomes Partnership Common Data Model. Patients: Children (≤18 years) with at least 1 day of observation in any of the respective databases from January 2009 until December 2013 were included. Main outcome measures: For each drug class, we assessed the per-protocol overall user and prescription prevalence rates (per 1000 persons) per country and setting. Results: Our study population comprised 1 574 524 children (52.9% male). The highest proportion of dispensings was recorded in the youngest age category (<2 years) for inpatients (45.1%) with a relatively high user prevalence of analgesics and antibiotics. Adrenergics, antihistamines, mucolytics and corticosteroids were used in 10%–15% of patients. For ambulatory patients, the highest proportion of dispensings was recorded in the middle age category (2–11 years, 67.1%) with antibiotics the most dispensed drug overall. Conclusions: Country-specific paediatric drug utilisation patterns were described, ranked and compared between four East Asian countries and Australia. The widespread use of mucolytics in East Asia warrants further investigation

    Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems

    Get PDF
    Objective: To evaluate positive predictive value (PPV) of different disease codes and free text in identifying acute myocardial infarction (AMI) from electronic healthcare records (EHRs). Design: Validation study of cases of AMI identified from general practitioner records and hospital discharge diagnoses using free text and codes from the International Classification of Primary Care (ICPC), International Classification of Diseases 9th revision-clinical modification (ICD9-CM) and ICD-10th revision (ICD-10). Setting: Population-based databases comprising routinely collected data from primary care in Italy and the Netherlands and from secondary care in Denmark from 1996 to 2009. Participants: A total of 4 034 232 individuals with 22 428 883 person-years of follow-up contributed to the data, from which 42 774 potential AMI cases were identified. A random sample of 800 cases was subsequently obtained for validation. Main outcome measures: PPVs were calculated overall and for each code/free text. 'Best-case scenario' and 'worst-case scenario' PPVs were calculated, the latter taking into account non-retrievable/non-assessable cases. We further assessed the effects of AMI misclassification on estimates of risk during drug exposure. Results: Records of 748 cases (93.5% of sample) were retrieved. ICD-10 codes had a 'best-case scenario' PPV of 100% while ICD9-CM codes had a PPV of 96.6% (95% CI 93.2% to 99.9%). ICPC codes had a 'best-case scenario' PPV of 75% (95% CI 67.4% to 82.6%) and free text had PPV ranging from 20% to 60%. Corresponding PPVs in the 'worst-case scenario' all decreased. Use of codes with lower PPV generally resulted in small changes in AMI risk during drug exposure, but codes with higher PPV resulted in attenuation of risk for positive associations. Conclusions: ICD9-CM and ICD-10 codes have good PPV in identifying AMI from EHRs; strategies are necessary to further optimise utility of ICPC codes and free-text search. Use of specific AMI disease codes in estimation of risk during drug exposure may lead to small but significant changes and at the expense of decreased precision

    Medication-Wide Association Studies

    Get PDF
    Undiscovered side effects of drugs can have a profound effect on the health of the nation, and electronic health-care databases offer opportunities to speed up the discovery of these side effects. We applied a “medication-wide association study” approach that combined multivariate analysis with exploratory visualization to study four health outcomes of interest in an administrative claims database of 46 million patients and a clinical database of 11 million patients. The technique had good predictive value, but there was no threshold high enough to eliminate false-positive findings. The visualization not only highlighted the class effects that strengthened the review of specific products but also underscored the challenges in confounding. These findings suggest that observational databases are useful for identifying potential associations that warrant further consideration but are unlikely to provide definitive evidence of causal effects

    Incidence, prevalence and prescription patterns of antipsychotic medications use in Asia and US: A cross-nation comparison with common data model

    Get PDF
    The use of antipsychotic medications (APMs) could be different among countries due to availability, approved indications, characteristics and clinical practice. However, there is limited literature providing comparisons of APMs use among countries. To examine trends in antipsychotic prescribing in Taiwan, Hong Kong, Japan, and the United States, we conducted a cross-national study from 2002 to 2014 by using the distributed network approach with common data model. We included all patients who had at least a record of antipsychotic prescription in this study, and defined patients without previous exposure of antipsychotics for 6 months before the index date as new users for incidence estimation. We calculated the incidence, prevalence, and prescription rate of each medication by calendar year. Among older patients, sulpiride was the most incident [incidence rate (IR) 11.0-23.3) and prevalent [prevalence rate (PR) 11.9-14.3) APM in Taiwan, and most prevalent (PR 2.5-3.9) in Japan. Quetiapine and haloperidol were most common in the United States (IR 8.1-9.5; PR 18.0-18.4) and Hong Kong (PR 8.8-13.7; PR 10.6-12.7), respectively. The trend of quetiapine use was increasing in Taiwan, Hong Kong and the United States. As compared to older patients, the younger patients had more propensity to be prescribed second-generation APM for treatment in four countries. Trends in antipsychotic prescribing varied among countries. Quetiapine use was most prevalent in the United States and increasing in Taiwan and Hong Kong. The increasing use of quetiapine in the elderly patients might be due to its safety profile compared to other APMs

    PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model.

    Get PDF
    MotivationElectronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge.ResultsWe present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes.Availability and implementationPatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu.Supplementary informationSupplementary data are available at Bioinformatics online

    Prenatal antidepressant use and risk of attention-deficit/hyperactivity disorder in offspring:population based cohort study

    Get PDF
    textabstractObjective To assess the potential association between prenatal use of antidepressants and the risk of attention-deficit/hyperactivity disorder (ADHD) in offspring. Design Population based cohort study. Setting Data from the Hong Kong population based electronic medical records on the Clinical Data Analysis and Reporting System. Participants 190 618 children born in Hong Kong public hospitals between January 2001 and December 2009 and followed-up to December 2015. Main outcome measure Hazard ratio of maternal antidepressant use during pregnancy and ADHD in children aged 6 to 14 years, with an average follow-up time of 9.3 years (range 7.4-11.0 years). Results Among 190 618 children, 1252 had a mother who used prenatal antidepressants. 5659 children (3.0%) were given a diagnosis of ADHD or received treatment for ADHD. The crude hazard ratio of maternal antidepressant use during pregnancy was 2.26 (P<0.01) compared with non-use. After adjustment for potential confounding factors, including maternal psychiatric disorders and use of other psychiatric drugs, the adjusted hazard ratio was reduced to 1.39 (95% confidence interval 1.07 to 1.82, P=0.01). Likewise, similar results were observed when comparing children of mothers who had used antidepressants before pregnancy with those who were never users (1.76, 1.36 to 2.30, P<0.01). The risk of ADHD in the children of mothers with psychiatric disorders was higher compared with the children of mothers without psychiatric disorders even if the mothers had never used antidepressants (1.84, 1.54 to 2.18, P<0.01). All sensitivity analyses yielded similar results. Sibling matched analysis identified no significant difference in risk of ADHD in siblings exposed to antidepressants during gestation and those not exposed during gestation (0.54, 0.17 to 1.74, P=0.30). Conclusions The findings suggest that the association between prenatal use of antidepressants and risk of ADHD in offspring can be partially explained by confounding by indication of antidepressants. If there is a causal association, the size of the effect is probably smaller than that reported previously

    Refining adverse drug reaction signals by incorporating interaction variables identified using emergent pattern mining

    Get PDF
    Purpose: To develop a framework for identifying and incorporating candidate confounding interaction terms into a regularised cox regression analysis to refine adverse drug reaction signals obtained via longitudinal observational data. Methods: We considered six drug families that are commonly associated with myocardial infarction in observational healthcare data, but where the causal relationship ground truth is known (adverse drug reaction or not). We applied emergent pattern mining to find itemsets of drugs and medical events that are associated with the development of myocardial infarction. These are the candidate confounding interaction terms. We then implemented a cohort study design using regularised cox regression that incorporated and accounted for the candidate confounding interaction terms. Results: The methodology was able to account for signals generated due to confounding and a cox regression with elastic net regularisation correctly ranking the drug families known to be true adverse drug reactions above those that are not. This was not the case without the inclusion of the candidate confounding interaction terms, where confounding leads to a non-adverse drug reaction being ranked highest. Conclusions: The methodology is efficient, can identify high-order confounding interactions and does not require expert input to specify outcome specific confounders, so it can be applied for any outcome of interest to quickly refine its signals. The proposed method shows excellent potential to overcome some forms of confounding and therefore reduce the false positive rate for signal analysis using longitudinal data

    Knowledge-based biomedical word sense disambiguation: comparison of approaches

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain.</p> <p>Methods</p> <p>We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM.</p> <p>Conclusions</p> <p>We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.</p
    corecore