Search CORE

Springer - Publisher Connector

Directory of Open Access Journals

Leiden University Scholary Publications

Springer - Publisher Connector

Thesaurus-based disambiguation of gene symbols

Author: Kors Jan A
Mons Barend
Schijvenaars Bob JA
Schuemie Martijn J
van Mulligen Erik M
Wain Hester M
Weeber Marc
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. RESULTS: We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. CONCLUSION: The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications

Directory of Open Access Journals

Identification of acute myocardial infarction from electronic healthcare records using different disease coding systems

Author: Coloma Pm
Eu-Adr Consortium
Mazzaglia G
Molokhia M
Morabito P
Mosseveld M
Nielsson Ms
Pedersen L
Schuemie Mj
Sturkenboom M
Trifir\uf2 G
Valkhoff Ve
van der Lei J
Publication venue: 'BMJ'
Publication date: 01/01/2013
Field of study

Objective: To evaluate positive predictive value (PPV) of different disease codes and free text in identifying acute myocardial infarction (AMI) from electronic healthcare records (EHRs). Design: Validation study of cases of AMI identified from general practitioner records and hospital discharge diagnoses using free text and codes from the International Classification of Primary Care (ICPC), International Classification of Diseases 9th revision-clinical modification (ICD9-CM) and ICD-10th revision (ICD-10). Setting: Population-based databases comprising routinely collected data from primary care in Italy and the Netherlands and from secondary care in Denmark from 1996 to 2009. Participants: A total of 4 034 232 individuals with 22 428 883 person-years of follow-up contributed to the data, from which 42 774 potential AMI cases were identified. A random sample of 800 cases was subsequently obtained for validation. Main outcome measures: PPVs were calculated overall and for each code/free text. 'Best-case scenario' and 'worst-case scenario' PPVs were calculated, the latter taking into account non-retrievable/non-assessable cases. We further assessed the effects of AMI misclassification on estimates of risk during drug exposure. Results: Records of 748 cases (93.5% of sample) were retrieved. ICD-10 codes had a 'best-case scenario' PPV of 100% while ICD9-CM codes had a PPV of 96.6% (95% CI 93.2% to 99.9%). ICPC codes had a 'best-case scenario' PPV of 75% (95% CI 67.4% to 82.6%) and free text had PPV ranging from 20% to 60%. Corresponding PPVs in the 'worst-case scenario' all decreased. Use of codes with lower PPV generally resulted in small changes in AMI risk during drug exposure, but codes with higher PPV resulted in attenuation of risk for positive associations. Conclusions: ICD9-CM and ICD-10 codes have good PPV in identifying AMI from EHRs; strategies are necessary to further optimise utility of ICPC codes and free-text search. Use of specific AMI disease codes in estimation of risk during drug exposure may lead to small but significant changes and at the expense of decreased precision

Catalogo dei prodotti della ricerca

Medication-Wide Association Studies

Author: Hripcsak George M.
Madigan David B.
Ryan P. B.
Schuemie M. J.
Stang P. E.
Publication venue: 'Columbia University Libraries/Information Services'
Publication date: 01/01/2013
Field of study

Undiscovered side effects of drugs can have a profound effect on the health of the nation, and electronic health-care databases offer opportunities to speed up the discovery of these side effects. We applied a “medication-wide association study” approach that combined multivariate analysis with exploratory visualization to study four health outcomes of interest in an administrative claims database of 46 million patients and a clinical database of 11 million patients. The technique had good predictive value, but there was no threshold high enough to eliminate false-positive findings. The visualization not only highlighted the class effects that strengthened the review of specific products but also underscored the challenges in confounding. These findings suggest that observational databases are useful for identifying potential associations that warrant further consideration but are unlikely to provide definitive evidence of causal effects

Columbia University Academic Commons

Prenatal antidepressant use and risk of attention-deficit/hyperactivity disorder in offspring:population based cohort study

Author: Chan Esther W.
Chan Phyllis K. L.
Coghill David
Ip Patrick
Lau Wallis C. Y.
Man Kenneth K. C.
Schuemie Martijn J.
Simonoff Emily
Sturkenboom Miriam C. J. M.
Wong Ian C. K.
Publication venue: 'BMJ'
Publication date: 01/01/2017
Field of study

textabstractObjective To assess the potential association between prenatal use of antidepressants and the risk of attention-deficit/hyperactivity disorder (ADHD) in offspring. Design Population based cohort study. Setting Data from the Hong Kong population based electronic medical records on the Clinical Data Analysis and Reporting System. Participants 190 618 children born in Hong Kong public hospitals between January 2001 and December 2009 and followed-up to December 2015. Main outcome measure Hazard ratio of maternal antidepressant use during pregnancy and ADHD in children aged 6 to 14 years, with an average follow-up time of 9.3 years (range 7.4-11.0 years). Results Among 190 618 children, 1252 had a mother who used prenatal antidepressants. 5659 children (3.0%) were given a diagnosis of ADHD or received treatment for ADHD. The crude hazard ratio of maternal antidepressant use during pregnancy was 2.26 (P<0.01) compared with non-use. After adjustment for potential confounding factors, including maternal psychiatric disorders and use of other psychiatric drugs, the adjusted hazard ratio was reduced to 1.39 (95% confidence interval 1.07 to 1.82, P=0.01). Likewise, similar results were observed when comparing children of mothers who had used antidepressants before pregnancy with those who were never users (1.76, 1.36 to 2.30, P<0.01). The risk of ADHD in the children of mothers with psychiatric disorders was higher compared with the children of mothers without psychiatric disorders even if the mothers had never used antidepressants (1.84, 1.54 to 2.18, P<0.01). All sensitivity analyses yielded similar results. Sibling matched analysis identified no significant difference in risk of ADHD in siblings exposed to antidepressants during gestation and those not exposed during gestation (0.54, 0.17 to 1.74, P=0.30). Conclusions The findings suggest that the association between prenatal use of antidepressants and risk of ADHD in offspring can be partially explained by confounding by indication of antidepressants. If there is a causal association, the size of the effect is probably smaller than that reported previously

University of Dundee Online Publications

King's Research Portal

Using Electronic Health Care Records for Drug Safety Signal Detection A Comparative Evaluation of Statistical Methods

Author: Coloma Preciosa M.
Gini Rosa
Herings Ron M. C.
Innocenti Francesco
Matthews Justin Neil
Mazzaglia Giampiero
Molokhia Mariam
Pedersen Lars
Picelli Gino
Prieto-Merino David
Schuemie Martijn J.
Scotti Lorenza
Straatman Huub
Sturkenboom Miriam C. J. M.
Trifiro Gianluca
van der Lei Johan
Publication venue
Publication date: 01/01/2012
Field of study

LSHTM Research Online

Catalogo dei prodotti della ricerca

King's Research Portal

Open Access Repository

Archivio Istituzionale della Ricerca- Università del Piemonte Orientale

PatientExploreR: an extensible application for dynamic visualization of patient clinical history from electronic health records in the OMOP common data model.

Author: Attali
Atul J Butte
Badgeley
Benjamin S Glicksberg
Bethany Percha
Boris Oskotsky
Chang
Debajyoti Datta
Duke
Estiri
Eugenia Rutenberg
Frankovich
Glicksberg
Hirsch
Hripcsak
Hripcsak
Huser
Jensen
Joel T Dudley
Jonathan Wren
Kipp W Johnson
Krause
Levine
Li Li
Malik
Mandel
Marcus A Badgeley
Mark M Shervey
Nadav Rappoport
Nelson Lee
Nicholas Giangreco
Nicholas P Tatonetti
Perer
Phyllis M Thangaraj
Pivovarov
Rajkomar
Remi Frazier
Riccardo Miotto
Rick Larsen
Rind
Schuemie
Shaddox
Sharat Israni
Sievert
Soulakis
Theodore C Goldstein
Vashisht
Vivek A Rudrapatna
West
Zhang
Publication venue: eScholarship, University of California
Publication date: 01/11/2019
Field of study

MotivationElectronic health records (EHRs) are quickly becoming omnipresent in healthcare, but interoperability issues and technical demands limit their use for biomedical and clinical research. Interactive and flexible software that interfaces directly with EHR data structured around a common data model (CDM) could accelerate more EHR-based research by making the data more accessible to researchers who lack computational expertise and/or domain knowledge.ResultsWe present PatientExploreR, an extensible application built on the R/Shiny framework that interfaces with a relational database of EHR data in the Observational Medical Outcomes Partnership CDM format. PatientExploreR produces patient-level interactive and dynamic reports and facilitates visualization of clinical data without any programming required. It allows researchers to easily construct and export patient cohorts from the EHR for analysis with other software. This application could enable easier exploration of patient-level data for physicians and researchers. PatientExploreR can incorporate EHR data from any institution that employs the CDM for users with approved access. The software code is free and open source under the MIT license, enabling institutions to install and users to expand and modify the application for their own purposes.Availability and implementationPatientExploreR can be freely obtained from GitHub: https://github.com/BenGlicksberg/PatientExploreR. We provide instructions for how researchers with approved access to their institutional EHR can use this package. We also release an open sandbox server of synthesized patient data for users without EHR access to explore: http://patientexplorer.ucsf.edu.Supplementary informationSupplementary data are available at Bioinformatics online

eScholarship - University of California

Knowledge-based biomedical word sense disambiguation: comparison of approaches

Author: A Aronson
A Aronson
A Aronson
A Jimeno-Yepes
A Jimeno-Yepes
Alan R Aronson
Antonio J Jimeno-Yepes
B McInnes
C Leacock
D Alexopoulou
D Demner-Fushman
D Rebholz-Schuhmann
E Agirre
E Agirre
F Vasilescu
G Leroy
J Mork
M Joshi
M Lesk
M Schuemie
M Stevenson
M Weeber
S Gaudan
S Humphrey
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Word sense disambiguation (WSD) algorithms attempt to select the proper sense of ambiguous terms in text. Resources like the UMLS provide a reference thesaurus to be used to annotate the biomedical literature. Statistical learning approaches have produced good results, but the size of the UMLS makes the production of training data infeasible to cover all the domain. Methods We present research on existing WSD approaches based on knowledge bases, which complement the studies performed on statistical learning. We compare four approaches which rely on the UMLS Metathesaurus as the source of knowledge. The first approach compares the overlap of the context of the ambiguous word to the candidate senses based on a representation built out of the definitions, synonyms and related terms. The second approach collects training data for each of the candidate senses to perform WSD based on queries built using monosemous synonyms and related terms. These queries are used to retrieve MEDLINE citations. Then, a machine learning approach is trained on this corpus. The third approach is a graph-based method which exploits the structure of the Metathesaurus network of relations to perform unsupervised WSD. This approach ranks nodes in the graph according to their relative structural importance. The last approach uses the semantic types assigned to the concepts in the Metathesaurus to perform WSD. The context of the ambiguous word and semantic types of the candidate concepts are mapped to Journal Descriptors. These mappings are compared to decide among the candidate concepts. Results are provided estimating accuracy of the different methods on the WSD test collection available from the NLM. Conclusions We have found that the last approach achieves better results compared to the other methods. The graph-based approach, using the structure of the Metathesaurus network to estimate the relevance of the Metathesaurus concepts, does not perform well compared to the first two methods. In addition, the combination of methods improves the performance over the individual approaches. On the other hand, the performance is still below statistical learning trained on manually produced data and below the maximum frequency sense baseline. Finally, we propose several directions to improve the existing methods and to improve the Metathesaurus to be more effective in WSD.</p

Springer - Publisher Connector

Directory of Open Access Journals

University of Melbourne Institutional Repository

A standardized analytics pipeline for reliable and rapid development and validation of prediction models using observational health data

Author: Blacketer Clair
Duarte-Salles Talita
Fernández-Bertolín Sergio
Khalid Sara
Kim Chungsoo
Park Jimyung
Park Rae Woong
Reps Jenna M.
Rijnbeek Peter R.
Schuemie Martijn J.
Sena Anthony G.
Suchard Marc A.
Yang Cynthia
You Seng Chan
Publication venue
Publication date: 06/09/2021
Field of study

Background and objective: As a response to the ongoing COVID-19 pandemic, several prediction models in the existing literature were rapidly developed, with the aim of providing evidence-based guidance. However, none of these COVID-19 prediction models have been found to be reliable. Models are commonly assessed to have a risk of bias, often due to insufficient reporting, use of non-representative data, and lack of large-scale external validation. In this paper, we present the Observational Health Data Sciences and Informatics (OHDSI) analytics pipeline for patient-level prediction modeling as a standardized approach for rapid yet reliable development and validation of prediction models. We demonstrate how our analytics pipeline and open-source software tools can be used to answer important prediction questions while limiting potential causes of bias (e.g., by validating phenotypes, specifying the target population, performing large-scale external validation, and publicly providing all analytical source code). Methods: We show step-by-step how to implement the analytics pipeline for the question: ‘In patients hospitalized with COVID-19, what is the risk of death 0 to 30 days after hospitalization?’. We develop models using six different machine learning methods in a USA claims database containing over 20,000 COVID-19 hospitalizations and externally validate the models using data containing over 45,000 COVID-19 hospitalizations from South Korea, Spain, and the USA. Results: Our open-source software tools enabled us to efficiently go end-to-end from problem design to reliable Model Development and evaluation. When predicting death in patients hospitalized with COVID-19, AdaBoost, random forest, gradient boosting machine, and decision tree yielded similar or lower internal and external validation discrimination performance compared to L1-regularized logistic regression, whereas the MLP neural network consistently resulted in lower discrimination. L1-regularized logistic regression models were well calibrated. Conclusion: Our results show that following the OHDSI analytics pipeline for patient-level prediction modelling can enable the rapid development towards reliable prediction models. The OHDSI software tools and pipeline are open source and available to researchers from all around the world.</p

ZENODO

eScholarship - University of California

Oxford University Research Archive

NEUROSURGERY ENTHUSIASTIC WOMEN SOCIETY