Search CORE

26 research outputs found

Thesaurus-based disambiguation of gene symbols

Author: Kors Jan A
Mons Barend
Schijvenaars Bob JA
Schuemie Martijn J
van Mulligen Erik M
Wain Hester M
Weeber Marc
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Massive text mining of the biological literature holds great promise of relating disparate information and discovering new knowledge. However, disambiguation of gene symbols is a major bottleneck. RESULTS: We developed a simple thesaurus-based disambiguation algorithm that can operate with very little training data. The thesaurus comprises the information from five human genetic databases and MeSH. The extent of the homonym problem for human gene symbols is shown to be substantial (33% of the genes in our combined thesaurus had one or more ambiguous symbols), not only because one symbol can refer to multiple genes, but also because a gene symbol can have many non-gene meanings. A test set of 52,529 Medline abstracts, containing 690 ambiguous human gene symbols taken from OMIM, was automatically generated. Overall accuracy of the disambiguation algorithm was up to 92.7% on the test set. CONCLUSION: The ambiguity of human gene symbols is substantial, not only because one symbol may denote multiple genes but particularly because many symbols have other, non-gene meanings. The proposed disambiguation approach resolves most ambiguities in our test set with high accuracy, including the important gene/not a gene decisions. The algorithm is fast and scalable, enabling gene-symbol disambiguation in massive text mining applications

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

EUR Research Repository

Erasmus University Digital Repository

Using contextual queries

Author: Diwersy M.
Eijk C.C. (Christiaan) van der
Jelier R. (Rob)
Kors J.A. (Jan)
Mons B. (Barend)
Mulligen E.M. (Erik) van
Schijvenaars R.J.A. (Bob)
Weeber M. (Marc)
Publication venue
Publication date: 01/01/2003
Field of study

Search engines generally treat search requests in isolation. The results for a given query are identical, independent of the user, or the context in which the user made the request. An approach is demonstrated that explores implicit contexts as obtained from a document the user is reading. The approach inserts into an original (web) document functionality to directly activate context driven queries that yield related articles obtained from various information sources

EUR Research Repository

Erasmus University Digital Repository

Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection

Author: Eijk C.C. (Christiaan) van der
Jelier R. (Rob)
Kors J.A. (Jan)
Mons B. (Barend)
Mulligen E.M. (Erik) van
Schijvenaars R.J.A. (Bob)
Weeber M. (Marc)
Publication venue
Publication date: 01/01/2003
Field of study

Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol often refers to more than one gene.Using an existing abbreviation expansion algorithm,we explore MEDLINE for the use of human gene symbols derived from LocusLink. It turns out that just over 40% of these symbols occur in MEDLINE, however, many of these occurrences are not related to genes. Along the process of making an inventory, a disambiguation test collection is constructed automatically

EUR Research Repository

Erasmus University Digital Repository

Applied information retrieval and multidisciplinary research: new mechanistic hypotheses in Complex Regional Pain Syndrome

Author: Boyer Scott
Cases Montserrat
de Bruijn Anke GJ
de Mos Marissa
Hettne Kristina M
Mestres Jordi
van der Lei Johan
van Mulligen Erik M
Weeber Marc
Publication venue: BioMed Central
Publication date: 01/01/2007
Field of study

Background: Collaborative efforts of physicians and basic scientists are often necessary in the investigation of complex disorders. Difficulties can arise, however, when large amounts of information need to reviewed. Advanced information retrieval can be beneficial in combining and reviewing data obtained from the various scientific fields. In this paper, a team of investigators with varying backgrounds has applied advanced information retrieval methods, in the form of text mining and entity relationship tools, to review the current literature, with the intention to generate new insights into the molecular mechanisms underlying a complex disorder. As an example of such a disorder the Complex Regional Pain Syndrome (CRPS) was chosen. CRPS is a painful and debilitating syndrome with a complex etiology that is still unraveled for a considerable part, resulting in suboptimal diagnosis and treatment. Results: A text mining based approach combined with a simple network analysis identified Nuclear Factor kappa B (NFκB) as a possible central mediator in both the initiation and progression of CRPS. Conclusion: The result shows the added value of a multidisciplinary approach combined with information retrieval in hypothesis discovery in biomedical research. The new hypothesis, which was derived in silico, provides a framework for further mechanistic studies into the underlying molecular mechanisms of CRPS and requires evaluation in clinical and epidemiological studies

CiteSeerX

Maastricht University Research Portal

Crossref

Springer - Publisher Connector

PubMed Central

EUR Research Repository

Leiden University Scholary Publications

Erasmus University Digital Repository

Using Noun Phrases for Navigating Biomedical Literature on Pubmed: How Many Updates Are We Losing Track of?

Author: A Névéol
A Rzhetsky
Andrey Rzhetsky
BM Fonseca
C Jacquemin
C Manning
CD Manning
D Beeferman
D Rebholz-Schuhmann
D Shotton
D Shotton
D Srikrishna
D Trieschnigg
Devabhaktuni Srikrishna
DR Hunter
GF Cooper
J Evans
J Lin
JPA Ionnidis
M Muin
M Weeber
Marc A. Coram
MH MacRoberts
MJ Schuemie
N Tran
O Bodenreider
P Srinivasan
PL Elkin
Q He
Q Li
R Islamaj Dogan
R Schifanella
RA DiGiacomo
S Bird
T Rindflesch
T Wachter
V Sintchenko
W Kim
Y Huang
Z Lu
Z Sun
Publication venue: Public Library of Science
Publication date: 14/09/2011
Field of study

Author-supplied citations are a fraction of the related literature for a paper. The “related citations” on PubMed is typically dozens or hundreds of results long, and does not offer hints why these results are related. Using noun phrases derived from the sentences of the paper, we show it is possible to more transparently navigate to PubMed updates through search terms that can associate a paper with its citations. The algorithm to generate these search terms involved automatically extracting noun phrases from the paper using natural language processing tools, and ranking them by the number of occurrences in the paper compared to the number of occurrences on the web. We define search queries having at least one instance of overlap between the author-supplied citations of the paper and the top 20 search results as citation validated (CV). When the overlapping citations were written by same authors as the paper itself, we define it as CV-S and different authors is defined as CV-D. For a systematic sample of 883 papers on PubMed Central, at least one of the search terms for 86% of the papers is CV-D versus 65% for the top 20 PubMed “related citations.” We hypothesize these quantities computed for the 20 million papers on PubMed to differ within 5% of these percentages. Averaged across all 883 papers, 5 search terms are CV-D, and 10 search terms are CV-S, and 6 unique citations validate these searches. Potentially related literature uncovered by citation-validated searches (either CV-S or CV-D) are on the order of ten per paper – many more if the remaining searches that are not citation-validated are taken into account. The significance and relationship of each search result to the paper can only be vetted and explained by a researcher with knowledge of or interest in that paper

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central