Search CORE

14 research outputs found

Ambiguity of human gene symbols in LocusLink and MEDLINE: creating an inventory and a disambiguation test collection

Author: Eijk C.C. (Christiaan) van der
Jelier R. (Rob)
Kors J.A. (Jan)
Mons B. (Barend)
Mulligen E.M. (Erik) van
Schijvenaars R.J.A. (Bob)
Weeber M. (Marc)
Publication venue
Publication date: 01/01/2003
Field of study

Genes are discovered almost on a daily basis and new names have to be found. Although there are guidelines for gene nomenclature, the naming process is highly creative. Human genes are often named with a gene symbol and a longer, more descriptive term; the short form is very often an abbreviation of the long form. Abbreviations in biomedical language are highly ambiguous, i.e., one gene symbol often refers to more than one gene.Using an existing abbreviation expansion algorithm,we explore MEDLINE for the use of human gene symbols derived from LocusLink. It turns out that just over 40% of these symbols occur in MEDLINE, however, many of these occurrences are not related to genes. Along the process of making an inventory, a disambiguation test collection is constructed automatically

EUR Research Repository

Erasmus University Digital Repository

Full-Text Mining: Linking Practice, Protocols and Articles in Biological Research

Author: Eales JM
Robertson DL
Stevens RD
Publication venue
Publication date: 01/01/2008
Field of study

The University of Manchester - Institutional Repository

Electronic data sources for kinetic models of cell signaling

Author: Bhalla Upinder S.
Harsharani G. V.
Vayttaden Sharat J.
Publication venue: 'Oxford University Press (OUP)'
Publication date: 01/01/2005
Field of study

Functional understanding of signaling pathways requires detailed information about the constituent molecules and their interactions. Simulations of signaling pathways therefore build upon a great deal of data from various sources. We first survey electronic data resources for cell signaling modeling and then based on the type of data representation the data sources are broadly classified into five groups. None of the data sources surveyed provide all required data in a ready-to-be-modeled fashion. We then put forward a wish list for the desired attributes for an ideal modeling centric database. Finally, we close with perspectives on how electronic data sources for cell signaling modeling have developed. We suggest that future directions in such data sources are largely model-driven and are hinged on interoperability of data sources

Do peers see more in a paper than its authors?

Author: Divoli Anna
Hearst Marti
Nakov Preslav
Publication venue: eScholarship, University of California
Publication date: 01/01/2012
Field of study

Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances-sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

LAITOR - Literature Assistant for Identification of Terms co-Occurrences and Relationships

Author: Andrade-Navarro Miguel A
Barbosa-Silva Adriano
Fontaine Jean-Fred
Magalhães Ivan LF
Ortega J Miguel
Pavlopoulos Georgios A
Schneider Reinhard
Soldatos Theodoros G
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Biological knowledge is represented in scientific literature that often describes the function of genes/proteins (bioentities) in terms of their interactions (biointeractions). Such bioentities are often related to biological concepts of interest that are specific of a determined research field. Therefore, the study of the current literature about a selected topic deposited in public databases, facilitates the generation of novel hypotheses associating a set of bioentities to a common context. Results We created a text mining system (LAITOR: <it>Literature Assistant for Identification of Terms co-Occurrences and Relationships</it>) that analyses co-occurrences of bioentities, biointeractions, and other biological terms in MEDLINE abstracts. The method accounts for the position of the co-occurring terms within sentences or abstracts. The system detected abstracts mentioning protein-protein interactions in a standard test (BioCreative II IAS test data) with a precision of 0.82-0.89 and a recall of 0.48-0.70. We illustrate the application of LAITOR to the detection of plant response genes in a dataset of 1000 abstracts relevant to the topic. Conclusions Text mining tools combining the extraction of interacting bioentities and biological concepts with network displays can be helpful in developing reasonable hypotheses in different scientific backgrounds.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Repository and Bibliography - Luxembourg

Exploring the Unexplored: Identifying Implicit and Indirect Descriptions of Biomedical Terminologies Based on Multifaceted Weighting Combinations

Author: Sung-Pil Choi
Publication venue
Publication date: 23/04/2020
Field of study

In order to achieve relevant scholarly information from the biomedical databases, researchers generally use technical terms as queries such as proteins, genes, diseases, and other biomedical descriptors. However, the technical terms have limits as query terms because there are so many indirect and conceptual expressions denoting them in scientific literatures. Combinatorial weighting schemes are proposed as an initial approach to this problem, which utilize various indexing and weighting methods and their combinations. In the experiments based on the proposed system and previously constructed evaluation collection, this approach showed promising results in that one could continually locate new relevant expressions by combining the proposed weighting schemes. Furthermore, it could be ascertained that the most outperforming binary combinations of the weighting schemes, showing the inherent traits of the weighting schemes, could be complementary to each other and it is possible to find hidden relevant documents based on the proposed methods

CiteSeerX

Do Peers See More in a Paper Than Its Authors?

Author: Anna Divoli
Marti A. Hearst
Preslav Nakov
Publication venue: 'Hindawi Limited'
Publication date: 01/01/2012
Field of study

Recent years have shown a gradual shift in the content of biomedical publications that is freely accessible, from titles and abstracts to full text. This has enabled new forms of automatic text analysis and has given rise to some interesting questions: How informative is the abstract compared to the full-text? What important information in the full-text is not present in the abstract? What should a good summary contain that is not already in the abstract? Do authors and peers see an article differently? We answer these questions by comparing the information content of the abstract to that in citances—sentences containing citations to that article. We contrast the important points of an article as judged by its authors versus as seen by peers. Focusing on the area of molecular interactions, we perform manual and automatic analysis, and we find that the set of all citances to a target article not only covers most information (entities, functions, experimental methods, and other biological concepts) found in its abstract, but also contains 20% more concepts. We further present a detailed summary of the differences across information types, and we examine the effects other citations and time have on the content of citances

Crossref

Directory of Open Access Journals

PubMed Central

eScholarship - University of California

Ontology-Based Clinical Information Extraction Using SNOMED CT

Author: Li Jun
Publication venue: DigitalCommons@TMC
Publication date: 15/08/2018
Field of study

Extracting and encoding clinical information captured in unstructured clinical documents with standard medical terminologies is vital to enable secondary use of clinical data from practice. SNOMED CT is the most comprehensive medical ontology with broad types of concepts and detailed relationships and it has been widely used for many clinical applications. However, few studies have investigated the use of SNOMED CT in clinical information extraction. In this dissertation research, we developed a fine-grained information model based on the SNOMED CT and built novel information extraction systems to recognize clinical entities and identify their relations, as well as to encode them to SNOMED CT concepts. Our evaluation shows that such ontology-based information extraction systems using SNOMED CT could achieve state-of-the-art performance, indicating its potential in clinical natural language processing

DigitalCommons@The Texas Medical Center

Large-scale biomedical concept recognition: an evaluation of current automatic annotators and their parameters

Author: A Aronson
A Doms
A Jimeno
A Koike
A Sokolov
AT McCray
B Settles
Benjamin Garcia
C Brewster
C Jonquet
C Roeder
C Verspoor
Christophe Roeder
Christopher Funk
D Ferrucci
D Hancock
D Rebholz-Schuhmann
DA Natale
DL Wheeler
DS DeLuca
FM Couto
H Liu
H Yu
HM Muller
IBM
J Bard
JC Denny
JC Denny
JG Caporaso
K Bretonnel Cohen
K Degtyarenko
K Eilbeck
K Verspoor
K Verspoor
K Verspoor
K Verspoor
Karin Verspoor
KB Cohen
L Hunter
L Reeve
L Yao
Lawrence E Hunter
M Bada
M Bada
M Krallinger
M Tanenblatt
Michael Bada
MJ Schuemie
N Kang
N Shah
Ontology Consortium The Gene
P Khatri
PV Ogren
Q Zou
R Leaman
S Ray
S Van Landeghem
SA Stewart
T Rocktaschel
William Baumgartner
WW Chu
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref

Automatic extraction of gene and protein synonyms from MEDLINE and journal articles.

Author: Friedman Carol
Hatzivassiloglou Vasileios
Rzhetsky Andrey
Wilbur W. John
Yu Hong
Publication venue: American Medical Informatics Association
Publication date: 01/01/2002
Field of study

Genes and proteins are often associated with multiple names, and more names are added as new functional or structural information is discovered. Because authors often alternate between these synonyms, information retrieval and extraction benefits from identifying these synonymous names. We have developed a method to extract automatically synonymous gene and protein names from MEDLINE and journal articles. We first identified patterns authors use to list synonymous gene and protein names. We developed SGPE (for synonym extraction of gene and protein names), a software program that recognizes the patterns and extracts from MEDLINE abstracts and full-text journal articles candidate synonymous terms. SGPE then applies a sequence of filters that automatically screen out those terms that are not gene and protein names. We evaluated our method to have an overall precision of 71% on both MEDLINE and journal articles, and 90% precision on the more suitable full-text articles alon

CiteSeerX

PubMed Central