Search CORE

Identifying gene and protein mentions in text using conditional random fields

Author: McDonald Ryan
Pereira Fernando
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

We present a model for tagging gene and protein mentions from text using the probabilistic sequence tagging framework of conditional random fields (CRFs). Conditional random fields model the probability P(t|o) of a tag sequence given an observation sequence directly, and have previously been employed successfully for other tagging tasks. The mechanics of CRFs and their relationship to maximum entropy are discussed in detail. We employ a diverse feature set containing standard orthographic features combined with expert features in the form of gene and biological term lexicons to achieve a precision of 86.4% and recall of 78.7%. An analysis of the contribution of the various features of the model is provided

ScholarlyCommons@Penn

Automatically annotating documents with normalized gene lists

Author: Crim Jeremiah
McDonald Ryan
Pereira Fernando
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

BACKGROUND: Document gene normalization is the problem of creating a list of unique identifiers for genes that are mentioned within a document. Automating this process has many potential applications in both information extraction and database curation systems. Here we present two separate solutions to this problem. The first is primarily based on standard pattern matching and information extraction techniques. The second and more novel solution uses a statistical classifier to recognize valid gene matches from a list of known gene synonyms. RESULTS: We compare the results of the two systems, analyze their merits and argue that the classification based system is preferable for many reasons including performance, simplicity and robustness. Our best systems attain a balanced precision and recall in the range of 74%–92%, depending on the organism

Directory of Open Access Journals

ScholarlyCommons@Penn

How to make the most of NE dictionaries in statistical NER

Author: A McCallum
B Settles
D Okanohara
EF Tjong Kim Sang
GD Zhou
J Aoe
J Finkel
J Kazama
J Lafferty
J-D Kim
JD Kim
John McNaught
K Franzen
K Fukuda
K Yamamoto
K-M Park
KJ Lee
L Tanabe
LE Baum
M Rössler
N Collier
S Kim
Sophia Ananiadou
T Kudo
TH Tsai
Y Song
Yoshimasa Tsuruoka
Yutaka Sasaki
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

The University of Manchester - Institutional Repository

Biomedical Named Entity Recognition: A Review

Author: Ahmad Kamsuriah
Alshaikhdeeb Basel
Publication venue: 'Insight Society'
Publication date: 22/12/2016
Field of study

Biomedical Named Entity Recognition (BNER) is the task of identifying biomedical instances such as chemical compounds, genes, proteins, viruses, disorders, DNAs and RNAs. The key challenge behind BNER lies on the methods that would be used for extracting such entities. Most of the methods used for BNER were relying on Supervised Machine Learning (SML) techniques. In SML techniques, the features play an essential role in terms of improving the effectiveness of the recognition process. Features can be identified as a set of discriminating and distinguishing characteristics that have the ability to indicate the occurrence of an entity. In this manner, the features should be able to generalize which means to discriminate the entities correctly even on new and unseen samples. Several studies have tackled the role of feature in terms of identifying named entities. However, with the surge of biomedical researches, there is a vital demand to explore biomedical features. This paper aims to accommodate a review study on the features that could be used for BNER in which various types of features will be examined including morphological features, dictionary-based features, lexical features and distance-based features

International Journal on Advanced Science, Engineering and Information Technology

Recognition of protein/gene names from text using an ensemble of classifiers

Author: Shen Dan
Su Jian
Tan SoonHeng
Zhang Jie
Zhou GuoDong
Publication venue: BioMed Central
Publication date: 01/01/2005
Field of study

This paper proposes an ensemble of classifiers for biomedical name recognition in which three classifiers, one Support Vector Machine and two discriminative Hidden Markov Models, are combined effectively using a simple majority voting strategy. In addition, we incorporate three post-processing modules, including an abbreviation resolution module, a protein/gene name refinement module and a simple dictionary matching module, into the system to further improve the performance. Evaluation shows that our system achieves the best performance from among 10 systems with a balanced F-measure of 82.58 on the closed evaluation of the BioCreative protein/gene name recognitiontask (Task 1A)

Directory of Open Access Journals

Directory of Open Access Journals

ScholarBank@NUS

A System for Identifying Named Entities in Biomedical Text: how Results From two Evaluations Reflect on Both the System and the Evaluations

Author: Dingare Shipra
Finkel Jenny
Grover Claire
Manning Christopher
Nissim Malvina
Publication venue: Hindawi Publishing Corporation
Publication date: 01/01/2005
Field of study

We present a maximum entropy-based system for identifying named entities (NEs) in biomedical abstracts and present its performance in the only two biomedical named entity recognition (NER) comparative evaluations that have been held to date, namely BioCreative and Coling BioNLP. Our system obtained an exact match F-score of 83.2% in the BioCreative evaluation and 70.1% in the BioNLP evaluation. We discuss our system in detail, including its rich use of local features, attention to correct boundary identification, innovative use of external knowledge resources, including parsing and web searches, and rapid adaptation to new NE sets. We also discuss in depth problems with data annotation in the evaluations which caused the final performance to be lower than optimal