Search CORE

22 research outputs found

How to make the most of NE dictionaries in statistical NER

Author: A McCallum
B Settles
D Okanohara
EF Tjong Kim Sang
GD Zhou
J Aoe
J Finkel
J Kazama
J Lafferty
J-D Kim
JD Kim
John McNaught
K Franzen
K Fukuda
K Yamamoto
K-M Park
KJ Lee
L Tanabe
LE Baum
M Rössler
N Collier
S Kim
Sophia Ananiadou
T Kudo
TH Tsai
Y Song
Yoshimasa Tsuruoka
Yutaka Sasaki
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

Analysing Entity Type Variation across Biomedical Subdomains

Author: Ananiadou Sophia
Batista-Navarro Riza Theresa
Mihaila Claudiu
Publication venue
Publication date: 26/05/2012
Field of study

The University of Manchester - Institutional Repository

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials

Crossref

PubMed Central

The University of Manchester - Institutional Repository

A Survey of Biological Entity Recognition Approaches

Author: Gurinder Pal Singh Gosal
Publication venue: 'Auricle Technologies, Pvt., Ltd.'
Publication date: 30/09/2015
Field of study

There has been growing interest in the task of Named Entity Recognition (NER) and a lot of research has been done in this direction in last two decades. Particularly, a lot of progress has been made in the biomedical domain with emphasis on identifying domain-specific entities and often the task being known as Biological Named Entity Recognition (BER). The task of biological entity recognition (BER) has been proved to be a challenging task due to several reasons as identified by many researchers. The recognition of biological entities in text and the extraction of relationships between them have paved the way for doing more complex text-mining tasks and building further applications. This paper looks at the challenges perceived by the researchers in BER task and investigates the works done in the domain of BER by using the multiple approaches available for the task

International Journal on Recent and Innovation Trends in Computing and Communication

Automatic extraction of angiogenesis bioprocess from text

Author: Ananiadou
Hunter
Hvidsten
I. Barrett
I. Dix
I. McKendrick
J. Tsujii
Kola
S. Ananiadou
T. French
X. Wang
Publication venue: Oxford University Press
Publication date: 01/09/2011
Field of study

Motivation: Understanding key biological processes (bioprocesses) and their relationships with constituent biological entities and pharmaceutical agents is crucial for drug design and discovery. One way to harvest such information is searching the literature. However, bioprocesses are difficult to capture because they may occur in text in a variety of textual expressions. Moreover, a bioprocess is often composed of a series of bioevents, where a bioevent denotes changes to one or a group of cells involved in the bioprocess. Such bioevents are often used to refer to bioprocesses in text, which current techniques, relying solely on specialized lexicons, struggle to find

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Named Entity Recognition for Bacterial Type IV Secretion Systems

Author: Ananiadou Sophia
Black William
Gillespie Joseph J.
Kolluru BalaKrishna
Levow Gina-Anne
Mao Chunhong
Pyysalo Sampo
Sobral Bruno
Sullivan Dan
Tsujii Junichi
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Research on specialized biological systems is often hampered by a lack of consistent terminology, especially across species. In bacterial Type IV secretion systems genes within one set of orthologs may have over a dozen different names. Classifying research publications based on biological processes, cellular components, molecular functions, and microorganism species should improve the precision and recall of literature searches allowing researchers to keep up with the exponentially growing literature, through resources such as the Pathosystems Resource Integration Center (PATRIC, patricbrc.org). We developed named entity recognition (NER) tools for four entities related to Type IV secretion systems: 1) bacteria names, 2) biological processes, 3) molecular functions, and 4) cellular components. These four entities are important to pathogenesis and virulence research but have received less attention than other entities, e.g., genes and proteins. Based on an annotated corpus, large domain terminological resources, and machine learning techniques, we developed recognizers for these entities. High accuracy rates (>80%) are achieved for bacteria, biological processes, and molecular function. Contrastive experiments highlighted the effectiveness of alternate recognition strategies; results of term extraction on contrasting document sets demonstrated the utility of these classes for identifying T4SS-related documents

CiteSeerX

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository