Search CORE

239 research outputs found

Analysing Entity Type Variation across Biomedical Subdomains

Author: Ananiadou Sophia
Batista-Navarro Riza Theresa
Mihaila Claudiu
Publication venue
Publication date: 26/05/2012
Field of study

The University of Manchester - Institutional Repository

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

Author: A Copestake
A Tiwari
Apache
B Florian
B Ludascher
B Mellebeek
B Muller
BalaKrishna Kolluru
C Kolarik
C Kolrik
C Nobata
C Steinbeck
CJ Rupp
CJ Rupp
D Banville
D Ferrucci
D Jiao
I Taylor
J Shon
J Wren
JA Townsend
Junichi Tsujii
K Hettne
K Hettne
Lezan Hawizy
M Hassan
N Kemp
P Corbett
P Corbett
P Murray-Rust
P Murray-Rust
Peter Murray-Rust
R Klinger
R Klinger
SG Vellay
Sophia Ananiadou
T Kuhn
T Kuhn
T Oinn
Tim J. Hubbard
WJ Wilbur
Y Kano
Y Kano
Y Kano
Y Kano
Y Miyao
Y Tsuruoka
Y Tsuruoka
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Biomedical Named Entity Recognition: A Review

Author: Ahmad Kamsuriah
Alshaikhdeeb Basel
Publication venue: 'Insight Society'
Publication date: 22/12/2016
Field of study

Biomedical Named Entity Recognition (BNER) is the task of identifying biomedical instances such as chemical compounds, genes, proteins, viruses, disorders, DNAs and RNAs. The key challenge behind BNER lies on the methods that would be used for extracting such entities. Most of the methods used for BNER were relying on Supervised Machine Learning (SML) techniques. In SML techniques, the features play an essential role in terms of improving the effectiveness of the recognition process. Features can be identified as a set of discriminating and distinguishing characteristics that have the ability to indicate the occurrence of an entity. In this manner, the features should be able to generalize which means to discriminate the entities correctly even on new and unseen samples. Several studies have tackled the role of feature in terms of identifying named entities. However, with the surge of biomedical researches, there is a vital demand to explore biomedical features. This paper aims to accommodate a review study on the features that could be used for BNER in which various types of features will be examined including morphological features, dictionary-based features, lexical features and distance-based features

International Journal on Advanced Science, Engineering and Information Technology

Chemical entity extraction using CRF and an ensemble of extractors

Author
Publication venue: Springer
Publication date: 19/01/2015
Field of study

Springer - Publisher Connector

TechMiner: Extracting Technologies from Academic Publications

Author: A Bandrowski
C Bizer
C Fellbaum
F Osborne
F Osborne
F Ronzano
K Scanning Douw
P Corbett
R Usbeck
S Peroni
T Groza
W Huang
Publication venue
Publication date: 01/01/2016
Field of study

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision

Crossref

Online Research @ Cardiff

Open Research Online (The Open University)

Chemical named entities recognition: a review on approaches and applications

Author: Eltyeb Safaa
Salim Naomie
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

The rapid increase in the flow rate of published digital information in all disciplines has resulted in a pressing need for techniques that can simplify the use of this information. The chemistry literature is very rich with information about chemical entities. Extracting molecules and their related properties and activities from the scientific literature to "text mine" these extracted data and determine contextual relationships helps research scientists, particularly those in drug development. One of the most important challenges in chemical text mining is the recognition of chemical entities mentioned in the texts. In this review, the authors briefly introduce the fundamental concepts of chemical literature mining, the textual contents of chemical documents, and the methods of naming chemicals in documents. We sketch out dictionary-based, rule-based and machine learning, as well as hybrid chemical named entity recognition approaches with their applied solutions. We end with an outlook on the pros and cons of these approaches and the types of chemical entities extracte

Springer - Publisher Connector

PubMed Central

Universiti Teknologi Malaysia Institutional Repository