Search CORE

3,018 research outputs found

How to make the most of NE dictionaries in statistical NER

Author: A McCallum
B Settles
D Okanohara
EF Tjong Kim Sang
GD Zhou
J Aoe
J Finkel
J Kazama
J Lafferty
J-D Kim
JD Kim
John McNaught
K Franzen
K Fukuda
K Yamamoto
K-M Park
KJ Lee
L Tanabe
LE Baum
M Rössler
N Collier
S Kim
Sophia Ananiadou
T Kudo
TH Tsai
Y Song
Yoshimasa Tsuruoka
Yutaka Sasaki
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

Adapting a relation extraction pipeline for the BioCreAtIvE II task

Author: Grover Claire
Haddow Barry
Klein Ewan
Matthews Michael
Nielsen Leif Arda
Tobin Richard
Wang Xinglong
Publication venue
Publication date: 01/01/2007
Field of study

Edinburgh Research Explorer

Enhanced services for targeted information retrieval by event extraction and data mining

Author: Jungermann Felix
Morik Katharina
Publication venue
Publication date
Field of study

Where Information Retrieval (IR) and Text Categorization delivers a set of (ranked) documents according to a query, users of large document collections would rather like to receive answers. Question-answering from text has already been the goal of the Message Understanding Conferences. Since then, the task of text understanding has been reduced to several more tractable tasks, most prominently Named Entity Recognition (NER) and Relation Extraction. Now, pieces can be put together to form enhanced services added on an IR system. In this paper, we present a framework which combines standard IR with machine learning and (pre-)processing for NER in order to extract events from a large document collection. Some questions can already be answered by particular events. Other questions require an analysis of a set of events. Hence, the extracted events become input to another machine learning process which delivers the final output to the user's question. Our case study is the public collection of minutes of plenary sessions of the German parliament and of petitions to the German parliament. --

Research Papers in Economics

NERD: Evaluating Named Entity Recognition Tools in the Web of Data

Author: Rizzo G. Troncy R.
Publication venue
Publication date: 01/01/2011
Field of study

EURECOM Repository

PORTO Publications Open Repository TOrino

Mining metabolites: extracting the yeast metabolome from the literature

Author: Chikashi Nobata
CR Batchelor
D Banville
D Broadhurst
D Jiao
DB Kell
Douglas B. Kell
GA Eller
J Brecher
J Finkel
J Townsend
J Wisniewski
J Wren
JD Kim
JD Kim
Jun’ichi Tsujii
K Degtyarenko
KM Hettne
L Goebels
M Hucka
M Kanehisa
M Kanehisa
M Kanehisa
M Krallinger
N Okazaki
P Corbett
P Mendes
Paul D. Dobson
PD Dobson
Pedro Mendes
R Hoffmann
R Klinger
S Ananiadou
S Ananiadou
S Ananiadou
Sophia Ananiadou
Syed A. Iqbal
X Wang
Y Kano
Y Kano
Y Miyao
Y Sasaki
Y Tsuruoka
Y Tsuruoka
Publication venue: Springer US
Publication date: 01/01/2011
Field of study

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Boosting Drug Named Entity Recognition using an Aggregate Classifier

Author: Ananiadou Sophia
Dowsey Andrew W.
Korkontzelos Ioannis
Piliouras Dimitrios
Publication venue: 'Elsevier BV'
Publication date: 17/06/2015
Field of study

AbstractObjectiveDrug named entity recognition (NER) is a critical step for complex biomedical NLP tasks such as the extraction of pharmacogenomic, pharmacodynamic and pharmacokinetic parameters. Large quantities of high quality training data are almost always a prerequisite for employing supervised machine-learning techniques to achieve high classification performance. However, the human labour needed to produce and maintain such resources is a significant limitation. In this study, we improve the performance of drug NER without relying exclusively on manual annotations.MethodsWe perform drug NER using either a small gold-standard corpus (120 abstracts) or no corpus at all. In our approach, we develop a voting system to combine a number of heterogeneous models, based on dictionary knowledge, gold-standard corpora and silver annotations, to enhance performance. To improve recall, we employed genetic programming to evolve 11 regular-expression patterns that capture common drug suffixes and used them as an extra means for recognition.MaterialsOur approach uses a dictionary of drug names, i.e. DrugBank, a small manually annotated corpus, i.e. the pharmacokinetic corpus, and a part of the UKPMC database, as raw biomedical text. Gold-standard and silver annotated data are used to train maximum entropy and multinomial logistic regression classifiers.ResultsAggregating drug NER methods, based on gold-standard annotations, dictionary knowledge and patterns, improved the performance on models trained on gold-standard annotations, only, achieving a maximum F-score of 95%. In addition, combining models trained on silver annotations, dictionary knowledge and patterns are shown to achieve comparable performance to models trained exclusively on gold-standard data. The main reason appears to be the morphological similarities shared among drug names.ConclusionWe conclude that gold-standard data are not a hard requirement for drug NER. Combining heterogeneous models build on dictionary knowledge can achieve similar or comparable classification performance with that of the best performing model trained on gold-standard annotations

Elsevier - Publisher Connector

Edge Hill University Research Information Repository

The University of Manchester - Institutional Repository

Explore Bristol Research