Search CORE

492 research outputs found

Analysis of gene ranking algorithms with extraction of relevant biomedical concepts from Pubmed publications

Author: Ananiadou S
Kim J-D
Kocbek S
Kokol P
Pernek I
Saetre R
Stiglic G
Tsujii J
Tsuruoka Y
Publication venue
Publication date: 01/01/2011
Field of study

The University of Manchester - Institutional Repository

Normalizing biomedical terms by minimizing ambiguity and variability

Author: AA Morgan
B Settles
BL Humphreys
C Blaschke
D Hanisch
E Brill
G Navarro
G Ngai
G Zhou
H Fang
H Liu
H Liu
JD Kim
JD Wren
John McNaught
K Samuel
KB Cohen
L Hirschman
L Tanabe
L Tanabe
L Yeganova
M Krauthammer
MJ Schuemie
S Kulick
Sophia Ananiadou
The UniProt Consortium
WW Cohen
Y Tsuruoka
Y Tsuruoka
Yoshimasa Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background One of the difficulties in mapping biomedical named entities, e.g. genes, proteins, chemicals and diseases, to their concept identifiers stems from the potential variability of the terms. Soft string matching is a possible solution to the problem, but its inherent heavy computational cost discourages its use when the dictionaries are large or when real time processing is required. A less computationally demanding approach is to normalize the terms by using heuristic rules, which enables us to look up a dictionary in a constant time regardless of its size. The development of good heuristic rules, however, requires extensive knowledge of the terminology in question and thus is the bottleneck of the normalization approach. Results We present a novel framework for discovering a list of normalization rules from a dictionary in a fully automated manner. The rules are discovered in such a way that they minimize the ambiguity and variability of the terms in the dictionary. We evaluated our algorithm using two large dictionaries: a human gene/protein name dictionary built from BioThesaurus and a disease name dictionary built from UMLS. Conclusions The experimental results showed that automatically discovered rules can perform comparably to carefully crafted heuristic rules in term mapping tasks, and the computational overhead of rule application is small enough that a very fast implementation is possible. This work will help improve the performance of term-concept mapping tasks in biomedical information extraction especially when good normalization heuristics for the target terminology are not fully known.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

PPLook: an automated data mining tool for protein-protein interaction

Author: A Chatr-aryamontri
BJ Breitkreutz
BJ Stapley
C Blaschke
C Friedman
D Braga
D Shreiner
D Zhou
JH Eom
JM Fernández
JM Temkin
JW Cooper
L Hermjakob
L Salwinski
Li Xia
MP Marcus
N Daraselia
Quan Pan
RS Wright
S Chernov
S Kim
Shao-Wu Zhang
T Ohta
T Ono
Y Tsuruoka
Y Tsuruoka
Yao-Jun Li
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

Abstract Background Extracting and visualizing of protein-protein interaction (PPI) from text literatures are a meaningful topic in protein science. It assists the identification of interactions among proteins. There is a lack of tools to extract PPI, visualize and classify the results. Results We developed a PPI search system, termed PPLook, which automatically extracts and visualizes protein-protein interaction (PPI) from text. Given a query protein name, PPLook can search a dataset for other proteins interacting with it by using a keywords dictionary pattern-matching algorithm, and display the topological parameters, such as the number of nodes, edges, and connected components. The visualization component of PPLook enables us to view the interaction relationship among the proteins in a three-dimensional space based on the OpenGL graphics interface technology. PPLook can also provide the functions of selecting protein semantic class, counting the number of semantic class proteins which interact with query protein, counting the literature number of articles appearing the interaction relationship about the query protein. Moreover, PPLook provides heterogeneous search and a user-friendly graphical interface. Conclusions PPLook is an effective tool for biologists and biosystem developers who need to access PPI information from the literature. PPLook is freely available for non-commercial users at <url>http://meta.usc.edu/softs/PPLook</url>.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Predictability study on the aftershock sequence following the 2011 Tohoku-Oki, Japan, earthquake: first results

Author: Falcone G.
Hirata N.
Ishigaki Y.
Jordan T. H.
Kasahara K.
Nanjo K. Z.
Obara K.
Ogata Y.
Schorlemmer D.
Shiomi K.
Tsuruoka H.
Yokoi S.
Zhuang J.
Publication venue: 'Wiley'
Publication date: 01/11/2012
Field of study

Although no deterministic and reliable earthquake precursor is known to date, we are steadily gaining insight into probabilistic forecasting that draws on space–time characteristics of earthquake clustering. Clustering-based models aiming to forecast earthquakes within the next 24 hours are under test in the global project ‘Collaboratory for the Study of Earthquake Predictability’ (CSEP). The 2011 March 11 magnitude 9.0 Tohoku-Oki earthquake in Japan provides a unique opportunity to test the existing 1-day CSEP models against its unprecedentedly active aftershock sequence. The original CSEP experiment performs tests after the catalogue is finalized to avoid bias due to poor data quality. However, this study differs from this tradition and uses the preliminary catalogue revised and updated by the Japan Meteorological Agency (JMA), which is often incomplete but is immediately available. This study is intended as a first step towards operability-oriented earthquake forecasting in Japan. Encouragingly, at least one model passed the test in most combinations of the target day and the testing method, although the models could not take account of the megaquake in advance and the catalogue used for forecast generation was incomplete. However, it can also be seen that all models have only limited forecasting power for the period immediately after the quake. Our conclusion does not change when the preliminary JMAcatalogue is replaced by the finalized one, implying that the models perform stably over the catalogue replacement and are applicable to operational earthquake forecasting. However, we emphasize the need of further research on model improvement to assure the reliability of forecasts for the days immediately after the main quake. Seismicity is expected to remain high in all parts of Japan over the coming years. Our results present a way to answer the urgent need to promote research on time-dependent earthquake predictability to prepare for subsequent large earthquakes in the near future in Japan.Published653-6583.1. Fisica dei terremotiJCR Journalrestricte

Text Mining the History of Medicine

Author: A Henriksson
AR Aronson
C Mihăilă
Carsten Timmermann
D Lopresti
D McClosky
Elizabeth Toon
G Hripcsak
G Schneider
Georgios Kontonatsios
H Moen
H Suominen
J Cohen
J-D Kim
Jacob Carter
John McNaught
JR Firth
K Bontcheva
KB Wagholikar
L Kelly
LM Schriml
Luis M. Rocha
M Miwa
M Miwa
M Ruiz-Casado
M Worboys
MA Hearst
Michael Worboys
N Alnazzawi
O Bodenreider
P Murrieta-Flores
P Thompson
Paul Thompson
R Prasad
RI Dogan
Riza Theresa Batista-Navarro
S Jonnalagadda
S Pyysalo
S Zhang
Sophia Ananiadou
T Hitchcock
TH Tanner
Y Tsuruoka
Y Tsuruoka
Y Tsuruoka
Y Wang
Z Liu
ZS Harris
Ö Uzuner
Ö Uzuner
Ö Uzuner
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 06/01/2016
Field of study

Historical text archives constitute a rich and diverse source of information, which is becoming increasingly readily accessible, due to large-scale digitisation efforts. However, it can be difficult for researchers to explore and search such large volumes of data in an efficient manner. Text mining (TM) methods can help, through their ability to recognise various types of semantic information automatically, e.g., instances of concepts (places, medical conditions, drugs, etc.), synonyms/variant forms of concepts, and relationships holding between concepts (which drugs are used to treat which medical conditions, etc.). TM analysis allows search systems to incorporate functionality such as automatic suggestions of synonyms of user-entered query terms, exploration of different concepts mentioned within search results or isolation of documents in which concepts are related in specific ways. However, applying TM methods to historical text can be challenging, according to differences and evolutions in vocabulary, terminology, language structure and style, compared to more modern text. In this article, we present our efforts to overcome the various challenges faced in the semantic analysis of published historical medical text dating back to the mid 19th century. Firstly, we used evidence from diverse historical medical documents from different periods to develop new resources that provide accounts of the multiple, evolving ways in which concepts, their variants and relationships amongst them may be expressed. These resources were employed to support the development of a modular processing pipeline of TM tools for the robust detection of semantic information in historical medical documents with varying characteristics. We applied the pipeline to two large-scale medical document archives covering wide temporal ranges as the basis for the development of a publicly accessible semantically-oriented search system. The novel resources are available for research purposes, while the processing pipeline and its modules may be used and configured within the Argo TM platform

Crossref

Directory of Open Access Journals

Edge Hill University Research Information Repository

PubMed Central

The University of Manchester - Institutional Repository

Mining metabolites: extracting the yeast metabolome from the literature

Author: Chikashi Nobata
CR Batchelor
D Banville
D Broadhurst
D Jiao
DB Kell
Douglas B. Kell
GA Eller
J Brecher
J Finkel
J Townsend
J Wisniewski
J Wren
JD Kim
JD Kim
Jun’ichi Tsujii
K Degtyarenko
KM Hettne
L Goebels
M Hucka
M Kanehisa
M Kanehisa
M Kanehisa
M Krallinger
N Okazaki
P Corbett
P Mendes
Paul D. Dobson
PD Dobson
Pedro Mendes
R Hoffmann
R Klinger
S Ananiadou
S Ananiadou
S Ananiadou
Sophia Ananiadou
Syed A. Iqbal
X Wang
Y Kano
Y Kano
Y Miyao
Y Sasaki
Y Tsuruoka
Y Tsuruoka
Publication venue: Springer US
Publication date: 01/01/2011
Field of study

Text mining methods have added considerably to our capacity to extract biological knowledge from the literature. Recently the field of systems biology has begun to model and simulate metabolic networks, requiring knowledge of the set of molecules involved. While genomics and proteomics technologies are able to supply the macromolecular parts list, the metabolites are less easily assembled. Most metabolites are known and reported through the scientific literature, rather than through large-scale experimental surveys. Thus it is important to recover them from the literature. Here we present a novel tool to automatically identify metabolite names in the literature, and associate structures where possible, to define the reported yeast metabolome. With ten-fold cross validation on a manually annotated corpus, our recognition tool generates an f-score of 78.49 (precision of 83.02) and demonstrates greater suitability in identifying metabolite names than other existing recognition tools for general chemical molecules. The metabolite recognition tool has been applied to the literature covering an important model organism, the yeast Saccharomyces cerevisiae, to define its reported metabolome. By coupling to ChemSpider, a major chemical database, we have identified structures for much of the reported metabolome and, where structure identification fails, been able to suggest extensions to ChemSpider. Our manually annotated gold-standard data on 296 abstracts are available as supplementary materials. Metabolite names and, where appropriate, structures are also available as supplementary materials

Crossref

PubMed Central

The University of Manchester - Institutional Repository

Gene and protein nomenclature in public databases

Author: AA Morgan
AS Schwartz
D Hanisch
D Hanisch
E Adar
E Brill
H Liu
H Liu
H Yu
JT Chang
K Fundel
Katrin Fundel
L Chen
L Hirschman
L Hirschman
M Szugat
M Weeber
O Tuason
Ralf Zimmer
T Ono
V Hatzivassiloglou
Y Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

BACKGROUND: Frequently, several alternative names are in use for biological objects such as genes and proteins. Applications like manual literature search, automated text-mining, named entity identification, gene/protein annotation, and linking of knowledge from different information sources require the knowledge of all used names referring to a given gene or protein. Various organism-specific or general public databases aim at organizing knowledge about genes and proteins. These databases can be used for deriving gene and protein name dictionaries. So far, little is known about the differences between databases in terms of size, ambiguities and overlap. RESULTS: We compiled five gene and protein name dictionaries for each of the five model organisms (yeast, fly, mouse, rat, and human) from different organism-specific and general public databases. We analyzed the degree of ambiguity of gene and protein names within and between dictionaries, to a lexicon of common English words and domain-related non-gene terms, and we compared different data sources in terms of size of extracted dictionaries and overlap of synonyms between those. The study shows that the number of genes/proteins and synonyms covered in individual databases varies significantly for a given organism, and that the degree of ambiguity of synonyms varies significantly between different organisms. Furthermore, it shows that, despite considerable efforts of co-curation, the overlap of synonyms in different data sources is rather moderate and that the degree of ambiguity of gene names with common English words and domain-related non-gene terms varies depending on the considered organism. CONCLUSION: In conclusion, these results indicate that the combination of data contained in different databases allows the generation of gene and protein name dictionaries that contain significantly more used names than dictionaries obtained from individual data sources. Furthermore, curation of combined dictionaries considerably increases size and decreases ambiguity. The entries of the curated synonym dictionary are available for manual querying, editing, and PubMed- or Google-search via the ProThesaurus-wiki. For automated querying via custom software, we offer a web service and an exemplary client application

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

Open Access LMU

Solid 4He and the Supersolid Phase: from Theoretical Speculation to the Discovery of a New State of Matter? A Review of the Past and Present Status of Research

Author: Anderson P. W.
Anderson P. W.
Anderson P. W.
Andreev A. F.
Aoki Y.
Aoki Y.
Balibar S.
Benedek G.
Benedek G.
Bernu B.
Bishop D. J.
Bishop D. J.
Blackburn E.
Bonfait G.
Boninsegni M.
Boninsegni M.
Boninsegni M.
Boronat J.
Ceperley D. M.
Ceperley D. M.
Ceperley D. M.
Chan M. H. W.
Cheng Y. C.
Cheng Y. C.
Chester G. V.
Clark A. C.
Clark A. C.
Clark A. C.
Clark B. K.
Clark B. K.
Dai X.
Day J.
Day J.
Day J.
Dorsey A. T.
Draeger E.
Fernandez J. F.
Galli D. E.
Galli D. E.
Galli D. E.
Galli D. E.
Galli D. E.
Galli D. E.
Galli D. E.
Galli D. E.
Gan J. Y.
Goodkind J. M.
Greywall D. S.
Grigor'ev V. N.
Guyer R. A.
Guyer R. A.
Hansen J. P.
Heidarian D.
Hetherington J. H.
Ho P.-C.
Hodgdon J. A.
Imry Y.
Iwasa I.
Josserand C.
Josserand C.
Kalos M. H.
Khairallah S. A.
Kim E.
Kim E.
Kim E.
Kim E.
Kim E.
Kondo M.
Leggett A. J.
Leggett A. J.
Leggett A. J.
Lengua G. A.
Lin X.
Liu K. S.
Liu M.
Mahan G. D.
Maris H. B.
Markovich T.
McMillan W. L.
Meisel M. W.
Moroni S.
Moroni S.
Mullin W. J.
Nussinov Z.
Paalanen M. A.
Pederiva F.
Pederiva F.
Pelleg O.
Pelleg O.
Penrose O.
Penzev A.
Pollet L.
Pollet L.
Pomeau Y.
Prokof'ev N. V.
Prokof'ev N. V.
Ray M. W.
Reatto L.
Reatto L.
Reatto L.
Rittner A. S. C.
Rittner A. S. C.
Rittner A. S. C.
Rossi M.
Rossi M.
Ruutu J. P.
Sarsa A.
Sasaki S.
Sasaki S.
Saslow W. M.
Saslow W. M.
Saslow W. M.
Simmons R. O.
Simmons R. O.
Suzuki H.
Todoshchenko I. A.
Todoshchenko I. A.
Todoshchenko I. A.
Toner J.
Tsuruoka F.
Tsymbalenko V. L.
Vitali E.
Vitiello S. A.
Wessel S.
Ye J.
Ye J.
Publication venue: 'Japan Society of Applied Physics'
Publication date: 01/11/2008
Field of study

The possibility of a supersolid state of matter, i.e., a crystalline solid exhibiting superfluid properties, first appeared in theoretical studies about forty years ago. After a long period of little interest due to the lack of experimental evidence, it has attracted strong experimental and theoretical attention in the last few years since Kim and Chan (Penn State, USA) reported evidence for nonclassical rotational inertia effects, a typical signature of superfluidity, in samples of solid 4He. Since this "first observation", other experimental groups have observed such effects in the response to the rotation of samples of crystalline helium, and it has become clear that the response of the solid is extremely sensitive to growth conditions, annealing processes, and 3He impurities. A peak in the specific heat in the same range of temperatures has been reported as well as anomalies in the elastic behaviour of solid 4He with a strong resemblance to the phenomena revealed by torsional oscillator experiments. Very recently, the observation of unusual mass transport in hcp solid 4He has also been reported, suggesting superflow. From the theoretical point of view, powerful simulation methods have been used to study solid 4He, but the interpretation of the data is still rather difficult; dealing with the question of supersolidity means that one has to face not only the problem of the coexistence of quantum coherence phenomena and crystalline order, exploring the realm of spontaneous symmetry breaking and quantum field theory, but also the problem of the role of disorder, i.e., how defects, such as vacancies, impurities, dislocations, and grain boundaries, participate in the phase transition mechanism.Comment: Published on J. Phys. Soc. Jpn., Vol.77, No.11, p.11101

arXiv.org e-Print Archive

Crossref

AIR Universita degli studi di Milano

Accelerating the annotation of sparse named entities by dynamic sentence selection

Author: A Culotta
A Globerson
A Vlachos
AA Morgan
B Settles
CA Thompson
D Okanohara
D Shen
EF Tjong Kim Sang
I Dagan
J Lafferty
J Nocedal
JD Kim
JD Kim
Jun'ichi Tsujii
K Tomanek
L Tanabe
LR Rabiner
S Engelson
S Kulick
S Sarawagi
Sophia Ananiadou
Yoshimasa Tsuruoka
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Crossref

Springer - Publisher Connector

PubMed Central

The University of Manchester - Institutional Repository

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

Author: A Copestake
A Tiwari
Apache
B Florian
B Ludascher
B Mellebeek
B Muller
BalaKrishna Kolluru
C Kolarik
C Kolrik
C Nobata
C Steinbeck
CJ Rupp
CJ Rupp
D Banville
D Ferrucci
D Jiao
I Taylor
J Shon
J Wren
JA Townsend
Junichi Tsujii
K Hettne
K Hettne
Lezan Hawizy
M Hassan
N Kemp
P Corbett
P Corbett
P Murray-Rust
P Murray-Rust
Peter Murray-Rust
R Klinger
R Klinger
SG Vellay
Sophia Ananiadou
T Kuhn
T Kuhn
T Oinn
Tim J. Hubbard
WJ Wilbur
Y Kano
Y Kano
Y Kano
Y Kano
Y Miyao
Y Tsuruoka
Y Tsuruoka
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository