Search CORE

36 research outputs found

Caipirini: using gene sets to rank literature

Author: A Barbosa-Silva
Adriano Barbosa-Silva
AM Cohen
Ana Carolina Wanderley-Nogueira
C Nobata
GB Martin
Georgios A Pavlopoulos
GL Poulter
H Kessman
H Kilicoglu
J Lewis
J-B Morel
JF Fontaine
KA Pattin
LJ Jensen
N Polavarapu
Nina Mota Soares-Cavalcanti
PK Shah
R Altman
R Rodriguez-Esteban
Reinhard Schneider
S Yu
Seán I O'Donoghue
T Etzold
T Goetz
T Soldatos
T Tuchler
Theodoros G Soldatos
Venkata P Satagopam
W Yu
Publication venue: BioMed Central
Publication date: 01/01/2012
Field of study

Abstract Background Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' (<url>http://caipirini.org</url>) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes. Results To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in <it>Arabidopsis thaliana</it>. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets. Conclusions To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments.</p

Crossref

Springer - Publisher Connector

Directory of Open Access Journals

PubMed Central

UNSWorks

MDC Repository

Open Repository and Bibliography - Luxembourg

Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

Author: A Copestake
A Tiwari
Apache
B Florian
B Ludascher
B Mellebeek
B Muller
BalaKrishna Kolluru
C Kolarik
C Kolrik
C Nobata
C Steinbeck
CJ Rupp
CJ Rupp
D Banville
D Ferrucci
D Jiao
I Taylor
J Shon
J Wren
JA Townsend
Junichi Tsujii
K Hettne
K Hettne
Lezan Hawizy
M Hassan
N Kemp
P Corbett
P Corbett
P Murray-Rust
P Murray-Rust
Peter Murray-Rust
R Klinger
R Klinger
SG Vellay
Sophia Ananiadou
T Kuhn
T Kuhn
T Oinn
Tim J. Hubbard
WJ Wilbur
Y Kano
Y Kano
Y Kano
Y Kano
Y Miyao
Y Tsuruoka
Y Tsuruoka
Publication venue: Public Library of Science
Publication date: 01/01/2011
Field of study

Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

Public Library of Science (PLOS)

Crossref

Directory of Open Access Journals

PubMed Central

The University of Manchester - Institutional Repository

Stringent response of Escherichia coli: revisiting the bibliome using literature mining

Understanding the mechanisms responsible for cellular responses depends on the systematic collection and analysis of information on the main biological concepts involved. Indeed, the identification of biologically relevant concepts in free text, namely genes, tRNAs, mRNAs, gene products and small molecules, is crucial to capture the structure and functioning of different responses. Results In this work, we review literature reports on the study of the stringent response in Escherichia coli. Rather than undertaking the development of a highly specialised literature mining approach, we investigate the suitability of concept recognition and statistical analysis of concept occurrence as means to highlight the concepts that are most likely to be biologically engaged during this response. The co-occurrence analysis of core concepts in this stringent response, i.e. the (p)ppGpp nucleotides with gene products was also inspected and suggest that besides the enzymes RelA and SpoT that control the basal levels of (p)ppGpp nucleotides, many other proteins have a key role in this response. Functional enrichment analysis revealed that basic cellular processes such as metabolism, transcriptional and translational regulation are central, but other stress-associated responses might be elicited during the stringent response. In addition, the identification of less annotated concepts revealed that some (p)ppGpp-induced functional activities are still overlooked in most reviews. Conclusions In this paper we applied a literature mining approach that offers a more comprehensive analysis of the stringent response in E. coli. The compilation of relevant biological entities to this stress response and the assessment of their functional roles provided a more systematic understanding of this cellular response. Overlooked regulatory entities, such as transcriptional regulators, were found to play a role in this stress response. Moreover, the involvement of other stress-associated concepts demonstrates the complexity of this cellular response

Universidade do Minho: RepositoriUM

Crossref

Springer - Publisher Connector

PubMed Central