36 research outputs found

    Caipirini: using gene sets to rank literature

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Keeping up-to-date with bioscience literature is becoming increasingly challenging. Several recent methods help meet this challenge by allowing literature search to be launched based on lists of abstracts that the user judges to be 'interesting'. Some methods go further by allowing the user to provide a second input set of 'uninteresting' abstracts; these two input sets are then used to search and rank literature by relevance. In this work we present the service 'Caipirini' (<url>http://caipirini.org</url>) that also allows two input sets, but takes the novel approach of allowing ranking of literature based on one or more sets of genes.</p> <p>Results</p> <p>To evaluate the usefulness of Caipirini, we used two test cases, one related to the human cell cycle, and a second related to disease defense mechanisms in <it>Arabidopsis thaliana</it>. In both cases, the new method achieved high precision in finding literature related to the biological mechanisms underlying the input data sets.</p> <p>Conclusions</p> <p>To our knowledge Caipirini is the first service enabling literature search directly based on biological relevance to gene sets; thus, Caipirini gives the research community a new way to unlock hidden knowledge from gene sets derived via high-throughput experiments.</p

    Using Workflows to Explore and Optimise Named Entity Recognition for Chemistry

    Get PDF
    Chemistry text mining tools should be interoperable and adaptable regardless of system-level implementation, installation or even programming issues. We aim to abstract the functionality of these tools from the underlying implementation via reconfigurable workflows for automatically identifying chemical names. To achieve this, we refactored an established named entity recogniser (in the chemistry domain), OSCAR and studied the impact of each component on the net performance. We developed two reconfigurable workflows from OSCAR using an interoperable text mining framework, U-Compare. These workflows can be altered using the drag-&-drop mechanism of the graphical user interface of U-Compare. These workflows also provide a platform to study the relationship between text mining components such as tokenisation and named entity recognition (using maximum entropy Markov model (MEMM) and pattern recognition based classifiers). Results indicate that, for chemistry in particular, eliminating noise generated by tokenisation techniques lead to a slightly better performance than others, in terms of named entity recognition (NER) accuracy. Poor tokenisation translates into poorer input to the classifier components which in turn leads to an increase in Type I or Type II errors, thus, lowering the overall performance. On the Sciborg corpus, the workflow based system, which uses a new tokeniser whilst retaining the same MEMM component, increases the F-score from 82.35% to 84.44%. On the PubMed corpus, it recorded an F-score of 84.84% as against 84.23% by OSCAR

    Stringent response of Escherichia coli: revisiting the bibliome using literature mining

    Get PDF
    Understanding the mechanisms responsible for cellular responses depends on the systematic collection and analysis of information on the main biological concepts involved. Indeed, the identification of biologically relevant concepts in free text, namely genes, tRNAs, mRNAs, gene products and small molecules, is crucial to capture the structure and functioning of different responses. Results In this work, we review literature reports on the study of the stringent response in Escherichia coli. Rather than undertaking the development of a highly specialised literature mining approach, we investigate the suitability of concept recognition and statistical analysis of concept occurrence as means to highlight the concepts that are most likely to be biologically engaged during this response. The co-occurrence analysis of core concepts in this stringent response, i.e. the (p)ppGpp nucleotides with gene products was also inspected and suggest that besides the enzymes RelA and SpoT that control the basal levels of (p)ppGpp nucleotides, many other proteins have a key role in this response. Functional enrichment analysis revealed that basic cellular processes such as metabolism, transcriptional and translational regulation are central, but other stress-associated responses might be elicited during the stringent response. In addition, the identification of less annotated concepts revealed that some (p)ppGpp-induced functional activities are still overlooked in most reviews. Conclusions In this paper we applied a literature mining approach that offers a more comprehensive analysis of the stringent response in E. coli. The compilation of relevant biological entities to this stress response and the assessment of their functional roles provided a more systematic understanding of this cellular response. Overlooked regulatory entities, such as transcriptional regulators, were found to play a role in this stress response. Moreover, the involvement of other stress-associated concepts demonstrates the complexity of this cellular response

    Diagnostic accuracy on fat-saturated T

    No full text
    corecore