6 research outputs found
A realistic assessment of methods for extracting gene/protein interactions from free text
Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community
Benchmarking natural-language parsers for biological applications using dependency graphs
BACKGROUND: Interest is growing in the application of syntactic parsers to natural language processing problems in biology, but assessing their performance is difficult because differences in linguistic convention can falsely appear to be errors. We present a method for evaluating their accuracy using an intermediate representation based on dependency graphs, in which the semantic relationships important in most information extraction tasks are closer to the surface. We also demonstrate how this method can be easily tailored to various application-driven criteria. RESULTS: Using the GENIA corpus as a gold standard, we tested four open-source parsers which have been used in bioinformatics projects. We first present overall performance measures, and test the two leading tools, the Charniak-Lease and Bikel parsers, on subtasks tailored to reflect the requirements of a system for extracting gene expression relationships. These two tools clearly outperform the other parsers in the evaluation, and achieve accuracy levels comparable to or exceeding native dependency parsers on similar tasks in previous biological evaluations. CONCLUSION: Evaluating using dependency graphs allows parsers to be tested easily on criteria chosen according to the semantics of particular biological applications, drawing attention to important mistakes and soaking up many insignificant differences that would otherwise be reported as errors. Generating high-accuracy dependency graphs from the output of phrase-structure parsers also provides access to the more detailed syntax trees that are used in several natural-language processing techniques
Information Routing Driven by Background Chatter in a Signaling Network
Living systems are capable of processing multiple sources of information simultaneously. This is true even at the cellular level, where not only coexisting signals stimulate the cell, but also the presence of fluctuating conditions is significant. When information is received by a cell signaling network via one specific input, the existence of other stimuli can provide a background activity –or chatter– that may affect signal transmission through the network and, therefore, the response of the cell. Here we study the modulation of information processing by chatter in the signaling network of a human cell, specifically, in a Boolean model of the signal transduction network of a fibroblast. We observe that the level of external chatter shapes the response of the system to information carrying signals in a nontrivial manner, modulates the activity levels of the network outputs, and effectively determines the paths of information flow. Our results show that the interactions and node dynamics, far from being random, confer versatility to the signaling network and allow transitions between different information-processing scenarios
Relaxation dynamics and frequency response of a noisy cell signaling network
We investigate the dynamics of cell signaling using an experimentally based Boolean model of the human fibroblast signal transduction network. We determine via systematic numerical simulations the relaxation dynamics of the network in response to a constant set of inputs, both in the absence and in the presence of environmental fluctuations. We then study the network's response to periodically modulated signals, uncovering different types of behaviors for different pairs of driven input and output nodes. The phenomena observed include low-pass, high-pass, and band-pass filtering of the input modulations, among other nontrivial responses, at frequencies around the relaxation frequency of the network. The results reveal that the dynamic response to the external modulation of biologically realistic signaling networks is versatile and robust to noise