700 research outputs found
Recommended from our members
An improved hidden vector state model approach and its adaptation in extracting protein interaction information from biomedical literature
Large quantity of knowledge, which is important for biological researchers to unveil the mechanism of life, often hides in the literature, such as journal articles, reports, books and so on. Many approaches focusing on extracting information from unstructured text, such as pattern matching, shallow and full parsing, have been proposed especially for biomedical applications. In this paper, we present an information extraction system employing a semantic parser using the Hidden Vector State (HVS) model for protein-protein interactions. We found that it performed better than other established statistical methods and achieved 58.3% and 76.8% in recall and precision respectively. Moreover, the pure data-driven HVS model can be easily adapted to other domains, which is rarely mentioned and possessed by other approaches. Experimental results prove that the model trained on one domain can still generate satisfactory results when shifting to another domain with a small amount of adaptation training data
A Labeled Graph Kernel for Relationship Extraction
In this paper, we propose an approach for Relationship Extraction (RE) based
on labeled graph kernels. The kernel we propose is a particularization of a
random walk kernel that exploits two properties previously studied in the RE
literature: (i) the words between the candidate entities or connecting them in
a syntactic representation are particularly likely to carry information
regarding the relationship; and (ii) combining information from distinct
sources in a kernel may help the RE system make better decisions. We performed
experiments on a dataset of protein-protein interactions and the results show
that our approach obtains effectiveness values that are comparable with the
state-of-the art kernel methods. Moreover, our approach is able to outperform
the state-of-the-art kernels when combined with other kernel methods
Extracting protein-protein interactions from text using rich feature vectors and feature selection
Because of the intrinsic complexity of natural language, automatically extracting accurate information from text remains a challenge. We have applied rich featurevectors derived from dependency graphs to predict protein-protein interactions using machine learning techniques. We present the first extensive analysis of applyingfeature selection in this domain, and show that it can produce more cost-effective models. For the first time, our technique was also evaluated on several large-scalecross-dataset experiments, which offers a more realistic view on model performance.
During benchmarking, we encountered several fundamental problems hindering comparability with other methods. We present a set of practical guidelines to set up ameaningful evaluation.
Finally, we have analysed the feature sets from our experiments before and after feature selection, and evaluated the contribution of both lexical and syntacticinformation to our method. The gained insight will be useful to develop better performing methods in this domain
A realistic assessment of methods for extracting gene/protein interactions from free text
Background: The automated extraction of gene and/or protein interactions from the literature is one of the most important targets of biomedical text mining research. In this paper we present a realistic evaluation of gene/protein interaction mining relevant to potential non-specialist users. Hence we have specifically avoided methods that are complex to install or require reimplementation, and we coupled our chosen extraction methods with a state-of-the-art biomedical named entity tagger. Results: Our results show: that performance across different evaluation corpora is extremely variable; that the use of tagged (as opposed to gold standard) gene and protein names has a significant impact on performance, with a drop in F-score of over 20 percentage points being commonplace; and that a simple keyword-based benchmark algorithm when coupled with a named entity tagger outperforms two of the tools most widely used to extract gene/protein interactions. Conclusion: In terms of availability, ease of use and performance, the potential non-specialist user community interested in automatically extracting gene and/or protein interactions from free text is poorly served by current tools and systems. The public release of extraction tools that are easy to install and use, and that achieve state-of-art levels of performance should be treated as a high priority by the biomedical text mining community
Biomedical Event Extraction with Machine Learning
Biomedical natural language processing (BioNLP) is a subfield of natural
language processing, an area of computational linguistics concerned with
developing programs that work with natural language: written texts and
speech. Biomedical relation extraction concerns the detection of semantic
relations such as protein-protein interactions (PPI) from scientific texts.
The aim is to enhance information retrieval by detecting relations between
concepts, not just individual concepts as with a keyword search.
In recent years, events have been proposed as a more detailed alternative
for simple pairwise PPI relations. Events provide a systematic, structural
representation for annotating the content of natural language texts. Events
are characterized by annotated trigger words, directed and typed arguments
and the ability to nest other events. For example, the sentence “Protein A
causes protein B to bind protein C” can be annotated with the nested event
structure CAUSE(A, BIND(B, C)). Converted to such formal representations,
the information of natural language texts can be used by computational
applications. Biomedical event annotations were introduced by the
BioInfer and GENIA corpora, and event extraction was popularized by the
BioNLP'09 Shared Task on Event Extraction.
In this thesis we present a method for automated event extraction, implemented
as the Turku Event Extraction System (TEES). A unified graph
format is defined for representing event annotations and the problem of
extracting complex event structures is decomposed into a number of independent
classification tasks. These classification tasks are solved using SVM
and RLS classifiers, utilizing rich feature representations built from full dependency
parsing. Building on earlier work on pairwise relation extraction
and using a generalized graph representation, the resulting TEES system is
capable of detecting binary relations as well as complex event structures.
We show that this event extraction system has good performance, reaching
the first place in the BioNLP'09 Shared Task on Event Extraction.
Subsequently, TEES has achieved several first ranks in the BioNLP'11 and
BioNLP'13 Shared Tasks, as well as shown competitive performance in the
binary relation Drug-Drug Interaction Extraction 2011 and 2013 shared
tasks.
The Turku Event Extraction System is published as a freely available
open-source project, documenting the research in detail as well as making
the method available for practical applications. In particular, in this thesis
we describe the application of the event extraction method to PubMed-scale
text mining, showing how the developed approach not only shows good
performance, but is generalizable and applicable to large-scale real-world
text mining projects.
Finally, we discuss related literature, summarize the contributions of the
work and present some thoughts on future directions for biomedical event
extraction. This thesis includes and builds on six original research publications.
The first of these introduces the analysis of dependency parses that
leads to development of TEES. The entries in the three BioNLP Shared
Tasks, as well as in the DDIExtraction 2011 task are covered in four publications,
and the sixth one demonstrates the application of the system to
PubMed-scale text mining.Siirretty Doriast
OpenDMAP: An open source, ontology-driven concept analysis engine, with applications to capturing knowledge regarding protein transport, protein interactions and cell-type-specific gene expression
<p>Abstract</p> <p>Background</p> <p>Information extraction (IE) efforts are widely acknowledged to be important in harnessing the rapid advance of biomedical knowledge, particularly in areas where important factual information is published in a diverse literature. Here we report on the design, implementation and several evaluations of OpenDMAP, an ontology-driven, integrated concept analysis system. It significantly advances the state of the art in information extraction by leveraging knowledge in ontological resources, integrating diverse text processing applications, and using an expanded pattern language that allows the mixing of syntactic and semantic elements and variable ordering.</p> <p>Results</p> <p>OpenDMAP information extraction systems were produced for extracting protein transport assertions (transport), protein-protein interaction assertions (interaction) and assertions that a gene is expressed in a cell type (expression). Evaluations were performed on each system, resulting in F-scores ranging from .26 – .72 (precision .39 – .85, recall .16 – .85). Additionally, each of these systems was run over all abstracts in MEDLINE, producing a total of 72,460 transport instances, 265,795 interaction instances and 176,153 expression instances. </p> <p>Conclusion</p> <p>OpenDMAP advances the performance standards for extracting protein-protein interaction predications from the full texts of biomedical research articles. Furthermore, this level of performance appears to generalize to other information extraction tasks, including extracting information about predicates of more than two arguments. The output of the information extraction system is always constructed from elements of an ontology, ensuring that the knowledge representation is grounded with respect to a carefully constructed model of reality. The results of these efforts can be used to increase the efficiency of manual curation efforts and to provide additional features in systems that integrate multiple sources for information extraction. The open source OpenDMAP code library is freely available at <url>http://bionlp.sourceforge.net/</url></p
Biomedical Event Extraction with Machine Learning
Biomedical natural language processing (BioNLP) is a subfield of natural
language processing, an area of computational linguistics concerned
with developing programs that work with natural language: written texts and
speech. Biomedical relation extraction concerns the detection of
semantic relations such as protein--protein interactions (PPI) from scientific
texts. The aim is to enhance information retrieval by detecting relations
between concepts, not just individual concepts as with a keyword search.
In recent years, events have been proposed as a more detailed alternative for
simple pairwise PPI relations. Events provide a systematic, structural
representation for annotating the content of natural language texts. Events are
characterized by annotated trigger words, directed and typed arguments and the
ability to nest other events. For example, the sentence ``Protein A causes
protein B to bind protein C'' can be annotated with the nested event structure
CAUSE(A, BIND(B, C)). Converted to such formal representations, the
information of natural language texts can be used by computational
applications. Biomedical event annotations were introduced by the BioInfer and
GENIA corpora, and event extraction was popularized by the BioNLP'09 Shared Task
on Event Extraction.
In this thesis we present a method for automated event extraction, implemented
as the Turku Event Extraction System (TEES). A unified graph format is defined
for representing event annotations and the problem of extracting complex event
structures is decomposed into a number of independent classification tasks.
These classification tasks are solved using SVM and RLS classifiers, utilizing
rich feature representations built from full dependency parsing. Building on
earlier work on pairwise relation extraction and using a generalized graph
representation, the resulting TEES system is capable of detecting binary
relations as well as complex event structures.
We show that this event extraction system has good performance,
reaching the first place in the BioNLP'09 Shared Task on Event Extraction. Subsequently,
TEES has achieved several first ranks in the BioNLP'11 and BioNLP'13 Shared
Tasks, as well as shown competitive performance in the binary relation Drug-Drug
Interaction Extraction 2011 and 2013 shared tasks.
The Turku Event Extraction System is published as a freely available open-source
project, documenting the research in detail as well as making the method
available for practical applications. In particular, in this thesis we
describe the application of the event extraction method to PubMed-scale text
mining, showing how the developed approach not only shows good performance, but
is generalizable and applicable to large-scale real-world text mining projects.
Finally, we discuss related literature, summarize the contributions of the work
and present some thoughts on future directions for biomedical event extraction.
This thesis includes and builds on six original research publications. The first
of these introduces the analysis of dependency parses that leads to
development of TEES. The entries in the three BioNLP Shared Tasks, as well as
in the DDIExtraction 2011 task are covered in four publications, and the sixth
one demonstrates the application of the system to PubMed-scale text mining.</p
- …